Saturday, 22 March 2014

Visualize Linux Processes via Interactive Tree diagram with d3.js

The following post is a portion of the D3 Tips and Tricks book which is free to download. To use this post in context, consider it with the others in the blog or just download the the book as a pdf / epub or mobi .
----------------------------------------------------------

Linux Processes via Interactive Tree diagram

Purpose

This page was developed to play with the idea of visualizing the relationship between processes running on a Linux server. To my shame, I never twigged that since the processes were ordered in a hierarchy they would therefore make an excellent tree diagram. I am therefore indebted to a friend for pointing out the obvious (as he often needs to do :-)).
Ultimately I have grand visions of this type of display being used to illustrate excessive memory or CPU usage when fault conditions occur, but for the purposes of simply showing the relationships, this example is suitable.
There are obviously a lot of processes running on a Linux server, so it was necessary to make the diagram interactive to allow branches to collapse where required for clarity. Indeed, there are a few of the tree diagram features which are covered separately in the chapter on tree diagrams which are combined here (interactive nodes, loading from an external source and making the tree interactive). Additionally, there is a great deal of data about each node that is available when running the ps command. I have chosen to show some of these in a tool tip that will appear when hovered over a node.
The tree is fairly large, so the following is a section of the tree with tool tip in action
Linux processes in tree form

The Code

The following is the full code for the example. A live version is available online at bl.ocks.org or GitHub. It is also available as the files ‘process-tree.html’ and ‘ps.csv’ as separate downloads with the book D3 Tips and Tricks. A a copy of most the files that appear in the book can be downloaded (in a zip file) when you download the book from Leanpub.
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">

    <title>Linux Process Tree</title>

    <style>

  div.tooltip {
    position: absolute;
    text-align: left;
    width: 180px;
    height: 80px;
    padding: 2px;
    font: 12px sans-serif;
    background: lightsteelblue;
    border: 0px;
    border-radius: 8px;
    pointer-events: none;
  }
  
  .node circle {
    fill: #fff;
    stroke: steelblue;
    stroke-width: 3px;
  }

  .node text { font: 12px sans-serif; }

  .link {
    fill: none;
    stroke: #ccc;
    stroke-width: 2px;
  }
  
    </style>

  </head>

  <body>

<!-- load the d3.js library -->  
<script src="http://d3js.org/d3.v3.min.js"></script>
  
<script>

// ************** Generate the tree diagram   *****************
var margin = {top: 20, right: 120, bottom: 20, left: 120},
  width = 1200 - margin.right - margin.left,
  height = 900 - margin.top - margin.bottom;
  
var i = 0;
  duration = 750;

var tree = d3.layout.tree()
  .size([height, width]);

var diagonal = d3.svg.diagonal()
  .projection(function(d) { return [d.y, d.x]; });

var svg = d3.select("body").append("svg")
  .attr("width", width + margin.right + margin.left)
  .attr("height", height + margin.top + margin.bottom)
  .append("g")
  .attr("transform", "translate(" + margin.left + "," + 
                                    margin.top + ")");

// load the external data
d3.csv("ps.csv", function(error, data) {

  // *********** Convert flat data into a nice tree ***************
  // create a name: node map
  var dataMap = data.reduce(function(map, node) {
    map[node.name] = node;
    return map;
  }, {});

  // create the tree array
  var treeData = [];
  data.forEach(function(node) {
    // add to parent
    var parent = dataMap[node.parent];
    if (parent) {
      // create child array if it doesn't exist
      (parent.children || (parent.children = []))
        // add node to child array
        .push(node);
    } else {
      // parent is null or missing
      treeData.push(node);
    }
  });

  root = treeData[0];
  root.x0 = height / 2;
  root.y0 = 0;
  
  update(root);
});

d3.select(self.frameElement).style("height", "500px");

function update(source) {

  // Compute the new tree layout.
  var nodes = tree.nodes(root).reverse(),
    links = tree.links(nodes);

  // Normalize for fixed-depth.
  nodes.forEach(function(d) { d.y = d.depth * 180; });

  // Update the nodes…
  var node = svg.selectAll("g.node")
    .data(nodes, function(d) { return d.id || (d.id = ++i); });

  // Enter any new nodes at the parent's previous position.
  var nodeEnter = node.enter().append("g")
    .attr("class", "node")
    .attr("transform", function(d) {
      return "translate(" + source.y0 + "," + source.x0 + ")"; })
    .on("click", click)
      // add tool tip for ps -eo pid,ppid,pcpu,size,comm,ruser,s
      .on("mouseover", function(d) {
        div.transition()
          .duration(200)
          .style("opacity", .9);
        div .html(
            "PID: " + d.name + "<br/>" + 
            "Command: " + d.COMMAND + "<br/>" +
            "User: " + d.RUSER + "<br/>" +
            "%CPU: " + d.CPU + "<br/>" +
            "Memory: " + d.SIZE
            )
          .style("left", (d3.event.pageX) + "px")
          .style("top", (d3.event.pageY - 28) + "px");
        })
      .on("mouseout", function(d) {
        div.transition()
          .duration(500)
          .style("opacity", 0);
        });

  nodeEnter.append("circle")
    .attr("r", 1e-6)
    .style("fill", function(d) {
      return d._children ? "lightsteelblue" : "#fff"; });

  nodeEnter.append("text")
    .attr("x", function(d) {
      return d.children || d._children ? -13 : 13; })
    .attr("dy", ".35em")
    .attr("text-anchor", function(d) { 
      return d.children || d._children ? "end" : "start"; })
    .text(function(d) { return d.COMMAND; })
    .style("fill-opacity", 1e-6);

// add the tool tip
  var div = d3.select("body").append("div")
    .attr("class", "tooltip")
    .style("opacity", 0);

  // Transition nodes to their new position.
  var nodeUpdate = node.transition()
    .duration(duration)
    .attr("transform", function(d) { 
        return "translate(" + d.y + "," + d.x + ")";
      });

  nodeUpdate.select("circle")
    .attr("r", 10)
    .style("fill", function(d) { 
	  return d._children ? "lightsteelblue" : "#fff"; });

  nodeUpdate.select("text")
    .style("fill-opacity", 1);

  // Transition exiting nodes to the parent's new position.
  var nodeExit = node.exit().transition()
    .duration(duration)
    .attr("transform", function(d) { return "translate(" + source.y + 
                                             "," + source.x + ")"; })
    .remove();

  nodeExit.select("circle")
    .attr("r", 1e-6);

  nodeExit.select("text")
    .style("fill-opacity", 1e-6);

  // Update the links…
  var link = svg.selectAll("path.link")
    .data(links, function(d) { return d.target.id; });

  // Enter any new links at the parent's previous position.
  link.enter().insert("path", "g")
    .attr("class", "link")
    .attr("d", function(d) {
      var o = {x: source.x0, y: source.y0};
      return diagonal({source: o, target: o});
    });

  // Transition links to their new position.
  link.transition()
    .duration(duration)
    .attr("d", diagonal);

  // Transition exiting nodes to the parent's new position.
  link.exit().transition()
    .duration(duration)
    .attr("d", function(d) {
      var o = {x: source.x, y: source.y};
      return diagonal({source: o, target: o});
    })
    .remove();

  // Stash the old positions for transition.
  nodes.forEach(function(d) {
  d.x0 = d.x;
  d.y0 = d.y;
  });
}

// Toggle children on click.
function click(d) {
  if (d.children) {
    d._children = d.children;
    d.children = null;
  } else {
    d.children = d._children;
    d._children = null;
  }
  update(d);
}

</script>
  
  </body>
</html>

Description

I will describe both the code for the example and the csv file that accompanies it since (in this case) the data that generates the tree is not gathered entirely automatically and has some manual intervention applied to make it suitable for purpose.
The csv file (ps.csv) was generated by running the command…
ps -eo pid,ppid,pcpu,size,comm,ruser,s
…and converting the resultant output to a csv file.
name,parent,CPU,SIZE,COMMAND,RUSER,S
0, ,0,0,start,nul,u
1,0,0.0,1140,init,root,S
2,0,0.0,0,kthreadd,root,S
3,2,0.0,0,ksoftirqd/0,root,S
5,2,0.0,0,kworker/0:0H,root,S
6,2,0.0,0,kworker/u2:0,root,S
7,2,0.0,0,migration/0,root,S
8,2,0.0,0,rcu_bh,root,S
9,2,0.0,0,rcuob/0,root,S
10,2,0.0,0,rcu_sched,root,S
...
The column names I have specifically asked for with the command are;
  • pid: Process ID - The unique numeric identifier assigned to the process
  • ppid: Parent Process ID - Indicates the decimal value of the parent process ID
  • pcpu: Percentage of CPU - Time used (total CPU time divided by length of time the process has been running)
  • size: Size - Memory size in kilobytes
  • comm: Command -Indicates the short name of the command being executed
  • ruser: Real User ID - The textual user ID
  • s: State - Process state with possible values:
    • R Running
    • S Sleeping (may be interrupted)
    • D Sleeping (may not be interrupted) used to indicate process is handling input/output
    • T Stopped or being traced
    • Z Zombie or “hung” process
I manually added the ‘start’ line (0, ,0,0,start,nul,u) to include the root node and removed the ‘%’ sign from the ‘%CPU’ label (which is produces when running ‘ps’) to reduce chance of errors.
In theory the process of formatting the data file could be automated (indeed, there may be a much better way to gather and include it!).
The code is essentially an amalgam of four components which have been covered separately in earlier sections of the book;
  1. Tree code where the data is loaded from an external source
  2. Tree code where the data is converted into a hierarchy from a flat file
  3. Tree code to allow the diagram to collapse and expand.
  4. Tool tips as an HTML object.
The loading of the data from an external source occurs in this portion of the code;
// load the external data
d3.csv("ps.csv", function(error, data) {

  // *********** Convert flat data into a nice tree ***************
  // create a name: node map
  var dataMap = data.reduce(function(map, node) {
    map[node.name] = node;
    return map;
  }, {});

  // create the tree array
  var treeData = [];
  data.forEach(function(node) {
    // add to parent
    var parent = dataMap[node.parent];
    if (parent) {
      // create child array if it doesn't exist
      (parent.children || (parent.children = []))
        // add node to child array
        .push(node);
    } else {
      // parent is null or missing
      treeData.push(node);
    }
  });

  root = treeData[0];
  root.x0 = height / 2;
  root.y0 = 0;
  
  update(root);
});
In reality the loading process is just the wrapping part of that code segment as the inner portion is the section that takes the flat data and creates the hierarchical treeData.
The function update is the main section that allows the tree to expand and collapse (along with the clickfunction). I won’t repeat that code here as it is quite lengthy and probably unnecessary (as it appears only a few pages previously). The only difference between the update function here and the one that is used in the example in the tree chapter is that we include a portion of code that allows us to include some of the details of each process in a tool tip;
  // Enter any new nodes at the parent's previous position.
  var nodeEnter = node.enter().append("g")
    .attr("class", "node")
    .attr("transform", function(d) {
      return "translate(" + source.y0 + "," + source.x0 + ")"; })
    .on("click", click)
      // add tool tip for ps -eo pid,ppid,pcpu,size,comm,ruser,s
      .on("mouseover", function(d) {
        div.transition()
          .duration(200)
          .style("opacity", .9);
        div.html(
            "PID: " + d.name + "<br/>" + 
            "Command: " + d.COMMAND + "<br/>" +
            "User: " + d.RUSER + "<br/>" +
            "%CPU: " + d.CPU + "<br/>" +
            "Memory: " + d.SIZE
            )
          .style("left", (d3.event.pageX) + "px")
          .style("top", (d3.event.pageY - 28) + "px");
        })
      .on("mouseout", function(d) {
        div.transition()
          .duration(500)
          .style("opacity", 0);
        });
We use the mouseover and mouseout calls to find out when the cursor is over a portion of a node (and this includes the text part as well as the circle) and print out the name of the process (which is the Process ID (PID)), the command name (a nice textual equivalent of the PID) the user that the process is run by along with the CPU use and memory used.
The tree is pretty large, so depending on your use it might want to expand or perhaps contract. It could be drawn radially perhaps and there is certainly scope for encoding information about memory and CPU usage in the associated colouring of the nodes / links.
I would recommend visiting the demo page on bl.ocks.org to get a good look at the result, as no picture in a book will be able to capture it sufficiently :-)
Linux processes in tree form


The description above (and heaps of other stuff) is in the D3 Tips and Tricks book that can be downloaded for free (or donate if you really want to :-)).

2 comments:

  1. Hello, pretty cool.

    I assume you can tweak the code to have colors and sizes relying on information from the csv, rite?

    Say, size of the circle on how much memory is using, and a color scale to CPU usage..

    in that way you can spot easily memory/cpu hogs..

    Alvaro.

    ReplyDelete
    Replies
    1. Oh yes :-). If you check out the post on building a tree diagram here (http://www.d3noob.org/2014/01/tree-diagrams-in-d3js_11.html) you can see how to do it.

      Delete