Thursday, 20 February 2014

Grouping and summing data using d3.nest

The following post is a portion of the D3 Tips and Tricks book which is free to download. To use this post in context, consider it with the others in the blog or just download the the book as a pdf / epub or mobi .
----------------------------------------------------------

Grouping and summing data (d3.nest).

Often we will wish to group elements in an array into a hierarchical structure similar to the GROUP BY operator in SQL (but with the scope for multiple levels). This can be achieved using the d3.nest operator. Additionally we will sometimes wish to collapse the elements that we are grouping in a specific way (for instance to sum values). This can be achieved using the rollup function.
The example we will use is having the following csv file consisting of a column of dates and corresponding values;
date,value
2011-03-23,3
2011-03-23,2
2011-03-24,3
2011-03-24,3
2011-03-24,6
2011-03-24,2
2011-03-24,7
2011-03-25,4
2011-03-25,5
2011-03-25,1
2011-03-25,4
We will nest the data according to the date and sum the data for each date so that our data is in the equivalent form of;
key,values
2011-03-23,5
2011-03-24,21
2011-03-25,14
We will do this with the following script;
d3.csv("source-data.csv", function(error, csv_data) {
 var data = d3.nest()
  .key(function(d) { return d.date;})
  .rollup(function(d) { 
   return d3.sum(d, function(g) {return g.value; });
  }).entries(csv_data);
...
});
We are assuming the data is in the form of our initial csv file and is named source-data.csv.
The first thing we do is load that file and assign the loaded arrar the variable name csv_data.
d3.csv("source-data.csv", function(error, csv_data) {
Then we declare our new array’s name will be data and we initiate the nest function;
 var data = d3.nest()
We assign the key for our new array as date. A ‘key’ is like a way of saying “This is the thing we will be grouping on”. In other words our resultant array will have a single entry for each unique date value.
  .key(function(d) { return d.date;})
Then we include the rollup function that takes all the individual value variables that are in each unique datefield and sums them;
  .rollup(function(d) { 
   return d3.sum(d, function(g) {return g.value; });
Lastly we tell the entire nest function which data array we will be using for our source of data.
  }).entries(csv_data);
You should note that our data will have changed name from date and value. This is as a function of the nestand rollup process. But never fear, it’s a simple task to re-name them if necessary using the following function (which could include a call to parse the date, but I have omitted it for clarity);
data.forEach(function(d) {
 d.date = d.key;
 d.value = d.values;
});

The description above (and heaps of other stuff) is in the D3 Tips and Tricks book that can be downloaded for free (or donate if you really want to :-)).

11 comments:

  1. Hi. I have a following json :
    {
    "ReturnCode":0,
    "ReturnMessage":"Success",
    "List":[
    {
    "Client":"Ad",
    "Department":"DP",
    "ProjectId":"12355",
    "ProjectName":"4940"
    },
    {
    "Client":"Ad",
    "Department":"SP",
    "ProjectId":"12355",
    "ProjectName":"4940"
    },
    {
    "Client":"Ad",
    "Department":"Co",
    "ProjectId":"12355",
    "ProjectName":"asdf"
    },
    {
    "Client":"Ad",
    "Department":"Co",
    "ProjectId":"212355",
    "ProjectName":"45ed"
    },
    {
    "Client":"Ad",
    "Department":"Co",
    "ProjectId":"212355",
    "ProjectName":"45ed "
    },
    {
    "Client":"we",
    "Department":" SP ",
    "ProjectId":"123455",
    "ProjectName":"asdf"
    },
    {
    "Client":"we",
    "Department":"Co",
    "ProjectId":"123455",
    "ProjectName":"asdf"
    },
    {
    "Client":"oc",
    "Department":"Co",
    "ProjectId":"24355",
    "ProjectName":"qwe"
    }]
    }
    Here I just need to count the number of projects to each client like below using d3.nest
    [{Key:”Ad”,value:2} , {Key:”we”,value:1},{Key:”oc”,value:1}]
    Any suggestion ?

    ReplyDelete
    Replies
    1. The best suggestion I can make is for you to solve the problem yourself. That way you will gain a better understanding of the process and be able to repeat it in the future. Having said that, it is definitely something that you will need to concentrate on for a while to get right, so I would recommend checking out the 'Mister Nester' page (http://bl.ocks.org/shancarter/raw/4748131/) which is excellent for illustrating the differences in the techniques. Good luck

      Delete
  2. Thank you for the illustrative example. In the rollup part I think you have g and d reversed. The rollup parameter should be a function over groups of the data, so that sum is called on an array g, and the individual data items are added up, d.value. So we have

    .rollup(function(g) {
    return d3.sum(g, function(d) {return d.value; });

    Programmatically they are the same, but I think it makes more logical sense this way.

    By the way, good call making the other commenter figure out his problem by himself.

    ReplyDelete
    Replies
    1. Great question. This had me thinking for quite a bit. And in fact I had a really interesting answer all lined up before I really understood what your question was stating. You are right. I believe that my code could be misconstrued. Your example is better and more logical. In fact I should take one more step and change the 'd' in `function(d) {return d.value; }` to something completely different like 'v'.

      Delete
  3. Hey Guys...
    Sorry For late response I did sorted this out long time b4 with some suggestions from.... http://stackoverflow.com/questions/32996575/counting-distinct-values-from-json-using-d3-nest/32997817#32997817

    soln:
    d3.json("json/data.json", function(data) {
    console.log(data);

    var nested_data = d3.nest()
    .key(function(d) { return d.Client; })
    .key(function(d) { return d.ProjectId; })
    .rollup(function(leaves) { return leaves.length; })
    .entries(data.List);

    for (var item in nested_data) { console.log(nested_data[item].key+'--'+ Object.keys(nested_data[item].values).length); }
    })

    ReplyDelete
  4. Oh, great, thank you for such a wonderful solution, it is very useful, thank you!
    Richard Brown data room due diligence

    ReplyDelete
  5. Can we use calculation in D3.js chart is this possible to have single label with multiple value and sum the values to generate any chart in d3.js..if plz give a solution

    ReplyDelete
    Replies
    1. Sorry, I don't think that this is possible.

      Delete
  6. var x = "Type";
    var y = "Fat";
    var svg = d3v4.select("svg"),
    width = +svg.attr("width"),
    height = +svg.attr("height"),
    radius = Math.min(width, height) / 2,
    g = svg.append("g").attr("transform", "translate(" + width / 2 + "," + height / 2 + ")");

    var color = d3v4.scaleOrdinal(["#98abc5", "#8a89a6", "#7b6888", "#6b486b", "#a05d56", "#d0743c", "#ff8c00"]);

    var pie = d3v4.pie()
    .sort(null)
    .value(function(d) { return d[y]; });

    var path = d3v4.arc()
    .outerRadius(radius - 10)
    .innerRadius(0);

    var label = d3v4.arc()
    .outerRadius(radius - 40)
    .innerRadius(radius - 40);

    // d3v4.csv("cereal.csv", function(csv_data) {
    //d.population = +d.population;
    //return d;
    //},

    d3v4.csv("cereal.csv", function(csv_data) {
    var data = d3v4.nest()
    .key(function(d) { return d[x];})
    .rollup(function(d) {
    return d3v4.sum(d, function(g) {return g[y]; });
    }).entries(csv_data);

    data.forEach(function(d) {
    d[x] = d.key;
    d[y] = d.values;
    });

    }, function(error,data) {
    if (error) throw error;

    var arc = g.selectAll(".arc")
    .data(pie(data))
    .enter().append("g")
    .attr("class", "arc");

    arc.append("path")
    .attr("d", path)
    .attr("fill", function(d) { return color(d.data[x]); });

    arc.append("text")
    .attr("transform", function(d) { return "translate(" + label.centroid(d) + ")"; })
    .attr("dy", "0.35em")
    .text(function(d) { return d.data[x]; });
    });

    this is the script of pie chart

    please help with this .I've done exactly as you've said but not getting any output.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. From a quick look at the code and the example from above there are quite a few differences. Please try not to paste long lists of code in the comments here, the formatting gets pretty messed up and it's a difficult way to try to solve software problems. What I recommend for the future is to use stackoverflow. Not only is the interface really good for helping with code, but there are a lot of other people reading it (rather than just this simple blog :-)).
      In this particular case, the start point I would advise is that you are currently using the operator 'd3v4' (instead of 'd3')unless you've declared that somewhere else in your code (which I can't see) this will be causing you problems.
      What you have is a pretty complex piece of code (compared to the nest example for this blog post). What I recommend is that you try to simplify your code as much as possible (even remove the pie display) so that you can test the different components. This is a good technique for trying to eliminate problems.
      Good luck.

      Delete