Friday, 25 April 2014

Adding in zero values into a time series in d3.js

The following post is a portion of the D3 Tips and Tricks book which is free to download. To use this post in context, consider it with the others in the blog or just download the the book as a pdf / epub or mobi .
----------------------------------------------------------

Padding for zero values

The ‘Padding for zero values’ question is one that I posted on Stack Overflow in April 2014 and which was elegantly answered by the user 'explunit'.
The premise is that I want to be able to graph points in a time series that has regular intervals but where some of those intervals are missing. For example a line graph of the following data (notice that August 2013 is missing)…
date,value
2013-01,53
2013-02,165
2013-03,269
2013-04,344
2013-05,376
2013-06,410
2013-07,421
2013-09,376
2013-10,359
2013-11,392
2013-12,433
2014-01,455
2014-02,478
… would produce a graph that looked like the following
Interpolated time series
But if the reason for the missing August month was that the value would have been zero, then the graph should actually look like this;
What actually happened
Now, this may sound like a contrived example, but I have come across it in cases where querying for data from a MySQL database by counting (COUNT(*)) the number of instances of an event in a time series (number of sales of an individual item on a monthly basis for instance). The end result is that you are left with a list of months with values for those months where sales occurred, but where no sales occurred, there is no result at all. This can be overcome by some serious MySQL-fu or PHP trickery, but every solution I implemented had a ‘cracking a walnut with a sledgehammer’ feel about it.
Now I know that shifting the resolution of this problem from the server to the client might not sit well for everyone, but that doesn’t mean it’s not a suitable solution.
The solution that was provided by the user ‘explunit’ has a nice natural feel to it and in spite of introducing an additional JavaScript library to load (Lo-Dash) it’s a solution that I will be using in the future for this type of problem. You will need to bear in mind that the lodash.js library will need to be loaded to enable this solution.
<script src=
  "http://cdnjs.cloudflare.com/ajax/libs/lodash.js/2.4.1/lodash.min.js">
</script>

Lo-Dash

Lo-Dash is a utility JavaScript library released under the MIT license that provides a range of useful functional programming helpers such as low level functions that can be used for iteration support for arrays, strings, objects, and arguments objects. It is often compared to underscore.js in terms of functionality.
_.FIND
_.find(collection, [callback=identity], [thisArg])
The ` _.find` function iterates over elements of a collection, returning the first element that the callback returns ‘truey’ for. The callback is bound to thisArg and invoked with three arguments; (value, index|key, collection).
If a property name is provided for callback the created “_.pluck” style callback will return the property value of the given element.
If an object is provided for callback the created “_.where” style callback will return true for elements that have the properties of the given object, else false.
Arguments
  1. collection (Array|Object|string): The collection to iterate over.
  2. [callback=identity] (Function|Object|string): The function called per iteration. If a property name or object is provided it will be used to create a “.pluck” or “.where” style callback, respectively.
  3. [thisArg] (*): The this binding of callback.

The explunit method

The following is the layout of the critical part of the code that the explunit method uses.
// Scale the range of the data
x.domain(d3.extent(data, function(d) { return d.date; }))
     .ticks(d3.time.month);
y.domain([0, d3.max(data, function(d) { return d.value; })]);

// Populate the new array
var newData = x.ticks()
               .map(function(monthBucket) { 
                   return _.find(data, 
                       { date: monthBucket }) || 
                       { date: monthBucket, value: 0 };
                });
It can be thought of as breaking the problem down into two steps. Firstly a new array is built that covers the range of values declared in the data and contains dates at a specified interval. Secondly, the script iterates over the old data and where the date matches a date in the new array it maps the value from the old array to the new array. Where there is no match, it maps the value of 0.
BUILD THE ARRAY
scale.ticks([count])
The domain in the x axis is set using the following line;
x.domain(d3.extent(data, function(d) { return d.date; }))
     .ticks(d3.time.month);
Here the extent of the domain is declared in a standard way (d3.extent(data, function(d) { return d.date; })) but by including the .ticks function the domain is broken into uniformly spaced time intervals defined by the specified time interval (in this case months per the d3.time.month statement). This has just created an array across the x axis with all the month values!
POPULATE THE ARRAY
In the next section we generate our new array (newData) with the values from the original data array and where necessary (there’s no value0 in the value field.
It’s all contained within the following code;
var newData = x.ticks()
               .map(function(monthBucket) { 
                   return _.find(data, 
                       { date: monthBucket }) || 
                       { date: monthBucket, value: 0 };
                });
x.ticks()
The x.ticks() portion recalls our new padded array of months.
.map([object])
In a wider (JavaScript) sense the map() method creates a new array by copying all enumerable properties from the object into the new array. This is implemented with .map which is used here;
               .map(function(monthBucket) { 
                   return _.find(data, 
                       { date: monthBucket }) || 
                       { date: monthBucket, value: 0 };
                });
In the example we are examining, the [object] is provided by the passing the date from the x.ticks() function as monthBucket. Whatever is returned by the function will get placed into newData. It’s within this function that we use the Lo-Dash _.find utility.
In common speak _.find will look over over the objects in an array (data) and will return the the first instance that it finds which matches its specified patten (in our case it will match whenever the date element in dataequals an array element specified by our x.ticks() (monthBucket) values or (if no match is made) it will return the date element from monthBucket and the value of 0.
The end result is an array called newData of objects containing equally spaced date elements across the range that we provided at an interval that we specify with values corresponding to our original array data where it matches the date in our new array and where it doesn’t, we introduce the value 0.

The description above (and heaps of other stuff) is in the D3 Tips and Tricks book that can be downloaded for free (or donate if you really want to :-)).

2 comments:

  1. Minor correction: "D3.js implements its own version of this function with .map which is used here". This is actually just the standard JavaScript .map function. Since D3 requires IE9 (and above), any code we write using it we can also use the ES5 methods. Unfortunately .find is ES6 so we have to depend on lodash/underscore for that.
    -explunit

    ReplyDelete
  2. Ahh.... Many thanks. This was not the only part of your code I had to think long and hard about :-). It's well out of my comfort zone, but worth the effort. I've made some slight edit above and amended the book as well. Thanks again!

    ReplyDelete