----------------------------------------------------------
Difference Chart: Science vs Style.
Dear readers, please forgive me for including this example in D3 Tips and Tricks. While it demonstrates a really cool graphing technique, I have chosen to apply it to a topic that has a potential to raise a couple of sets of eyebrows in the form of Messrs Roger Peng and Jeff Leek. Both work at the Johns Hopkins Bloomberg School of Public Health where Roger is an Associate Professor of Biostatistics, and Jeff is an Associate Professor of Biostatistics and Oncology.
While both are doing amazing work to improve peoples health and well-being (amongst other things), both are also authors of highly successful books published by Leanpub. In particular Roger has written R Programming for Data Science and Exploratory Data Analysis with R while Jeff has penned The Elements of Data Analytic Style. As we could anticipate, there is a possibility that there is something of a competitive elementto publishing for both gentlemen as they see the the number of downloads of their books climb ever higher.
While I would hate to promote an increase to these tensions, The opportunity was too attractive given that I had access to some data on the number of downloads that each of the books had been achieving and I really wanted to write about difference charts using d3.js (and the method of sourcing the data for the book Raspberry Pi: Measure, Record, Explore).
So at the risk of providing some form of offence to these fine gentlemen or inciting an increased rivalry, I have forged ahead and hopefully the worst that will happen is that someone interested in d3.js will also find some interesting reading in R Programming for Data Science,Exploratory Data Analysis with R or The Elements of Data Analytic Style. Ultimately we should be left with a graph that will look something like this;
Science vs Style - Daily Leanpub Book Sales |
Purpose
A difference chart is a variation on a bivariate area chart. This is a line chart that includes two lines that are interlinked by filling the space between the lines. A difference chart as demonstrated in the example here by Mike Bostock is able to highlight the differences between the lines by filling the area between them with different colours depending on which line is the greater value.
As Mike points out in his example, this technique harks back at least as far as William Playfair when he was describing the time series of exports and imports of Denmark and Norway in 1786.
All that remains is for us to work out how d3.js can help us out by doing the job programmatically. The example that I use here is based on that of Mike Bostock’s, with the addition of a few niceties in the form of a legend, a title, and some minor changes.
We will start with a simple example of the code and we will add blocks to finally arrive at the example with Legends and title.
The Code
The following is the code for the simple difference chart. A live version is available online at bl.ocks.org or GitHub. It is also available as the files ‘diff-basic.html’ and ‘downloads.csv’ as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub.
A sample of the associated csv file (downloads.csv) is formatted as follows;
Description
The graph has some portions that are common to the simple line graph example.
We start the HTML file, load some styling for the upcoming elements, set up the margins, time formatting scales, ranges and axes.
Because the graph is composed of two lines we need to declare two separate line functions;
To fill an area we declare an area function using one of the lines as the baseline (
y1
) and when it comes time to fill the area later in the script we declare y0
separately to define the area to be filled as an intersection of two paths.
In this instance we are using the green ‘Science’ line as the
y1
line.
The svg area is then set up using the height, width and margin values and we load our csv files with our number of downloads for each book. We then carry out a standard
forEach
to ensure that the time and numerical values are formatted correctly.Nesting the data
The data that we are starting with is formatted in a way that we could reasonably expect data to be available in this instance where a value is saved for distinct elements on an element by element basis. This style of recording data makes it easy to add new elements into the data stream or a database rather than relying on having them as discrete columns.
In this case, we will need to ‘pivot’ the data to produce a multi-column representation where we have a single row for each date, and the number of downloads for each book as seperate columns as follows;
This can be achieved using the d3 nest function.
We declare our new array’s name as
data
and we initiate the nest
function;
We assign the
key
for our new array as dtg
. A ‘key’ is like a way of saying “This is the thing we will be grouping on”. In other words our resultant array will have a single entry for each unique date (dtg
) which will have the values of the number of downloaded books associated with it.
Then we tell the nest function which data array we will be using for our source of data.
Wrangle the data
Once we have our pivoted data we can format it in a way that will suit the code for the visualisation. This involves storing the values for the ‘Science’ and ‘Style’ variables as part of a named index.
We then loop through the ‘Science’ and ‘Style’ array to convert the incrementing value of the total number of downloads into a value of the number that have been downloaded each day;
Finally because we are adjusting from total downloaded to daily values we are left with an orphan value that we need to remove from the front of the array;
Cheating with the domain
The observant d3.js reader will have noticed that the setting of the
y
domain has a large section commented out;
That’s because I want to be able to provide an ideal way for the graph to represent the data in an appropriate range, but because we are using the
basis
smoothing modifier, and the data is ‘peaky’, there is a tendency for the y scale to be fairy broad and the resultant graph looks a little lost;
Alternatively, we could remove the smoothing and let the true data be shown;
It should be argued that this is a truer representation of the data, but in this case I feel comfortable sacrificing accuracy for aesthetics (what have I become?).
Therefore, the domain for the
y
axis is set manually to between 0 and 1400, but feel free to remove that at the point when you introduce your own data :-).
data
vs datum
One small line gets its own section. That line is;
A casual d3.js user could be forgiven for thinking that this doesn’t seem too fearsome a line, but it has hidden depths.
As Mike Bostock explains here, if we want to bind data to elements as a group we would be
*.data
, but if we want to bind that data to individual elements, we should use *.datum
.
It’s a function of how the data is stored. If there is an expectation that the data will be dynamic then
data
is the way to go since it has the feature of preparing enter and exit selections. If the data is static (it won’t be changing) then datum
is the way to go.
In our case we are assigning data to individual elements and as a result we will be using
datum
.
Setting up the clipPath
s
The
clipPath
operator is used to define an area that is used to create a shape by intersecting one area with another.
In our case we are going to set up two clip paths. One is the area above the green ‘Science’ line (which we defined earlier as being the
y1
component of an area selection);
This is declared via this portion of the code;
Then we set up the clip path that will exist for the area below the green ‘Science’ line ;
This is declared via this portion of the code;
Each of these paths has an ‘id’ which can be subsequently used by the following code.
Clipping and adding the areas
Now we come to clipping our shape and filling it with the appropriate colour.
We do this by having a shape that represents the area between the two lines and applying our clip path for the values above and below our reference line (the green ‘Science’ line). Where the two intersect, we fill it with the appropriate colour. The code to fill the area above the reference line is as follows;
Here we have two lines that are defining the shape between the two science and style lines;
If we were to look at the shape that this produces it would look as follows (greyed out for highlighting);
We apply a class to the shape so that is filled with the colour that we want;
.. and apply the clip path so that only the areas that intersect the two shapes are filled with the appropriate colour;
Here the intersection of those two shapes is shown as pink;
Then we do the same for the area below;
With the corresponding areas showing the intersection of the two shapes coloured differently;
Draw the lines and the axes
The final part of our basic difference chart is to draw in the lines over the top so that they are highlighted and to add in the axes;
Et viola! we have our difference chart!
As mentioned earlier, the code for the simple difference chart is available online at bl.ocks.org or GitHub. It is also available as the files ‘diff-basic.html’ and ‘downloads.csv’ as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub.
Adding a bit more to our difference chart.
The chart itself is a thing of beauty, but given the subject matter (it’s describing two books after all) we should include a bit more information on what it is we’re looking at and provide some links so that a fascinated viewer of the graphs can read the books!
Add a Y axis label
Because it’s not immediately obvious what we’re looking at on the Y axis we should add in a nice subtle label on the Y axis;
Add a title
Every graph should have a title. The following code adds this to the top(ish) centre of the chart and provides a white drop-shadow for readability;
Adding the legend
A respectable legend in this case should provide visual context of what it is describing in relation to the graph (by way of colour) and should actually name the book. We can also go a little bit further and provide a link to the books in the legend so that potential readers can access them easily.
Firstly the rectangles filled with the right colour, sized appropriately and arranged just right;
Then we add the text (with a drop-shadow) and a link;
I’ll be the first to admit that this could be done more efficiently with some styling via css, but then it would leave nothing for the reader to try :-).
Link the areas
As a last touch we can include the links to the respective books in the shading for the graph itself;
Perhaps not strictly required, but a nice touch none the less.
The final result
And here it is;
The code for the full difference chart is available online at bl.ocks.org or GitHub. It is also available as the files ‘diff-full.html’ and ‘downloads.csv’ as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub.
The description above (and heaps of other stuff) is in the D3 Tips and Tricks book that can be downloaded for free (or donate if you really want to :-)).
No comments:
Post a Comment