Purpose
In the process of implementing a method of measuring and displaying the passage of a cat through a cat-door (as described in the book ‘Raspberry Pi: Measure, Record, Explore’) I built a graph that showed events indicated by both date and time on separate axes. It was then that I figured that this would be useful for exploring event data or data that exists as a series of date/time stamps that signify a particular ‘thing as having occurred. In the cat door example it was the use of the door by the cat, but this is applicable to a huge range of data sets.
One that I thought of straight away was the dates and times that people downloaded the book D3 Tips and Tricks. Leanpub has an API for accessing the history of book activity and I was able to download it and store it in a database for examination.
Ultimately what I developed was a scatter plot that shows the date of the events on the X axis and the time of the events on the Y axis. This was augmented by two line graphs that showed the accumulated sums of each axis on their respective sides.
Data Event Exploration |
The full code for this example is available online at bl.ocks.org or GitHub. It is also available as the files ‘book-downloads.html’ and ‘downloads.zip’ (which contains downloads.json (it’s zipped up because otherwise it’s a bit too large for Leanpub)) as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub. For the ideal viewing experience, check it out in full screen mode.
There is also a separate blog post describing the information that I learned from looking at the data here.
There is also a separate blog post describing the information that I learned from looking at the data here.
To make the information slightly more accessible when the user hovers their mouse over the scatter plot there is an intersection of the position extrapolated to show the relationship to the other graphs and it presents the appropriate value of date, time and number downloaded by date and time.
This graph is a relatively complex combination of a range of different techniques presented in the book, including wrangling and nesting of data, combination of multiple graphs and the use of mouse movement to display tool-tips and additional data.
The Code
The code is extremely lengthy, so in lieu of placing it in the book it can be found on bl.ocks.org or Github. It is liberally commented to assist readers and I will describe particular sections of the code below and hopefully that will help more where required.
Wrangling the data
The graph uses four sets of data.
- The raw event data (an array called
events
) - The scatter plot data (an array called
data
) - The date graph data (an array called
dataDate
) - The time graph data (an array called
dataTime
)
The raw event data is ingested from an external JSON file using the standard
d3.json
call.
The data itself is simply a collection of dates.
{
"dtg"
:
"2013-01-24 09:10:59"
},
{
"dtg"
:
"2013-01-24 09:17:37"
},
{
"dtg"
:
"2013-01-24 09:48:48"
},
{
"dtg"
:
"2013-01-24 15:01:59"
},
{
"dtg"
:
"2013-01-24 18:11:44"
},
{
"dtg"
:
"2013-01-24 18:47:05"
},
{
"dtg"
:
"2013-01-24 18:47:23"
},
{
"dtg"
:
"2013-01-24 19:55:53"
},
{
"dtg"
:
"2013-01-24 22:37:39"
},
{
"dtg"
:
"2013-01-25 01:22:48"
},
{
"dtg"
:
"2013-01-25 06:37:38"
},
{
"dtg"
:
"2013-01-25 08:28:20"
},
Each date represents the time that a book was downloaded.
Once loaded we run a
forEach
over the file to put it in a format for manipulation into the remaining three data sets. // parse and format all the event data
events
.
forEach
(
function
(
d
)
{
d
.
dtg
=
d
.
dtg
.
slice
(
0
,
-
4
)
+
'0:00'
;
// get the 10 minute block
dtgSplit
=
d
.
dtg
.
split
(
" "
);
// split on the space
d
.
date
=
dtgSplit
[
0
];
// get the date seperatly
d
.
time
=
dtgSplit
[
1
];
// format the time
d
.
number_downloaded
=
1
;
// Number of downloads
});
The first thing we do is to
slice
off the last four characters of the dtg
string and replace them with 0:00
. This leave us with a set of dtg
values that are only represented by the 10 minute window in which they were downloaded.
We then
split
the dtg
string on the space that separates the date and the time and we designate one half date
and the other half time
.
Lastly we represent the number of books downloaded for each event as 1 (this helps us sum them up later).
Using the
events
data we create the data-set for the scatter plot (data
) by nesting the information on the 10 minute dtg
value of date/time and by summing the number of downloads; var
data
=
d3
.
nest
()
.
key
(
function
(
d
)
{
return
d
.
dtg
;})
.
rollup
(
function
(
d
)
{
return
d3
.
sum
(
d
,
function
(
g
)
{
return
g
.
number_downloaded
;
});
})
.
entries
(
events
);
We carry out a similar process for the date…
var
dataDate
=
d3
.
nest
()
.
key
(
function
(
d
)
{
return
d
.
date
;})
.
rollup
(
function
(
d
)
{
return
d3
.
sum
(
d
,
function
(
g
)
{
return
g
.
number_downloaded
;
});
})
.
entries
(
events
);
… and the time;
var
dataTime
=
d3
.
nest
()
.
key
(
function
(
d
)
{
return
d
.
time
;})
.
sortKeys
(
d3
.
ascending
)
.
rollup
(
function
(
d
)
{
return
d3
.
sum
(
d
,
function
(
g
)
{
return
g
.
number_downloaded
;
});
})
.
entries
(
events
);
Sizing Everything Up
The size of the graph is determined by a number of fixed variables which are fairly self explanatory;
scatterplotHeight
(which is also the height of the time graph)dateGraphHeight
timeGraphWidth
But we need to let the width of the scatter plot (and the date graph) be a function of the number of days that have been collected. This variable is handled by;
scatterplotWidth
This set-up is handled in the following block of code;
var
oneDay
=
24
*
60
*
60
*
1000
;
// hours*minutes*seconds*milliseconds
var
dateStart
=
d3
.
min
(
data
,
function
(
d
)
{
return
d
.
date
;
});
var
dateFinish
=
d3
.
max
(
data
,
function
(
d
)
{
return
d
.
date
;
});
var
numberDays
=
Math
.
round
(
Math
.
abs
((
dateStart
.
getTime
()
-
dateFinish
.
getTime
())
/
(
oneDay
)));
var
margin
=
{
top
:
20
,
right
:
20
,
bottom
:
20
,
left
:
50
},
scatterplotHeight
=
520
,
scatterplotWidth
=
numberDays
*
1.5
,
dateGraphHeight
=
220
,
timeGraphWidth
=
220
;
The overall size of the graphic (
height
and width
) is therefore a combination of these variables; var
height
=
scatterplotHeight
+
dateGraphHeight
,
width
=
scatterplotWidth
+
timeGraphWidth
;
The Scatter Plot
There is no real surprise with the scatter plot itself. The only thing slightly unusual is the use of a time scale for both the X and Y axes;
var
x
=
d3
.
time
.
scale
().
range
([
0
,
scatterplotWidth
]);
var
y
=
d3
.
time
.
scale
().
range
([
0
,
scatterplotHeight
]);
When the circles are drawn, the size of the circle is determined by the radius, which is the number of downloads multiplied by 1.5. I know that this is a bit of a visualization ‘no-no’ because the area of the circle should be representative of the number, not the radius, but I tried it both ways and to my simple way of viewing the data, the radius adjustment provided the best comparison.
svg
.
selectAll
(
".dot"
)
.
data
(
data
)
.
enter
().
append
(
"circle"
)
.
attr
(
"class"
,
"dot"
)
.
attr
(
"r"
,
function
(
d
)
{
return
d
.
number_downloaded
*
1.5
;
})
.
style
(
"opacity"
,
0.3
)
.
style
(
"fill"
,
"#e31a1c"
)
.
attr
(
"cx"
,
function
(
d
)
{
return
x
(
d
.
date
);
})
.
attr
(
"cy"
,
function
(
d
)
{
return
y
(
d
.
time
);
});
I know that this is a topic of some academic debate, and it is fascinating, so here are both results for comparison;
Circle Area Representing Downloads |
Circle Radius Representing Downloads |
Date and Time Graphs
Both of these graphs are fairly routine. The time graph has the X and Y axes reversed from what would be ordinarily expected, but otherwise not much else to write home about.
Mouse Movement Information Display
This portion of the graph is an expansion of the ‘Favorite tool tip’ method from the previous section in this chapter. We expand the number of elements to update dynamically to about 10. All of which are designated with their own
class
.
We append the rectangle to capture the mouse movement over the scatter plot;
svg
.
append
(
"rect"
)
.
attr
(
"width"
,
scatterplotWidth
)
.
attr
(
"height"
,
scatterplotHeight
)
.
style
(
"fill"
,
"none"
)
.
style
(
"pointer-events"
,
"all"
)
.
on
(
"mouseover"
,
function
()
{
focus
.
style
(
"display"
,
null
);
})
.
on
(
"mouseout"
,
function
()
{
focus
.
style
(
"display"
,
"none"
);
})
.
on
(
"mousemove"
,
mousemove
);
We capture the position of the mouse and convert it to figures we can use to compare to our data;
function
mousemove
()
{
var
xpos
=
d3
.
mouse
(
this
)[
0
],
x0
=
x
.
invert
(
xpos
),
y0
=
d3
.
mouse
(
this
)[
1
],
y1
=
y
.
invert
(
y0
),
date1
=
d3
.
mouse
(
this
)[
0
];
And then we place our dynamic text and lines with our
focus.select
statements.Labeling
The last order of business is to place some labels.
The location of labeling in this example is an interesting problem in itself. I’m personally torn between the desire to maintain simplicity and to ensure clarity. Hopefully what I have is enough to satisfy both requirements, but as always, each user and requirement will differ, so label as desired.
If there are additional parts of the code that you would like explained, please feel free to get in touch.
thank you! really useful stuff.
ReplyDeleteHi!
ReplyDeleteThis is a great visualisation!
I'd like to use your visualization in an OSS project (https://github.com/lwindolf/polscan, project is GPLv3+, most JS is MIT licensed) and would like to know if you consider this visualization as open source? If yes, what license do you prefer?
With Best Regards
Lars
Hi Lars. Yes I would consider it as open source and I would go with an MIT licence. I have annotated the visualisation on GitHub https://gist.github.com/d3noob/a0cbcddc6bf0eb9569fe. Enjoy
Delete