Raspberry Pi Pico Tips and Tricks

Wednesday, 19 December 2012

Setting Scales Domains and Ranges in d3.js

This is another example where if you set it up right, D3 will look after you forever.

The “Ah Ha!” moment for me in understanding ranges and scales was after reading Jerome Cukier's great page on 'd3:scales and color'.1 I thoroughly recommend you read it (and plenty more of the great work by Jerome) as he really does nail the description in my humble opinion. I will put my own description down here, but if it doesn't seem clear, head on over to Jerome's page.

From our basic web page we have now moved to the section that includes the following lines;
var x = d3.time.scale().range([0, width]);
var y = d3.scale.linear().range([height, 0]);

The purpose of these portions of the script is to ensure that the data we ingest fits onto our graph correctly. Since we have two different types of data (date/time and numeric values) they need to be treated separately (but they do essentially the same job). To examine this whole concept of scales, domains and ranges properly, we will also move slightly out of sequence and (in conjunction with the earlier scale statements) take a look at the lines of script that occur later and set the domain. They are as follows;

x.domain(d3.extent(data, function(d) { return d.date; }));
y.domain([0, d3.max(data, function(d) { return d.close; })]);

So, the idea of scaling is to take the values of data that we have and to fit them into the space we have available.

If we have data that goes from 53.98 to 636.23 (as the data we have for 'close' in our tsv file does), but we have a graph that is 210 pixels high (height = 270 - margin.top – margin.bottom;) we clearly need to make an adjustment.

Not only that. Even though our data goes from 53.98 to 636.23, that would look slightly misleading on the graph and it should really go from 0 to a bit over 636.23.

It sound's really complicated, but let's simple it up a bit.

First we make sure that any quantity we specify on the x axis fits onto our graph.

var x = d3.time.scale().range([0, width]);

Here we set our variable that will tell D3 where to draw something on the x axis. By using the d3.time.scale() function we make sure that D3 knows to treat the values as date / time entities (with all their ingrained peculiarities). Then we specify the range that those values will cover (.range) and we specify the range as being from 0 to the width of our graphing area (See! Setting those variables for margins and widths are starting to pay off now!).

Then we do the same for the Y axis. 

var y = d3.scale.linear().range([height, 0]);

There's a different function call (d3.scale.linear()) but the .range setting is still there. In the interests of drawing a (semi) pretty picture to try and explain, hopefully this will assist;
I know, I know, it's a little misleading because nowhere have we atually said to D3 this is our data from 53.98 to 636.23. All we've said is when we get the data, we'll be scaling it into this space.

Now hang on, what's going on with the [height, 0] part in y axis scale statement? The astute amongst you will note that for the time scale we set the range as [0, width] but for this one ([height, 0]) the values look backwards.

Well spotted.

This is all to do with how the screen is laid out and referenced. Take a look at the following diagram showing how the coordinates for drawing on your screen work;
The top left hand of the screen is the origin or 0,0 point and as we go left or down the corresponding x and y values increase to the full values defined by height and width.

So that's all well and good for the time values on the x axis that will start at lower values and increase, but for the values on the y axis we're trying to go against the flow. We want the low values to be at the bottom and the high values to be at the top.

No problem. We just tell D3 via the statement y = d3.scale.linear().range([height, 0]); that the larger values (height) are at the low end of the screen (at the top) and the low values are at the bottom (as you most probably will have guessed by this stage, the .range statement uses the format .range([closer_to_the_origin, further_from_the_origin]). So when we put the height variable first, that is now associated at the top of the screen.
OK, so we've scaled our data to the graph size and ensured that the range of values is set appropriately, so what's with the domain part that was in the title for this section?

Come on, you remember this little piece of script don't you?

x.domain(d3.extent(data, function(d) { return d.date; }));
y.domain([0, d3.max(data, function(d) { return d.close; })]);

While it exists in a separate part of the file from the scale / range part, it is certainly linked.

That's because there's something missing from what we have been describing so far with the set up of the data ranges for the graphs. We haven't actually told D3 what the range of the data is. That's also the reason this part of the script occurs where it does. It is within the portion where the data.tsv file has been loaded as 'data' and it's therefore ready to act on it.

So, the .domain function is designed to let D3 know what the scope of the data will be this is what is then passed to the scale function.

Looking at the first part that is setting up the x axis values, it is saying that the domain for the x axis values will be determined by the d3.extent function which in turn is acting on a separate function which looks through all the 'date' values that occur in the 'data' array. In thins case the .extent function returns the minimum and maximum value in the given array.
  • So function(d) { return d.date; } returns all the 'date' values in 'data'. This is then passed to...
  • The .extent function that finds the maximum and minimum values in the array and then...
  • The .domain function which returns those maximum and minimum values to D3 as the range for the x axis.
Pretty neat really. At first you might think it was slightly overly complex, but breaking the function down into these components, allows additional functionality with differing scales, values and quantities. In short, don't sweat it. It's a good thing.

Now, the x axis values are dates, so the domain for them is basically from the 26th of March 2012 till 1st of May 2012. The y axis is done slightly differently.

y.domain([0, d3.max(data, function(d) { return d.close; })]);

Because the range of values desired on the y axis goes from 0 to the maximum in the data range, that's exactly what we tell D3. The '0' in the .domain function is the starting point and the finishing point is found by employing a separate function that sorts through all the 'close' values in the 'data' array and returns the largest one. Therefore the domain is from 0 to 636.23.

Let's try a small experiment. Let's change the y axis domain to use the .extent function (the same way the x axis does) to see what it produces.

So the JavaScript for the y domain will be;

y.domain(d3.extent(data, function(d) { return d.close; }));

You can see apart from a quick copy paste of the internals, all I had to change was the reference to 'close' rather than 'date'.

And the result is...
Look at that. The starting point for the y axis looks like it's pretty much on the 53.98 mark and the graph itself certainly touches the x axis where the data would indicate it should.

Now, I'm not really advocating making a graph like this since I think it looks a bit nasty (and a casual observer might be fooled into thinking that the x axis was at 0). However, this would be a useful thing to do if the data was concentrated in a narrow range of values that are quite distant from zero.

For instance, if I change the data.tsv file so that the values are represented like the following;
Then it kind of looses the ability to distinguish between values around the median of the data.

But, if I put in our magic .extent function for the y axis and redraw the graph...
---------------------------------------------

The above description (and heaps of other stuff aimed at helping those with limited understanding, but plenty of desire to play with D3) is in the D3 Tips and Tricks document that can be accessed from the main page of d3noob.org.  

The above description (and heaps of other stuff) is in the D3 Tips and Tricks document that can be accessed from the downloads page of d3noob.org.

7 comments:

  1. How do we set range for irregular domain. Domain: [300, 580, 670, 740, 800, 850]. X-axis will have 12 months - starting from current month and goes back 12 months. For this irregular domain I'll need a equidistant grid lines.

    ReplyDelete
    Replies
    1. The good news is that D3 will totally take care of the gridlines and make the axis look great. The other good news is that D3 can also ingest your data simply. You could in theory do something as crude as;
      y.domain(d3.extent([300, 580, 670, 740, 800, 850]));
      Although this is untested, it should be that easy.

      Delete
  2. This was a wonderful article helping me understand the basics. Thanks so much for putting the work into this!

    ReplyDelete
  3. Hello, how can i show all dates in x axis , extent function shows some of them. I want to show all. Thanks for your interest.

    ReplyDelete
    Replies
    1. Hmm... If I read your question correctly what you want to do is to show all the date ticks on the x axis. That isn't awhat the extent function is designed to do. Instead, play with the var xAxis = d3.svg.axis().scale(x).orient("bottom").ticks(5); line and vary the ticks value. Even then d3 will try to be clever about what it shows so that they don't overlap, so you might need to do some research on the d3.svg.axis function. Good luck

      Delete
  4. hello please i need help with d3 assignment, please can you assist?

    ReplyDelete
    Replies
    1. I am sorry, I don't have time to complete the tasks that I have on my plate at the moment. I couldn't commit to assisting. I would recommend using Stack Overflow for good troubleshooting assistance if you strike trouble. There are a bunch of really smart people there.

      Delete