Sunday, January 17, 2016

How to make Tufte's line graph sparklines with d3.js

In this blog post, I will cover how to make a sparkline using d3.js . Our final product will look like this:

For those who are unfamiliar with them, sparklines were first discussed by Edward Tufte, who defines them as follows:

A sparkline is a small intense, simple, word-sized graphic with typographic resolution. Sparklines mean that graphics are no longer cartoonish special occasions with captions and boxes, but rather sparkline graphics can be everywhere a word or number can be: embedded in a sentence, table, headline, map, spreadsheet, graphic. Data graphics should have the resolution of typography. -- Edward Tufte, Beautiful Evidence

The first step in creating our sparkline is to create our HTML template. Our `d3.js` script will then add elements where necessary, based on our script's specifications.

<html>
<meta charset='utf-8'>




    <link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
    <style>
    text {
        font-family: 'Open Sans', sans-serif;
    }
    .sparkline-container {
        position:absolute;
        top:0px;
        left:0;
        float:left;
        width:300px;
        height:100px;
    }
    .sparkcircle {
      fill: #f00;
      stroke: none;
    }
    path {
        stroke: #000;
        stroke-width: 0.35px;
        fill: none;
    }
    </style>



<body>


    <div id="graph" class="aGraph sparkline-container" style=""></div>


</body>


<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="
https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.12/d3.js" charset='utf-8'></script>

<script src='https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.8.3/underscore-min.js'></script>
<script>

$(document).ready(function() {



});
</script>

</html>

We will add our `d3.js` code at line 50, within the `$(document).ready(function() {});`. This function simply ensures that our DOM is completely ready before the `d3.js` script runs.

First, let's define some test data. I'll go ahead and use the currency fluctuations of the Brazilian Real vs. the USD over the past month, which I have lying around from the sparkline dashboard project:

        var data = { 'rates.BRL': [3.7629, 3.853, 3.853, 3.853, 3.9045,
        3.8771, 3.9369, 3.877, 3.9004, 3.9004, 3.9004, 3.976,
        3.9827, 3.9677, 3.941, 3.941, 3.941, 3.941, 3.9257,
        3.852, 3.898, 3.9604, 3.9604, 3.9604, 3.9604, 4.0395,
        4.0036, 4.0338, 4.0487, 4.0487]}

Next, we define the dimensions of the box that contains our sparkline. While this may seem trivial, it is the essential element that sets a sparkline aside from a regular line graph -- it is a small visual representation of data. We will use this variable in a bit:

Now, we create a skeleton of our sparkline object. We will have the side text which corresponds to the sparkline label, an svg element that will show the sparkline graph, and a little red circle that will highlight the sparkline's endpoint. Feel free to play around with including or not including the end circle -- I thought that it added to the visualization, but depending on where exactly you are using your sparkline, it might make sense to leave it out.

        d3.select('body')
          .selectAll('g')
          .data(d3.keys(data))
            .enter()
            .append('g')
            .classed('sparkline-container', function(d) {
                return true;
            })
            .style('top', function(d, i) { return (i + 1) * 35 + 'px';})
            .style('left', '500px')

          .append('text')
            // get last 3 characters since format is like rates.JPY
            .text(function(d) {return d.slice(-3);})
          .append('svg')
            .attr('id', function(d) { return 'graph' + d.slice(-3);})
          .append('circle')
            .attr('id', function(d) { return 'circle' + d.slice(-3);})
            .attr('r', 1.5)
            .classed('sparkcircle', true);

Let's break down that last chunk of code. In line 1, we select our body element, to which we will be appending the rest of our visualization. Then, we choose to append as many 'g' elements as we have data points. In this case, we only have one time series, but we could easily re-use this code to add numerous sparklines. In line 6, we make sure that our sparkline g elements have the `sparkline-container` class. Based on our styles that we defined, this will give them a certain width, height, and alignment. In line 9, we give it a distance from the top that depends on our i, or incrementing, value -- this will make sure that our sparklines will not be on top of each other if we have more than one.

You'll notice that so far, the svg doesn't have any line elements within it, and the circle doesn't have any `cx` nor `cy` attributes, so neither one of these elements will render a visual element. Let's make a function that will actually draw our sparklines and end circles for us!

         function drawSparkline(svgElemId, circleElemId, data) {

            // create an SVG element inside the #elemId div that fills 100% of the div
        var graph = d3.select(svgElemId)
            .append("svg:svg")
            .attr("width", "100%")
            .attr("height", "100%");


        // X scale will fit values from 0-60 within pixels 0-width
        var x = d3.scale.linear()
            .domain([0, 60])
            .range([0, boxDimensions.width]);
        // Y scale will fit values from [the extent of our currency fluctations
        // dataset] within pixels 0-100
        var y = d3.scale.linear()
            .domain(d3.extent(data, function(d) { return d; }))
            .range([0, boxDimensions.height]);

        // create a line object that represents the SVG line we're creating
        var line = d3.svg.line()
            // Makes our plots smooth
            .interpolate('monotone')
            // assign the X function to plot our line as we wish
            .x(function(d,i) {
                return x(i);
            })
            .y(function(d) {
                return y(d);
            });

            // display the line by appending an svg:path element with the data line we created above
            graph.append("svg:path").attr("d", line(data));

        var circle = d3.select(circleElemId)
            .attr('cx', function(d, i) {return x(29); })
            .attr('cy', function(d) {
                /* And this is really helpful for figuring out what is going on */
                return y(data.slice(-1)); })

        }

The key part of this chunk of code is lines 21 - 30 -- these actually create a line object that represents the SVG line that we are creating. We use the x and y scale functions that we have defined previously, and then actually display the line by appending an svg:path. Finally, lines 35-39 give our circles `cx` and `cy` attributes, which will define where the center of the circle should be. Our `cx` is fixed, since we always want for the circle to be drawn at the last data point, and our `cy` consists of the last element in our data set.

Now that we have this handy sparkline function, let's call it for each data element!

 
        $.each(data, function(country_name, country_series){

            drawSparkline('#graph' + country_name.slice(-3), '#circle' + country_name.slice(-3),data[country_name]);
        });

And we do a tiny bit of visual cleanup to ensure that the labels and the SVGs are not on top of each other:

1	$('svg').css({left: 45, position: 'absolute'});

Which gives us a working sparkline.

Check out the full example on Github. If you're interested in sparklines, you can read about creating a dashboard that shows numerous sparklines for comparative purposes here. You can learn more about sparklines by reading Tufte's Beautiful Evidence.

How to make a dashboard with Tufte's sparklines in d3.js

Thus far, we've used traditional charts to visualize our data. We will now be adding a visualization that is less common: a sparkline. What is a sparkline?

A sparkline is a small intense, simple, word-sized graphic with typographic resolution.
Sparklines mean that graphics are no longer cartoonish special occasions with captions and boxes,
but rather sparkline graphics can be everywhere a word or number can be: embedded in a sentence,
table, headline, map, spreadsheet, graphic. Data graphics should have the resolution of typography.
-- Edward Tufte, Beautiful Evidence

Sparklines can be very handy for quickly analyzing several time series at a time. In this blog post, I'll explain how you can create a dashboard using `d3.js` that visualizes sparklines of currency fluctuations. This dashboard not only visualizes currency fluctuations over the past month for major currencies, it also updates in real time.

You can view a live version of the dashboard at http://sparklines-dash.rowanv.com.

Why Visualize Currency Exchange Rates using Sparklines?

So first off, what prompted me to make this dashboard? I was looking at the exchange rates put out by the European Central Bank. The ECB publishes exchange rates for a number of major currencies each day. However, their interface misses out on a great opportunity to use sparklines:

While they show a small arrow that illustrates the derivative of each of the major currencies, their dashboard does not show the currency fluctuations over time. In order to view this type of a graph, one has to click through to the individual graphs by clicking on the 'Chart' icon. This prevents quick, side-by-side comparisons of the currency fluctuations over time. A quick peek at last month's currency fluctuations would be a great visualization for using sparklines in order to quickly compare the shapes of the time series. While I don't think that they completely replace the chart tool that the ECB has published, they'd be a nice complement to the data that they provide.

Getting the Data:

This merits its own blog post. For now, you can check out my iPython notebook here for a quick overview of the process.

Making the d3.js visualization:

After the data acquisition and cleaning process, I had a view within my web application which served a JSON response of my currency data. This data included time series for the past month for each of the major currencies. If we visualize the data in a tabular format, it would look like this:

Each column corresponded to a currency type, and each row corresponded to a day of observations. With this data in hand, I created a basic HTML document that would read the data within d3:

<html>
<meta charset='utf-8'>


    
    <link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'> <!--#1-->
    <style>
    text {
        font-family: 'Open Sans', sans-serif;
    }
    <!--#2-->
    .sparkline-container {
        position:absolute;
        top:0px;
        left:0;
        float:left;
        width:300px;
        height:100px;
    }
    <!--#3-->
    .sparkcircle {
      fill: #f00;
      stroke: none;
    }
    <!--#4-->
    path {
        stroke: #000;
        stroke-width: 0.35px;
        fill: none;
    }
    </style>


<body>
    <!--#5-->
    <div id="graph" class="aGraph sparkline-container" style=""></div>

</body>


<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="
https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.12/d3.js" charset='utf-8'></script>

<script src='https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.8.3/underscore-min.js'></script>
<script>
// # 6
$(document).ready(function() {


    d3.json('/data/not_realtime/currency_rates_month/', function(error, data) {
 
    });
});

</script>


</html>

Let's highlight some key things that our template is doing:
# 1: We specify the font that we will be using for the text next to our sparklines
# 2: We specify the width and height of our sparklines container
# 3: We specify how we'd like to format the little red circles at the end of the sparklines
# 4: Our sparklines will be 0.35px thick and use a black stroke colour
# 5: This is the actual div element which we will be using as our starting point for the visualization. We'll select this element and then attach all other visual elements to it.
# 6: We write our actual `d3.js` script between these `script` tags. So far, we've used a jQuery helper function to keep our `d3.js` script from running until the DOM is ready for it, and we've requested our data set from the URL at `/data/not_realtime/currency_rates_month/', where it will be served as a json response.

Actually Drawing Our First Sparkline

Now that we have our HTML template, we can draw our first sparkline. You can read through an example of how to draw an individual sparkline in my sparkline blogpost.

Reshaping Our Data
Based on the way that we have written our sparkline visualization function, it would be useful to reshape our data. It will be easiest to work with if we have a set of json objects with each key representing our currency country name and each value representing a list of time series values. We do some reshaping:

$(document).ready(function() {


    d3.json('/data/not_realtime/currency_rates_month/', function(error, data) {
        var boxDimensions = {width: 300, height: 15}

        // cleaning data
        // Avoid iterating through all stuff in prototype chain
        $.each(data, function(country_name, country_series){

            data_list = []
            for (i in _.range(30)){

                data_list.push(country_series[i]);
            }
            data[country_name] = data_list.reverse()
            // and flip it so the end sparkline will have oldest dates on the left
            // newest dates on the right
        });

    });
});

Next, let's start adding our visual elements at the point in our code represented by line 23 in the last code snippet. We can add our container divs for our set of sparklines. This includes a general container div, a text element, an svg element, and a circle element:

        // Add container divs:

        d3.select('body')
          .selectAll('div')
          .data(d3.keys(data))
            .enter()
            .append('div')
            .classed('sparkline-container', function(d) {
                console.log(d);
                return true;
            })
            .style('top', function(d, i) { return i * 35 + 'px';})
            .style('left', '500px')

          .append('text')
            // get last 3 characters since format is like rates.JPY
            .text(function(d) {return d.slice(-3);})
          .append('svg')
            .attr('id', function(d) { return 'graph' + d.slice(-3);})
          .append('circle')
            .attr('id', function(d) { return 'circle' + d.slice(-3);})
            .attr('r', 1.5)
            .classed('sparkcircle', true);

Let's break down this code snippet. This code snippet basically creates the skeleton for our visualization -- it creates the text and a set of elements that will then be 'filled in' by the `drawSparkline` function in the following code snippet. In line 3, we select the body element, to which we will be appending our sparkline elements. We then add a set of divs in lines 4-13, with one div for each sparkline, since we have bound it to our dataset. Each div has a `sparkline-container` class, is offset from the left, and is offset from the top by a variable amount so that the sparklines are not all on top of each other.

In lines 15-17, we append the 3-character labels that represent the sparkline countries' names. Then, in lines 18-28, we add elements that will correspond with our little red circles, and the svg containers for our sparklines. .

Putting it all together:

Now that we have our visualization skeleton, we can use the `drawSparkline` function to actually append our sparklines:

         function drawSparkline(svgElemId, circleElemId, data) {

            // create an SVG element inside the #elemId div that fills 100% of the div
        var graph = d3.select(svgElemId)
            .append("svg:svg")
            .attr("width", "100%")
            .attr("height", "100%");


        // X scale will fit values from 0-60 within pixels 0-width
        var x = d3.scale.linear()
            .domain([0, 60])
            .range([0, boxDimensions.width]);
        // Y scale will fit values from [the extent of our currency fluctations
        // dataset] within pixels 0-100
        var y = d3.scale.linear()
            .domain(d3.extent(data, function(d) { return d; }))
            .range([0, boxDimensions.height]);

        // create a line object that represents the SVN line we're creating
        var line = d3.svg.line()
            // Makes our plots smooth
            .interpolate('monotone')
            // assign the X function to plot our line as we wish
            .x(function(d,i) {
                return x(i);
            })
            .y(function(d) {
                return y(d);
            });

            // display the line by appending an svg:path element with the data line we created above
            graph.append("svg:path").attr("d", line(data));

        var circle = d3.select(circleElemId)
            .attr('cx', function(d, i) {return x(29); })
            .attr('cy', function(d) {console.log('circle');
                /* And this is really helpful for figuring out what is going on */
                console.log(svgElemId);
                console.log(data);
                console.log(data.slice(-1));
                return y(data.slice(-1)); })

        }

Lines 11 to 23 cover the same process that we would follow to add a line graph, except that we are making each graph tiny since it is a sparkline. Then in lines 35 to 42, we give our circle elements `cx` and `cy` attributes -- these tell us where the center of each circle will be. Without these attributes, our circles would not show up.

Finally, we call this function in order to draw our set of sparklines:

          // Then add sparklines and dots for circles
        $.each(data, function(country_name, country_series){

            drawSparkline('#graph' + country_name.slice(-3), '#circle' + country_name.slice(-3),data[country_name]);
        });
        // And move the SVGs by a bit so they are not on top of the
        // text boxes
        $('svg').css({left: 45, position: 'absolute'});

Check out the live example, or the github repository with the complete codebase.

Thursday, January 7, 2016

Visualizing Big Data with d3.js

You're creating a dashboard or other visualization of Big Data in `d3.js`. You've heard that SVG-based visualizations can experience a performance hit as they render more and more elements.
Perhaps you've even hit the object threshold where you are starting to notice significant lags. Where to now? How do you build your visualization to ensure that it scales?

Simplify your data/dashboard by aggregating. Are all of the elements that you are rendering necessary? Make sure that each pixel counts and that an aggregate data view wouldn't more effectively convey your point. For example, Google Maps successfully manages to visualize a large quantity of data. However, they don't just throw all of their data at you -- depending on the zoom level that you choose, you will be presented with a certain amount of map detail. This allows for a better and more responsive visual experience than if they did not have aggregated data as part of their visualization.

Simplify your data/dashboard by sampling. Try sampling a subset of your dataset and visualizing that -- that is often enough to perceive the general shape of your data.

Try to move some of the rendering server-side. Normally, the bulk of rendering is done client-side with `d3.js`. However, you can generate an HTML document of part of your visualization, save it in your server's memory, and then inject it into your client's visualization, where the chunk that your server rendered appears together with the rest of the visualization that your client renders. For more details, check out a tutorial here.

Try a hybrid canvas/svg approach. This gives you some of the benefits of the SVG, while still avoiding the performance constraints of too many SVG objects.

Change tools. At some point, the performance hit of creating your DOM and manipulating will be significant enough that the other optimization techniques will not be sufficient. The weakest point in the `d3.js` stack is your browser -- change your rendering process by hooking up `d3.js` to Google BigQuery. You can check out this link for more info.

Do you have other tips and tricks that you've used to visualize big data with `d3.js`? Have another stack that you recommend? Tell us about it in the comments below!

Wednesday, January 6, 2016

Using Tufte's discrete sparklines to monitor Nginx status codes

Since I manage several of my own sites end-to-end, including configuring and running their servers, it is important for me know what the sites are up to. This includes knowing when they may be down, as well as checking out any error codes that may be generated. I went ahead and created this simple dashboard vignette:

to keep track of my `Nginx` response status codes. I can see if there are any error codes that have been generated (those in the 400s or the 500s), and act accordingly.

Why did I go ahead and make my own dashboard? First off, reading through Nginx logs in their raw form is a pain. For those who haven't worked with them before, this is what they look like:

1
2
3

180.76.15.141 - - [04/Jan/2016:07:42:26 -0500] "GET / HTTP/1.1" 200 36005 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.147 - - [04/Jan/2016:07:43:33 -0500] "GET / HTTP/1.1" 200 36005 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
91.65.255.241 - - [04/Jan/2016:07:50:31 -0500] "GET /portfolio/resume HTTP/1.1" 301 5 "http://rowanv.com/portfolio/about/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"

Bleh. So I decided to go out and hunt for some existing visualizations, none of which seemed too great. They seemed far too cluttered with chart junk and lacked customizability. Here is an example of an existing dashboard element that aims to visualize something similar to my final visualization:

I found this graph fairly distracting, and for some reason the non-2xx status codes didn't show up on here. Also, any spikes in traffic would make it very hard to read -- if we had hundreds of requests that returned a 200 response, but only a handful of requests that returned a 400 or 500 response, it would be very difficult to see the 400s and 500s because they are graphed on the same axis. I didn't just want to see how many 200 responses occurred -- I wanted visual confirmation that 400s and 500s were not occurring. Besides, I didn't really care how many responses of each type were occurring -- just the presence or absence of error responses was what mattered to me, so a simple discrete chart would abstract away a lot of the distracting additional noise. It could be useful to have error rates / total requests available as well, but I think that would be best visualized in another chart.

Along the way, I also found a few completely text-based interfaces to monitor one's servers. I found these completely text-based outputs difficult to parse at a glance. In the end, creating a custom dashboard made the most sense for my needs.

The process was fairly straightforward. First, I located my Nginx access logs and wrote a Python script to read them into a Pandas Dataframe, which is more manageable than their native format. Then, I aggregated and reshaped the dataframe to get a dataset that would identify what error codes had occurred over the course of each hour. This left me with a dataframe that looked like this:

Each row consisted of an hour; each column identified how times each type of error code (one in the 200s, one in the 300s, etc.) had occurred over the course of that hour.

Then, I used Flask to serve an HTTP response within a web application which served my reshaped dataset in a JSON format. This would enable my front-end view to locate the data and read it in. Here's my basic view code that server my HTTP response:

import re
import pprint
import pandas as pd
import datetime

@app.route('/nginx_dash/data/status_code_hourly/')
def status_code_hourly_data():


    log_path = 'access.log'

    conf = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
    regex = ''.join(
        '(?P<' + g + '>.*?)' if g else re.escape(c)
        for g, c in re.findall(r'\$(\w+)|(.)', conf))

    conf_list = []

    f = open(log_path, 'r')
    for line in f:
        parsed_string = re.match(regex, line)
        conf_list.append(parsed_string.groupdict())
    df = pd.DataFrame(conf_list)

    df['date'] = df['time_local'].apply(lambda x: datetime.datetime.strptime(x[:11], '%d/%b/%Y'))
    df['date_time'] = df['time_local'].apply(lambda x: datetime.datetime.strptime(x[:20], '%d/%b/%Y:%H:%M:%S'))
    df['status'] = df['status'].apply(lambda x: str(x))
    df['is_100'] = df['status'].apply(lambda x: x[0] == '1')
    df['is_200'] = df['status'].apply(lambda x: x[0] =='2')
    df['is_300'] = df['status'].apply(lambda x: x[0] =='3')
    df['is_400'] = df['status'].apply(lambda x: x[0] == '4')
    df['is_500'] = df['status'].apply(lambda x: x[0] == '5')
    df['date_hour'] = df['date_time'].apply(lambda dt: dt.replace(minute=0, second=0))
    df_status_codes = df[['date_hour', 'is_100', 'is_200', 'is_300', 'is_400', 'is_500']]
    df_status_codes_grouped = df_status_codes.groupby('date_hour').sum()
    df_status_codes_grouped['date_hour'] = df_status_codes_grouped.index
    json_response = df_status_codes_grouped.to_json(orient='records')
    return Response(response=json_response,
                    status=200,
                    mimetype='application/json')

Finally, I wrote the frontend in `d3.js` which actually renders the discrete sparklines. You can check out my previous tutorial on making discrete sparklines to learn how to make this type of a visualization.

And that's it! I now have an awesome, live-updating, minimalist visualization that lets me know if any concerning error codes have occurred over the last couple of days.

If you want to learn more about sparklines, check out Tufte's Beautiful Evidence.

Tuesday, January 5, 2016

To scroll or not to scroll -- changing the rules of the game for dashboard UX

I was rereading Few's Information Dashboard Design recently and was struck by one of his "13 key mistakes in dashboard design":

Exceeding the Boundaries of a Single Screen

My insistence [is] that a dashboard should confine its display to a single screen, with no need for scrolling or switching between multiple screens...

-- Few, Information Dashboard Design, Ch. 3.1

First, what does this mean in a post-mobile revolution world? Many devices no longer have screens big enough to fit a traditional dashboard. Refusing to make online applications mobile compatible is simply no longer an option given the ubiquity of devices. What are we left with? We need to find a way to provide meaningful data for our dashboard viewers without necessitating side-by-side visual comparison. We need to use dynamic, flexible layouts such that when viewers are on a smaller device, their screen renders content in a way that best uses their screen real estate. Finally, we need to make sure that we are presenting data in ways that allow for meaningful comparisons to be made. Given the plethora of data manipulation libraries, we have no excuse not to slice, dice, and aggregate the underlying data to facilitate viewers' analysis.

What about the cases where viewers are using devices that do have sufficient screen real estate to make a meaningful one-page layout? It used to be the case that many well-designed web pages also followed this steadfast rule: keep your content to a single page to avoid the burden of having your user scroll down the page. Additionally, if we had too much content on one page, it took a long time for content to load which created a sub-par experience.

Now that we have the ability to widely make asynchronous requests (e.g. with Ajax), this is not as much of an issue. Users are also used to web applications that make use of infinite scrolling -- scrolling is a more natural part of the online user experience than it was even 5 years ago. To put it in the language that Donald Norman from The Design of Everyday Things uses: the nature of the browser environment's affordances have changed. Browser users have always been able to scroll through a web application. Nowadays, though, browser users are aware that they can scroll easily, and use this as a primary method to interact with web applications, whereas before it was a secondary method.

Does this mean that we should let our dashboards grow into multi-page mammoths? No. The point of a dashboard should still be to succinctly provide informative data. There is more justification, though, to think about extending the dashboard boundary beyond the edge of the page when it is necessary.

How to make a real-time updating map for your dashboard with JS

It is often useful to be able to quickly analyze events that have a geographic component. A real-time updating map that shows events that occur at certain locations on the map can be very handy for a dashboard. While it is possible to create these maps from scratch using `d3.js`, it can be useful to use some higher-level abstractions to quickly generate our maps. I'll take you through the process of creating an interactive map like this:

Check out the interactive version here

by using `DataMaps`, a library that uses `d3.js` to produce interactive maps (you can check out the library's documentation here). Rather than needing to source your geo data from scratch and process it using command line tools like `ogr2ogr` and `topojson`, then render your polygons and boundaries, and ultimately add labels and locations to your map, I'll to through a more agile process. In addition, we'll go ahead and query a live-updating JSON data source within our visualization script. This will allow for our data visualization to fetch the latest version of the earthquake data each time that it is rendered.

First, let's create the HTML structure for our visualization:

<html>
<meta charset='utf-8'>
<head>


</head>




<body>
    <div id="map_container" style="position: relative; width: 80%; max-height: 450px;"></div>
</body>

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.12/d3.js" charset='utf-8'></script>
<script src="http://d3js.org/topojson.v1.min.js"></script>
<script src="http://datamaps.github.io/scripts/datamaps.world.min.js?v=1"></script>


<script>

</script>



</html>

We've created a div element onto which our script will bind SVG objects to create our map. We've also included the dependecies that we will be needing (`jQuery`, `d3.js`, `topojson` and `DataMaps`). We'll now start filling in the script tags on lines 21-23:

$(document).ready(function() {

    d3.json("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.geojson",function(error,data){
        data = data.features;

      var map = new Datamap({
        scope: 'world',
        element: document.getElementById('map_container'),
        projection: 'equirectangular',
        height: 500,
        fills: {
          defaultFill: '#F1EBF4',
          lt50: 'rgba(0,244,244,0.9)',
          gt50: '#20438A',
        },
        geographyConfig: {
          highlightFillColor: '#bfdfec',
          highlightBorderColor: 'white',
        },
        bubblesConfig: {
          highlightFillColor: '#2265aa',
          highlightBorderColor: '#61b3b7',
        }
      });

     
 });
});

We wrap our entire script in a document ready function to ensure that it doesn't run until our DOM is ready. Then, we use the `d3.json` function to read in our data from the USGS earthquake GeoJSON feed. For more information about this feed, check out their docs here. We then create a var called map which is a new `Datamap` object. This object will actually render our map based on the configurations that we pass on to it. This is great so far, but it will not include the bubbles that indicate where recent earthquakes have occurred. To do that, we'll have to add another set of functions at line 25 - 26:

      var bubble_array = [];

      function getBubble(lat, lon, name){
        var bubble = {
            name: name,
            latitude: lat,
            longitude: lon,
            radius: 10,
            fillKey: 'gt50',
        }
        bubble_array.push(bubble);
        return bubble;
      }

      for (var i = 0; i < data.length; i++) {
          getBubble(data[i].geometry.coordinates[1],
            data[i].geometry.coordinates[0],
            data[i].properties.title);
        }

       //bubbles, custom popup on hover template
     map.bubbles(bubble_array, {
       popupTemplate: function(geo, data) {
         return "<div class='hoverinfo'>" + data.name + "</div>";
       }
     });

Line 1: We create an array that will hold the data for the earthquake bubbles that we will add to the map.
Lines 3-13: We create a function that creats a bubble object with the attributes that we will be needing in order to place these on the map. This includes the latitude and longitude, which our `dataMap` object will use to plot the bubbles in their correct location when we pass the array to `map.bubbles()`.
Lines 15-19: We iterate through our data set, calling the `getBubble` function at each data point. This pushes a `bubble` item onto our array with the correct data for each point.
Lines 22-26: We finally pass the `bubble_array` to `map.bubbles`. We also go ahead and specify the `popupTemplate` var, which tells our map what to render when we hover over.

And we're done! You can see the full code snippet on Github here and the live example here.

Monday, January 4, 2016

How to make Tufte's discrete sparklines using d3.js

I've been working on a dashboard and found a data set that would look great represented as a discrete sparkline. This type of visualization is great for quickly showing trends at a glance in discrete variable datasets. For those who are not familiar with them, discrete sparklines look like this:

This example is from Tufte's site, and was made in collaboration with Adam Schwartz . The presence of a horizontal line represents a home game.

I decided to use his example as a starting point, working from the HTML file that is in Adam's github, which he kindly passed along. For my purposes, I only needed to illustrate a dichotomous variable, so I went ahead and removed the horizontal bars which represented home vs. away games, as well as the colouring which represented winning streaks. After recoding it to leverage the `d3.js` library, simplifying the styles, and reworking the code to make it reusable for future dichotomous variable projects, I was able to create my own example using d3, which you can see below Adam's:

Here's the relevant d3 code:

Let's break it down further.
Lines 7 - 10: We bind each data point to a div element
Line 11: We give each div the 'drawn-line' class which will format each div element as a line (black background colour, 1px by 8 px).
Lines 12-19: We then give it an HTML5 data attribute called data-binary-var which will be a 0 if the game was lost and a 1 if the game was won.
Line 20: Finally, we use our counter variable i to correctly place the lines along the x axis.

And here's our final product!

You can check out the full example I've written up on Github.

If you want to learn more about sparklines, check out Tufte's Beautiful Evidence. Thanks as well to Adam Schwartz for creating the original example that inspired this recreation - you can check out more of his work here and follow him on twitter here.

And here's some of the work in progress for my dashboard sparkline visualization element which prompted this whole exploration:

Saturday, January 2, 2016

Make great-looking d3.js charts in Python without coding a line of JavaScript

d3.js is an excellent visualization library due to the ability it gives a developer to gain control over all aspects of visualization. However, its extreme power can often also be a problem -- sometimes, we just want to create a set of standard bar and line charts that look good, without needing to spend hours fine-tuning aspects of charts. This is where nvd3 can come in handy: it is a handy library that offers a set of reusable presets based on d3.js. Though you still have some level of control over the output, you can move more quickly and reuse the visualization types that the library supports.

As a Pythonista, I've particularly enjoyed using Python-nvd3 -- a great library that offers a Python wrapper for the nvd3.js interface. You can create great-looking d3.js charts without ever touching a line of JavaScript. I'll go through a quick bar chart example.

We use initialize_javascript so that we are able to render our plots within the iPython notebook, and np.random.seed so we are able to reproduce our results.

In [2]:

import numpy as np
import nvd3


nvd3.ipynb.initialize_javascript(use_remote=True)
np.random.seed(100)

Next, we initialize our discreteBarChart object with our desired chart configurations. We pass it our y and x data series, making sure to convert from the numpy.float64, since nvd3 cannot work with unconverted numpy types at this point.

In [3]:

chart_type = 'discreteBarChart'
chart = nvd3.discreteBarChart(name=chart_type, height=500, width=500)

ydata = [float(x) for x in np.random.randn(10)]
xdata = [int(x) for x in np.arange(10)]

chart.add_serie(y=ydata, x=xdata)
chart.buildhtml()
chart_html = chart.htmlcontent

We can see the output of the rendered HTML -- it's created some JS and HTML elements -

In [4]:

chart_html

Out[4]:

'<!DOCTYPE html>\n<html lang="en">\n    <head>\n        <meta charset="utf-8" />\n        <link href="https://cdnjs.cloudflare.com/ajax/libs/nvd3/1.7.0/nv.d3.min.css" rel="stylesheet" />\n        <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js"></script>\n        <script src="https://cdnjs.cloudflare.com/ajax/libs/nvd3/1.7.0/nv.d3.min.js"></script>\n    </head>\n    <body>\n        \n    <div id="discretebarchart"><svg style="width:500px;height:500px;"></svg></div>\n\n\n    <script>\n\n\n\n                data_discretebarchart=[{"key": "Serie 1", "values": [{"x": 0, "y": -1.7497654730546974}, {"x": 1, "y": 0.34268040332750216}, {"x": 2, "y": 1.153035802563644}, {"x": 3, "y": -0.25243603652138985}, {"x": 4, "y": 0.9813207869512316}, {"x": 5, "y": 0.5142188413943821}, {"x": 6, "y": 0.22117966922140045}, {"x": 7, "y": -1.0700433305682933}, {"x": 8, "y": -0.18949583082317534}, {"x": 9, "y": 0.25500144427338167}], "yAxis": "1"}];\n\n\n            nv.addGraph(function() {\n        var chart = nv.models.discreteBarChart();\n\n        chart.margin({top: 30, right: 60, bottom: 20, left: 60});\n\n        var datum = data_discretebarchart;\n\n\n\n                    chart.yAxis\n                .tickFormat(d3.format(\',.0f\'));\n\n    \n    \n\n        \n\n\n\n            d3.select(\'#discretebarchart svg\')\n            .datum(datum)\n            .transition().duration(500)\n            .attr(\'width\', 500)\n            .attr(\'height\', 500)\n            .call(chart);\n\n    \n        });\n\n\n\n    </script>\n\n    </body>\n</html>'

And if we want to view the actual chart, we can use the display and HTML functions below:

In [5]:

from IPython.display import display, HTML


display(HTML(chart_html))

For more info, check out the official nvd3.js docs here, or the Python-nvd3 docs here.