My presidential election voter turnout by country over time visualization raised some interesting questions (you can check it out here ).
First off, why use d3 to visualize this data? As data practitioners, we often have a multiplicity of tools available at our disposal. d3 really shines though when we want to use its interactivity to explore highly dimensional data.
I started out wanting to know more about presidential voter turnout -- how had it changed over time worldwide? Were there outlier countries that bucked the worldwide trend? Could I look at the time series for each individual country and see the participation rise and fall in response to the country's own historical events?
Ideally, I'd be able to answer these questions using a tool that allows me to generate plots quickly. My two favourite libraries are Python's matplotlib and R's ggplot2 . After cleaning the data (you can check out an iPython notebook with the whole process here), I was able to generate this plot.
Bleh! Even though this plot should help us to answer our questions, since it graphs a voter turnout time series, with each line representing a different country. We even have some nice colours that help us distinguish between our different lines. This is virtually illegible, though -- our data simply has too many dimensions, and we have used a visual attribute to represent each time series that does not allow for the human eye to effectively distinguish between the individual time series. We are using colour differences to tell the lines apart from each other, but given the nature of the dataset, there is simply too much noise for this to be a useful mapping. Furthermore, we can't tell which of these lines belongs to which country, and thus can't answer questions about the countries' individual events. We could add some labels to each of the lines, maybe play around with the line weights a little more, but the result would be visually noisy and uneappealing.
What we really need is a better mapping of our data features to visual attributes -- enter d3.js . I went ahead and took the opacity of each line down to 0.2 -- this gives us some line outlines and allows for us to see the general shape of the data, without being too noisy. Then, I added an interactive feature that allows for users to either hover over or tap each line and have its opacity be increased to 1 and present a tooltip showing which country's time series the user has selected. We now are not just using colours to tell our lines apart -- we have added shading and a textual element.
The less busy composition of the d3 visualization also allows for us to leave the world average time series
continuously at an opacity of 1, facilitating comparisons between the world trend and the individual country's trend. And there we have it, an actually useful visualization! With a slightly different set of visual attributes used to represent our data that make use of d3's interactive components, we've created an easy way for users to get far more information than they were able to with the simple line graph created in matplotlib.
Git repo with d3 and python files used to generate these visualizations.
No comments:
Post a Comment