How to: draw a nice graph #dataviz

I’ve always found it rather ironic that those of us trained to collect, analyze and interpret data are often the worst qualified to communicate it. This is evident from the results sections of millions of scientific papers out there, littered with visually unappealing, cluttered, incomprehensible and, at worst, highly deceiving graphs. The purpose of a graph is to convey quantitative information visually in a way that is more effective than by other means, such as text or tables. This is particularly true for presentations, in which it is difficult for the audience to focus on reading text on a slide while listening to the presenter at the same time. And yet we feel tempted to cram as much information into our graphs as possible, until they become so distracting that it takes just as long to understand the information as it would to read a table.

In fairness, often the software we use doesn’t help things. Excel, for example, is intended to produce decent graphs with minimal effort, but invariably results in generic, bland-looking graphs that offer limited options for customization (at least not without a lot more effort). I tend to use Stata for most graphs (sorry, R users). This has a comprehensive range of plot types and user options, although it lacks much of the flexibility inherent in R. Consequently, I find the syntax more intuitive, even if it has some at-times frustrating limitations.

This is a basic bar chart in Stata with the default options:

.graph bar (count) unique_id, over(sex)

Not too bad, but not the nicest graph. The colour scheme is not that appealing, the rotated y-axis labels are hard to read, and the gridlines and background colour are distracting. In fact, most of the time I spend coding up graphs is to get rid of some of these default options. For a presentation, I would make the background white, get rid of the axes, axis labels and gridlines and put the value information on the bars themselves so that they’re easy to see without the audience having to figure out what scale the y-axis is in, if it’s even legible from the back of the room. For the colour scheme, I often use Adobe Kuler to find complementary colour combinations and feed the corresponding RGB codes to Stata. Below is a much simplified, but visually far more effective version of the same chart. Of course, it takes a bit more coding, but I think it’s well worth the effort.

graph bar (count) unique_id, over(sex, relabel(1 “female” 2 “male”) axis(off noline)) ///

asyvars percentages /// express values as percentages

blabel(bar, pos(outside) format(%4.1f) size(large)) ///

ylab(, angle(0) nogrid) /// rotate y-axis labels and suppress gridlines

legend(off) /// suppress legend

bargap(50) /// add gap between bars equal to 50% of bar width

text(-2 24 “female (n=32)”) ///

text(-2 76 “male (n=21)”) ///

yscale(off) /// suppress y-axis

ylab(-10(10)80, nogrid) /// scale y-axis from -10 to 80 in steps of 10

ytitle(“”) ///

title(“Percentage of females and males in sample”) ///

bar(1, fcolor(“204 114 14”) lcolor(“255 179 94”)) /// set fill and outline RGB colours for first bar

bar(2, fcolor(“0 128 178”) lcolor(“17 188 255”)) /// set fill and outline RGB colours for second bar

graphregion(color(white)) plotregion(color(white)) // make graph and plot region background white

How to: draw a nice graph #dataviz

Related

Leave a Reply Cancel reply