I’ve always found it rather ironic that those of us trained to collect, analyze and interpret data are often the worst qualified to communicate it. This is evident from the results sections of millions of scientific papers out there, littered with visually unappealing, cluttered, incomprehensible and, at worst, highly deceiving graphs. The purpose of a graph is to convey quantitative information visually in a way that is more effective than by other means, such as text or tables. This is particularly true for presentations, in which it is difficult for the audience to focus on reading text on a slide while listening to the presenter at the same time. And yet we feel tempted to cram as much information into our graphs as possible, until they become so distracting that it takes just as long to understand the information as it would to read a table.
In fairness, often the software we use doesn’t help things. Excel, for example, is intended to produce decent graphs with minimal effort, but invariably results in generic, bland-looking graphs that offer limited options for customization (at least not without a lot more effort). I tend to use Stata for most graphs (sorry, R users). This has a comprehensive range of plot types and user options, although it lacks much of the flexibility inherent in R. Consequently, I find the syntax more intuitive, even if it has some at-times frustrating limitations.
This is a basic bar chart in Stata with the default options:
.graph bar (count) unique_id, over(sex)
Not too bad, but not the nicest graph. The colour scheme is not that appealing, the rotated y-axis labels are hard to read, and the gridlines and background colour are distracting. In fact, most of the time I spend coding up graphs is to get rid of some of these default options. For a presentation, I would make the background white, get rid of the axes, axis labels and gridlines and put the value information on the bars themselves so that they’re easy to see without the audience having to figure out what scale the y-axis is in, if it’s even legible from the back of the room. For the colour scheme, I often use Adobe Kuler to find complementary colour combinations and feed the corresponding RGB codes to Stata. Below is a much simplified, but visually far more effective version of the same chart. Of course, it takes a bit more coding, but I think it’s well worth the effort.
graph bar (count) unique_id, over(sex, relabel(1 “female” 2 “male”) axis(off noline)) ///
asyvars percentages /// express values as percentages
blabel(bar, pos(outside) format(%4.1f) size(large)) ///
ylab(, angle(0) nogrid) /// rotate y-axis labels and suppress gridlines
legend(off) /// suppress legend
bargap(50) /// add gap between bars equal to 50% of bar width
text(-2 24 “female (n=32)”) ///
text(-2 76 “male (n=21)”) ///
yscale(off) /// suppress y-axis
ylab(-10(10)80, nogrid) /// scale y-axis from -10 to 80 in steps of 10
ytitle(“”) ///
title(“Percentage of females and males in sample”) ///
bar(1, fcolor(“204 114 14”) lcolor(“255 179 94”)) /// set fill and outline RGB colours for first bar
bar(2, fcolor(“0 128 178”) lcolor(“17 188 255”)) /// set fill and outline RGB colours for second bar
graphregion(color(white)) plotregion(color(white)) // make graph and plot region background white