## statistics and applied probability

### national university of singapore

#### Month: December 2010

Chris Monson’s makefile is the best makefile I’ve ever used for LaTeX (the blog software won’t allow me to upload it citing “security concerns”—it’s a plain text file!). It’s just so easy. Check out the webpage describing why.

Of course, it really will only be useful to non-windows LaTeX users.

It’s actually quite an interesting study, but methinks the reviewers perhaps went easy on the eight year olds. There are more than two figures (expressly forbidden by the journal), spelling doesn’t seem to abide by the OED (e.g. “duh duh duuuuhhh”: duh is in, but not duuuuhhh), the introduction starts with “Once upon a time” despite there being no historical aspect to the work, and one author appears to be made up (a certain “P S Blackawton” whose affiliation is Blackawton Primary School). It reminds me of those letters to national newspapers by “Bobby, 7” and his friends.

The article has attracted quite a bit of attention in the press. More so than any of mine, so perhaps my words are tinged with jealousy…

Just spent a bit of time trying to work out why arima() and lm() were giving different estimates on some dengue data, and on solving the problem, thought I’d share what I found via simulated data.
pars=c(0.5,0.3,0.2,0.1,-0.1);sigma=1
set.seed(666) #for luck
n=10000 #large sample size
errors=rnorm(n,0,sigma)
y=c(1,2,3,4) #initial values
for(i in 5:n)
{
meany=sum(pars*c(1,y[(i-1):(i-4)]))
y[i]=meany+errors[i]
}
plot(y,type=’l’)
arima(y,order=c(4,0,0))
fit=lm(y[5:n]~y[4:(n-1)]+y[3:(n-2)]+y[2:(n-3)]+y[1:(n-4)])
summary(fit)

Calling arima() gives the following coefficients:

• ar1 0.29 [should be 0.3, ok]
• ar2 0.19 [should be 0.2, ok]
• ar3 0.09 [should be 0.1, ok]
• ar4 -0.09 [should be -0.1, ok]
• intercept 1.0 [should be 0.5, wrong!]

while lm() gives

• ar1 0.29
• ar2 0.19
• ar3 0.09
• ar4 -0.09
• intercept 0.53 [ok!]

An explanation can be found via the website of Shumway and Stoffer: what R provides in arima() is the mean of the stationary distribution, not the “intercept” that you’d expect having learned basic regression.  I wonder how many people have been caught out by this?

PS Just noticed in the original version I wrote anova in place of arima. Doh!

Here is an interesting article from the beeb on how technology is changing higher education. The University of Texas in San Antonio is purportedly the first university in the world to have a book-free library…

For those who don’t know already, NUS libraries subscribe to a fairly large number of e-books that can be downloaded to your computer, ipad, kindle, or whatever it is young people use nowadays. It’s difficult to find them, but they are there.

I was chatting with Louis about how to give good presentations, and revealed that I’d written up a checklist of things to avoid in giving statistics presentations. Here it is, with details underneath. It is written in the dictatorial style of Strunk and White. Von Bing provided comments on an earlier draft.

Slides

1. Landscape orientation?
2. Margins not excessive?
3. Non-informative contents page?
4. Too many slides?

Text

1. Too many words?
2. Font large enough?
3. Colour visible on projector?
4. Spelling checked?

Equations

1. Are you sure you want every equation?
2. Do the equations come out ok?

Tables

1. Are you sure you want a table?
2. Limited to 2 significant figures?
3. Decimal places lined up?
4. Only relevant entries tabulated?
5. Too much information for one slide?
6. Fonts large enough?

Figures

1. Axes labelled?
2. Labels big enough?
3. Margins too big?
4. Excessive space between figures?
5. Colour visible on projector?
6. Lines and points labelled, e.g. on legend?
7. Can plot cover whole slide?

General suggestions

1. Don’t read from the slide
2. Be aware of colour blindness

Some detailed thoughts:

Slides
1. Landscape orientation?
It’s a good idea to make sure your slides are set up so that they are wider than they are tall, i.e. that they have a landscape, not portrait, orientation. Modern (post-1990s?) projectors project landscape orientation. If you use portrait, your slides will cover only half the available surface area, meaning that everything is smaller than need be.
2. Margins not excessive?
Some slides are formatted to have a “double margin”, that is, a space, a line, another space, and then content. This may or may not be to your taste, but in any case, it wastes space and, again, means the content of the slides is smaller than need be.
3. Non-informative contents page?
If your talk is very short, e.g. 15 minutes, it’s debatable if you should have a contents page at all. But if you do have a contents page, it is advisable to use informative headings. If your contents page says “Introduction, Methods, Results, Discussion”, it will convey no real information (of course you’re going to start with an introduction!) and may irritate the audience.
4. Too many slides?
It is always wise to plan in advance to have the right number of slides for the length of time you will be talking. You don’t want the chair to stop you when you are half way through discussing the methodology and still have 47 slides to present. You might develop contingency plans that allow you to skip parts if you find you’ve spent too long at the beginning, or to spend more time discussing if you’ve got time to spare. LaTeX beamer is great for this, as it allows you effortlessly to put hyperlinks to different sections of your talk in the margin, thus allowing you to skip slides without having to flick through them, hence revealing the bad planning to the audience.

Text
5. Too many words?
You may wish to avoid putting too much text on a single slide. The audience, when faced with a slide full of words in a small font, will either have to ignore you while they read it, or won’t read it. And if you recite the content of the slides to your audience, again, they will probably just ignore you and focus on the words on slide.
6. Font large enough?
You should ensure the font is sufficiently large that people at the back of the room with bad eyesight will be able to see it. To check this, try putting your slides on a projector and then standing at the back of the room or a similar room (get someone with bad eyesight to help if need be). If it seems too small, it is too small. Making the font large enough will additionally ensure your slides don’t have too much text on them.
7. Colour visible on projector?
Bright fluorescent green text might look great on your computer monitor, but the chances are it won’t be visible on the projector. It’s recommended to check out the colours at the same time you check the size of your fonts.
8. Spelling checked?
Use a spell checker. Do this even if you are a native anglophone (or whatever the language of the conference). Do this especially if you are not.

Equations
9. Are you sure you want every equation?
It takes a lot of concentration to understand an equation, especially if you are not familiar with the details of the topic. Furthermore, if yours is one of a series of talks, the audience will probably forget your notation before you have finished using it. (In fact, I usually forget notation even in individual seminars.) For these reasons, you ought to think very carefully about whether or not you really want each equation. Can the equation be replaced by a pithy description or (ideally) plot? Does it even contribute to the message of your talk?
10. Do the equations come out okay?
There are few things that show a speaker’s lack of respect for his or her audience than seeing an equation that has been mistyped in LaTeX by a speaker who hasn’t even checked its final appearance. To avoid this is simple: check that your equations come out the way you intended before the talk, not during it.

Tables
11. Are you sure you want a table?
Many statisticians love tables. Tables can be a powerful tool for presenting a message. But they require work to get right. So many talks I have been to have been let down by ill-thought out tables, obstreperous and crammed full of numbers that reveal no message. I implore you: think carefully about each table you include in a presentation. Can a graph convey the information instead? If so, use the graph.
12. Limited to 2 significant figures?
Wainer (1997, J. Ed. Behav. Stat. 22:1–30) should be required reading for all statisticians. Do not use more than two significant figures in a table in a presentation.
13. Decimal places lined up?
Make sure you’ve lined up decimal places.
14. Only relevant entries tabulated?
You may be copying a table from a paper of yours. The paper probably contains a lot of information in the table. During the presentation, you probably don’t want the express every single gram of information in that original table. If so, a simple solution is not to put all that information in the table. Take out anything that you’re not going to talk about. If this is difficult to do, you can always highlight the bits you plan to talk about, so the audience can concentrate on them and ignore the other clutter.
15. Too much information for one slide?
If you do wish to talk about many points on a table, you might break the table up over several slides, as this will make it easy for the audience to digest.
16. Fonts large enough?
As with text, make sure the font size in the table is large enough for everyone in the audience to read easily.