statistics and applied probability

national university of singapore

Author: Alex (page 1 of 5)

A book: on Bayes’ theorem

Finally, after Fermat’s last theorem, someone (Sharon McGrayne) has written a book on Bayes’ theorem.  It’s been reviewed by Andrew Robinson in Nature. From the review:

Considering the widespread effectiveness of Bayesian inference in physics and astronomy, genetics, imaging and robotics, Internet communication, finance and commerce, it is surprising that it has remained controversial for so long… McGrayne explains [users’] reticence [to admit to using Bayes] in her impressively researched history of Bayes’ theorem, The Theory That Would Not Die. The statistical method runs counter to the conviction that science requires objectivity and precision, she writes. Bayes’ theorem “is a measure of belief. And it says that we can learn even from missing and inadequate data, from approximations, and from ignorance.”

The reviews on Amazon are mixed, and venom seems present in some of them…

Thanks to Román for the good find!

Class of 2011

It’s farewell to another cohort of graduands.  The statistics committee organised a group photograph—out under the blazing tropical sun.  What were they thinking?  Here are the photos, courtesy of Hu Fan.

Class of 2011, shot 1Class of 2011, shot 2

Fare ye well and haste ye back, graduands!

An easy way to publish mathematical research

Unfortunately, although JUR guarantees rejection of your paper, RMce doesn’t guarantee acceptance of all previously rejected articles.  But I bet the odds are good!

A “penalty for peeking”

(Nothing lewd follows.)

Just stumbled upon a fascinating short article by Rothman (1990) in the first volume of Epidemiology 1:43–6. The title of the paper says it all: no adjustments are needed for multiple comparisons. An exerpt from the end:

Suppose the drug C differs considerably in its effect from drug B. Will this difference be less worthy of attention when, sometime in the future, information on drug D comes along as part of the same research programme? Should an investigator estimate on the first day of data analysis how many contrasts ultimately will come along before making adjustments for multiple comparisons? Where do the boundaries of a specific study lie…?

What I’ve taken to doing when I have multiple nil hypothesis significance tests to perform is to write in the figure/table caption or methods the expected number of spurious positive findings conditional on the incredibly pessimistic premise that all nil hypotheses really are true (which is probably impossible for observational studies). Maybe I should cease even this?

This paper should be compulsory reading for everyone interested in statistics (and able to meet the prerequisites).

A super makefile for LaTeX

Chris Monson’s makefile is the best makefile I’ve ever used for LaTeX (the blog software won’t allow me to upload it citing “security concerns”—it’s a plain text file!). It’s just so easy. Check out the webpage describing why.

Of course, it really will only be useful to non-windows LaTeX users.

How to publish a paper with no references, hand drawn figures, and smiley faces…

Co-author it with primary school children.

It’s actually quite an interesting study, but methinks the reviewers perhaps went easy on the eight year olds. There are more than two figures (expressly forbidden by the journal), spelling doesn’t seem to abide by the OED (e.g. “duh duh duuuuhhh”: duh is in, but not duuuuhhh), the introduction starts with “Once upon a time” despite there being no historical aspect to the work, and one author appears to be made up (a certain “P S Blackawton” whose affiliation is Blackawton Primary School). It reminds me of those letters to national newspapers by “Bobby, 7” and his friends.

The article has attracted quite a bit of attention in the press. More so than any of mine, so perhaps my words are tinged with jealousy…

A warning about arima() in R

Just spent a bit of time trying to work out why arima() and lm() were giving different estimates on some dengue data, and on solving the problem, thought I’d share what I found via simulated data.
pars=c(0.5,0.3,0.2,0.1,-0.1);sigma=1
set.seed(666) #for luck
n=10000 #large sample size
errors=rnorm(n,0,sigma)
y=c(1,2,3,4) #initial values
for(i in 5:n)
{
meany=sum(pars*c(1,y[(i-1):(i-4)]))
y[i]=meany+errors[i]
}
plot(y,type=’l’)
arima(y,order=c(4,0,0))
fit=lm(y[5:n]~y[4:(n-1)]+y[3:(n-2)]+y[2:(n-3)]+y[1:(n-4)])
summary(fit)

Calling arima() gives the following coefficients:

  • ar1 0.29 [should be 0.3, ok]
  • ar2 0.19 [should be 0.2, ok]
  • ar3 0.09 [should be 0.1, ok]
  • ar4 -0.09 [should be -0.1, ok]
  • intercept 1.0 [should be 0.5, wrong!]

while lm() gives

  • ar1 0.29
  • ar2 0.19
  • ar3 0.09
  • ar4 -0.09
  • intercept 0.53 [ok!]

An explanation can be found via the website of Shumway and Stoffer: what R provides in arima() is the mean of the stationary distribution, not the “intercept” that you’d expect having learned basic regression.  I wonder how many people have been caught out by this?

PS Just noticed in the original version I wrote anova in place of arima. Doh!

The future of academic publishing?

Here is an interesting article from the beeb on how technology is changing higher education. The University of Texas in San Antonio is purportedly the first university in the world to have a book-free library…

For those who don’t know already, NUS libraries subscribe to a fairly large number of e-books that can be downloaded to your computer, ipad, kindle, or whatever it is young people use nowadays. It’s difficult to find them, but they are there.

Statistical presentation guidelines

I was chatting with Louis about how to give good presentations, and revealed that I’d written up a checklist of things to avoid in giving statistics presentations. Here it is, with details underneath. It is written in the dictatorial style of Strunk and White. Von Bing provided comments on an earlier draft.

Slides

  1. Landscape orientation?
  2. Margins not excessive?
  3. Non-informative contents page?
  4. Too many slides?

Text

  1. Too many words?
  2. Font large enough?
  3. Colour visible on projector?
  4. Spelling checked?

Equations

  1. Are you sure you want every equation?
  2. Do the equations come out ok?

Tables

  1. Are you sure you want a table?
  2. Limited to 2 significant figures?
  3. Decimal places lined up?
  4. Only relevant entries tabulated?
  5. Too much information for one slide?
  6. Fonts large enough?

Figures

  1. Axes labelled?
  2. Labels big enough?
  3. Margins too big?
  4. Excessive space between figures?
  5. Colour visible on projector?
  6. Lines and points labelled, e.g. on legend?
  7. Can plot cover whole slide?

General suggestions

  1. Don’t read from the slide
  2. Be aware of colour blindness

Some detailed thoughts:

Slides
1. Landscape orientation?
It’s a good idea to make sure your slides are set up so that they are wider than they are tall, i.e. that they have a landscape, not portrait, orientation. Modern (post-1990s?) projectors project landscape orientation. If you use portrait, your slides will cover only half the available surface area, meaning that everything is smaller than need be.
2. Margins not excessive?
Some slides are formatted to have a “double margin”, that is, a space, a line, another space, and then content. This may or may not be to your taste, but in any case, it wastes space and, again, means the content of the slides is smaller than need be.
3. Non-informative contents page?
If your talk is very short, e.g. 15 minutes, it’s debatable if you should have a contents page at all. But if you do have a contents page, it is advisable to use informative headings. If your contents page says “Introduction, Methods, Results, Discussion”, it will convey no real information (of course you’re going to start with an introduction!) and may irritate the audience.
4. Too many slides?
It is always wise to plan in advance to have the right number of slides for the length of time you will be talking. You don’t want the chair to stop you when you are half way through discussing the methodology and still have 47 slides to present. You might develop contingency plans that allow you to skip parts if you find you’ve spent too long at the beginning, or to spend more time discussing if you’ve got time to spare. LaTeX beamer is great for this, as it allows you effortlessly to put hyperlinks to different sections of your talk in the margin, thus allowing you to skip slides without having to flick through them, hence revealing the bad planning to the audience.

Text
5. Too many words?
You may wish to avoid putting too much text on a single slide. The audience, when faced with a slide full of words in a small font, will either have to ignore you while they read it, or won’t read it. And if you recite the content of the slides to your audience, again, they will probably just ignore you and focus on the words on slide.
6. Font large enough?
You should ensure the font is sufficiently large that people at the back of the room with bad eyesight will be able to see it. To check this, try putting your slides on a projector and then standing at the back of the room or a similar room (get someone with bad eyesight to help if need be). If it seems too small, it is too small. Making the font large enough will additionally ensure your slides don’t have too much text on them.
7. Colour visible on projector?
Bright fluorescent green text might look great on your computer monitor, but the chances are it won’t be visible on the projector. It’s recommended to check out the colours at the same time you check the size of your fonts.
8. Spelling checked?
Use a spell checker. Do this even if you are a native anglophone (or whatever the language of the conference). Do this especially if you are not.

Equations
9. Are you sure you want every equation?
It takes a lot of concentration to understand an equation, especially if you are not familiar with the details of the topic. Furthermore, if yours is one of a series of talks, the audience will probably forget your notation before you have finished using it. (In fact, I usually forget notation even in individual seminars.) For these reasons, you ought to think very carefully about whether or not you really want each equation. Can the equation be replaced by a pithy description or (ideally) plot? Does it even contribute to the message of your talk?
10. Do the equations come out okay?
There are few things that show a speaker’s lack of respect for his or her audience than seeing an equation that has been mistyped in LaTeX by a speaker who hasn’t even checked its final appearance. To avoid this is simple: check that your equations come out the way you intended before the talk, not during it.

Tables
11. Are you sure you want a table?
Many statisticians love tables. Tables can be a powerful tool for presenting a message. But they require work to get right. So many talks I have been to have been let down by ill-thought out tables, obstreperous and crammed full of numbers that reveal no message. I implore you: think carefully about each table you include in a presentation. Can a graph convey the information instead? If so, use the graph.
12. Limited to 2 significant figures?
Wainer (1997, J. Ed. Behav. Stat. 22:1–30) should be required reading for all statisticians. Do not use more than two significant figures in a table in a presentation.
13. Decimal places lined up?
Make sure you’ve lined up decimal places.
14. Only relevant entries tabulated?
You may be copying a table from a paper of yours. The paper probably contains a lot of information in the table. During the presentation, you probably don’t want the express every single gram of information in that original table. If so, a simple solution is not to put all that information in the table. Take out anything that you’re not going to talk about. If this is difficult to do, you can always highlight the bits you plan to talk about, so the audience can concentrate on them and ignore the other clutter.
15. Too much information for one slide?
If you do wish to talk about many points on a table, you might break the table up over several slides, as this will make it easy for the audience to digest.
16. Fonts large enough?
As with text, make sure the font size in the table is large enough for everyone in the audience to read easily.

Figures Continue reading

Two amusing things from Gelman’s blog

1. Check out the authors of this paper in Physics Letters B. (You need to scroll to the end.) This makes medicine look selective.

2. Learn how journalists structure science stories here.

Older posts
Skip to toolbar