As you know, hypothesis tests (or null hypothesis significance tests) pervade statistics. Yet many hate them. Yes, hate. I found this delightful quote of Newman and Peason (fils) and comment in a paper by Steven Goodman (1999, Ann Int Med 130:995–1004):
In their [N&P’s] words … ” Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour with regard to them… [to] insure that, in the long run of experience, we shall not often be wrong”…
Hypothesis tests are equivalent to a system of justice that is not concerned with which individual defendant is found guilty or innocent (that is, “whether each separate hypothesis is true or false”) but tries instead to control the overall number of incorrect verdicts (that is, “in the long run of experience, we shall not often be wrong”)… just as our sense of justice demands that individual persons be correctly judged, scientific intuition says that we should try to draw the proper conclusions from individual studies.
Mark Chen sent a link over to a brief presentation by Hans Rosling at Gapminder (gapminder looks excellent, by the way). It’s a little dated now, coming as it does a few weeks into the H1N1 pandemic last year, but it’s quite nice use of statistics (in the sense of data, not the study of data), and quite damning of the media. Take a look!
Tsung Fei requests a critique of graphs. He points out this list of the top ten bad graphs of all time. This has beauties such as
(Wittke-Thompson JK, Pluzhnikov A, Cox NJ (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Human Genetics 76:967–86). You have to look hard, but there are data in panel D.
Anyway, although these are bad, I don’t think they really count as a “top ten”. If you want to see worse, check out Kaiser Fung’s blog Junk Charts. Or, just go along to the Annals of Statistics, one of the top stats journals, and read a couple of paper. Pretty soon you’ll find some terrible examples of graphs (or tables with 500+ numbers, all recorded to 4 significant figures, as David N once told me, the more figures in a table you have, the more serious the research). Here’s an example, from I think the 5th paper I looked at to prove this point:
It’s taken from Wong and Ma, Ann Stat 38:1433–59, an interesting paper, but what a terrible graph! I reckon at most 20% of the figure is made up of, well, figures, and the rest white space between panels and lots and lots of labels.
Anyway, Kaiser Fung has much more to say on this than me (for now anyway), so take a look at his blog!
Romàn found this a while ago, thought it would be good to post. If you use internet searches to look for scientific papers, rather than going via the library webpage, you often find the journal webpage wants to charge you, even when you know NUS subscribes. The slow way to deal with this is to go to the library webpage, find the journal, log in, search for the volume and page number, precious minutes of your life you will never see again.
Well no longer! Follow the instructions here and all you need to do is to click on a bookmark on firefox (or whatever browser you use, although firefox really is the best) et voilà! Journal realises you’re at NUS and gives you the article.
The Institute for High Performance Computing is looking to hire a post-doctoral research associate or research engineer to work on the NMRC-funded project entitled “Optimising local management of global pandemics — a modelling approach”. For information on the IHPC Advanced Computing group click here. Here is the abstract to the project:
Emerging infectious diseases have the potential to cause global pandemics which directly impact Singapore, as evidenced by the outbreaks of Severe Acute Respiratory Syndrome in 2003  and the history of influenza pandemics . The novel H1N1 influenza strain recently detected in Mexico and the United States  has raised concerns that a new virus with the capability to cause a pandemic has emerged .
A historical review we conducted previously showed that the influenza pandemics of 1957 and 1968 caused epidemics in Singapore earlier than in most Northern hemisphere countries. Moreover, the pandemic of 1918 caused two waves of infection, the second being more severe than the first . The last time a novel strain of influenza emerged and spread around the world was in 1977, and that virus also caused a major epidemic in Singapore with evidence of second wave about 1 year later . However, there was little data collected in previous pandemic events to characterize spread of infection or effectiveness of interventions.
With the lead-time prior to community transmission of the new H1N1 virus in Singapore, studies have been planned to collect data about the spread of the new virus (see Annex A on list of epidemiological studies proposed located after the reference section). In addition, there are several existing projects led by parties on this grant application to model the spread of pandemic influenza in different settings within Singapore, as well as community perceptions of the impact of various health outcomes and policy options. The data collected during the epidemic and the different modeling approaches already in progress need to be analyzed and synthesized into a set of results that can produce recommendations for action. In addition, should be translated into cost-effectiveness/cost-benefit assessments to help policy-makers prioritize different strategies according to different policy objectives.
The PI for the project is Mark Chen at Tan Tock Seng Hospital, with Fu Xiuju at IHPC and David Matchar at Duke-NUS as co-PIs, and co-investigators Alex Cook (DSAP), Vernon Lee (SAF), Stefan Ma (MOH), Tomi Silander and Gary Lee (IHPC).
A PhD degree in related fields (eg: epidemiology, statistics, computer science, mathematics, or physics). Candidates with an MSc and demonstrably exceptional research abilities may also be considered.
Solid epidemiological modelling background with a thorough understanding of the fundamentals of computer-based simulation and compartmental models. Significant experience writing code is preferred.
Experience in one or more of the following areas: contact network theory, individual/agent-based modeling, mathematical or statistical modelling in epidemiology.
Good communication skills: ability to present and communicate ideas.
Strong statistical/mathematical/analytical abilities and exceptional problem solving skills.
Key player and contributor to a research team
Consolidate epidemiological and social contact data from different resources
Creation of new epidemiological models based on existing data and models
Challenge results with scientists and experts for constant improvement
Close collaboration with research team members for implementing and analyzing results
Constantly interacts with key people to understand working environment
For more information contact : Xiuju FU (fuxj AT ihpc.a-star.edu.sg)
The blog’s been quiet for a while, so I thought I’d dig up an excellent, thought-provoking paper by Marks Nester (1996, Appl Stat 45:401–10) called An Applied Statistician’s Creed. He’s quite critical of null hypothesis significance testing (NHST), which I like, as NHST is one of my pet peeves.
Nester opens his argument with a description of how a scientific researcher reacts upon calculation of the p-value (which I paraphrase as I can’t copy and paste from the original article for some reason):
If a significant result is obtained:
This is good. This is what I hoped/expected. The true difference between the treatments can now be equated to the observed difference.
This is bad. I didn’t want this. But it doesn’t matter, as the sample difference is so small, it’s not practically significant.
This is bad. I didn’t want this, and I can’t explain it. Maybe it’s a type I error?
If a non-significant result is obtained:
This is good. This is what I hoped/expected. The true difference between the treatments can now be equated to zero.
This is bad. I didn’t want this, and I can’t explain it. Maybe it’s a type II error?
It’s funny because it’s true.
He later puts forth a creed and how it should influence the way we approach NHST.
All treatments differ.
All factors interact.
All variables are correlated.
No two populations are identical.
No data are Gaussian.
Variances are never equal.
All models are wrong.
No two numbers [e.g. parameters] are the same.
Many numbers [I think he means sample sizes?] are very small.
Thus, to answer the titular title of this post: How many replicates should be used to test whether two treatments have different effects? The answer:
Obviously one need not use any replications, nor even do the experiment, since the treatments must have different effects.
Nice! This would save a lot of work. Well, no, we’d still have to calculate MLEs and CIs (or posterior distributions—much better!), and these actually would address something that scientists are interested in. So, let us strive to move away from p-values, and to be obdurate when our scientific collaborators request them.