## statistics and applied probability

### national university of singapore

#### Month: April 2010 (page 1 of 2)

We organised quizzes for the graduands and the post-graduate students last week and this (the quizzes had only minor differences).  This was to find out how well the curricula were succeeding in teaching the fundamentals of statistics, and also to give students some extra practice before their real exams.  Here are the results (of the graduand quiz: where the postgrad quiz differs is indicated).

1. Here is a function:

The derivative with respect to α has no closed form. Explain brieﬂy how you would maximise this function.
You would have to use some numerical method, such as Newton-Raphson or cross entropy.
2. Suppose you have two samples from populations in which binary data are recorded (e.g. liking of the colour pink in science students versus arts students) and wish to assess
whether the two proportions are the same or are different. Why do we use H0 : p1 = p2 and H1 : p1 = p2 and not the other way around?
There are several ways to answer this. To my mind the reason why the null hypothesis is what we assume to be true in carrying out the test is that it is impossible to carry out the test otherwise: there’s no known distribution of the test statistic under the alternative hypothesis, so technically H1 can not be rejected. But there’s an easier answer: unless the data are an exact tie, the likelihood function evaluated at at least one point in the alternative region will be higher than the likelihood evaluated on the constraint that the null is true, so how could the alternative be rejected in favour of the null?
3. Deal two cards from a well-shuffled, French-suited 52-card deck. (a) Are the events {1st is an ace} and {1st is a spade} independent? (b) Are {1st is an ace} and {2nd is a spade} independent?
Both are independent.
4. You have independent samples X1 , . . . , Xn from a normal population with mean µ and standard deviation σ. What is the MLE of (µ, σ)? (But don’t attempt to derive it!) Is this unbiased?
The MLE is (xbar,(n-1)s²/n). It is biased.
5. Which of the following procedures use linear models?
• Two-sample t-test with equal variances
• Paired t-test
• One-way ANOVA
• Two-way ANOVA
• Linear regression
• Multiple regression
• Logistic regression
All except logistic regression are special cases of the linear model.
6. . The sex ratio in most countries is approximately 50:50. A city in such a country has two hospitals, one bigger than the other. The bigger hospital has around 45 births per day, the smaller around 15. The two hospitals keep track of the number of days in the year in which the sex ratio deviates from 50:50 by as much as 60:40 (or 40:60). Which hospital do you think recorded more such days?
• The larger hospital
• The smaller hospital
• About the same (within 5% of each other)
The sampling distribution of the proportion of male births in the smaller hospital has a greater variance because of the central limit theorem.
7. Let the sample mean of a random sample be xbar and the population mean be µ. Which of these is:
• a random variable
• a realisation of a random variable
• a statistic
• a parameter
• an estimator
(You may also answer “both” or “neither” by indicating two yeses or two noes.)
According to frequentist dogma, neither is a random variable, while xbar is a realisation of a random variable, a statistic and an estimator, and µ is a parameter. To a Bayesian, µ is a random variable.
8. A government statistician and an academic economist are investigating the average wage in Singapore (with around 3 million workers). The statistician looks up the wages of
all workers in the country on her governmental data base, while the economist carries out a simple random sample (with replacement) of 200 randomly selected people. Both calculate the mean and standard deviation of wages (respectively (xbar_S , s_S ) and (xbar_E , s_E )). What are the standard errors of the means calculated by the two investigators?
The statistician has a census at her hands, and so there is no uncertainty on the mean: the standard error is 0. The standard error for the economist is s_e/14.14.
9. The Compte de Buﬀon described an experiment, called Buﬀon’s needle experiment. This involves dropping a needle many times over a table marked with parallel lines separated by a distance equal to the needle’s length. It can be shown using geometry and calculus that the probability the needle will touch a line is p = 2/π. Dropping the needle a few hundred times, a mathematician derived a conﬁdence interval for π by manipulating the standard formula for the conﬁdence interval of a proportion,
;
this gave the interval (3.08, 3.18).
(a) What is the probability the true value of the parameter is inside this conﬁdence interval if Zα = 1.96?
100%. π=3.14 which is in (3.08, 3.18).
(b) Why is the value of Zα taken from a normal distribution?
The central limit theorem says that phat is approximately normal for this sample size.
10. Imagine you are going to carry out a two-sample t-test of the null hypothesis that two means are equal. What is the distribution of the resulting p-value under the null hypothesis?
It is uniform on the interval (0,1).

In case you weren’t aware, in the far off United Kingdom (of Great Britain and Northern Ireland), there’s an election going on.  The news are getting very excited about it, as it looks possible for a hung parliament for the first time since nineteen-canteen, with neither of the two major parties particularly loved anymore.

Anyway, on to the stats. Back in the UK, they have a first-past-the-post constituency system, which tends to favour two big parties, who get far more seats in parliament than their share of the vote deserves.  (My old lecturer, Dennis Mollison, will be arguing in Significance for a fairer system that he has long promoted and of which the Liberal Democrats are in favour.) This encourages parties to argue/pretend in each constituency that it’s a two-horse race between them and one other, and that a vote for a third party is a wasted vote.

This leads to gems of propaganda such as this

highlighted in an interesting blog by the Beeb’s Mark Easton.  He provides some more examples from the other parties.

Funny that they’d bother putting the numbers on the graphs: surely that just weakens the whole message created by the creative y-axis?

It’s like buses: you wait ages and then two come at once! There were two news items from Europe on the BBC webpage today that involved statistics.

Story I: De Berk acquitted of murder

In a story reminiscent of the Sally Clark case in the UK 10y ago, the Netherlandish nurse, Lucy De Berk, was acquitted of killing seven patients. She’d spent 6y in prison after being found guilty, and was dubbed the Angel of Death. The original prosecution had made much of the “probability” that so many suspicious deaths would occur around her if she were innocent as 1/300 million. Of course, this is not the same as the probability she was innocent given all those deaths.  There’s a nice story on Andrew Gelman’s blog describing this in more detail.  I can’t work out what the url for that particular post is, so here‘s the article in the NYT he links to.

Let’s hope things work out for De Berk better than they did for Sally Clark, who died a few years ago from alcohol poisoning.

Story II: “Useful for climatologists to work with statisticians in future”

After climategate, namely the hacking and subsequent leaking of emails relating to how the University of East Anglia climatic research unit created its predictions of future climate, an independent panel was set up to investigate. The president of the RSS, David Hand, was one member of the panel.  They concluded:

We cannot help remarking that it is very surprising that research in an area that depends so heavily on statistical methods has not been carried out in close collaboration with professional statisticians.

The beeb quote Prof Hand as saying the CRU had been “a little naïve” in not working more closely with statisticians.

When it’s a square!  At least, according to a colleague’s partner’s child’s primary teacher, who marked the boy wrong for displaying an exemplary grasp of Euclidean geometry by noting that the square satisfies all the requirements to be a rectangle (it is a planar figure with four sides and four right angles). I wonder which of these conditions the teacher thought a square violated?

Another year, another batch of students are leaving us for the “real world”. We had a tea to celebrate and wish them well.  Here are photos!

Clicking on the thumbnail will open the full-sized image—which will be quite large!

We held a quiz before tea.  Results to be posted here, soon.

I got an interesting, attractively designed (though poorly edited) booklet through my letter box last week, entitled the Survival of the World is in Your Hands.  It seems to be the work of the Supreme Master Ching Hai, and there are related ads appearing on bus stops around about Singapore.  The basic argument is that we can avert climate change by becoming “veg”, which seems to indicate adopting a vegetarian diet.  (Conflict of interest declaration: I don’t eat meat, although I do eat fish.)

The argument is supported by impressive sounding statistics alongside charts like this bizarre pie chart:

Not quite sure what the things protruding from the charts are.  (I edited out the extra text, if you want to read it, check out the original on the loving hut webpage.)

Here are some of the claims:

• At least 51% of global greenhouse gas emissions come from livestock.
• Natural disasters have doubled in the last 20 years.
• There is a 99% correlation between the number of pig factory farms and the number of people getting influenza A (H1N1-2009) in Canadian provinces.

Having just read Lomborg’s the Skeptical Environmentalist, and having been working on H1N1 a bit, these claims seem far-fetched. The 51% claim comes from a publication of the Worldwatch Institute, and is much higher than the estimate from the FAO (of 18%), but the latter has recently been been criticised as being too high (which the authors of the FAO report conceded).   The doubling of natural disasters is to me fairly convincingly dismissed by Lomborg.  The 99% correlation claim comes from some newspaper reports by Alex Roslin in Canada, but while I could get the data on the number of pig farms, the number of cases at the time of the report seems to be no longer available, alas.  But it seems far more likely that there is a high correlation between the number of pig farms and the number of people (infected or no) and that that’s driving this correlation.  After all, didn’t we get H1N1 in Singapore?  Not too many pigs here…

It’s a shame the author had to resort to scare-mongering.  I think she’d have done better by providing more of the recipes that appear in the middle of the booklet: asparagus wrap, pasta with mariana sauce, etc.  Yum!

Following up on the last post…

While Lockhart argues that we should scrap (compulsory) mathematics from schools, I read an interesting book (Doman & Doman 2005 How to teach your baby math, New York: Square One Publishers) last night that argues the opposite, namely that you should start teaching your child mathematics from birth. Yes, birth lah. Failing that before 1y, or failing that before 2y, or…

The approach they describe is very simple, and I’ll outline it here, though if you want the full story you have to read the book.

On the left plot are twelve dots. According to Doman², this is the number twelve. On the right plot is “12”, the numeral we use to represent twelve. As adults, we find it easier to understand the numeral, and this is usually how we teach children numbers. But the authors (child educators) argue that this is too abstract for children, and that we should start showing them numbers and teach arithmetic using numbers, and only then introduce numerals and arithmetic using numerals.

Apparently, a child will learn quite quickly to look at this (left) and know it’s thirty (or however many dots I put down, I’m no good at arithmetic). And according to the authors, you can have your child doing simple algebra before s/he starts school… and enjoying it!

Could this be true?  It seems so implausible, and yet so tantalising… I’m going out to buy some cards and 5050 red stickers today! (Well, tomorrow, I’m at work today.)

I came across a really interesting paper on the MAA webpage the other day, a mathematician’s lament by Paul Lockhart, a school teacher in New York.  I think many of the points are relevant to university maths and stats teaching, and even those that are not add perspective to how we deal with students on service courses in especial, since they may suffer from dyscalculia.

Here’s an exerpt from the first page:

A musician wakes from a terrible nightmare. In his dream he finds himself in a society where music education has been made mandatory… Since musicians are known to set down their ideas in the form of sheet music, these curious black dots and lines must constitute the “language of music.” It is imperative that students become fluent in this language if they are to attain any degree of musical competence; indeed, it would be ludicrous to expect a child to sing a song or play an instrument without having a thorough grounding in music notation and theory. Playing and listening to music, let alone composing an original piece, are considered very advanced topics and are generally put off until college…  Of course, not many students actually go on to concentrate in music, so only a few will ever get to hear the sounds that the black dots represent. Nevertheless, it is important that every member of society be able to recognize a modulation or a fugal passage, regardless of the fact that they will never hear one. “To tell you the truth, most students just aren’t very good at music. They are bored in class, their skills are terrible, and their homework is barely legible… they just want to take the minimum number of music courses and be done with it.”

Here is my take on the main points:

• Maths is about problem solving, which is intellectually fun.
• But maths teaching often involves memorising formulæ and applying them over and over again, in other words, a class of problems is outlined, a solution for those problems provided, and then students get to practice applying that solution.  This is not fun.
• Once they leave school, most people don’t need to know all the mathematics they were taught, so we should focus on teaching them problem solving (genuine problem solving) rather than “pointless definitions”.

I wonder what messages we can take from this when it comes to teaching stats at university level? Should we refrain from reproving old theorems? Should exams focus exclusively on solving new problems (rather than redoing some questions from tutorials with minor changes)? Should we scrap lectures?

We know the doubling-strategy in any martingale-game produces sure profits – if only one had unlimited resources at hand. It seems there are other ways – although I don’t quite understand the exact procedure. Here’s an article from StraitsTimes (shortened) with the details:

“THREE foreigners were charged on Monday with cheating the Resorts World Sentosa of \$13,400. […] [They] are said to have placed \$3,600 worth of chips on a winning bet in a game of roulette after the result had been declared at the casino, to deceive the dealer into believing that they had won \$7,200. […] [One of the foreigners] is also accused of trying to cheat a dealer into giving him chips valued at \$15,700 […] by placing chips worth \$400 and \$100 on a winning number and between number 19 and 22 respectively after the result had been declared. […]”

(http://www.straitstimes.com/BreakingNews/Singapore/Story/STIStory_510731.html)

Important to notice: this strategy is not a previsible process – no wonder it works! (the proof of that is left to the reader).

In case you ever wondered:
“If a man/woman can expect to meet exactly N eligible partners in his/her life, what strategy should he/she use to maximize his chances of choosing the very best one?”