jump to navigation

A warning about arima() in R 2010/12/22

Posted by Alex in : Uncategorized , trackback

Just spent a bit of time trying to work out why arima() and lm() were giving different estimates on some dengue data, and on solving the problem, thought I’d share what I found via simulated data.
pars=c(0.5,0.3,0.2,0.1,-0.1);sigma=1
set.seed(666) #for luck
n=10000 #large sample size
errors=rnorm(n,0,sigma)
y=c(1,2,3,4) #initial values
for(i in 5:n)
{
meany=sum(pars*c(1,y[(i-1):(i-4)]))
y[i]=meany+errors[i]
}
plot(y,type=’l’)
arima(y,order=c(4,0,0))
fit=lm(y[5:n]~y[4:(n-1)]+y[3:(n-2)]+y[2:(n-3)]+y[1:(n-4)])
summary(fit)

Calling arima() gives the following coefficients:

while lm() gives

An explanation can be found via the website of Shumway and Stoffer: what R provides in arima() is the mean of the stationary distribution, not the “intercept” that you’d expect having learned basic regression.  I wonder how many people have been caught out by this?

PS Just noticed in the original version I wrote anova in place of arima. Doh!

Comments»

no comments yet - be the first?