A warning about arima() in R 2010/12/22
Posted by Alex in : Uncategorized , trackbackJust spent a bit of time trying to work out why arima() and lm() were giving different estimates on some dengue data, and on solving the problem, thought I’d share what I found via simulated data.
pars=c(0.5,0.3,0.2,0.1,-0.1);sigma=1
set.seed(666) #for luck
n=10000 #large sample size
errors=rnorm(n,0,sigma)
y=c(1,2,3,4) #initial values
for(i in 5:n)
{
meany=sum(pars*c(1,y[(i-1):(i-4)]))
y[i]=meany+errors[i]
}
plot(y,type=’l')
arima(y,order=c(4,0,0))
fit=lm(y[5:n]~y[4:(n-1)]+y[3:(n-2)]+y[2:(n-3)]+y[1:(n-4)])
summary(fit)
Calling arima() gives the following coefficients:
- ar1 0.29 [should be 0.3, ok]
- ar2 0.19 [should be 0.2, ok]
- ar3 0.09 [should be 0.1, ok]
- ar4 -0.09 [should be -0.1, ok]
- intercept 1.0 [should be 0.5, wrong!]
while lm() gives
- ar1 0.29
- ar2 0.19
- ar3 0.09
- ar4 -0.09
- intercept 0.53 [ok!]
An explanation can be found via the website of Shumway and Stoffer: what R provides in arima() is the mean of the stationary distribution, not the “intercept” that you’d expect having learned basic regression. I wonder how many people have been caught out by this?
PS Just noticed in the original version I wrote anova in place of arima. Doh!

Comments»
no comments yet - be the first?