## 2.8 Autocorrelation

Just as correlation measures the extent of a linear relationship between two variables, autocorrelation measures the linear relationship between lagged values of a time series.

There are several autocorrelation coefficients, corresponding to each panel in the lag plot. For example, $$r_{1}$$ measures the relationship between $$y_{t}$$ and $$y_{t-1}$$, $$r_{2}$$ measures the relationship between $$y_{t}$$ and $$y_{t-2}$$, and so on.

The value of $$r_{k}$$ can be written as $r_{k} = \frac{\sum\limits_{t=k+1}^T (y_{t}-\bar{y})(y_{t-k}-\bar{y})} {\sum\limits_{t=1}^T (y_{t}-\bar{y})^2},$ where $$T$$ is the length of the time series. The autocorrelation coefficients make up the autocorrelation function or ACF.

The autocorrelation coefficients for the beer production data can be computed using the ACF() function.

recent_production %>% ACF(Beer, lag_max = 9)
#> # A tsibble: 9 x 2 [1Q]
#>     lag     acf
#>   <lag>   <dbl>
#> 1    1Q -0.102
#> 2    2Q -0.657
#> 3    3Q -0.0603
#> 4    4Q  0.869
#> 5    5Q -0.0892
#> 6    6Q -0.635
#> 7    7Q -0.0542
#> 8    8Q  0.832
#> 9    9Q -0.108

The values in the acf column are $$r_1,\dots,r_9$$, corresponding to the nine scatterplots in Figure 2.16. We usually plot the ACF to see how the correlations change with the lag $$k$$. The plot is sometimes known as a correlogram.

recent_production %>% ACF(Beer) %>% autoplot()

In this graph:

• $$r_{4}$$ is higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be four quarters apart and the troughs tend to be four quarters apart.
• $$r_{2}$$ is more negative than for the other lags because troughs tend to be two quarters behind peaks.
• The dashed blue lines indicate whether the correlations are significantly different from zero. These are explained in Section 2.9.

### Trend and seasonality in ACF plots

When data have a trend, the autocorrelations for small lags tend to be large and positive because observations nearby in time are also nearby in size. So the ACF of trended time series tend to have positive values that slowly decrease as the lags increase.

When data are seasonal, the autocorrelations will be larger for the seasonal lags (at multiples of the seasonal frequency) than for other lags.

When data are both trended and seasonal, you see a combination of these effects. The a10 data plotted in Figure 2.2 shows both trend and seasonality. Its ACF is shown in Figure 2.18.

a10 %>% ACF(Cost, lag_max = 48) %>% autoplot()

The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due the seasonality.