## 12.3 Vector autoregressions

One limitation of the models that we have considered so far is that they impose a unidirectional relationship — the forecast variable is influenced by the predictor variables, but not vice versa. However, there are many cases where the reverse should also be allowed for — where all variables affect each other. In Section 10.2, the changes in personal consumption expenditure (\(C_t\)) were forecast based on the changes in personal disposable income (\(I_t\)). However, in this case a bi-directional relationship may be more suitable: an increase in \(I_t\) will lead to an increase in \(C_t\) and vice versa.

An example of such a situation occurred in Australia during the Global Financial Crisis of 2008–2009. The Australian government issued stimulus packages that included cash payments in December 2008, just in time for Christmas spending. As a result, retailers reported strong sales and the economy was stimulated. Consequently, incomes increased.

Such feedback relationships are allowed for in the vector autoregressive (VAR) framework. In this framework, all variables are treated symmetrically. They are all modelled as if they all influence each other equally. In more formal terminology, all variables are now treated as “endogenous.” To signify this, we now change the notation and write all variables as \(y\)s: \(y_{1,t}\) denotes the \(t\)th observation of variable \(y_1\), \(y_{2,t}\) denotes the \(t\)th observation of variable \(y_2\), and so on.

A VAR model is a generalisation of the univariate autoregressive model for forecasting a vector of time series.^{22} It comprises one equation per variable in the system. The right hand side of each equation includes a constant and lags of all of the variables in the system. To keep it simple, we will consider a two variable VAR with one lag. We write a 2-dimensional VAR(1) model as
\[\begin{align}
y_{1,t} &= c_1+\phi _{11,1}y_{1,t-1}+\phi _{12,1}y_{2,t-1}+\varepsilon_{1,t} \tag{12.1}\\
y_{2,t} &= c_2+\phi _{21,1}y_{1,t-1}+\phi _{22,1}y_{2,t-1}+\varepsilon_{2,t}, \tag{12.2}
\end{align}\]
where \(\varepsilon_{1,t}\) and \(\varepsilon_{2,t}\) are white noise processes that may be contemporaneously correlated. The coefficient \(\phi_{ii,\ell}\) captures the influence of the \(\ell\)th lag of variable \(y_i\) on itself, while the coefficient \(\phi_{ij,\ell}\) captures the influence of the \(\ell\)th lag of variable \(y_j\) on \(y_i\).

If the series are stationary, we forecast them by fitting a VAR to the data directly (known as a “VAR in levels”). If the series are non-stationary, we take differences of the data in order to make them stationary, then fit a VAR model (known as a “VAR in differences”). In both cases, the models are estimated equation by equation using the principle of least squares. For each equation, the parameters are estimated by minimising the sum of squared \(e_{i,t}\) values.

The other possibility, which is beyond the scope of this book and therefore we do not explore here, is that the series may be non-stationary but cointegrated, which means that there exists a linear combination of them that is stationary. In this case, a VAR specification that includes an error correction mechanism (usually referred to as a vector error correction model) should be included, and alternative estimation methods to least squares estimation should be used.^{23}

Forecasts are generated from a VAR in a recursive manner. The VAR generates forecasts for *each* variable included in the system. To illustrate the process, assume that we have fitted the 2-dimensional VAR(1) model described in Equations (12.1)–(12.2), for all observations up to time \(T\). Then the one-step-ahead forecasts are generated by
\[\begin{align*}
\hat y_{1,T+1|T} &=\hat{c}_1+\hat\phi_{11,1}y_{1,T}+\hat\phi_{12,1}y_{2,T} \\
\hat y_{2,T+1|T} &=\hat{c}_2+\hat\phi _{21,1}y_{1,T}+\hat\phi_{22,1}y_{2,T}.
\end{align*}\]
This is the same form as (12.1)–(12.2), except that the errors have been set to zero and parameters have been replaced with their estimates. For \(h=2\), the forecasts are given by
\[\begin{align*}
\hat y_{1,T+2|T} &=\hat{c}_1+\hat\phi_{11,1}\hat y_{1,T+1|T}+\hat\phi_{12,1}\hat y_{2,T+1|T}\\
\hat y_{2,T+2|T}&=\hat{c}_2+\hat\phi_{21,1}\hat y_{1,T+1|T}+\hat\phi_{22,1}\hat y_{2,T+1|T}.
\end{align*}\]
Again, this is the same form as (12.1)–(12.2), except that the errors have been set to zero, the parameters have been replaced with their estimates, and the unknown values of \(y_1\) and \(y_2\) have been replaced with their forecasts. The process can be iterated in this manner for all future time periods.

There are two decisions one has to make when using a VAR to forecast, namely how many variables (denoted by \(K\)) and how many lags (denoted by \(p\)) should be included in the system. The number of coefficients to be estimated in a VAR is equal to \(K+pK^2\) (or \(1+pK\) per equation). For example, for a VAR with \(K=5\) variables and \(p=3\) lags, there are 16 coefficients per equation, giving a total of 80 coefficients to be estimated. The more coefficients that need to be estimated, the larger the estimation error entering the forecast.

In practice, it is usual to keep \(K\) small and include only variables that are correlated with each other, and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags to be included. Care should be taken when using the AICc as it tends to choose large numbers of lags; instead, for VAR models, we often use the BIC instead. A more sophisticated version of the model is a “sparse VAR” (where many coefficients are set to zero); another approach is to use “shrinkage estimation” (where coefficients are smaller).

A criticism that VARs face is that they are atheoretical; that is, they are not built on some economic theory that imposes a theoretical structure on the equations. Every variable is assumed to influence every other variable in the system, which makes a direct interpretation of the estimated coefficients difficult. Despite this, VARs are useful in several contexts:

- forecasting a collection of related variables where no explicit interpretation is required;
- testing whether one variable is useful in forecasting another (the basis of Granger causality tests);
- impulse response analysis, where the response of one variable to a sudden but temporary change in another variable is analysed;
- forecast error variance decomposition, where the proportion of the forecast variance of each variable is attributed to the effects of the other variables.

### Example: A VAR model for forecasting US consumption

```
<- us_change %>%
fit model(
aicc = VAR(vars(Consumption, Income)),
bic = VAR(vars(Consumption, Income), ic = "bic")
)
fit#> # A mable: 1 x 2
#> aicc bic
#> <model> <model>
#> 1 <VAR(5) w/ mean> <VAR(1) w/ mean>
glance(fit)
#> # A tibble: 2 x 6
#> .model sigma2 log_lik AIC AICc BIC
#> <chr> <list> <dbl> <dbl> <dbl> <dbl>
#> 1 aicc <dbl[,2] [2 × 2]> -373. 798. 806. 883.
#> 2 bic <dbl[,2] [2 × 2]> -408. 836. 837. 869.
```

A VAR(5) model is selected using the AICc (the default), while a VAR(1) model is selected using the BIC. This is not unusual — the BIC will always select a model that has fewer parameters than the AICc model as it imposes a stronger penalty for the number of parameters.

```
%>%
fit augment() %>%
ACF(.innov) %>%
autoplot()
```

We see that the residuals from the VAR(1) model (`bic`

) have significant autocorrelation for Consumption, while the VAR(5) model has effectively captured all the information in the data.

The forecasts generated by the VAR(5) model are plotted in Figure 12.14.

```
%>%
fit select(aicc) %>%
forecast() %>%
autoplot(us_change %>% filter(year(Quarter) > 2010))
```