10.3 Forecasting
To forecast using a regression model with ARIMA errors, we need to forecast the regression part of the model and the ARIMA part of the model, and combine the results. As with ordinary regression models, in order to obtain forecasts we first need to forecast the predictors. When the predictors are known into the future (e.g., calendar-related variables such as time, day-of-week, etc.), this is straightforward. But when the predictors are themselves unknown, we must either model them separately, or use assumed future values for each predictor.
Example: US Personal Consumption and Income
We will calculate forecasts for the next eight quarters assuming that the future percentage changes in personal disposable income will be equal to the mean percentage change from the last forty years.
us_change_future <- new_data(us_change, 8) |>
mutate(Income = mean(us_change$Income))
forecast(fit, new_data = us_change_future) |>
autoplot(us_change) +
labs(y = "Percentage change")

Figure 10.4: Forecasts obtained from regressing the percentage change in consumption expenditure on the percentage change in disposable income, with an ARIMA(1,0,2) error model.
The prediction intervals for this model are narrower than if we had fitted an ARIMA model without covariates, because we are now able to explain some of the variation in the data using the income predictor.
It is important to realise that the prediction intervals from regression models (with or without ARIMA errors) do not take into account the uncertainty in the forecasts of the predictors. So they should be interpreted as being conditional on the assumed (or estimated) future values of the predictor variables.
Example: Forecasting electricity demand
Daily electricity demand can be modelled as a function of temperature. As can be observed on an electricity bill, more electricity is used on cold days due to heating and hot days due to air conditioning. The higher demand on cold and hot days is reflected in the U-shape of Figure 10.5, where daily demand is plotted versus daily maximum temperature.
vic_elec_daily <- vic_elec |>
filter(year(Time) == 2014) |>
index_by(Date = date(Time)) |>
summarise(
Demand = sum(Demand) / 1e3,
Temperature = max(Temperature),
Holiday = any(Holiday)
) |>
mutate(Day_Type = case_when(
Holiday ~ "Holiday",
wday(Date) %in% 2:6 ~ "Weekday",
TRUE ~ "Weekend"
))
vic_elec_daily |>
ggplot(aes(x = Temperature, y = Demand, colour = Day_Type)) +
geom_point() +
labs(y = "Electricity demand (GW)",
x = "Maximum daily temperature")

Figure 10.5: Daily electricity demand versus maximum daily temperature for the state of Victoria in Australia for 2014.
The data stored as vic_elec_daily
includes total daily demand, daily maximum temperatures, and an indicator variable for if that day is a public holiday. Figure 10.6 shows the time series of both daily demand and daily maximum temperatures. The plots highlight the need for both a non-linear and a dynamic model.
vic_elec_daily |>
pivot_longer(c(Demand, Temperature)) |>
ggplot(aes(x = Date, y = value)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y") + ylab("")

Figure 10.6: Daily electricity demand and maximum daily temperature for the state of Victoria in Australia for 2014.
In this example, we fit a quadratic regression model with ARMA errors using the ARIMA()
function. The model also includes an indicator variable for if the day was a working day or not.
fit <- vic_elec_daily |>
model(ARIMA(Demand ~ Temperature + I(Temperature^2) +
(Day_Type == "Weekday")))
fit |> gg_tsresiduals()

Figure 10.7: Residuals diagnostics for a dynamic regression model for daily electricity demand with workday and quadratic temperature effects.
The fitted model has an ARIMA(2,1,2)(2,0,0)[7] error, so there are 6 AR and MA coefficients.
augment(fit) |>
features(.innov, ljung_box, dof = 6, lag = 14)
#> # A tibble: 1 × 3
#> .model lb_stat lb_pvalue
#> <chr> <dbl> <dbl>
#> 1 "ARIMA(Demand ~ Temperature + I(Temperature^2) + (Day_Typ… 28.4 0.000404
There is clear heteroscedasticity in the residuals, with higher variance in January and February, and lower variance in May. The model also has some significant autocorrelation in the residuals, and the histogram of the residuals shows long tails. All of these issues with the residuals may affect the coverage of the prediction intervals, but the point forecasts should still be ok.
Using the estimated model we forecast 14 days ahead starting from Thursday 1 January 2015 (a non-work-day being a public holiday for New Years Day). In this case, we could obtain weather forecasts from the weather bureau for the next 14 days. But for the sake of illustration, we will use scenario based forecasting (as introduced in Section 7.6) where we set the temperature for the next 14 days to a constant 26 degrees.
vic_elec_future <- new_data(vic_elec_daily, 14) |>
mutate(
Temperature = 26,
Holiday = c(TRUE, rep(FALSE, 13)),
Day_Type = case_when(
Holiday ~ "Holiday",
wday(Date) %in% 2:6 ~ "Weekday",
TRUE ~ "Weekend"
)
)
forecast(fit, vic_elec_future) |>
autoplot(vic_elec_daily) +
labs(title="Daily electricity demand: Victoria",
y="GW")

Figure 10.8: Forecasts from the dynamic regression model for daily electricity demand. All future temperatures have been set to 26 degrees, and the working day dummy variable has been set to known future values.
The point forecasts look reasonable for the first two weeks of 2015. The slow down in electricity demand at the end of 2014 (due to many people taking summer vacations) has caused the forecasts for the next two weeks to show similarly low demand values.