To forecast using a regression model with ARIMA errors, we need to forecast the regression part of the model and the ARIMA part of the model, and combine the results. As with ordinary regression models, in order to obtain forecasts we first need to forecast the predictors. When the predictors are known into the future (e.g., calendar-related variables such as time, day-of-week, etc.), this is straightforward. But when the predictors are themselves unknown, we must either model them separately, or use assumed future values for each predictor.
Example: US Personal Consumption and Income
We will calculate forecasts for the next eight quarters assuming that the future percentage changes in personal disposable income will be equal to the mean percentage change from the last forty years.
The prediction intervals for this model are narrower than those for the model developed in Section 9.5 because we are now able to explain some of the variation in the data using the income predictor.
It is important to realise that the prediction intervals from regression models (with or without ARIMA errors) do not take into account the uncertainty in the forecasts of the predictors. So they should be interpreted as being conditional on the assumed (or estimated) future values of the predictor variables.
Example: Forecasting electricity demand
Daily electricity demand can be modelled as a function of temperature. As can be observed on an electricity bill, more electricity is used on cold days due to heating and hot days due to air conditioning. The higher demand on cold and hot days is reflected in the u-shape of Figure 10.5, where daily demand is plotted versus daily maximum temperature.
vic_elec_daily <- vic_elec %>% filter(year(Time) == 2014) %>% index_by(Date = date(Time)) %>% summarise( Demand = sum(Demand)/1e3, Temperature = max(Temperature), Holiday = any(Holiday) ) %>% mutate(Day_Type = case_when( Holiday ~ "Holiday", wday(Date) %in% 2:6 ~ "Weekday", TRUE ~ "Weekend" )) vic_elec_daily %>% ggplot(aes(x=Temperature, y=Demand, colour=Day_Type)) + geom_point() + ylab("Electricity demand (GW)") + xlab("Maximum daily temperature")
The data are stored as
vic_elec_daily includes total daily demand,daily maximum temperatures, and an indicator variable for if that day is a public holiday. Figure 10.6 shows the time series of both daily demand and daily maximum temperatures. The plots highlight the need for both a non-linear and a dynamic model.
In this example, we fit a quadratic regression model with ARMA errors using the
ARIMA() function. The model also includes an indicator variable for if the day was a working day or not.
There is clear heteroskedasticity in the residuals, with higher variance in January and February, and lower variance in May. The model also has some significant autocorrelation in the residuals, and the histogram of the residuals shows long tails. All of these issues with the residuals may affect the coverage of the prediction intervals, but the point forecasts should still be ok.
Using the estimated model we forecast 14 days ahead starting from Thursday 1 January 2015 (a non-work-day being a public holiday for New Years Day). In this case, we could obtain weather forecasts from the weather bureau for the next 14 days. But for the sake of illustration, we will use scenario based forecasting (as introduced in Section 7.6) where we set the temperature for the next 14 days to a constant 26 degrees.
vic_elec_future <- new_data(vic_elec_daily, 14) %>% mutate( Temperature = 26, Holiday = c(TRUE, rep(FALSE, 13)), Day_Type = case_when( Holiday ~ "Holiday", wday(Date) %in% 2:6 ~ "Weekday", TRUE ~ "Weekend" ) ) forecast(fit, vic_elec_future) %>% autoplot(vic_elec_daily) + ylab("Electricity demand (GW)")
The point forecasts look reasonable for the first two weeks of 2015. The slow down in electricity demand at the end of 2014 (due to many people taking summer vacations) has caused the forecasts for the next two weeks to show similarly low demand values.