4.4 Exercises

  1. Calculate the residuals from a seasonal naïve forecast applied to the quarterly Australian beer production data from 1992. The following code will help.

    # Extract data of interest
    recent_production <- aus_production %>%
      filter(year(Quarter) >= 1992)
    
    # Define and estimate a model
    fit <- recent_production %>% model(SNAIVE(Beer))
    
    # Look at the residuals
    fit %>% gg_tsresiduals()
    
    # Look a some forecasts
    fit %>% forecast() %>% autoplot(recent_production)

    What do you conclude?

  2. Repeat the exercise for the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

  3. Are the following statements true or false? Explain your answer.

    1. Good forecast methods should have normally distributed residuals.
    2. A model with small residuals will give good forecasts.
    3. The best measure of forecast accuracy is MAPE.
    4. If your model doesn’t forecast well, you should make it more complicated.
    5. Always choose the model with the best forecast accuracy as measured on the test set.
  4. For your retail time series (from Exercise 3 in Section 2.10):

    1. Create a training dataset consisting of observations before 2011 using

      myts_train <- myts %>%
        filter(Month <= yearmonth("2010 Dec"))
    2. Check that your data have been split appropriately by producing the following plot.

      autoplot(myts) +
        autolayer(myts_train, colour = "red")
    3. Calculate seasonal naïve forecasts using SNAIVE() applied to your training data (myts_train).

      fit <- myts_train %>%
        model(SNAIVE())
      fc <- fit %>%
        forecast()
    4. Compare the accuracy of your forecasts against the actual values.

      fit %>% accuracy()
      fc %>% accuracy(myts)
    5. Check the residuals.

      fit %>% gg_tsresiduals()

      Do the residuals appear to be uncorrelated and normally distributed?

    6. How sensitive are the accuracy measures to the amount of training data used?

  5. tourism contains quarterly visitor nights (in thousands) from 1998 to 2017 for 76 regions of Australia.

    1. Extract data from the Gold Coast region using filter() and aggregate total overnight trips (sum over Purpose) using summarise(). Call this new dataset gc_tourism.

    2. Using slice() or filter(), create three training sets for this data excluding the last 1, 2 and 3 years. For example, gc_train_1 <- gc_tourism %>% slice(1:(n()-4)).

    3. Compute one year of forecasts for each training set using the seasonal naïve (SNAIVE()) method. Call these gc_fc_1, gc_fc_2 and gc_fc_3, respectively.

    4. Use accuracy() to compare the test set forecast accuracy using MAPE. Comment on these.

  6. Consider the number of pigs slaughted in New South Wales (data set aus_livestock).

    1. Produce some plots of the data in order to become familiar with it.
    2. Create a training set of 486 observations, witholding a test set of 72 observations (6 years).
    3. Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?
    4. Check the residuals of your preferred method. Do they resemble white noise?
  7. Consider the sales of new one-family houses in the USA, Jan 1973 – Nov 1995 (data set fma::hsales).

    1. Convert the data to a tsibble using as_tsibble().`
    2. Produce some plots of the data in order to become familiar with it.
    3. Create a training set by witholding the last two years of data.
    4. Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?
    5. Check the residuals of your preferred method. Do they resemble white noise?