## 11.1 Hierarchical and grouped time series

### Hierarchical time series

Figure 11.1 shows a 2-level hierarchical structure. At the top of the hierarchy (which we call level 0) is the “Total,” the most aggregate level of the data. The $$t$$th observation of the Total series is denoted by $$y_t$$ for $$t=1,\dots,T$$. The Total is disaggregated into two series at level 1, which in turn are divided into three and two series respectively at the bottom-level of the hierarchy. Below the top level, we use $$y_{j,t}$$ to denote the $$t$$th observation of the series corresponding to node $$j$$. For example, $$\y{A}{t}$$ denotes the $$t$$th observation of the series corresponding to node A at level 1, $$\y{AB}{t}$$ denotes the $$t$$th observation of the series corresponding to node AB at level 2, and so on.

In this small example, the total number of series in the hierarchy is $$n=1+2+5=8$$, while the number of series at the bottom-level is $$m=5$$. Note that $$n>m$$ in all hierarchies.

For any time $$t$$, the observations at the bottom-level of the hierarchy will sum to the observations of the series above. For example, $$$y_{t}=\y{AA}{t}+\y{AB}{t}+\y{AC}{t}+\y{BA}{t}+\y{BB}{t} \tag{11.1}$$$ and $$$\y{A}{t}=\y{AA}{t}+\y{AB}{t}+\y{AC}{t}\quad \text{and} \quad \y{B}{t}=\y{BA}{t}+\y{BB}{t}. \tag{11.2}$$$ Substituting (11.2) into (11.1), we also get $$y_{t}=\y{A}{t}+\y{B}{t}$$.

### Example: Australian tourism hierarchy

Australia is divided into six states and two territories, with each one having its own government and some economic and administrative autonomy. Each of these can be further subdivided into regions as shown in Figure 11.2 and Table 11.1. For simplicity, we refer to both states and territories as “States.” In total there are 76 such regions. Business planners and tourism authorities are interested in forecasts for the whole of Australia, for each of the states and territories, and also for the regions.

Table 11.1: Australian tourism regions.
State Region
Australian Capital Territory Canberra
New South Wales Blue Mountains, Capital Country, Central Coast, Central NSW, Hunter, New England North West, North Coast NSW, Outback NSW, Riverina, Snowy Mountains, South Coast, Sydney, The Murray.
Northern Territory Alice Springs, Barkly, Darwin, Kakadu Arnhem, Katherine Daly, Lasseter, MacDonnell.
Queensland Brisbane, Bundaberg, Central Queensland, Darling Downs, Fraser Coast, Gold Coast, Mackay, Northern Outback, Sunshine Coast, Tropical North Queensland, Whitsundays.
South Australia Adelaide, Adelaide Hills, Barossa, Clare Valley, Eyre Peninsula, Fleurieu Peninsula, Flinders Ranges and Outback, Kangaroo Island, Limestone Coast, Murraylands, Riverland, Yorke Peninsula.
Tasmania East Coast, Hobart and the South, Launceston Tamar and the North, North West, Wilderness West.
Victoria Ballarat, Bendigo Loddon, Central Highlands, Central Murray, Geelong and the Bellarine, Gippsland, Goulburn, Great Ocean Road, High Country, Lakes, Macedon, Mallee, Melbourne, Melbourne East, Murray East, Peninsula, Phillip Island, Spa Country, Upper Yarra, Western Grampians, Wimmera.
Western Australia Australia’s Coral Coast, Australia’s Golden Outback, Australia’s North West, Australia’s South West, Experience Perth.

The tourism tsibble contains data on quarterly domestic tourism demand, measured as the number of overnight trips Australians spend away from home. The key variables State and Region denote the geographical areas, while a further key Purpose describes the purpose of travel. For now, we will ignore the purpose of travel and just consider the geographic hierarchy. To make the graphs and tables simpler, we will recode State to use abbreviations.

tourism <- tsibble::tourism %>%
mutate(State = recode(State,
New South Wales = "NSW",
Northern Territory = "NT",
Queensland = "QLD",
South Australia = "SA",
Tasmania = "TAS",
Victoria = "VIC",
Western Australia = "WA"
))

Using the aggregate_key() function, we can create the hierarchical time series with overnight trips in regions at the bottom-level of the hierarchy, aggregated to states, which are aggregated to the national total. A hierarchical time series corresponding to the nested structure is created using a parent/child specification.

tourism_hts <- tourism %>%
aggregate_key(State / Region, Trips = sum(Trips))
tourism_hts
#> # A tsibble: 6,800 x 4 [1Q]
#> # Key:       State, Region [85]
#>    Quarter State        Region        Trips
#>      <qtr> <chr*>       <chr*>        <dbl>
#>  1 1998 Q1 <aggregated> <aggregated> 23182.
#>  2 1998 Q2 <aggregated> <aggregated> 20323.
#>  3 1998 Q3 <aggregated> <aggregated> 19827.
#>  4 1998 Q4 <aggregated> <aggregated> 20830.
#>  5 1999 Q1 <aggregated> <aggregated> 22087.
#>  6 1999 Q2 <aggregated> <aggregated> 21458.
#>  7 1999 Q3 <aggregated> <aggregated> 19914.
#>  8 1999 Q4 <aggregated> <aggregated> 20028.
#>  9 2000 Q1 <aggregated> <aggregated> 22339.
#> 10 2000 Q2 <aggregated> <aggregated> 19941.
#> # … with 6,790 more rows

The new tsibble now has some additional rows corresponding to state and national aggregations for each quarter. Figure 11.3 shows the aggregate total overnight trips for the whole of Australia as well as the states, revealing diverse and rich dynamics. For example, there is noticeable national growth since 2010 and for some states such as the ACT, New South Wales, Queensland, South Australia, and Victoria. There seems to be a significant jump for Western Australia in 2014.

tourism_hts %>%
filter(is_aggregated(Region)) %>%
autoplot(Trips) +
ylab("Trips ('000)") +
ggtitle("Australian tourism: national total and states") +
facet_wrap(vars(State), scales = "free_y", ncol = 3) +
theme(legend.position = "none")

The seasonal pattern of the northern states, such as Queensland and the Northern Territory, leads to peak visits in winter (corresponding to Q3) due to the tropical climate and rainy summer months. In contrast, the southern states tend to peak in summer (corresponding to Q1). This is highlighted in the seasonal plots shown in Figure 11.4 for Queensland and the Northern Territory versus the most southern states of Victoria and Tasmania.

The plots in Figure 11.5 shows data for some selected regions. These help us visualise the diverse individual dynamics within each region, with some series showing strong trends or seasonality, some showing contrasting seasonality, while some series appear to be just noise.

### Grouped time series

With grouped time series, the data structure does not naturally disaggregate in a unique hierarchical manner. Figure 11.6 shows a 2-level grouped structure. At the top of the grouped structure is the Total, the most aggregate level of the data, again represented by $$y_t$$. The Total can be disaggregated by attributes (A, B) forming series $$\y{A}{t}$$ and $$\y{B}{t}$$, or by attributes (X, Y) forming series $$\y{X}{t}$$ and $$\y{Y}{t}$$. At the bottom level, the data are disaggregated by both attributes.

This example shows that there are alternative aggregation paths for grouped structures. For any time $$t$$, as with the hierarchical structure, $\begin{equation*} y_{t}=\y{AX}{t}+\y{AY}{t}+\y{BX}{t}+\y{BY}{t}. \end{equation*}$ However, for the first level of the grouped structure, $$$\y{A}{t}=\y{AX}{t}+\y{AY}{t}\quad \quad \y{B}{t}=\y{BX}{t}+\y{BY}{t} \tag{11.3}$$$ but also $$$\y{X}{t}=\y{AX}{t}+\y{BX}{t}\quad \quad \y{Y}{t}=\y{AY}{t}+\y{BY}{t} \tag{11.4}.$$$

Grouped time series can sometimes be thought of as hierarchical time series that do not impose a unique hierarchical structure, in the sense that the order by which the series can be grouped is not unique.

### Example: Australian prison population

In this example we consider the Australia prison population data introduced in Chapter 2. The top panel in Figure 11.7 shows the total number of prisoners in Australia over the period 2005Q1–2016Q4. This represents the top-level series in the grouping structure. The panels below show the prison population disaggregated or grouped by (a) state (b) legal status (whether prisoners have already been sentenced or are in remand waiting for a sentence), and (c) gender. The three factors are crossed, but none are nested within the others.

The following code, introduced in Section 2.1, builds a tsibble object for the prison data.

prison <- readr::read_csv("https://OTexts.com/fpp3/extrafiles/prison_population.csv") %>%
mutate(Quarter = yearquarter(Date)) %>%
select(-Date)  %>%
as_tsibble(key = c(Gender, Legal, State, Indigenous), index = Quarter) %>%
relocate(Quarter)

We create a grouped time series using aggregate_key() with attributes or groupings of interest now being crossed using the syntax attribute1*attribute2 (in contrast to the parent/child syntax used for hierarchical time series). The following code builds a grouped tsibble for the prison data with crossed attributes: gender, legal status and state.

prison_gts <- prison %>%
aggregate_key(Gender * Legal * State, Count = sum(Count)/1e3)

Using is_aggregated() within filter() is helpful for exploring or plotting the main groups shown in the bottom panels of Figure 11.7. For example, the following code plots the total numbers of female and male prisoners across Australia.

prison_gts %>%
filter(!is_aggregated(Gender), is_aggregated(Legal), is_aggregated(State)) %>%
autoplot(Count) +
ylab("Number of prisoners ('000)") +
ggtitle("Gender")

Plots of other group combinations can also be obtained in a similar way. Figure 11.8 shows the Australian prison population grouped by all possible combinations of two attributes at a time: state and gender, state and legal status, and legal status and gender.

The following code will reproduce the first plot in Figure 11.8 above.

prison_gts %>%
filter(
!is_aggregated(Gender), !is_aggregated(Legal), !is_aggregated(State)
) %>%
mutate(Gender = as.character(Gender)) %>%
ggplot(aes(x = Quarter, y = Count, group = Gender, colour=Gender)) +
stat_summary(fun = sum, geom = "line") +
ggtitle("Prison population by state and gender") +
ylab("Number of prisoners ('000)") +
facet_wrap(~ as.character(State), nrow = 1, scales = "free_y") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

Figure 11.9 shows the Australian adult prison population disaggregated by all three attributes: state, legal status and gender. These form the bottom-level series of the grouped structure.

### Mixed hierarchical and grouped structure

Often disaggregating factors are both nested and crossed. For example, the Australian tourism data can also be disaggregated by the four purposes of travel: holiday, business, visiting friends and relatives, and other. This grouping variable does not nest within any of the geographical variables. In fact, we could consider overnight trips split by purpose of travel for the whole of Australia, and for each state, and for each region. We describe such a structure as a “nested” geographic hierarchy “crossed” with the purpose of travel. Using aggregate_key this can be specified by simply combining the factors as follows.

tourism_full <- tourism %>%
aggregate_key((State / Region) * Purpose, Trips = sum(Trips))

The tourism_full tsibble contains 425 series, including the 85 series from the hierarchical structure, as well as another 340 series obtained when each series of the hierarchical structure is crossed with the purpose of travel.

Figures 11.10 and 11.11 show the aggregate series grouped by purpose of travel, and the series grouped by purpose of travel and state, revealing further rich and diverse dynamics across these series.