11.1 Hierarchical and grouped time series
Hierarchical time series
Figure 11.1 shows a simple hierarchical structure. At the top of the hierarchy is the “Total”, the most aggregate level of the data. The \(t\)th observation of the Total series is denoted by \(y_t\) for \(t=1,\dots,T\). The Total is disaggregated into two series, which in turn are divided into three and two series respectively at the bottom level of the hierarchy. Below the top level, we use \(y_{j,t}\) to denote the \(t\)th observation of the series corresponding to node \(j\). For example, \(\y{A}{t}\) denotes the \(t\)th observation of the series corresponding to node A, \(\y{AB}{t}\) denotes the \(t\)th observation of the series corresponding to node AB, and so on.

Figure 11.1: A two level hierarchical tree diagram.
In this small example, the total number of series in the hierarchy is \(n=1+2+5=8\), while the number of series at the bottom level is \(m=5\). Note that \(n>m\) in all hierarchies.
For any time \(t\), the observations at the bottom level of the hierarchy will sum to the observations of the series above. For example, \[\begin{equation} y_{t}=\y{AA}{t}+\y{AB}{t}+\y{AC}{t}+\y{BA}{t}+\y{BB}{t}, \tag{11.1} \end{equation}\] \[\begin{equation} \y{A}{t}=\y{AA}{t}+\y{AB}{t}+\y{AC}{t}\qquad \text{and} \qquad \y{B}{t}=\y{BA}{t}+\y{BB}{t}. \tag{11.2} \end{equation}\] Substituting (11.2) into (11.1), we also get \(y_{t}=\y{A}{t}+\y{B}{t}\).
Example: Australian tourism hierarchy
Australia is divided into six states and two territories, with each one having its own government and some economic and administrative autonomy. For simplicity, we refer to both states and territories as “states”. Each of these states can be further subdivided into regions as shown in Figure 11.2 and Table 11.1. In total there are 76 such regions. Business planners and tourism authorities are interested in forecasts for the whole of Australia, for each of the states and territories, and also for the regions.

Figure 11.2: Australian states and tourism regions.
State | Region |
---|---|
Australian Capital Territory | Canberra |
New South Wales | Blue Mountains, Capital Country, Central Coast, Central NSW, Hunter, New England North West, North Coast NSW, Outback NSW, Riverina, Snowy Mountains, South Coast, Sydney, The Murray. |
Northern Territory | Alice Springs, Barkly, Darwin, Kakadu Arnhem, Katherine Daly, Lasseter, MacDonnell. |
Queensland | Brisbane, Bundaberg, Central Queensland, Darling Downs, Fraser Coast, Gold Coast, Mackay, Northern Outback, Sunshine Coast, Tropical North Queensland, Whitsundays. |
South Australia | Adelaide, Adelaide Hills, Barossa, Clare Valley, Eyre Peninsula, Fleurieu Peninsula, Flinders Ranges and Outback, Kangaroo Island, Limestone Coast, Murraylands, Riverland, Yorke Peninsula. |
Tasmania | East Coast, Hobart and the South, Launceston Tamar and the North, North West, Wilderness West. |
Victoria | Ballarat, Bendigo Loddon, Central Highlands, Central Murray, Geelong and the Bellarine, Gippsland, Goulburn, Great Ocean Road, High Country, Lakes, Macedon, Mallee, Melbourne, Melbourne East, Murray East, Peninsula, Phillip Island, Spa Country, Upper Yarra, Western Grampians, Wimmera. |
Western Australia | Australia’s Coral Coast, Australia’s Golden Outback, Australia’s North West, Australia’s South West, Experience Perth. |
The tourism
tsibble contains data on quarterly domestic tourism demand, measured as the number of overnight trips Australians spend away from home. The key variables State
and Region
denote the geographical areas, while a further key Purpose
describes the purpose of travel. For now, we will ignore the purpose of travel and just consider the geographic hierarchy. To make the graphs and tables simpler, we will recode State
to use abbreviations.
tourism <- tsibble::tourism |>
mutate(State = recode(State,
`New South Wales` = "NSW",
`Northern Territory` = "NT",
`Queensland` = "QLD",
`South Australia` = "SA",
`Tasmania` = "TAS",
`Victoria` = "VIC",
`Western Australia` = "WA"
))
Using the aggregate_key()
function, we can create the hierarchical time series with overnight trips in regions at the bottom level of the hierarchy, aggregated to states, which are aggregated to the national total. A hierarchical time series corresponding to the nested structure is created using a parent/child
specification.
tourism_hts <- tourism |>
aggregate_key(State / Region, Trips = sum(Trips))
tourism_hts
#> # A tsibble: 6,800 x 4 [1Q]
#> # Key: State, Region [85]
#> Quarter State Region Trips
#> <qtr> <chr*> <chr*> <dbl>
#> 1 1998 Q1 <aggregated> <aggregated> 23182.
#> 2 1998 Q2 <aggregated> <aggregated> 20323.
#> 3 1998 Q3 <aggregated> <aggregated> 19827.
#> 4 1998 Q4 <aggregated> <aggregated> 20830.
#> 5 1999 Q1 <aggregated> <aggregated> 22087.
#> 6 1999 Q2 <aggregated> <aggregated> 21458.
#> 7 1999 Q3 <aggregated> <aggregated> 19914.
#> 8 1999 Q4 <aggregated> <aggregated> 20028.
#> 9 2000 Q1 <aggregated> <aggregated> 22339.
#> 10 2000 Q2 <aggregated> <aggregated> 19941.
#> # ℹ 6,790 more rows
The new tsibble
now has some additional rows corresponding to state and national aggregations for each quarter. Figure 11.3 shows the aggregate total overnight trips for the whole of Australia as well as the states, revealing diverse and rich dynamics. For example, there is noticeable national growth since 2010 and for some states such as the ACT, New South Wales, Queensland, South Australia, and Victoria. There seems to be a significant jump for Western Australia in 2014.
tourism_hts |>
filter(is_aggregated(Region)) |>
autoplot(Trips) +
labs(y = "Trips ('000)",
title = "Australian tourism: national and states") +
facet_wrap(vars(State), scales = "free_y", ncol = 3) +
theme(legend.position = "none")

Figure 11.3: Domestic overnight trips from 1998 Q1 to 2017 Q4 aggregated by state.
tourism_hts |>
filter(State == "NT" | State == "QLD" |
State == "TAS" | State == "VIC", is_aggregated(Region)) |>
select(-Region) |>
mutate(State = factor(State, levels=c("QLD","VIC","NT","TAS"))) |>
gg_season(Trips) +
facet_wrap(vars(State), nrow = 2, scales = "free_y")+
labs(y = "Trips ('000)")

Figure 11.4: Seasonal plots for overnight trips for Queensland and the Northern Territory, and Victoria and Tasmania highlighting the contrast in seasonal patterns between northern and southern states in Australia.
The seasonal pattern of the northern states, such as Queensland and the Northern Territory, leads to peak visits in winter (corresponding to Q3) due to the tropical climate and rainy summer months. In contrast, the southern states tend to peak in summer (corresponding to Q1). This is highlighted in the seasonal plots shown in Figure 11.4 for Queensland and the Northern Territory (shown in the left column) versus the most southern states of Victoria and Tasmania (shown in the right column).

Figure 11.5: Domestic overnight trips from 1998 Q1 to 2017 Q4 for some selected regions.
The plots in Figure 11.5 shows data for some selected regions. These help us visualise the diverse regional dynamics within each state, with some series showing strong trends or seasonality, some showing contrasting seasonality, while some series appear to be just noise.
Grouped time series
With grouped time series, the data structure does not naturally disaggregate in a unique hierarchical manner. Figure 11.6 shows a simple grouped structure. At the top of the grouped structure is the Total, the most aggregate level of the data, again represented by \(y_t\). The Total can be disaggregated by attributes (A, B) forming series \(\y{A}{t}\) and \(\y{B}{t}\), or by attributes (X, Y) forming series \(\y{X}{t}\) and \(\y{Y}{t}\). At the bottom level, the data are disaggregated by both attributes.

Figure 11.6: Alternative representations of a two level grouped structure.
This example shows that there are alternative aggregation paths for grouped structures. For any time \(t\), as with the hierarchical structure, \[\begin{equation*} y_{t}=\y{AX}{t}+\y{AY}{t}+\y{BX}{t}+\y{BY}{t}. \end{equation*}\] However, for the first level of the grouped structure, \[\begin{equation} \y{A}{t}=\y{AX}{t}+\y{AY}{t}\quad \quad \y{B}{t}=\y{BX}{t}+\y{BY}{t} \tag{11.3} \end{equation}\] but also \[\begin{equation} \y{X}{t}=\y{AX}{t}+\y{BX}{t}\quad \quad \y{Y}{t}=\y{AY}{t}+\y{BY}{t} \tag{11.4}. \end{equation}\]
Grouped time series can sometimes be thought of as hierarchical time series that do not impose a unique hierarchical structure, in the sense that the order by which the series can be grouped is not unique.
Example: Australian prison population
In this example we consider the Australia prison population data introduced in Chapter 2. The top panel in Figure 11.7 shows the total number of prisoners in Australia over the period 2005Q1–2016Q4. This represents the top-level series in the grouping structure. The panels below show the prison population disaggregated or grouped by (a) state (b) legal status (whether prisoners have already been sentenced or are in remand waiting for a sentence), and (c) gender. The three factors are crossed, but none are nested within the others.

Figure 11.7: Total Australian quarterly adult prison population, disaggregated by state, by legal status, and by gender.
The following code, introduced in Section 2.1, builds a tsibble
object for the prison data.
prison <- readr::read_csv("https://OTexts.com/fpp3/extrafiles/prison_population.csv") |>
mutate(Quarter = yearquarter(Date)) |>
select(-Date) |>
as_tsibble(key = c(Gender, Legal, State, Indigenous),
index = Quarter) |>
relocate(Quarter)
We create a grouped time series using aggregate_key()
with attributes or groupings of interest now being crossed using the syntax attribute1*attribute2
(in contrast to the parent/child
syntax used for hierarchical time series). The following code builds a grouped tsibble for the prison data with crossed attributes: gender, legal status and state.
Using is_aggregated()
within filter()
is helpful for exploring or plotting the main groups shown in the bottom panels of Figure 11.7. For example, the following code plots the total numbers of female and male prisoners across Australia.
prison_gts |>
filter(!is_aggregated(Gender), is_aggregated(Legal),
is_aggregated(State)) |>
autoplot(Count) +
labs(y = "Number of prisoners ('000)")
Plots of other group combinations can be obtained in a similar way. Figure 11.8 shows the Australian prison population grouped by all possible combinations of two attributes at a time: state and gender, state and legal status, and legal status and gender. The following code will reproduce the first plot in Figure 11.8.

Figure 11.8: Australian adult prison population disaggregated by pairs of attributes.
prison_gts |>
filter(!is_aggregated(Gender), !is_aggregated(Legal),
!is_aggregated(State)) |>
mutate(Gender = as.character(Gender)) |>
ggplot(aes(x = Quarter, y = Count,
group = Gender, colour=Gender)) +
stat_summary(fun = sum, geom = "line") +
labs(title = "Prison population by state and gender",
y = "Number of prisoners ('000)") +
facet_wrap(~ as.character(State),
nrow = 1, scales = "free_y") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Figure 11.9 shows the Australian adult prison population disaggregated by all three attributes: state, legal status and gender. These form the bottom-level series of the grouped structure.

Figure 11.9: Bottom-level time series for the Australian adult prison population, grouped by state, legal status and gender.
Mixed hierarchical and grouped structure
Often disaggregating factors are both nested and crossed. For example, the Australian tourism data can also be disaggregated by the four purposes of travel: holiday, business, visiting friends and relatives, and other. This grouping variable does not nest within any of the geographical variables. In fact, we could consider overnight trips split by purpose of travel for the whole of Australia, and for each state, and for each region. We describe such a structure as a “nested” geographic hierarchy “crossed” with the purpose of travel. Using aggregate_key()
this can be specified by simply combining the factors.
The tourism_full
tsibble contains 425 series, including the 85 series from the hierarchical structure, as well as another 340 series obtained when each series of the hierarchical structure is crossed with the purpose of travel.

Figure 11.10: Australian domestic overnight trips from 1998 Q1 to 2017 Q4 disaggregated by purpose of travel.

Figure 11.11: Australian domestic overnight trips over the period 1998 Q1 to 2017 Q4 disaggregated by purpose of travel and by state.
Figures 11.10 and 11.11 show the aggregate series grouped by purpose of travel, and the series grouped by purpose of travel and state, revealing further rich and diverse dynamics across these series.