AUSTRALIAN FRONTIERS OF SCIENCE, 2008
The Shine Dome, Canberra, 21-22 February
Forecasting functional time series
by Professor Rob Hyndman
![]() |
Rob Hyndman is Professor of Statistics at Monash University. He was recently awarded the 2007 Moran Medal for 'substantial contributions to several areas of statistics'. His theoretical and methodological work covers applied probability, time series, statistical inference, forecasting, non-parametric estimation and statistical graphs. Rob is Editor-in-Chief of the International Journal of Forecasting and was previously Theory and Methods Editor of the Australian and New Zealand Journal of Statistics (20012004). He has published extensively in leading statistical and forecasting journals. In particular, his papers on non-Gaussian forecasting and automatic forecasting have been very influential. |
I should say that although I am in a department of econometrics, I do a lot of collaborative work with people in other disciplines medicine, human science, social science and demography, all sorts of places.
What I am going to talk about is functional data data that are infinitely dimensional in one direction, and finite in the other.
![]()
(Click on image for a larger version)
I thought I would start by showing a little example of data where it is relatively easy to understand what is going on, because it is mortality people generally know something about mortality. The curve here is showing the French male mortality in 1899. Things are not so good when you are young; mortality is pretty high for babies. It drops fairly rapidly, and at least in 1899 the ideal time to be alive was when you were in your early teens. Then it increases till about 20; there is a well-known 'accident hump' that happens around age 20, as you can just see on this graph. (I will show you some data later where it is much more apparent.) And then it flattens out and steadily goes uphill from there.
![]()
(Click on image to view and play this PDF movie)
If we just look at how that evolves over time, we can see that the First World War comes in, then the Second World War pops up, and then there is a big drop after the Second World War with changes in mortality. The accident bump then appears again in the early to late 1970s. We are trying to model these changes and project them into the future.
So there are a few things here: we have got the wars, and as well as those years from the First World War there was a Spanish flu occurring as well; you have got the Second World War providing more outliers; and then you have got these accident bumps where things actually got worse for people around their late teens and early 20s, largely due to car accidents, with an additional effect from the AIDS epidemic. The graph is ordered by colour, the red being the oldest and the purple being the most recent rainbow order.
The real trick here is to know what is going to happen for the next 50 years. To gain any kind of understanding of the ageing population, we need good forecasts of death rates.
![]()
(Click on image for a larger version)
The second example is Australian fertility rates. Mathematically this is the same problem. Even though it is a completely different set of data, we are still looking at curves, the curves are infinite in the age direction, because you can sample at any level of accuracy across the age dimension, but finite in the time dimension we have just got one curve per year.
In 1921 this is what the fertility rates looked like.
![]()
(Click on image to view and play this PDF movie)
The fertility rates dropped during the Second World War. Then came the baby boom; you can see it really was a boom. It took off, with all these green curves, straight after the Second World War. Blue is around the start of the 1980s, and what we have had since then, in the last 20 years, is a shift in the curve sideways as people have delayed child-bearing. The purple curve is for the most recent data that I have got there, which is 2003, and the curve, although staying roughly the same size, has been shifting to the right.
Again we are interested in forecasting this for the next 20 to 30 years, so that governments can use the figures for planning.
![]()
(Click on image for a larger version)
Nobody else has put up an equation, and I thought hard about how I could do this talk without equations. I decided I really just needed one, or I wouldn't be able to explain what the rest of it was about. So this is it. You will see it three times (this is the first time) but it is the same equation. It is a way of writing down functional data data that is in the nature of curves in a way that we can start to analyse them.
Y is the observed data, which might be mortality rates or fertility rates or whatever else we have measured; t is the year; and x is age, in this case. So on the left-hand side of the equation we have got the curve; on the right-hand side is a decomposition of the curve into bits that will be a little more manageable than just looking at individual curves. So it equals a mean and then a sum of other bits. And the trick here is that in the other bits we separate age and time, so that the first term has no x in it, and the second term has no t in it. That makes it a whole lot easier to deal with doing any kind of statistical analysis, because age and time have been separated. And then the last term is error: because no model is perfect, you have got to include an error term.
We are going to estimate the betas and the phis , using a principal components (PC) method. I am not sure that those words have been used before, but I know that a lot of people here have been doing principal components on other sorts of data. We can do them as well. It is a little different, because our data is infinitely dimensional so the matrix, for example, is infinite in one direction but that is not actually that difficult.
We will do these principal components and we will have a look at the betas, which are the scores. The phis are the eigenfunctions.
So that is the equation. I hope that wasn't too scary. We will now go back to the graphs.
![]()
(Click on image for a larger version)
At the left here we have the mean mortality over time. And then every one of the curves gets decomposed into a series of other curves which are constant across time. They are the phis at the top, and the betas are below them. So the curve for a single year is the mean plus Phi 1 multiplied by Beta 1 plus Phi 2 multiplied by Beta 2, and so on. That enables us to do forecasting, because we can forecast the betas, the ones at the bottom, and reconstruct future curves in that way.
If you look at these, though, you get some interesting things. We have got wild outliers the wars, and the Spanish flu pandemic and any kind of model is going to have to allow for weird things like that going on. You can't just fit a standard time series model to it.
![]()
(Click on image for a larger version)
The residuals from that decomposition are shown here, and again the outliers show up, as interesting little strips. What I often look for here are diagonal strips, because they are cohort effects. If you have these diagonal strips running across the graph, then that is a cohort effect, something that has affected one population over time but doesn't affect the neighbouring cohorts. There may be hints of it in two places here, but it is not very strong. In some other populations we have looked at, it is much stronger.
![]()
(Click on image for a larger version)
Here we have the same equation. We want to be able to spot those outliers automatically. It is not always going to be easy to identify where the outliers are and how we should deal with them, so we want an automatic way of doing it. The idea is that the outliers will show up as outliers in the betas, as we saw from the previous graphs. And sure enough, that is what happens.
![]()
(Click on image for a larger version)
This is a plot of Beta 1 versus Beta 2, the first principal component score against the second one. The outliers are shown at the right-hand side of the graph, and you can see that it has picked out all of those years associated with the two wars.
And then the lower graph is a plot of the data, with just the outliers shown in different colours.
That is called a 'bagplot', invented by Tukey, famous for a lot of statistical inventions. The bagplot enables you to pick out outliers in a two-dimensional scatter plot.
![]()
(Click on image for a larger version)
Another way of doing outliers is called an 'HDR boxplot', invented by somebody much less famous it was in a paper of mine, about 10 years ago. It turns out that we get the same outliers, using this particular way of spotting them.
So that gives us a way of pulling the outliers out of the data and trying to work with what is left.
![]()
(Click on image for a larger version)
Here again we have the same equation. The eigenfunctions, the phis, show the main regions of variation. The betas are the things that are moving through time. They are the bits we are going to try to forecast.
Because we use principal components to get the scores {βt,k}, they are actually uncorrelated by construction. They automatically don't have any relationship with the 'other' beta Beta 1 is not correlated with Beta 2. So that makes forecasting really easy. We just have to look at them one at a time; we don't have to worry about all these relationships between the betas.
We will just forget the outliers existed, pretend wars won't occur in the future. That means our forecasts will be conditional on a peaceful situation continuing, with no major pandemics.
![]()
(Click on image for a larger version)
So here are the forecasts. You see that the outliers have been greyed out. The black is what is left, and the forecasts are shown in yellow. This is our prediction for the next 20 years, with 80 per cent prediction regions around it reasonably easy things to predict once you take those outliers out.
![]()
(Click on image for a larger version)
Here are the data.
![]()
(Click on image for a larger version)
If we take the data away...
![]()
(Click on image for a larger version)
...and show the forecasts reconstructed from those extrapolations, the mortality rates will continue to decline, mostly at the younger ages with very, very small declines at the older ages.
![]()
(Click on image for a larger version)
The interesting thing is to put some uncertainty limits around those. One year ahead is shown here in red; 20 years ahead is in purple and even for 20 years ahead we have very tight prediction intervals. To predict 20 years ahead and have that width interval is pretty good. The reason is that mortality is pretty stable, it is pretty easy to predict. We have a fairly good idea of what is going to happen over the next 20 years, which is not the case for some other series.
![]()
(Click on image for a larger version)
Let me just do a second example, looking at Australia's fertility rates. Here is the picture. Remember that we had the baby boom and we have the shift in child-bearing, we have the introduction of the Pill at the end of the baby boom, dropping the green curves down.
![]()
(Click on image for a larger version)
I am going to try and do the same sort of model for that. You see here the decomposition, with the mean curve, the first phi function, the second one and the betas, so you get a very interesting interpretation. I skipped over the interpretation of the last one, but let me just pause and have a look at the interpretation you get from this.
It is saying that most of the action is happening around people aged 25 and people aged 45, with not much action happening in the middle, when people are in their early thirties. For those two age groups, the behaviour has been to fall rapidly to 1940, to increase rapidly to about the 1970s, and then to drop down until 1980. And then it has been a relatively steady but slight downhill trend from 1980. This only goes to 2003; in the last three or four years there has been a barely perceptible increase in the trend just very slightly going up. It is not statistically significant. Despite the fact that our previous Treasurer made a big deal of the fact that the fertility rate had gone up as a result of baby bonuses, it is nowhere near statistically significant. It is just a little bit of 'noise' happening at the bottom of the curve.
The second principal component, at the right of this slide, says that a lot of the rest of the action is happening out in the older ages, from age 35, maybe, onwards. For that group there was a drop down to about the middle of the 1970s, and then there has been a rapid increase since. That is reflecting changes in the social situation, but also changes in what is available to stop fertility.
![]()
(Click on image for a larger version)
There are some residuals. Some weird things are happening here that we haven't yet explained. I don't know what happened in the 1980s to cause two of these years to be a little odd.
![]()
(Click on image for a larger version)
Nevertheless, here are the forecasts. Again the forecasts are just using univariate time series models. The interesting thing for fertility is that the prediction intervals are so wide. The reason for that is that historically there have been big changes. In the curve showing Beta 1 against time you get big jumps big increases, big drops and so if you are going to forecast it you have to allow for the fact that such big changes can occur in the future, and so our prediction intervals are relatively wide.
| (Click on image for a larger version) | |
To turn this into forecasts of the curve: here we have the data, and as we grey the data out and bring in the forecasting, you can see that the curve is continuing to shift to the right. The red is the one-year forecast, and then, increasing up to 20 years, is the purple. So essentially it is just shifting in relatively small ways to the right.
| (Click on image for a larger version) | |
But the prediction intervals around that tell the real story.
The real story is that we can have a pretty good idea what is going to happen next year or the year after, but in 20 years time the prediction intervals are so wide that we really have no idea what is going to happen to fertility. It could be going up or down, and we can't tell from the data.
![]()
(Click on image for a larger version)
Let me just say a little bit about the team of people that works with me on this. You see here the names of those who work on this particular topic, functional data forecasting.
We are looking at a few things. One is cohort effects. Where you might have a particular cohort of people it doesn't have to be people, it depends on the application you have effects that will move through time diagonally. Our models don't allow for that the moment, and we are trying to bring that in.
We are looking at how to model synergy and differences between groups. For example, we are looking at white and black breast-cancer mortality in the United States. There are some similarities in the patterns of breast-cancer mortality, but there are some differences, and we need to allow for that. We are looking at breast-screening effects, looking at the differences in cancer rates between those who are screened and those who aren't, using exactly the same sorts of models.
We are looking at whether there is a better way to choose the basis functions, the phis, than just principal components. Even though we are using a fancy version of principal components, it is really designed to explain the historical variation as well as possible, rather than the future variation, and we are looking to see whether we can do better in that respect.
![]()
(Click on image for a larger version)
Applications that have been worked on with this sort of technology so far are, firstly, the fertility rates and the all-cause mortality they are the two that I have shown you. Population flows naturally from that. If you can forecast fertility, mortality and migration, then you forecast population. (Population is just made up of those three components.) We have a new paper out on forecasting Australia's population using this approach, which is actually the first time we have got prediction intervals for population forecasts. It does seem remarkable, but the ABS (Australian Bureau of Statistics) doesn't give any kind of estimate of uncertainty for their forecasts. In fact, they won't even call them forecasts, they call them projections and hide behind that terminology, whereas we are producing forecasts with uncertainty limits.
We are looking at cancer mortality and incidence rates. The Australian Institute of Health and Welfare now use this methodology that we have developed for all the official cancer forecasts in Australia.
Some people are using the methodology for yield curves in finance. They are the curves of, essentially, interest rates over terms.
Some people have used it for seasonal electricity demand. We have another way of doing seasonal electricity demand which works better.
Some people have used it for seasonal El Niño sea surface temperatures, where each curve is one year of the temperatures, to try to forecast into the next year or two.
If you can imagine a situation where you collect curves, then the methodology should be able to be used in that context.



