SCIENCE OF SEASONAL CLIMATE PREDICTION

The Shine Dome, 2-3 August 2006

Evaluation of forecast ensembles
by Simon J Mason, International Research Institute for Climate and Society, The Earth Institute of Columbia University

A critical review of methods for evaluating the quality of forecast ensembles will be presented. Most methods currently used require the forecast to be expressed as probabilities for discrete categories. There are a number of scores used for when there are only two categories, but options are more limited when the number of categories is three or more. It will be argued that the only valid score to use is the ignorance score because it is the only score that is strictly proper and local. A local score is one that scores a forecast only on the basis of the probability assigned to the outcome. The desirability of locality will be explained and defended. Graphical verification procedures are also used for forecasts of probabilities for categories. Reliability diagrams and ROC graphs are widely used. Some issues related to the comparison of graphs for different forecast systems will be raised.

It is often undesirable to have to categorise the ensemble, and so procedures for verifying ensemble distributions will be discussed. Appropriate ways of identifying whether this in any information in the ensemble spread (and higher moments) will be identified; procedures based on some form of correlation between ensemble spread and forecast accuracy will be rejected as inappropriate. Graphical procedures, such as the Talagrand diagram, will be considered. Some limitations of the Talagrand diagram will be raised, and the concept of ‘complete calibration’ introduced. Complete calibration refers to the reliability of subsets of forecasts, and is useful for identifying whether the reliability of a forecast is conditional upon the forecast.

Predictability limits for seasonal atmospheric climate variability depend on the fraction of seasonal variance that is due to factors external to the atmosphere (eg. boundary conditions) and the fraction that is internal. Decomposition of observed seasonal variance into predictable (or external) and unpredictable (or internal) components, however, remains an outstanding (and often a controversial) issue. The importance of this decomposition is highlighted by the fact that the average skill of seasonal prediction has a fundamental limit that is determined by the ratio of external-to-internal variance.

In this talk reasons why limits to seasonal predictability should exist will be briefly discussed. Procedures for estimating atmospheric internal variability will be also outlined, and current estimates of seasonal predictability for surface temperature and rainfall over Australia will be presented.