AUSTRALIAN FRONTIERS OF SCIENCE, 2008
The Shine Dome, Canberra, 21-22 February
Session 8: Statistical challenges with high dimensional data
Chair: Professor Sue Wilson
![]() |
Sue Wilson is director of the Centre for Bioinformation Science at the Mathematical Sciences Institute, Australian National University (ANU). She obtained her degree from the University of Sydney and her PhD from the ANU, was a lecturer in the Department of Probability and Statistics at the University of Sheffield in the UK, and has held various research positions at ANU. Sue has many publications in biometry and applied statistics, with a particular emphasis in statistical genetics/genomics and bioinformatics. Her extensive consulting experience in the biological and medical sciences has led to developments in statistical modelling to answer research questions. Sue has been elected to the International Statistical Institute, the American Statistical Association and the Institute of Mathematical Statistics. She has held the position of president of the International Biometric Society, holds various editorial responsibilities and serves on the committees of many international societies. |
This is both a challenging time for statisticians, and an exciting time. Why is that? It is because we've got data. As you have heard throughout this symposium, in many of the talks, we've got masses of data and there is loads of variation in those data, and that is the lifeblood for statistical science.
Because of modern-day technology producing these masses of data, the world of statistics has been undergoing a very fundamental change. To exemplify this I willrefer to the experiment, referred to yesterday, that has been running at Rothamsted since the middle of the 1800s to look at wheat. It involved looking at data for this and many other wheat experiments, led by Sir Ronald Fisher whose name many of you would know. Based on the analyses of these data, examining perhaps yield as a function of treatments different sorts of fertiliser, different environmental conditions he (Fisher) developed what we call 'the analysis of variance'.
But these days, not only can we look at the treatments and the environmental conditions, we can look at the gene expression data, we can look at the genomes. So we have gone from having perhaps a few hundred observations and a few variables to having a few hundred observations but millions of variables, in the extreme situations. In other words, we have what is called 'large p small n' problems lots of measurements, and relatively small numbers of samples or observations.
Our second speaker is going to continue with this genome theme, but our first speaker is Rob Hyndman. You may have noticed that he is from the Department of Econometrics and Business Studies at Monash University. We have had talks from the life sciences and the physical sciences; I just want to say that statistics also goes across into the social sciences, and we have got fundamentally the same types of challenges across all these sciences.



