Science at the Shine Dome 2010

Symposium: Genomics and mathematics

Friday, 7 May 2009

Dr Jean Yang

Jean Yang is a senior lecturer at the School of Mathematics and Statistics, University of Sydney. She has a degree in statistics from the University of Sydney and she completed her doctoral studies in the Department of Statistics at the University of California, Berkeley where she worked under the supervision of Terry Speed on the design and analysis of microarray experiments. She was an assistant professor in the Department of Medicine at the University of California, San Francisco and relocated back to Sydney four years ago, taking up a lectureship with the University of Sydney.

Jean’s research work has centred on the development of statistical methodology and the application of statistics to problems in genomics, proteomics and biomedical research. In particular, her focus is on developing methods for integrating expression studies and other biological metadata such as miRNA expression, sequence information and clinical data. As a statistician who works in the bioinformatics area, Jean enjoys research in a collaborative environment, working closely with scientific investigators from diverse backgrounds.

Statistical analysis of quantitative proteomics analysis

iTRAQTM for protein quantisation using mass spectrometry is a recent, powerful means of determining relative protein levels for thousands of proteins simultaneously. In recent years we have witnessed rapid development in spectrometry technologies; however, the statistical analysis of raw tandem spectrum data remains a challenging task. This has become an issue for mass spectrometry proteomic research and it is very desirable to have an integrated and comprehensive analysis system. We are examining statistical issues at various stages of the iTRAQ analysis and have developed a new pre-processing algorithm and an alternative assessment for protein identification. Based on wavelet theory, our new pre-processing method uses a dynamic peak model to identify peaks and results in the identification of significantly more peptides and proteins in the downstream analysis for a given false discovery rate (FDR). We further examine the commonly used target-decoy strategy for estimating FDR and discuss an alternate decoy database, using an amino acid substitution-based framework, which leads to a more realistic estimation of the false positive protein identification.