Bioinformatics: making sense of the information flood
This topic is sponsored by the Australian Research Council Centre of Excellence in Bioinformatics.
Since the completion of the Human Genome Project, scientists have been inundated by biological data. Bioinformatics is helping to make sense of it all.
![]() |
You will get more from this topic if you have mastered the basics of DNA and genes – these links will take you to an annotated list of sites with helpful background information. |
The Human Genome Project was launched in 1990 with the claim that it would be ’the source book for biomedical science in the 21st century’. On its completion in 2003, scientists marvelled at the volume of data the project had created. But it was only the beginning of perhaps the greatest flood of biological data in human history.
A genome is the entirety of the genetic information of, usually, a single individual organism. It usually comes in the form of DNA, each strand of which consists of two long chains of nucleotides which contain the bases adenine, thymine, guanine and cytosine, usually known as A, T, G and C. The Human Genome Project discovered that the human genome has a sequence of around 3 billion nucleotides.
Determining the sequence of the human genome is one thing; understanding what it means is another. If the genome is a source book, in 2003 no one understood the language in which it was written.
And plenty more such ‘books’ have been read since. Extraordinary technological advances in the last decade mean that what took the Human Genome Project 13 years to do can now be done in a few days and at a fraction of the cost. Thousands of genomes – human as well as from other species – have already been sequenced, and many more can be expected in coming years.
Moreover, the proteins that these genomes code for – which, for some organisms, may number in the hundreds of thousands – are also being increasingly described. Impressive though it was in 2003, the volume of biological data available today is staggering.
Obtaining the data has become relatively easy, but the bigger challenge lies ahead. If it is to be of any value, the huge and growing mass of data needs to be stored, organised, analysed and understood, especially by comparing it with data obtained in other ways. This is the task of a new scientific discipline: bioinformatics.
Defining bioinformatics
An animation that explains how DNA is
sequenced. (Nova Online, USA)
In its earliest days, bioinformatics was mostly concerned with understanding molecular evolution and in determining the genetic similarities between species. Since then it has evolved rapidly into a sophisticated approach to research. Broadly, it is the application of information technology, mathematics and statistics to biological problems. More narrowly, it has been defined as the use of computers to store, retrieve, analyse and predict the composition, structure or function of biomolecules – particularly DNA, RNA and proteins. The field of bioinformatics draws together a range of disciplines and professions – mathematic and information scientists working in close collaboration with biologists and biomedical scientists. These interdisciplinary collaborations allow for the context-free analysis of data that is central to the success of bioinformatics.
Today, bioinformatics is proving useful in agriculture, medicine and evolutionary biology. But because it is so new, scientists are still grappling with the implications and possibilities of this powerful branch of science. Some of the ways they are using bioinformatics include:
- mapping DNA and protein sequences;
- predicting the structure of proteins;
- understanding the role of proteins and genes;
- finding evolutionary relationships.
Mapping DNA and proteins
A graphic showing different ways of
controlling protein production from the
genome. (New Scientist)
To be sequenced, a genome needs to first be broken into convenient pieces. These are then sequenced. One task of bioinformatics is to put all the sequences back together and
‘map’ the DNA and determine the order in which the bases occur and the chromosomes to which they belong.
The term ‘map’ is used in a different way for the set of proteins (known as the proteome) that are coded for by a genome. The task of mapping the proteome is daunting, even using bioinformatics techniques. Not only is the number of potential proteins large – the same stretch of DNA can code for several different proteins – but they can vary in response to factors such as age, health and diet. Protein molecules may also fold into different, often bizarre shapes depending on the environment in which they are produced. These shapes are difficult to predict but can have a huge effect on what the proteins do in the body.
Comparing sequences
Studies of the human genome have indicated the importance of minor variations in genes between individuals. Even differences in a single base or nucleotide – called a ‘single nucleotide polymorphism’ – may increase the susceptibility of a person to a particular disease.
One of the opportunities offered by the growing pool of genetic maps is the ability to compare the maps of individuals and species and to correlate differences in them with, for example, health disorders. This could lead to the improved use of different drugs, or changes in diet that are tailored to suit specific genetic profiles.
However, for many diseases such comparisons are unlikely to be straightforward. Heart disease, cancer and many other illnesses usually arise as a result of complex interactions between a number of genes within an individual and its environment. Such complexities can be unpicked, at least partially, by the use of bioinformatics. For example, comparing and contrasting the genomes of large numbers of individuals, a process known as data mining, may discover genomic patterns that correlate with other characters. When these patterns are viewed by scientists with differing expertise in bioinformatics and biomedical science, they can lead to major advances in knowledge and ultimately ideas for new treatments.
The sequencing of the genomes of plants and animals also benefits agriculture. The rapid identification of genes and the capacity to compare and contrast genomes, can greatly speed up the process of genetic improvement in crops and livestock.
With the help of bioinformatics, drugs can be designed to target specific proteins that cause illness.
(Credit: Cutting Edge: Interactive concepts in Biochemistry, R.Boyer (ed), 2002. Reprinted with permission of John Wiley & Sons, Inc.)
Predicting proteins
Bioinformatics offers more than data mining: approaches that combine number-crunching brute force and an understanding of biochemistry have a range of applications.
Scientists at the Australian National University, for example, used bioinformatics tools (Box 1. Bioinformatics tools) to scan the genomes of a range of species for genes similar to the human PRionN (PRNP) gene. This gene codes for the production of prion proteins which have important cellular functions but in certain circumstances can become dangerous, causing diseases such as Creutzfeld-Jacob or “mad cow” disease. Scientists have also discovered another gene that encodes the Shadoo Protein, SPRN. It is similar to PRNP and occurs in a range of species. When this protein disappears, it’s a sign that prions are replicating. Investigations are now under way to determine the exact role of SPRN in the hope of shedding light on prion diseases.
of Excellence in Bioinformatics
Provides information on the Centre’s
research activities.
(Australia)
Integrating information
Using bioinformatics approaches, scientists at the ARC Centre of Excellence in Bioinformatics are developing what they say will be the world's first navigable ‘Visible Cell® atlas’, a high-resolution map of the three-dimensional structure of a cell. It will allow us to visualise the three-dimensional space of a cell, and it will show molecular processes – such as the actions of proteins and the functions of organelles – in four dimensions (i.e. over time as well in space). The scientists involved in the project expect that the Visible Cell® will become a sophisticated tool that will integrate proteomic, genomic, molecular, cell and developmental biology data from different sources.
The same group are also using bioinformatics to investigate the information networks that operate between the genome and the cells and organism it creates. A better understanding of the pathways and regulation of these networks in complex diseases such as cancer could lead to new treatments.
Evolutionary biology
The genome of the Tammar Wallaby (Macropus eugenii) is providing valuable genetic information.
(Image: Assoc. Prof. G Shaw, ARC Centre of Excellence in Kangaroo Genomics, Zoology Department, University of Melbourne)
The analysis of genomes using bioinformatics is generating new information on the way in which species have evolved.
In one investigation, scientists compared the modern human genome with that of the Neanderthal, a human-like species that became extinct about 30,000 years ago. The comparison showed that, to some extent, the two species had once interbred – contradicting earlier studies that had suggested they had not. Scientists have also determined that at least some of the human genome that is not shared with either Neanderthals or apes is associated with cognitive ability – meaning, in effect, that one of the main things separating us from both those groups is our ‘braininess’.
Studies of the platypus and kangaroo genomes have also yielded intriguing results. The platypus diverged from other mammals about 166 million years ago, and the kangaroo diverged about 148 million years ago. Comparisons between the modern-day genomes of these species and the genomes of other modern mammals, such as humans, are providing new genetic insights that can be applied across a range of species. Kangaroo genomics, for example, helped in the identification of the gene SRY, which determines the sex of mammals. It has also led to developments in theories of how globins or proteins in the blood form.
The platypus is an unusual creature – it is a warm-blooded, fur-clad, egg-laying mammal with webbed feet and a duck-like bill. It is also one of the few mammals that are venomous: spurs on the hind legs of males are able to inject a nasty poison into predators. This poison contains the same proteins found in snake venom, yet genomic analysis shows that the two venoms evolved independently, an example of convergent evolution.
Risk and reward
of Excellence in Bioinformatics
Provides information on the Centre’s
research activities.
(Australia)
Despite the extraordinary potential of bioinformatics, it also comes with issues. Privacy advocates worry about the sharing of personal medical and genetic information. Others are concerned that genomic comparisons between races and ethnic groups will inflame racial tensions. The rapid growth of biochemical data and technology has also meant that regular updates have been needed for the software and algorithms being used.
Yet the power of bioinformatics is undeniable and its techniques are becoming increasingly fast, cheap and revealing. A range of tools has been developed to make the research effort more integrated and potent, and accessible globally.
The powerful tools of bioinformatics are enabling scientists to rise above the data flood and will undoubtedly lead to major advances in medicine, agriculture and our understanding of living things.
Box
1. Bioinformatics tools
Credits
Related Academy Links
Nova:
Biology meets industry – genomics, proteomics, phenomics
The Human Genome Project – discovering the human blueprint
Epigenetics - beyond genes
External sites are not endorsed by the Australian Academy of Science.
Posted February 2010, edited August 2012.







