Bioinformatics: making sense of the information flood
Box 1 | Bioinformatics tools
For a discipline founded on masses of data – in some senses, the more data there is, the more powerful the science – it is important that the data are properly stored, labelled, ‘cleaned’ and analysed, and that they are accessible. Parallel to the collection of data through innovations in DNA sequencing and protein analysis, has been the development of storage databases and analysis tools. A few of the many thousands of bioinformatics databases and tools are described below.
Storing the sequences
When it comes to storing and organising genomic information, one of the biggest databases of nucleotide sequences is a collaborative endeavour between the European Molecular Biology Laboratory (EMBL) database in Europe, GenBank in the United States, and the DNA Databank of Japan (DDBJ). Each collects sequence data reported worldwide by researchers, genome-sequencing projects and patent applications; these data are shared under the International Nucleotide Sequence Database Collaboration (INSDC) and made available to scientists worldwide.
Browsing the databases
Genome browsers make it easier to search, view and extract information from databases.
Ensembl, a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, is a publicly available browser for eukaryotic genome data.
The Entrez system is a search and retrieval system that allows users to search a range of databases; it can retrieve genetic sequences, protein sequences, structures, citations to references, as well as chromosome maps.
Map Viewer allows users to view all of an organism’s chromosomes or to home in on detailed maps of individual chromosomes and their genes.
Analysing the sequences
By making database information readily available, scientists worldwide can make use of other analytical tools to explore the data. They may use it, for example, to investigate common gene or protein functions across different species, to predict the structure of proteins for drug development or to compare sequences for evolutionary relationships.
The Basic Local Alignment Search Tool – BLAST – is one of the most widely used analytical tools in bioinformatics. It comprises a family of algorithms for comparing sequences of nucleic acids or proteins to those stored in databases, with the aim of identifying other, similar sequences (not necessarily in different organisms). The uses to which BLAST can be put are expanding: some BLAST-related algorithms, for example, can translate DNA nucleotide sequences into protein sequences and compare those against databases of proteins. These known database proteins may help to explain the role of the original protein – and the DNA sequence – in the organism being investigated.
The public availability of bioinformatics tools and databases, means that genomics and proteomics is no longer stuck in the science laboratory. These tools have become a platform for research themselves, with much of the research into genomes and proteomes now being done without so much as picking up a pipette.
Related sites
Genomic data resources: Challenges and promises (Nature Education)
Bioinformatics tools (European Bioinformatics Institute)
International Nucleotide Sequence Database Collaboration (INSDC)
Ensembl (UK)
Entrez (National Centre for Biotechnology Information)
BLAST (National Centre for Biotechnology Information)
Map Viewer (National Centre for Biotechnology Information)
External sites are not endorsed by the Australian Academy of Science.
Posted February 2010.






