|
|||
Genomics: Junking the junk DNA
Buyers of new PCs are often dismayed to find that their new toy comes pre-loaded with "bloatware", trial programs that at best eat up precious space and at worst slow down the system. If you think that's bad, imagine if your dream machine turned out to be chock-a-block with rubbish, including millions of computer viruses.
This is the shocking picture of our genome that emerged in the 1970s. Biologists had expected our DNA to be pared down to the bare essentials. Instead, they found that the vast majority of our DNA seemed to be junk - rubbish with no apparent function that had piled up over millions of years of evolution.
Now this view is changing. One study after another has hinted at possible functions for "junk DNA". Recent headlines in the popular press point to a revolution in the field: "Time to stop trashing junk DNA", "DNA junkyard yielding gold" and "Junk DNA isn't junk". Even creationists have leapt on the bandwagon, claiming they predicted from the start that all DNA has a purpose. So is the term "junk DNA" really a misnomer?
The discovery of the structure of DNA led to the idea that genomes are merely a series of DNA sequences, or genes, that code for proteins. Yet a paradox soon emerged: some relatively simple creatures turned out to have much larger genomes than more complex ones (see "Survival of the fattest"). Why would they need more genes?
They don't. It rapidly became clear that in animals and plants, most DNA does not code for proteins. We now know that more than 98 per cent of our DNA is of the non-coding variety. Even back in the 1970s, though, it was obvious that not all non-coding DNA is junk. For instance, there are specific sequences to which certain proteins bind, and the presence of these proteins can boost or block the expression of genes nearby. Although it does not code for any protein, this "regulatory DNA" plays an important role.
Over the years, tiny bits of non-coding DNA have turned out to have a regulatory role or some other function. But until recently such sequences accounted for only a minuscule fraction of non-coding DNA. Only in the past decade, as the genomes of more and more species have been sequenced and compared, has the bigger picture begun to emerge.
Even though it is 450 million years since the ancestors of pufferfish and humans parted ways, everyone expected that we would still share many of the same genes - as proved to be the case. Most of the protein-coding DNA in different vertebrates is very similar or "conserved". The surprise was that even more of the non-coding DNA is conserved, too.
DNA is constantly mutating due to copying mistakes and damage from chemicals and radiation. Specific sequences will be conserved only if natural selection weeds out any offspring with changes in these sequences. This will happen only if the changes are harmful, so researchers are convinced that all the conserved non-coding DNA must do something important. Why else would evolution hang on to it? "Those regions really challenge our understanding of biology," says Gill Bejerano of the University of California, Santa Cruz, who helped discover them.
One of the biologists trying to find out what conserved non-coding DNA does, Edward Rubin at the Lawrence Berkeley National Laboratory in California, recently added extra copies of some of these sequences to mice. "It's like taking a few extra pages and stapling them into a book," he says.
Ultra-conserved
His team added copies of the "ultra-conserved" sequences that are almost exactly the same, base for base, in the mouse, rat and human. Nearly half of the sequences the team tested boosted gene expression in specific tissues, especially genes involved in nervous system development, the team reported last year (Nature, vol 444, p 499).
This suggests that much of the conserved non-coding DNA is needed to make a brain cell, say, different from a skin cell. However, conserved DNA still accounts for only a tiny proportion of the genome. Even counting the 1.2 per cent of coding DNA, the human sequences found in other mammals add up to just 5 per cent. What's the other 95 per cent for?
One possibility is that some of the DNA whose sequence is not conserved might be conserved in a different sense. Regulatory sequences are essentially binding sites for proteins, so what matters is their three-dimensional structure. And while the conventional view is that the 3D structure of DNA is closely related to its sequence, Stephen Parker and colleagues at Boston University have found evidence that some regulatory regions share similar structures even though their sequences are different (Genome Research, vol 17, p 940). Looked at this way, the total amount of conserved DNA could be much higher, says Parker.
Another line of evidence suggesting that some non-conserved DNA has a function comes from looking at which DNA sequences get transcribed into RNA. It used to be thought that, with a few exceptions, most RNAs were produced as the first step in making proteins.
Protein-coding genes contain vast stretches of non-coding DNA called introns, which make up a quarter of our genome. These introns are transcribed into RNA but immediately edited out of the "raw" RNA. The resulting "processed" RNAs represent just 2 per cent of the genome.
A few years ago, however, Thomas Gingeras of biotech company Affymetrix in Santa Clara, California, showed that far more than 2 per cent of the genome gets transcribed into RNA. The latest estimates are that 85 to 97 per cent of the entire genome is transcribed into raw RNA, resulting in processed RNAs representing 18 per cent of the genome.
Clearly, most of this RNA is non-coding, or ncRNA. So what is it for? While some of the very small ncRNAs have a big role in the control of gene expression (New Scientist, 27 November 2004, p 36), most ncRNA remains mysterious.
Perhaps the most intriguing are the so-called long ncRNAs, which can be tens of thousands of base pairs long. Only a handful have known functions. One acts as an enhancer for the genes that code for heat shock proteins, which protect cells from environmental stress. Another controls the expression of genes involved in brain, craniofacial and limb development in mice.
Some researchers think that most other ncRNAs are just "noise", generated when nearby genes are transcribed. But Gingeras says that DNA is transcribed into RNA in regions of the genome where no genes exist, making accidental transcription unlikely.
An unpublished study by John Mattick's team at the University of Queensland in St Lucia, Australia, shows many long ncRNAs are transcribed in the brains of mice. However, they are transcribed differently from the genes they are closest to, again suggesting they are not mere accidental byproducts.
Gerton Lunter's team at the University of Oxford, meanwhile, has found that parts of many long ncRNAs are conserved in mammals, even though some come from regions of DNA that were not known to be conserved (Genome Research, vol 17, p 556). "Our current evidence suggests that at least half of the ncRNAs we are studying are functional," Lunter says.
Others are less convinced. Ewan Birney of the European Bioinformatics Institute in Cambridge, UK, has bet Mattick that of the processed RNAs yet to be assigned a function - representing 14 per cent of the entire genome - less than 20 per cent will turn out to be useful. "I'll get a case of vintage champagne if I win," Birney says.
Mostly useless
Whatever the answer turns out to be, no one is saying that most of our genome is vital after all. "You could chuck three-quarters of it," Birney speculates. "If you put a gun to my head, I'd say 10 per cent has a function, maybe," says Lunter. "It's very unlikely to be higher than 50 per cent."
Most researchers agree that 50 per cent is the top limit because half of our genome consists of endless copies of parasitic DNA or "transposons", which do nothing except copy and paste themselves all over the genome until they are inactivated by random mutations. A handful are still active in our genome and can cause diseases such as breast cancer if they land in or near vital genes.
Over the epochs, however, some transposons have landed in the right place at the right time. Bejerano's team recently discovered that at least 5 per cent of the conserved, non-coding DNA in mammals started out as transposons. He thinks a few of the non-conserved transposons found only in primates, called Alus, may also have taken on useful roles as gene regulators. Bejerano is now measuring the gene-enhancing ability of these transposons, much as Rubin did with the ultra-conserved elements.
However, just because some transposons have taken on a useful role, that does not mean all the rest do anything useful. Indeed, transposons show that just because a piece of DNA can have a function, it is not necessarily essential or even beneficial. Much "functional" DNA could still be junk in the sense of being disposable.
Birney has been working on a project called ENCODE that is looking in great detail at 44 big chunks of our genome - 1 per cent of it in total. Of these chunks, 4.9 per cent is conserved among mammals, in line with the estimate for the entire genome, and the researchers have identified possible functions for about half the conserved, non-coding DNA.
The real surprise is that ENCODE has identified many non-coding sequences in humans that seem to have a function, yet are not conserved in rats and mice. There seem to be just as many of these non-conserved functional sequences as there are conserved ones. One explanation is that these are the crucial sequences that make humans different from mice. However, Birney thinks this is likely to be true of only a tiny proportion of these non-conserved yet functional sequences. Instead, he thinks most are neutral. "They have appeared by chance and neither hinder nor help the organism."
Put another way, just because a certain piece of DNA can do something doesn't mean we really need it to do whatever it does. Such DNA may be very like computer bloatware: functional in one sense yet useless as far as users are concerned. If this is right, the ultimate test of whether DNA is junk or not is to see whether anything nasty happens if you delete it. And this is just what Rubin and his colleagues have been doing.
A few years ago, his team deleted two huge chunks of non-coding DNA, each around a million base pairs long, from some mice. Even though these chunks included more than 1000 conserved sequences, to the researchers' surprise there was no apparent effect on any of the mice lacking the two chunks.
Now the team has deleted a few of the ultra-conserved regions - the very highly conserved non-coding sequences everyone agrees must have an important function. What's more, of the 481 ultra-conserved sequences, they selected four that were near vital genes. If the four are crucial for the function of those genes, then deleting them should kill mice or at the very least cause serious defects.
One year on, the animals are healthy, and the researchers are stunned and bewildered. There are no apparent differences in development, lifespan, fertility, body weight or blood chemistry between normal mice and those missing the four ultra-conserved regions. "Everything was saying: if you mess with these, there will be problems," says Rubin. "But we just didn't see that." The unpublished results were presented at a meeting at Cold Spring Harbor Laboratory, New York state, earlier this year.
"It's a mystery," says Sol Katzman, a biomolecular engineer at the University of California, Santa Cruz. His team's work, presented at the same meeting, makes the finding even more puzzling. They have found there is even stronger evolutionary selection for the ultra-conserved elements in humans than for protein-coding genes.
While the deletion study shows that at least four of these sequences are not essential for the survival of mice, this doesn't automatically mean they are junk, Rubin says. Their effects may be too subtle to pick up in a lab setting. "I think they do something, but we're just not able to see it," he says. "Maybe if I were a mouse, I'd be able to recognise it."
Birney agrees. Even a sequence that increases an animal's fitness by as little as 1 per cent will be strongly selected for and thus highly conserved, he points out.
Others think redundancy could be built into the system, meaning other sequences could fulfil the same function as the deleted ones. Genes often have a built-in redundancy, so why not non-coding DNA? "It could be that these things are so important evolution doesn't take any chances on them," says Kelly Frazer of Scripps Genomic Medicine in La Jolla, California.
Indeed, Rubin's team has identified other ultra-conserved regions near the four they deleted that may have the same function. But why are such sequences so strongly conserved if there is so much redundancy? "The field is really conflicted here," says Frazer.
As researchers chip away more bits of the mouse genome, it should become clear just what all the conserved DNA does and how important it is. What is certain is that rumours of the demise of junk DNA have been exaggerated. Most of our genome still looks highly disposable. From issue 2612 of New Scientist magazine, 11 July 2007, page 42-45
For the latest from New Scientiist visit www.newscientist.com |
|||
Academy disclaimer: We cannot guarantee the accuracy of information in external sites. |