It's All in the Genes

John Wasmuth had a feeling that a long-sought gene for dwarfism was in his lab freezer: other biologists had narrowed the search for the defective gene, and the University of California, Irvine, geneticist suspected the culprit was among the bits of DNA on ice. But which bit? To find out, he could have painstakingly put each gene into lab animals to see if any of the unknown genes made them short. But these are the '90s, when mice of the bewhiskered kind are giving way in biology labs to mice of the electronic kind. So instead, Wasmuth had his computer jocks send a description of the mystery DNA over the Internet to GenBank, a database at the National Center for Biotechnology Information (part of the National Institutes of Health). In seconds the computer found a match: one of the mystery genes resembled a database gene that lets cells snatch up a growth hormone from the blood. The matched DNA, Wasmuth and colleagues reported last month, is the gene for the most common form of dwarfism. The find should one day allow prenatal screening for a fatal form of the condition. More sweeping, says geneticist Victor McKusick of Johns Hopkins University, the discovery shows that this use of computers "is the future of biology."

That's because computers allow researchers "to make discoveries that were previously not possible," says Scott Humphries of MasPar Computer Corp. MasPar produces software for what is now known as computational biology: one program reads the chemical "letters" that spell out a mystery gene and compares them to sequences that spell out known genes, stored in a database. (Genes are made up of units designated A, T, G or C.) Finding a match is not simply a matter of holding up one string of A-T-G-C's to others, explains Keith Robison, a graduate student in molecular biology at Harvard University. A mere human could do that, if given a few months. Instead, the software performs the biochemical equivalent of preparing a recipe, tasting it and deciding whether it re-sembles the dish a known recipe makes. In technical terms, it decides whether the A's, T's, G's and C's in the new gene spell the same protein as those in a known gene.

"By looking for related genes whose functions are known," says biochemist William Pearson of the University of Virginia, you can "identify candidates for the cause of the [human] disease." Some successes:

The first eureka moment for computational biology came in 1983, when researchers found that a cancer gene is 87 percent similar to a gene that makes a substance that stimulates cells to grow. "It became clear how the [cancer] gene caused cells to begin dividing," says Pearson: by pushing them into hypergrowth.

In 1989 geneticists discovered that the gene for cystic fibrosis matches a bacteria gene. Since this primitive gene carries substances into and out of bacteria, biologists realized that CF must arise when such substances cannot cross into lung cells. And that has suggested new therapies for the fatal disease.

In 1992, researchers led by Greg Lennon of Lawrence Livermore National Laboratory found that a new gene matched a whole slew of database genes: all make enzymes that cripple nerves controlling muscles. With the match, the scientists had discovered the gene that causes myotonic dystrophy, a muscle-wasting disease that afflicts more than 35,000 Americans.

Not all of this work is serious. Last summer NCBI's Mark Boguski sought a match for the DNA sequence printed in "Jurassic Park." It wasn't T. rex; the computer told him it was bacterial DNA.

"It would be hard to find a recent DNA-based discovery that didn't use these tools," says Lennon. That's bad news for scientists who aren't computer-savvy. In one case, Oregon researchers missed discovering a colon-cancer gene because they used the wrong software: it failed to look in every nook and cranny of the database. This spring, researchers at Hopkins found the gene but made their own costly cy-bergoof. They had sought the match not in a free database but in a commercial one. The owner, Human Genome Sciences Inc., of Maryland, requires scientists to give it exclusive rights to any product from the discovery. Any profits from, say, kits for detecting the cancer gene will go to the private firm.

Databases now contain upwards of 200,000 DNA sequences; GenBank doubles in size every 21 months. Boguski, who helps run it, even foresees the day when software agents dubbed "knowbots" will automatically search out genetic matches. "It will be possible for these agents to make discoveries and just sort of let us know," he says. Or, they can team up with a word processor, report the discovery themselves and leave humans out of the loop entirely.