Solving the Next Genome Puzzle

Even in the high-minded field of genetics, the scientists who labor to decipher genomes are, after all, only human. So, thrilled as they are to read the complete sequence of DNA in an organism--its "book of life"--a cruder puzzle also piques their curiosity. As teams of researchers sequenced more than a dozen genomes in the last few years, knocking off the yeast and rice and fruit fly and, last summer, the human, the e-mails flew fast and furious: how big is yours? How many genes does your genome have? For the human one, the answer is, well, not very many. After years of guesstimating that humans have something on the order of 100,000 genes, it turns out we have a humbling 34,000 or so. A roundworm has 19,099; a fruit fly has 13,601; even the mustard plant, for Pete's sake, has about 25,000. "The great abiding mystery of the human genome is how we manage to be so complex with so few genes," says Eric Lander of the Whitehead Institute, a leader of the genome project.

The two long-awaited analyses of the human genome--one by the publicly funded consortium and one by its bitter rival, biotech firm Celera--are finally being released this week, with bells-and-whistles press conferences in five cities on three continents. But industry sources and other scientists got an advance look. The human genome, they say, holds a wealth of information and several surprises. One of the oddest is that, over the eons, hundreds of genes insinuated their way into the human genome from bacteria, probably after a bacterium infected one of our distant vertebrate ancestors and slipped its DNA into its host's. These alien genes are now part of us, some performing important functions (one is an enzyme that processes brain chemicals) and some not. Analysis of the human genome also shows that the mutation rate in sperm is more than twice what it is in eggs, as David Page of the Whitehead estimated even before the genome was completely sequenced. Since mutation is the raw material of evolution, it seems that one half of humankind has been doing the heavy lifting of pulling us up from the primordial ooze. And out of the 3 billion chemical letters in the genome--those now famous A's, T's, C's and G's--there are so few variations that people the world over, from a sumo wrestler to Britney Spears, are 99.95 percent identical. But perhaps the biggest surprise is that little matter of size, and Homo sapiens' lack thereof. Which leads us to why, even with this wealth of new scientific knowledge, many biologists see the sequencing of the human genome as the final triumph of yesterday's biology. You can almost hear them say, sighing, "Genomes? So 20th century."

The new game in town is the proteome. Just as "genome" means all the DNA in an organism, so "proteome" means all the proteins. "Proteomics" is the study of that collection, and if you thought the genome was complicated (it did, after all, take on the order of $1 billion to sequence it), wait till you meet the proteome. "Compared to the human genome, proteomics involves 1,000 times more data," says Caroline Kovac of IBM Life Sciences. For although the DNA in a liver cell is identical to the DNA in a skin cell or a brain neuron or any other cell, the proteins are not. To make things really interesting, the kinds and amounts of a cell's proteins--molecules like hemoglobin and insulin, the brain chemicals dopamine and serotonin, hormones like testosterone and estrogens, as well as the countless enzymes that keep cells running--vary not just by which type of cell you're looking at. Which proteins a cell contains also depends on things such as whether it is healthy or diseased, how old it is, its stress levels and maybe even the time of day. All told, there are probably 500,000 to 1 million human proteins. Despite the challenge, researchers are tackling the proteome for good reason. When it comes to diagnosis, prognosis and treatment of disease, the dirty little secret of genomics is that "the genome is just the beginning," says Brian Chait of Rockefeller University. "What you really want to know is, in a person's 100 billion cells, what proteins are made in each?"

And for that, the genome is not enough. The genome is the set of instructions for making proteins. But knowing the instructions doesn't get you far. That's because the 34,000 or so genes in every human cell are little more than order forms. Some orders never make it to the cellular factories that produce our proteins. Some orders make proteins that fall apart soon after leaving the factory, like an automotive lemon. Some orders are so popular that the factories make millions of them. You can't tell any of that from the order forms--the genome--alone. Three genes might dispatch order forms for proteins A, B and C, but the factory seems to regard the orders as little more than polite suggestions. It will make proteins A, B and C, sure. But it will also get fancy, making AB, AC, BC, AAB, ABC and so on. This ability to mix and match, and even to accessorize (cells will also dangle little molecules of sugar or phosphate on proteins, changing their function), sets the human genome apart from all others. "From a single gene," says John Richards of the California Institute of Technology, "you can get 10 or more different proteins. A genome analysis alone won't tell you which ones."

The reason you want to identify the proteins in the first place is that rogue genes don't cause disease. Rogue proteins do. "If you really want to understand what's going on in a disease, you have to look at the proteins," says William Rich, CEO of the biotech firm Ciphergen. Alzheimer's disease shows the value of the proteome over the genome. Yes, there are half a dozen genes that increase risk of this disease. But the only unambiguous diagnosis comes from the presence, in the brain, of sticky bits of proteins called beta amyloid fragments. Ciphergen hopes its ProteinChip will detect these killer amyloids. But because there is no beta amyloid gene, you can't screen for Alzheimer's with a DNA chip. "You can never deduce what's happening in this disease with genes alone," says Rich.

Pharmaceutical maker Merck & Co. is using Ciphergen's chip to test candidate Alzheimer's drugs. If the chip shows that a drug eliminates beta amyloid tangles, the drug is a potential winner. Another biotech, Molecular Staging, sells chips that track the progression of diseases like cancer and arthritis, in which changing levels of proteins might serve as early-warning signs of a worsening condition. Millennium Predictive Medicine has identified three dozen proteins that may be markers for hard-to-diagnose ovarian cancer. A federal proteomics project is comparing proteins in normal lung, ovarian, breast and colon tissue with those in cancerous tissue. Proteins that are more abundant in cancer could be diagnosis targets, the way PSA is for prostate cancer. And if it turns out that some of those proteins let cells divide unchecked, then an antibody that bottles up and inactivates the protein might prove an effective cancer drug. Scientists at Large Scale Proteomics Corp. (LSP) and Johns Hopkins University have compiled a list of proteins that seem to mark depression, bipolar disorder and schizophrenia.

Last month LSP unveiled the first database of human proteins, a total of 15,693 proteins from 157 tissues. That's clearly just a minuscule down payment, says Leigh Anderson, president of LSP. But researchers aren't daunted. "Proteomics is standing on the shoulders of the human genome project and asking the next questions," says Trevor Hawkins, director of the Department of Energy's Joint Genome Institute. The payoff will be the stuff of the 21st century.