BERKELEY, CA -- In 90 years of study, the diminutive fruit fly Drosophila melanogaster has yielded many of the most fundamental discoveries in genetics -- beginning with proof, in 1916, that the genes are located on the chromosomes. Only during the last year has the fly's whole genome been sequenced, however, and its 13,601 individual genes enumerated.
The genome of D. melanogaster, the largest yet sequenced in full, is described in the 24 March 2000 issue of Science magazine, in a series of articles jointly authored by hundreds of scientists, technicians, and students from 20 public and private institutions in five countries.
The collaboration was led by Gerald Rubin of the University of California at Berkeley and the Howard Hughes Medical Institute (HHMI), who heads the Berkeley Drosophila Genome Project, and by J. Craig Venter of Celera Genomics in Rockville, Maryland. The Berkeley Drosophila Genome Project (BDGP) is supported by the Department of Energy, the National Human Genome Research Institute, and HHMI, with the largest of its facilities operated by the Life Sciences Division of the Department of Energy's Lawrence Berkeley National Laboratory.
In 1998, when collaboration with Celera began, extensive but incomplete maps of the location of specific DNA sequences on the fly chromosomes had been constructed, and about 20 percent of the fly genome had already been sequenced in detail -- mostly by the BDGP group at Berkeley Lab where, with Rubin, Susan Celniker is co-director of the sequencing effort.
The purpose of the collaboration was to test whether a strategy known as whole-genome shotgun sequencing could be used on organisms having many thousands of genes encoded in millions of DNA base pairs; the strategy had proven effective for small bacterial genomes.
"No one knew whether whole-genome shotgun sequencing would work for the fly genome," says Roger Hoskins, leader of the BDGP physical mapping project, "but we knew that if it did, it would be faster and more efficient than traditional methods."
D. melanogaster has some 250 million bases in its genome, arranged on five chromosomes; 80 percent of the genome is located on the large chromosomes labeled 2 and 3. Hoskins and his colleagues set out to produce a physical map of that part of chromosomes 2 and 3 that expresses genes (about 45 percent of the chromosomal material is highly condensed and does not encode genes).
Although physical maps are not sequences -- a sequence identifies every pair of bases along a given stretch of DNA -- a good map pins down the location of unique short sequences that can be used to establish the correct long-range order of copies of longer DNA sequences, and thus of any genes they represent.
The 17,000 clones used by the Berkeley Lab BDGP group are actual stretches of DNA replicated in Escherichia coli bacteria and known as "bacterial artificial chromosomes" (BACs). Each BAC accurately represents a discrete stretch of the genome, and the map marks each BAC with at least one unique "sequence-tagged site" (STS) -- ideally with two or more such sites.
Using probes tailored to each sequence-tagged site, an STS can be found wherever it occurs in a random collection of clones; 1,923 of these markers, spaced roughly every 50,000 bases, were used to build the BDGP's final map. By matching these sites among overlapping clones, sets of clones of different lengths can be lined up with one another and eventually "tiled" along the entire length of each chromosome. The result is called an STS content map.
When their map of chromosomes 2 and 3 was complete -- along with maps of the much shorter chromosomes 4 and X produced by others -- the BDGP researchers made a "rough draft" sequence of the genome with shallow coverage (less than two clones deep), which served as a check against Celera's whole-genome shotgun sequence and is being used to close some of its 1,600 gaps.
The multi-author Science paper summarizing the genome-sequence results describes the importance of the BDGP's methods and results: "The BAC end-sequences and STS content map provided the most informative long-range sequence-based information at the lowest cost." Increasing the number of BAC end-sequences is the authors' primary recommendation for future genome-sequencing projects.
D. melanogaster's importance is far greater than as a trial run for the mouse and human genome, however. In a set of 289 human genes implicated in diseases, 177 are closely similar to fruit fly genes, including genes that play roles in cancers, in kidney, blood, and neurological diseases, and in metabolic and immune-system disorders. "The underlying biochemistry of fruit flies and humans is remarkably similar," says Hoskins, "so fruit flies can provide clues to understanding human diseases caused by defective genes."
"We can find human tumor-suppressing genes in flies easier than we can in the mouse," says Susan Celniker, pointing out that experiments can be done using fly genes that would be impractical (or unthinkable) using human subjects. Especially useful is the identification of networks of other genes that interact with known disease genes, and their associated metabolic pathways. The implications for medicine are immediate.
To this end the BDGP researchers are continuing to refine the D. melanogaster sequence already produced. "We're going to push it to high accuracy," says Hoskins.
The Human Genome Project aims for a resolution of one error in 10,000 base pairs -- roughly the number of errors that could arise from normal human variation -- but the Drosophila workers intend to achieve an accuracy of one error in 100,000, a goal partly made possible by the limited variation among inbred laboratory flies.
Meanwhile the completed genome of D. melanogaster reported in the 24 March 2000 issue of Science stands as a milestone in the history of genetic research and a doorway to new methods of progress. For one thing, Celera is now attempting to apply the whole-genome shotgunning technique to the much larger human genome.
"Celera did a great job," says Hoskins, "and the project worked better than anyone could have hoped. Now, the BDGP and the rest of the community of 5,000 Drosophila researchers around the world can begin projects to understand how the genome sequence controls the biology."
The Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California. Visit our website at http://www.lbl.gov .
BACKGROUNDER: BUILDING THE FRUIT FLY GENOME MAP
D. melanogaster's five chromosomes are the X and Y sex chromosomes, the two large autosomes (non-sex chromosomes) labeled 2 and 3, and the small autosome labeled 4.
Of the roughly 215 million bases in the genome of D. melanogaster, about 120 million are in the form of euchromatin, DNA that can unwind and open and that encodes genes. Material known as heterochromatin forms the centers and ends of chromosomes and consists mostly of noncoding sequences; much of its DNA resists sequencing because it occurs in very short sequences of bases that are repeated in long tandem arrays.
DNA is purified from whole flies that are frozen and ground up. Clones are made by cutting all the DNA into pieces with enzymes, then inserting these snippets into various hosts that replicate numerous copies of them.
Each kind of clone has advantages and drawbacks. For example, short viral clones may reproduce sequences of bases very accurately, but these often match more than one location in the genome. At the other extreme, so-called YACs ("yeast artificial chromosomes") are very long stretches of DNA, up to millions of bases. While YACs can reduce the number of steps needed to create a physical map, they are unstable and may incorporate numerous sequence errors.
The STS content map constructed by BDGP's Berkeley Lab researchers relied on BACs, "bacterial artificial chromosomes," DNA clones that are stable at lengths up to hundreds of thousands of bases. The researchers tiled numerous overlapping BAC clones along the length of chromosomes 2 and 3, using sequence-tagged sites at intervals of approximately 50,000 bases.
In a technique pioneered by BDGP workers, over a third of the sequence-tagged sites in the Drosophila mapping project were "end-sequence tags" -- chosen to lie within 500 bases of either end of a BAC clone, greatly aiding the matching of overlapping clones.
The more clones that overlap at any given place along the chromosome, the greater the assurance of high accuracy in sequencing. At most places the BAC-based map of the euchromatin in chromosomes 2 and 3 reached a depth of about 13 overlapping BAC clones.
(Five short gaps in the euchromatin map were not spanned by any clone, however, and these regions were also present as gaps in the whole-genome shotgun sequence produced in collaboration with Celera Genomics.)
The BAC-clone map, made mostly at Berkeley Lab, was augmented and checked against other mapping techniques. At Baylor College of Medicine in Houston, Texas, members of BDGP separated DNA segments by gel electrophoresis to provide distinct visual identification of BAC clones; these were assembled to construct a "fingerprint map" that corroborated Berkeley Lab's map.
Working with BDGP researchers in Gerald Rubin's laboratory on the UC Berkeley campus, Berkeley Lab researchers confirmed the physical BAC map by hybridizing (joining) BAC clones directly to the chromosomes themselves.
"The chromosomes of the Drosophila larval salivary glands are unusual in that their DNA can form multiple, perfectly registered copies," says Susan Celniker. "This can help the fly make lots of a particular protein in a short time. Larvae produce glue copiously in their salivary glands -- to get tons of glue, the fly makes tons of glue genes."
These so-called polytene chromosomes have distinct banding patterns, and when they are stained, variations in the pattern unambiguously identify regions of the chromosomes. When BAC clones were stained with a different color and allowed to hybridize to their matching sites on the polytenes, they demonstrated virtually complete coverage of the chromosomes by corresponding clones.
By integrating the BAC STS content map and the fingerprint map, BDGS researchers achieved both contiguous clone coverage of the genome and assured adequate overlaps, confirming that sequence assemblies reflected the structure of the genome. Then the in situ polytene hybridization data established that their physical map covered more than 97.8 percent of the euchromatic portion of chromosomes 2 and 3.