Among the many plants that humans have found useful enough to domesticate, soybean (Glycine max) is a wonder. Like other legumes, it has the important ability to make some of its own essential nutrients by hosting nitrogen-fixing bacteria. Soybean is also a virtual chemical factory, so rich in proteins that it is a major source of protein for animal feed, and so rich in oils that it is used to produce much of the world's cooking oil; it is also a major source for biodiesel.
If it seems as if nature could hardly have made agriculture a more useful plant, at last we may now be able to understand why. The first complete sequencing of the soybean genome has now made available the fine details of the soybean's unusually productive genetic code and is revealing an unusual evolutionary history that led to its chemical versatility.
The sequencing of the soybean genome will be announced in a paper forthcoming in the January 14 issue of the journal Nature. Authored by Jeremy Schmutz of the Joint Genome Institute and the HudsonAlpha Genome Sequencing Center and 43 other researchers from 18 institutions, the paper details results pointing to key evolutionary events that may be responsible for the plant's unusual capabilities.
In particular, researchers found evidence of two separate instances, one about 59 million years ago and the other about 13 million years ago, when the plant's ancestors doubled their genes by adding an extra copy of the organism's original set a chromosomes, resulting in a genetic condition known as polyploidy.
Most higher animals and plants (including humans) have two copies of their genetic code in most of their cells through most of their life cycle (they are "diploid"), but polyploid organisms have a extra copies, usually in multiples of two so the material can be evenly divided during sexual reproduction. In each of the polyploid events in the soybean's evolutionary history, the plant's ancestor changed from having two copies of its genes to four. After the polyploidy occurred, the new copies either slowly evolved and diverged from the original genes to become new pairs of genes, or the duplicate copy disappeared because it was unnecessary, and the plant eventually became diploid again.
The more recent gene-copying event in the soybean lineage was almost certainly an event known as "allopolyploidy," where the duplicated set of genes came from a separate organism that was genetically similar, but probably a distinct species from the other genetic donor. In this condition, the new set of genes are essentially still duplicates, but may be somewhat varied in their specific code.
What makes soybean somewhat unique as a polyploid, according to Jessica Schlueter, a faculty member in Bioinformatics at the University of North Carolina at Charlotte and the paper's third author, is the fact that most of the plant's copied genes diverged to become new genes rather than disappearing, which is the more common evolutionary result of gene duplication.
"One of the characteristics that we've known from studies in soybeans is that there is an over-abundance of multi-gene families," noted Schlueter. "On average, we are finding 2.3 loci (a term designating specific locations in the genetic material) per genetic marker (individual gene). In a simple diploid genome, you would expect one loci per marker."
Schlueter stresses, however, that soybean's polyploidy alone is not the whole story: "In Arabidopsis (the first sequenced plant and also an ancient polyploid), you only have 20% of the genome showing a signature of duplication - it has kicked out 80% of the genes that were duplicated," Schlueter said. "Soybean is the complete opposite of that spectrum - it has kept 75% of that duplicated material. It seems to be very resilient to polyploidy - it handles it very well and retains a lot of similar genetic information."
The team found a particularly high number of genes that provide the genetic codes for soybean's rich compliment of proteins and the vast majority (78%) of those and other identifiable genes occur at the ends of the chromosomes. The chromosome ends are generally distant from the centromeres (where the chromosomes' chromatid strands are linked) and thus contain the regions in the genome, as the authors note, "where nearly all the genetic recombination occurs during reproduction."
"You can see across the genomic sequence these major blocks that have been duplicated and remain within the genome," Schlueter said. "This is one of the big take-home messages that we had. The soybean genome has a unique structural characteristic that we have not seen in a sequenced plant genome before."
Since most plants with histories of genome duplication lose many of their extra gene copies relatively quickly, a major question remaining is why the soybean has not dumped its extras. Schlueter points out that the oldest identified occurrence of polyploidy in the soybean lineage occurred 59 million years ago, a time near the point where legume family itself first emerged, and the event may be related to the development of these plants' shared ability to form the unique adaptation of root nodules that house nitrogen-fixing bacteria.
The nodules are a particularly valuable evolutionary development, since they give legumes the ability to produce their own biologically usable form of nitrogen, an element that is essential for biological processes (especially protein production) but is also frequently scarce in a usable form. Developing a feature that allowed a biological partnership with nitrogen-fixing bacteria was a game-changer for the legumes.
"One of the concepts with polyploidy is that you get unique morphological characteristics because the plant has twice the genetic information" Schlueter said, "Large seeds, large flowers, the ability to grow in various temperature conditions, and so on. It's like doubling your genetic variability all at once. If you allow genes to mutate, you have a second copy that is suddenly evolutionarily free to go off on its own path."
In the soybean lineage, the team found that many of the duplicated genes were preserved and allowed to diversify after each of the two polyploidy events.
If soybean may have kept its duplicated genes because it was able to diversify many of them into new genes that gave the organism useful new capabilities, the question is what were those new capabilities, and how are they related to the plant's diverse chemical attributes that humans find so useful? Finding out is the complicated task ahead for Schlueter's research.
As one of the bioinformaticians on the soybean genome project, Schlueter's participation involved identifying the genes and blocks of genes that were duplicated and establishing dates for when duplication events had occurred.
The team used a "molecular clock" to establish dates for when genes had been duplicated, measuring specific differences between genes that are known to be essentially random and therefore have a predictable rate of occurrence. For example, certain single substitutions of the DNA bases (A, T, G and C) in the code sequence are "silent," which means they do not affect the organism and their rate of appearing in the genetic record should be random. The changes have no genetic effect because the new three-letter "codon" they make also codes for the exact same amino acid as the original codon ( a change in the code from "AAG" to "AAA" for example - both produce the amino acid lysine). The change thus has no effect on the production of the substance the gene carries the instructions for, and the number of times it occurs in the history of the gene at a specific point in the sequence is a purely random event, with a regular and predictable rate of occurrence. If the researcher measures the number of times such a letter difference occurs between two gene sequences that were once identical, then they have a relative measurement for how long ago the copying was done.
In the next stage of her research on the genome, Schlueter will be looking in finer detail at differences between diverged genes and looking for clues regarding the process of gene divergence and its effects.
"In my lab I'm starting to ask why there is a persistence of polyploid genes," Schlueter said. "I'm looking at differences in gene expression between the two duplicated genes - why are they both still being expressed? How are they regulated? What are the epigenetic changes in these regions?
The big question is," she noted, "why are they both still there?"
Far from being simply an abstract academic question, the issue is potentially a very large one for bioscience and particularly for the biotech industry, as the soybean is a model plant for understanding how natural processes can lead to biochemical diversity.
"There was an article in Newsweek recently that essentially said 'stop all the sequencing - the last thing we need to do is to sequence another genome.' I get the point - we have a lot of sequence data, and we are just starting to utilized all of it," Schlueter said.
"But on the flip side, from an evolutionary biology perspective, there are some very important evolutionary processes that need to be revealed," she said. "It's easier to draw conclusions about what happened millions of years ago if you have access to hundreds of different genomes that have been sequenced and can see differences. The information will help us find the 'why?' of the soybean and many other useful plants."
The soybean genome project was funded by the US Department of Energy's Joint Genome Institute. Other authors include Steven B. Cannon, Jianxin Ma, Therese Mitros, William Nelson, David Hyten, Qijian Song, Jay J. Thelen, Jianlin Cheng, Dong Xu , Uffe Hellsten, Gregory D. May, Yeisoo Yu, Tetsya Sakurai, Taishi Umezawa, Madan Bhattacharyya, Devinder Sandhu, Babu Valliyodan, Erika Lindquist, Myron Peto, David Grant, Shengqiang Shu, David Goodstein, Kerrie Barry, Montona Futrell-Griggs, Jianchang Du, Zhixin Tian, Liucun Zhu, Navdeep Gill, Trupti Joshi, Marc Libault, Anand Sethuraman, Xue-Cheng Zhang, Kazuo Shinozaki, Henry T. Nguyen, Rod A. Wing, Perry Cregan, James Specht, Jane Grimwood, Dan Rokhsar, Gary Stacey, Randy C. Shoemaker, and Scott A. Jackson, who is the corresponding author (firstname.lastname@example.org, 765-496-3621).