The production of high quality chocolate, and the farmers who grow it, will benefit from the recent sequencing and assembly of the chocolate tree genome, according to an international team led by Claire Lanaud of CIRAD, France, with Mark Guiltinan of Penn State, and including scientists from 18 other institutions.
The team sequenced the DNA of a variety of Theobroma cacao, considered to produce the world's finest chocolate. The Maya domesticated this variety of Theobroma cacao, Criollo, about 3,000 years ago in Central America, and it is one of the oldest domesticated tree crops. Today, many growers prefer to grow hybrid cacao trees that produce chocolate of lower quality but are more resistant to disease.
"Fine cocoa production is estimated to be less than 5 percent of the world cocoa production because of low productivity and disease susceptibility," said Guiltinan, professor of plant molecular biology.
The researchers report in the current issue of Nature Genetics "consumers have shown an increased interest for high-quality chocolate made with cocoa of good quality and for dark chocolate, containing a higher percentage of cocoa, while also taking into account environmental and ethical criteria for cocoa production."
Currently, most cacao farmers earn about $2 per day, but producers of fine cacao earn more. Increasing the productivity and ease of growing cacao can help to develop a sustainable cacao economy. The trees are now also seen as an environmentally beneficial crop because they grow best under forest shade, allowing for land rehabilitation and enriched biodiversity.
The team's work identified a variety of gene families that may have future impact on improving cacao trees and fruit either by enhancing their attributes or providing protection from fungal diseases and insects that effect cacao trees.
"Our analysis of the Criollo genome has uncovered the genetic basis of pathways leading to the most important quality traits of chocolate -- oil, flavonoid and terpene biosynthesis," said Siela Maximova, associate professor of horticulture, Penn State, and a member of the research team. "It has also led to the discovery of hundreds of genes potentially involved in pathogen resistance, all of which can be used to accelerate the development of elite varieties of cacao in the future."
Because the Criollo trees are self-pollinating, they are generally highly homozygous, possessing two identical forms of each gene, making this particular variety a good choice for accurate genome assembly.
The researchers assembled 84 percent of the genome identifying 28,798 genes that code for proteins. They assigned 88 percent or 23,529 of these protein-coding genes to one of the 10 chromosomes in the Criollo cacao tree. They also looked at microRNAs, short noncoding RNAs that regulate genes, and found that microRNAs in Criollo are probably major regulators of gene expression.
"Interestingly, only 20 percent of the genome was made up of transposable elements, one of the natural pathways through which genetic sequences change," said Guiltinan "They do this by moving around the chromosomes, changing the order of the genetic material. Smaller amounts of transposons than found in other plant species could lead to slower evolution of the chocolate plant, which was shown to have a relatively simple evolutionary history in terms of genome structure."
Guiltinan and his colleagues are interested in specific gene families that could link to specific cocoa qualities or disease resistance. They hope that mapping these gene families will lead to a source of genes directly involved in variations in the plant that are useful for acceleration of plant breeding programs.
The researchers identified two types of disease resistance genes in the Criollo genome. They compared these to previously identified regions on the chromosomes that correlate with disease resistance -- QTLs -- and found that there was a correlation between many the resistance genes' QTL locations. The team suggests that a functional genomics approach, one that looks at what the genes do, is needed to confirm potential disease resistant genes in the Criollo genome.
Hidden in the genome the researchers also found genes that code for the production of cocoa butter, a substance highly prized in chocolate making, confectionary, pharmaceuticals and cosmetics. Most cocoa beans are already about 50 percent fat, but these 84 genes control not only the amounts but quality of the cocoa butter.
Other genes were found that influence the production of flavonoids, natural antioxidants and terpenoids, hormones, pigments and aromas. Altering the genes for these chemicals might produce chocolate with better flavors, aromas and even healthier chocolate.
Penn State researchers involved in this study include Guiltinan and Maximova; Yufan Zhang and Zi Shi, graduate students, plant biology; Stephen Schuster, Department of Biochemistry and Molecular Biology; John E. Carlson, School of Forest Resources and M.J. Axtell and Z. Ma, Department of Biology.
Other researchers involved were from CIRAD; Institut National de la Recherche Agronomique UMR; Genoscope; Centre National de la Recherche Scientifique; Centre National de Genotypage; Universite d'Evry; INRA-CNRS LIPM Laboratoire des Interactions Plantes Micro-organismes; Universite de Perpignan; Unite de Biometrie et d'Intelligence Artificielle; Institut des Sciences du Vegetal; and Chocolaterie Valrhona, all in France.
Also included are researchers from the University of Arizona; Cold Spring Harbor Laboratory; Centre National de la Recherche Agronomique, Ivory Coast; CEPLAC, Brazil; and Centro Nacional de Biotecnologia Agricola, Instituto de Estudios Avanzados, Venezuela.
CIRAD, the Agropolis foundation, the Région Languedoc Roussillon, Agence Nationale de la Recherche (ANR), Valrhona, the Venezuelan Ministry of Science, Technology and Industry, Hershey Corp., the American Cocoa Research Institute Endowment and the National Science Foundation supported this work.
The Theobroma cacao genome sequences are deposited in the EMB:/Genbank/DDBJ databases under accession numbers CACC01000001-CACC01025912. A genome browser and further information on the project are available from http://cocoagendb.cirad.fr/gbrowse and http://cocoagendb.cirad.fr.