Regions of the X chromosome, one of the two sex chromosomes (Y is the other), have been linked to mental retardation and numerous other disorders, but finding the particular genetic abnormalities involved has been difficult.
The team's accomplishment, described in the April issue of Nature Genetics, should speed research into diseases associated with the X chromosome and encourage similar analyses of other chromosomes.
"To our knowledge, this is the first time critical analysis of an entire chromosome has been done by a group that wasn't involved in determining the chromosome's genetic sequence," says study leader Akhilesh Pandey, M.D., Ph.D., an assistant professor in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins and chief scientific adviser to the Institute of Bioinformatics (IOB) in Bangalore, India, where the analyses took place. "We didn't start small. We wanted to prove that complete annotation can be done, and done in a way that lets you find new and unexpected things."
For 18 months, 26 Indian scientists pored through the publicly available sequence of the X chromosome (information generated by the Wellcome Trust Sanger Institute in England and others) to identify genes and other important parts of its DNA.
But unlike other efforts, the team didn't just "mine the data" by using computers to search for known patterns in the genetic sequence. Instead, Pandey decided they would look for similarities between the human X chromosome's protein-encoding instructions and corresponding regions in the mouse. Regions that were identical or nearly so were then examined carefully by IOB biologists.
"We didn't want to start out by saying that genes had to look a certain way," says Pandey. "So our only initial assumption was that if a genetic region is important and codes for a protein, the sequence will be conserved at the protein level. Thus, even if the genetic sequence is different here and there, the protein sequence could still be the same."
Essentially, the researchers took advantage of the redundancy inherent in the genetic code. DNA's four building blocks -- A, T, C and G -- act as instructions for proteins in select three-block sets. These three-block sets each "code" for just one of the 20 possible protein building blocks, or amino acids, but some of the sets code for the same amino acid. For example, the DNA sequences TTGAGGAGC and CTACGATCA are quite different, but both specify the same three amino acids -- leucine, arginine and serine, in that order.
"Instead of telling the computer what to look for, we let nature tell the computer what was important," says Pandey. "When you align the protein-encoding instructions of the human and mouse, the genes jump out at you."
In the regions that were the same between species, the scientists found 43 new "gene structures" that encode proteins. Some of the newly identified genes sit in regions long tied to X-linked mental retardation syndromes, which appear only in boys, or other disorders. Quite remarkably, Pandey says, almost half of the new genes don't look like any previously known genes, nor do they look like each other.
"These would not be found any other way, because no one knew to look for them," he says. "No one had ever identified any aspect of their sequences as being important."
The IOB scientists and the U.S. members of the team experimentally investigated a few of the new genes to confirm the comparative approach's validity. Their results, as well as data created by other scientists since the U.S-India team started working, confirm the existence of some of the newly identified genes. The team's work also showed that some so-called pseudogenes on the X chromosome are actually expressed, or transcribed, which contradicts the widespread idea that they are functionless.
"We're really trying to show that complete annotation of chromosomes can be done, and that doing it this way means you can find things you don't expect to find," says Pandey. "It's long, painstaking work, but it's worth it."
Pandey hopes that researchers will take the initiative to annotate sequenced genetic information and validate regions used in their work.
The research at the IOB was funded internally by the Institute of Bioinformatics. Authors on the paper are co-first authors H.C. Harsha and Shubha Suresh, Nandan Deshpande, K. Shanker, A.J. Yatish, Babylakshmi Muthusamy, B.M. Vrushabendra, B.P. Rashmi, K.N. Chandrika, N. Padma, M.A. Ramya, H.N. Shivashankar, Dipanwita Roy Choudhury, M.P. Kavitha, R. Saravana, Vidya Niranjan, T.K.B. Gandhi, Neelanjana Ghosh, Sreenath Chandran, Minal Menezes, Mary Joy, Sujatha Mohan, Krishna Deshpande, Chaerkady Raghothama and C.K. Prasad of the Institute of Bionformatics, Bangalore, India; and Ramars Amanchy, Salil Sharma, Jose Badano, Suraj Peri, Nicholas Katsanis, and Pandey of Johns Hopkins. Salil Sharma is affiliated with both institutions.
Pandey serves as chief scientific adviser to the Institute of Bioinformatics. The terms of this arrangement are being managed by The Johns Hopkins University in accordance with its conflict of interest policies.
On the Web:
AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert! system.