Public Release: 

Faster, better, cheaper: A new method to generate extended data for genome assemblies

Earlham Institute


IMAGE: European Ash Tree and wheat (example organisms this method has already been applied to produce high-quality genome assemblies). view more

Credit: TGAC

Scientists at The Genome Analysis Centre (TGAC) have developed a new library construction method for genome sequencing that can simultaneously construct up to 12 size-selected long mate pair (LMP) or 'jump' libraries ranging in sizes from 1.7kb to 18kb with reduced DNA input, time and cost.

Long range genetic data is an invaluable source for plant, crop and animal genetic research. Sequencing genomes requires breaking them into small manageable pieces and then working out how they go back together - similar to a million or billion piece jigsaw puzzle. To do this, a combination of short range (a jigsaw piece) and long range (tells you about the nearby pieces) sequence data is needed.

While generating the short range data is relatively straightforward, the long range data is more problematic as quality and quantity of DNA are major factors influencing the outcome. Illumina's Nextera Mate Pair Sample Preparation Kit has helped improve the quality of long range (LMP) data, but they can still be challenging to generate. More complex genomes, typically benefit from accurately size selected LMP data to produce the highest quality genome assemblies.

Although producing a single high-quality size selected LMP library can be difficult, several LMP libraries are often used for larger genome sequencing projects. This new approach allows construction of 12 libraries for less than twice the cost of a single library and reduces the time by 3 to 2 days.

The TGAC team gained early access to a new piece of technology, the SageELF from Sage Scientific, with the aim to develop a more robust, global approach for accurately sized long-range sequence data. The protocol would be more tolerant to DNA quality and quantity and could ensure that the best possible data was generated for any given sample.

"Improving LMP libraries has been a goal of TGAC's for some time, and we have previously published a software tool for processing this data to 'clean it up' before using it in genome assemblies," said Darren Heavens, Lead Author and Team Leader in the Platforms & Pipelines at TGAC.

"Previous approaches limited us to targeting a single size range at a time, whereas using this new protocol we can simultaneously target up to 12 different size fractions improving the likelihood of achieving the best long range data from a given DNA source, saving both time and money. While most projects wouldn't require sequencing all 12 fractions we can select the best fractions for sequencing".

The scientists hope that this new library construction approach will be widely adopted within the scientific community, having a positive effect in improving genome assemblies. Providing a better understanding of traits of economic interest in crops and animals which are seen as key requirements for breeders such as disease-resistance.

Matt Clark, last author, Plant and Microbial Genomics Group Leader at TGAC, added: "Generating high-quality genome assemblies is important as they form the base upon which we build an understanding of an organism's genetics. Since its development, this approach has already been successfully implemented in a number of our high profile sequencing projects such as bread wheat, durum wheat, Aegilops sharonensis and a European Ash Tree more resistant to Ash Dieback Disease, all with very impressive results.

"For all projects requiring long-range sequence information, we are now promoting this method as our Gold Standard approach. Identifying new technologies and their applications is a key aspect of the work we do and we will continue to monitor opportunities in this area."


The advanced online version of the paper, titled: "A method to simultaneously construct up to 12 differently sized Illumina Nextera long mate pair libraries with reduced DNA input, time, and cost" is published in BioTechniques.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.