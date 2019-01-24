Jan 24, 2019 -- Differences in genetic diversity among bacterial pathogens correlate with clinically important factors, such as virulence and antimicrobial resistance, prompting the need to identify clusters of similar bacterial strains. However, current bacterial clustering and typing approaches are not suitable for real-time pathogen surveillance and outbreak detection.

In a study published today in Genome Research, researchers developed PopPUNK (Population Partitioning Using Nucleotide K-mers), a computational tool for analyzing tens of thousands of bacterial genomes in a single run, up to 200-fold faster than previous methods. Using k-mers, short sections of DNA length k, this software enables rapid estimation of the proportion of k-mers present in one genome that are also shared by another. Differences in k-mer content between genomes may represent changes to individual bases in otherwise similar stretches of DNA or differences in gene content. By calculating these relationships across isolates, the population structure of a species can be efficiently estimated.

Importantly, PopPUNK applies a machine learning method that enables easy identification of emerging strains in a population. Using a previously published data set of E. coli isolates collected over a ten-year study, PopPUNK was able to efficiently classify the prevalence of different strains in the population each year and identify the emergence of antibiotic-resistance strains over time.

Researchers envision PopPUNK will expedite the identification of bacterial strains as the scale of bacterial genomes being sequenced increases and, importantly, allow public health agencies to quickly identify outbreak strains that pose a public health risk.

Researchers from New York University School of Medicine, Wellcome Sanger Institute, University of Helsinki, University of Cambridge, and Imperial College London contributed to this work. The study was funded by grants from the United States Public Health Service, Wellcome, Bill and Melinda Gates Foundation, European Research Council, and the Royal Society.

About the article:

The manuscript will be published online ahead of print on 24 Jan 2019. Its full citation is as follows: Lees J, Harris S. Tonkin-Hill G, Gladstone R, Lo S, Weiser J, Corander J, Bentley S, Croucher N. 2019. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research doi: 10.1101/gr.241455.119

