November 11, 2014, Hong Kong, China -- Published today in the Open Access and Open Data Journal GigaScience, researchers from Universidad Politécnica de Madrid in Spain and the National Institutes of Health in the USA provide a fantastic example of open data sharing to help build these exact tools: a wealth of patient imaging data. Even better: to enable reproducible comparisons between new tools, the researchers and journal have taken the unusual step of publishing and packaging the data alongside tools, scripts and the software required to run the experiments. This is available to download from GigaScience's GigaDB database as a "virtual hard disk" that will specifically allow researchers to directly run the experiments themselves and to add their own annotations to the data set.
The most common cause of heart attacks is coronary heart disease. Diagnosis is key to beginning treatment for preventing such events. One useful tool in the fight against this leading killer is magnetic resonance imaging, which allows the direct examination of blood flow to the myocardium of the heart. However, for this perfusion analysis technique to be the most effective requires compensation for the breathing motion of the patient, which is done using complex image processing methods. Thus, there is a need to improve these tools and algorithms. The key to achieving things is the availability of large publicly available MRI datasets to allow testing, optimization and development of new methods.
As one potential user of these resources, Professor Alistair Young, Technical Director of the Auckland Magnetic Resonance Research Group commented: "Very large amounts of medical imaging data are now becoming available through registries and large population studies. Well validated, automated methods are required to derive maximum benefit from such resources. The paper by Wollny and Kellman exemplifies how data and algorithm sharing can advance the field by providing a platform by which existing methods can be tested and new methods validated against existing benchmarks. Such benchmarking datasets are essential to advance the field through objective metrics and standards."
Having everything wrapped up in a Virtual Machine also made things simpler during the scientific peer-review and publication process, as the settings, packages and file locations were already set up in a working configuration. One of the people carrying out this testing process, Dr Robert Davidson Data Scientist at GigaScience stated "Actually testing the code during review is sadly almost a novel concept and one that needs to roll out as a standard. But even more: if it's easy for the reviewers, it's easy for the community to use too."
As well as being important for improving the diagnosis for the number one cause of death world wide, the continuing rise in retractions of published scientific articles, makes the addition of direct means to improve article reproducibility is essential, both for the ability to be able to trust current findings --on which future studies are built-- and to prevent the public losing confidence in the research community they fund. Publishing a virtual machine, an interactive and executable publication provides an example to the scientific community and test case demonstrating a potential new type of scholarly output.
1. Wollny, G; Kellman, P: Free breathing myocardial perfusion data sets for performance analysis of motion compensation algorithms. GigaScience 2014 3:23 http://www.
2. Wollny, G; Kellman, P (2014): Supporting material for: "Free breathingly acquired myocardial perfusion data sets for performance analysis of motion compensation algorithms". GigaScience Database. http://dx.
Executive Editor, GigaScience, BGI Hong Kong
Tel: +852 3610 3531
Mob: +852 92490853
Notes to News Writers:
GigaScience is co-published by BGI, the world's largest genomics organization, and BioMed Central, the world's first open-access publisher. The journal covers research that uses or produces 'big data' from the full spectrum of the life sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life sciences. The journal has a completely novel publication format -- one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, GigaDB (http://gigadb.