News Release

Waking up to an interactive coffee cup of data

Researchers make available one of the largest coffee datasets for the industry to easily, cheaply, and interactively assess the validity of the variety under which the coffee is sold

Peer-Reviewed Publication


When coffee is sold as single origin or as the more expensive Arabica beans— do you really know whether you are getting what you’re paying for? Different coffee-producing regions need to enforce the standards and reputation of their coffee; thus, there is a growing industry looking at different technologies to more accurately classify and test coffee beans from different origins. Researchers in Columbia from Universidad del Valle and Universidad del Atlantico and the company Almacafe have taken steps toward making it easier to validate the variety under which the coffee is being sold. For this, they analyzed hundreds of coffee samples from multiple countries using highly sensitive Nuclear Magnetic Resonance (NMR), and made these data broadly available in an inexpensive and interactive manner; thus, allowing researchers to look at their coffee to see ‘what’s in that cup’ (or should be.) This study was published in the open science journal GigaByte1.

NMR is an extremely sensitive technique that provides very detailed information, down to the level of molecular structure, about the contents of any sample analyzed. NMR has long been the gold standard in medical and pharmacology studies for content identification, but it is less often used in the food industry as it has been far too expensive for more general use. To open up the use of this technique in the coffee sector, the researchers here gathered 715 coffee samples from 27 different countries and used NMR to obtain detailed information on the content of those samples. They then made all of these data openly available for general use.

The researchers have primarily been engaged in using their technique to aid the Colombian Coffee Federation to enforce the Protected Geographical Indication (GPI) that monitors agricultural products, such as Columbian coffee, whose quality and reputation is linked to a specific geographical area. For this they have primarily been involved in using different technologies to classify coffee beans from different origins. However, it quickly became apparent to the scientists that NMR could also give very accurate information about coffee quality. 

Lead author Julien Wist from Universidad del Valle noted that “Although roasting is very important as it can ruin the best beans, it is impossible to make good coffee out of bad beans.”  With a hint of humor, he added: “Our research group has had a wonderful time working with coffee samples. The whole lab was, for once, smelling nice. The sample preparation is so simple that we just prepared coffee— cold for the [NMR] magnet and hot for us!”

Most important to this work, the authors have made this huge collection of samples and spectra freely available so that it can be shared without restriction quickly, cheaply — and interactively. Readers can directly engage with these gigabytes of NMR data because, in addition to the datasets, the authors have made software called the NMRium-browser2 available so readers can look through the spectra for themselves. NMRium is the newest iteration of a project that started 2 decades ago to bring NMR spectra to the browser.

Wist says of this interactive paper: "Visualization of data is often difficult and requires expensive pieces of software. Often, the consequence is that data is overlooked and simply fed into a black box. I think the first step should always be to look at the data. NMRium does that in the browser and for free"

By sharing what is thought to be the largest available database of NMR spectra of coffee samples, researchers across the world now have a baseline to look at the effectiveness of the technology for applications such as determining coffee origin, purity and adulteration, as well as the effect of roasting.

Making large data sets interactive within the article is possible due to their publishing the work in the journal GigaByte, which uses custom-built publishing technology that includes the ability to integrate interactive content. Making the data available and interactive as part of the publishing process increases trust in article content and creates living documents rather than the more common publishing-industry standard of posting articles online in a static format. Other GigaByte articles have included many different types visualization tools as is best suited to the data being presented, these include Hi-C maps2, 3D imaging viewers3 that can run on VR-headsets, interactive maps4, and interactive protocols5. These types of embedded interactive tools showcase new things that can be done in publishing, and demonstrate this more hands-on approach as a way to share research in a manner better suited to communicate modern research and data — Even to the point of letting readers explore the contents of their morning cup of coffee.


Further Reading
1. Osorio J et al. 1D and 2D NMR spectra of coffee from 27 countries. GigaByte, 2022

2. NMRium example: 

3. Lamb, S et al. De novo chromosome-length assembly of the mule deer (Odocoileus hemionus) genome, Gigabyte, 2021

4. Lehmann, P et al X-ray micro-tomographic data of live larvae of the beetle Cacosceles newmannii, Gigabyte, 2021

5. Zarkogiannis, S et al. X-ray tomographic data of planktonic foraminifera species Globigerina bulloides from the Eastern Tropical Atlantic across Termination II Gigabyte, 2020

6.  Shippy, TD et al. Annotation of Hox cluster and Hox cofactor genes in the Asian citrus psyllid, Diaphorina citri, reveals novel features, Gigabyte, 2022


Contact information:

Scott Edmunds, PhD, Chief Editor
Tel: +852 3610 3531 Cell: +852 92490853

Sharing on social media?

Find GigaByte online on twitter @GigaByte; Facebook, and keep up-to-date with our blog


About GigaScience Press

GigaScience Press is BGI's Open Access Publishing division, which publishes scientific journals and data. Its publishing projects are carried out with international publishing partners and infrastructure providers, including Oxford University Press and River Valley Technologies. It currently publishes two data-centric journals: its premier journal GigaScience (launched 2012) and its new journal GigaByte (launched 2020). It also publishes data, software, and other research objects via its database. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to all GigaScience Press journals that all supporting data and source code be made available in GigaDB or in a community approved, publicly available repository. See


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.