Spanish scientists have developed a new piece of software to predict the source of faecal pollution in seas, reservoirs and rivers. The system, called Ichnaea, uses the automatic learning and analysis of various biological indicators to make highly reliable predictions of this type of pollution, which poses a serious health risk. The team is now looking for funding to move the whole application to the cloud.
Faecal pollution is increasingly more common in rivers and water reserves. The concentration of towns increases the demand for water and also generates a high volume of waste water from both humans and animals. For this reason, a multidisciplinary team of Spanish biologists and IT experts have decided to develop a new system which helps to identify the source of the faecal pollution present in the water.
"Identifying the species to which the traces belong would help in resolving conflicts about who is responsible for the faecal pollution of a river: a farm, an abattoir, a sewage treatment plant or a human population nucleus, for example," as Anicet R. Blanch tells SINC. Blanch is microbiologist at the University of Barcelona, and co-director of the project together with Lluis Belanche, from the Polytechnic University of Catalonia.
Microbial, chemical or eukaryotic indicators
According to Blanch, the new piece of software, which they have called Ichnaea (ancient Greek for 'tracker'), "is based on the development of prediction models from the analysis of a series of microbial, chemical or eukaryotic indicators. This information allows the source of the sample to be determined, even in complex cases in which the faecal pollution is very diluted or deteriorated," Blanch highlights.
To be able to elaborate on these predictions, the analysis results of several parameters of other water samples with a single known source of faecal pollution have to be previously entered into the system. "Using this data, the software determines the relevant indicators, which when analysed in water samples with an unknown source of faecal pollution, would allow its source to be determined," he adds.
"Up until now, each research group proposed the indicators which they believed to be the most important, but Ichnaea eliminates the subjectivity by selecting the most essential for a reliable prediction from among the different variable parameters," the researcher explains.
Reliability of the prediction
Some of these relevant indicators are microbial parameters such as bacteriophages (viruses which infect bacteria) linked to a single species. Others related to bifidobacteria (a group of bacteria that live in the intestine) often appear or the mitochondrial DNA of the specimens, indicates the expert.
The prediction's degree of reliability depends on the number and the quality of the samples used to train the computer learning models, the parameters that have been analysed and their relevance, given that the presence of certain bacteria varies depending on the geographical location. The freshness and degree of dilution of the samples to be analysed also play a part.
The scientists compared the effectiveness of this system on three sites with different degrees of faecal pollution of human and animal (cattle, pig or poultry) origin. The predictions made ascertained the source in areas with high and medium dilution. Where the dilution was low, in irrigation channels in the Delta del Ebro, a high range of probability was established.
Analysis from the cloud
This software is currently still in its prototype stage, as its components have been developed separately. The researchers are now looking for funding which will allow them to refine and integrate the different modules into a single platform for calculation. They plan to move the whole application to the cloud.
"The idea is that anyone can access the system from any computer, even a tablet, because the calculation is done from a remote machine," explains Blanch.
According to the team's plans, the user will be able to adapt the prediction to their geographical location with a customised setting. This will allow them to provide the software with their own samples of known origin, to complete the learning stage which will subsequently allow the predictions to be made.
In the event that a user does not have their own samples for this training, they could use the system's database, which will include results left with open access by other scientists. In this way, they will be able to choose the analysis of the closest or most similar geographical areas to that which is to be studied.
Blanch considers that knowing the source of the pollution is also important from a health-risk perspective, "given that human pathogens present in water are significantly more contagious than those of animal origin," concludes the scientist.
Blanch, A. R. et al. "Predicting fecal sources in waters with diverse pollution loads using general and molecular host-specific indicators and applying machine learning methods". Journal of Environmental Management 151: 317-325, 2015. http://www.