Politecnico di Milano against deepfakes: two projects to detect fake video and audio
Politecnico di Milano
video: The Politecnico di Milano platform recognizes an image generated by artificial intelligence
Credit: Politecnico di Milano
Two European research projects on the detection of deepfakes and on mitigating their spread have come to an end: FF4ALL and FUN-Media. The Image and Sound Processing Lab (ISPL) at Politecnico di Milano, funded by the National Recovery and Resilience Plan (NRRP) funds, analyzed emerging phenomena linked to the generation of synthetic images and videos for FF4ALL, while in FUN-Media the lab focused on the detection of vocal deepfakes, one of the most significant emerging threats in the digital security landscape. The results of the projects mark an important step towards the development of reliable technologies for the protection of digital information, the fight against disinformation and the protection of users in an increasingly complex and dynamic media ecosystem.
For several years, the Image and Sound Processing Lab (ISPL) of the Department of Electronics, Information and Bioengineering of Politecnico di Milano has been engaged in the development of advanced techniques for multimedia forensic analysis. The activities in this field are coordinated by Professors Stefano Tubaro and Paolo Bestagini, with the contribution of assistant professors Sara Mandelli and Luca Comanducci.
FF4ALL
ISPL researchers investigated the ways in which fake images and videos are engineered and disseminated. For instance, ISPL explored techniques that make it possible to transform real images into extremely realistic synthetic versions, making verification of their authenticity more complex and masking traces that are paramount for forensic analysis. At the same time, the laboratory developed new tools for the detection of synthetic faces, combining three-dimensional geometric information and structural facial features. These solutions improve the generalization ability of forensic detectors and maintain good performance even in the presence of post-processing operations, such as compression or editing.
«A further contribution concerns the study of the systems used to detect deepfakes» explains Professor Stefano Tubaro. «Understanding which elements they base their decisions on is in fact crucial to increasing their reliability». Artificial-intelligence (AI)-based models, trained on large amounts of data, are often difficult to interpret and it is not always clear which characteristics are used to classify content. Along these lines, techniques were developed to identify the area of the face most relevant for classification, making the decision-making processes of detectors more transparent.
The project activities also included analysis of the impact of new AI-based compression technologies, which can generate artifacts that make it difficult to distinguish authentic images subjected to such compression from synthetic or manipulated data.
Lastly, in collaboration with the project’s partner universities, it was released the WILD dataset, collecting fake images generated by twenty state-of-the-art models: this is an important resource for identifying the generative technology used to synthesize an image.
FUN-MEDIA
With FUN-Media, the focus was on the detection of vocal deepfakes. To address this challenge, ISPL researchers developed new architectures based on the so-called Mixture of Experts models, capable of combining multiple specialized systems to improve deepfake detection performance even in the presence of generative techniques never seen during training. These approaches offer greater flexibility and adaptability than traditional detectors, proving particularly effective in complex and constantly evolving scenarios.
A further line of research explored the use of forensic detectors based on anomaly detection. In this case, the models are trained exclusively on authentic voice signals to learn their distinctive characteristics and are therefore able to identify synthetic data as deviations from the expected behaviour.
«Alongside detection, the project also addressed the problem of attribution, that is, the identification of the generative technology responsible for the creation of a piece of audio content» says Professor Paolo Bestagini. ISPL developed detectors that can establish whether two voice tracks were produced by the same generative model. Further contributions concern the development of techniques for detailed analysis of the voice signal, for instance at the phoneme level, and the use of models capable of highlighting the most relevant acoustic characteristics for detection.
THE BACKGROUND
In recent years, rapid advances in generative artificial intelligence models have profoundly transformed the way digital contents such as images, videos and audio are produced, shared and consumed. While on the one hand these technologies open up new creative opportunities and applications, on the other hand they introduce significant risks in terms of security, disinformation and content manipulation. In particular, they make it possible to impersonate individuals with increasing effectiveness and to generate extremely realistic content, often lacking obvious traces of alteration. This scenario amplifies the risk of social engineering attacks and the large-scale spread of fake news, making manipulated content increasingly credible and difficult to detect.
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.