News Release

The most visited websites do not comply correctly with privacy laws and track their users

In addition to cookies, there are other widely used web-tracking techniques that are not well known to the public, such as web beacons

Peer-Reviewed Publication

Universitat Oberta de Catalunya (UOC)

Only a small percentage of the 500 most visited websites in Spain (which include everything from government sites to streaming and adult content platforms) correctly fulfil the requirements set out in the General Data Protection Regulation (GDPR). This is one of the main findings of a study involving researchers from the Universitat Oberta de Catalunya (UOC), the University of Girona and the Center for Cybersecurity Research of Catalonia (CYBERCAT).

The results, which are published in open access in the scientific journal Computers & Security under a Creative Commons licence, were reached using novel automated methods for analysing web-tracking techniques and compliance with internet privacy regulations.

In addition to the incorrect and non-consensual use of cookies, these analysis algorithms detected the use of web-tracking techniques that are little known to the average user, such as web beacons and technologies based on the browser's digital fingerprint.

 

Widespread non-compliance with privacy laws

The European Parliament's approval of the General Data Protection Regulation in 2016 was set to forever change how companies, websites and digital platforms manage users' personal data. The European regulation, which was transposed in Spain as the Organic Law on the Protection of Personal Data and Guarantee of Digital Rights in 2018, was supposed to mark a turning point in the protection of citizens' privacy. However, six years later, the actual implementation of this regulation is progressing at a faltering pace.

"We found that websites still have a long way to go to correctly implement the requirements set out in the General Data Protection Regulation," explained Cristina Pérez-Solà, who took part in analysing this issue as a researcher at the UOC's Faculty of Computer Science, Multimedia and Telecommunications. She said: "Many of the websites analysed inform users of the use of cookies, but either do not wait for their consent to use them or acquire this consent improperly."

For this study, the team of researchers developed several algorithms to analyse the 500 most visited websites in Spain according to the Alexa ranking. The results revealed a high percentage of sites that lack an appropriate form to obtain users' consent for the use of cookies and other data collection tools. The analysis tools also detected the use of nearly 7 tracking cookies on average per website and 11 web beacons, which are small pieces of code embedded in the site to invisibly collect certain types of information from web traffic. In addition, 10% of the sites analysed in the study use browser fingerprinting techniques, which are also difficult to detect.

According to Pérez-Solà, an expert in web security and privacy, "The purpose of all these techniques is usually to track the online behaviour of web users in order to create profiles that can then be used to adjust the advertising that will be shown or the prices that will be offered for services or products." The analysis carried out by the researchers from the UOC (Pérez-Solà and Albert Jové) and the University of Girona (David Martínez and Eusebi Calle) shows that only 8.91% of websites that obtain users' consent as required apply this consent successfully in practice.

 

New algorithms to analyse compliance with the GDPR

Beyond the analysis results, the importance of this research lies in the algorithms used to study compliance with online privacy laws. The sheer number of pages and platforms on the internet makes it imperative to automate the process, as studying each case manually would be infeasible. Besides, some of the web-tracking techniques used are extremely hard to detect, with no clear markers to indicate their presence. To overcome these challenges, the researchers developed a proprietary method involving four algorithms and a measure – the Websites Level of Confidence – to assess the state of regulatory compliance.

"Our method uses a combination of automation and manual inspection. The implemented algorithms automatically browse the analysed websites and take screenshots that are then manually inspected," said Pérez-Solà; "In order to detect web-tracking techniques, we also used a tool developed by the European Data Protection Supervisor called the Website Evidence Collector. This tool is designed to perform privacy inspections on websites and makes it possible to detect the use of cookies, web beacons and browser fingerprinting tools."

Each of the algorithms used by the researchers has a well-defined function:

  • The Consent Inspector Algorithm (CIA) captures clear images of the website's cookie banners and identifies buttons that should allow users to customize the use of these tracking elements.
  • The Website Evidence Collector (WEC) gathers information on the different web-tracking techniques being used on each website.
  • The Cookies Detector Algorithm (CDA) categorizes the cookies that websites use in the browsers without user consent, based on the data provided by the WEC.
  • The Web Beacons Detection Algorithm (BDA) not only extracts web beacons detected by the WEC, but also identifies browser fingerprinting techniques.

"Our study focuses on analysing compliance with the General Data Protection Regulation by the most visited websites in Spain," Pérez-Solà added; "We selected the 500 most visited websites according to the Alexa ranking and analysed their use of these web-tracking techniques as well as the information they give to users and the alternative options they provide them with. Finally, we compiled the results of this analysis into a measure, the Websites Level of Confidence, which makes it possible to assess the current state of compliance."

"Understanding the details of the regulations that apply at any given time and knowing how to tell what techniques a website is using are beyond the grasp of most users," she concluded; "Our proposed Websites Level of Confidence (WLoC) measure provides users with insight into the compliance status of the most popular websites and lets them see how it changes over time without the need for legal or technical knowledge."

 

This research supports Sustainable Development Goal (SDG) 9, Build resilient infrastructure, promote sustainable industrialization and foster innovation.

 

UOC R&I

The UOC's research and innovation (R&I) is helping overcome pressing challenges faced by global societies in the 21st century by studying interactions between technology and human & social sciences with a specific focus on the network society, e-learning and e-health.

Over 500 researchers and more than 50 research groups work in the UOC's seven faculties, its eLearning Research programme and its two research centres: the Internet Interdisciplinary Institute (IN3) and the eHealth Center (eHC).

The university also develops online learning innovations at its eLearning Innovation Center (eLinC), as well as UOC community entrepreneurship and knowledge transfer via the Hubbik platform.

Open knowledge and the goals of the United Nations 2030 Agenda for Sustainable Development serve as strategic pillars for the UOC's teaching, research and innovation. More information: research.uoc.edu.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.