News Release

World-first platform for transparent, fair and equitable use of AI in healthcare

Revolutionary AI platform for detecting diabetic eye disease proven safe for NHS

Peer-Reviewed Publication

City St George’s, University of London

Professor Alicja Rudnicka

image: 

Professor Alicja Rudnicka, City St George's, University of London

view more 

Credit: Professor Alicja Rudnicka, City St George's, University of London

Researchers have developed the world’s first real-world head-to-head testing platform to determine whether commercial artificial intelligence (AI) algorithms are fit for NHS use to detect disease in a fair, equitable, transparent and trustworthy way, using diabetic eye disease as the first example.

They say that it removes any biases that can come from companies wanting to deploy their AI software in clinical settings, putting all companies on a level playing field.

Currently, NHS AI algorithm selection focuses on cost-effectiveness and matching human performance. However, broader challenges remain, particularly the need for robust digital infrastructure and more rigorous testing of commercial algorithms. Crucially, software used as medical devices has rarely been assessed for algorithmic fairness on a large scale, particularly across different populations and ethnicities. This oversight has led to unintended disparities in health, such as pulse oximeters used to measure oxygen saturation levels being less accurate on people with darker skin, prompting governmental review of equity of medical devices, including AI.

In a study published today in The Lancet Digital Health [1], researchers trialled the independent platform led by Professor Alicja Rudnicka at City St George’s, University of London and Adnan Tufail at Moorfields Eye Hospital NHS Foundation Trust, in collaboration with Kingston University and Homerton Healthcare NHS Trust. The platform was used to compare commercial AI algorithms designed to detect diabetic eye disease. These algorithms work by identifying signs of blood vessel damage at the back of the eye.

Of the four million people in England and Wales registered in the NHS diabetic eye screening programme, over three million people are screened for diabetic eye disease every one to two years. The English NHS screening service alone generates around 18 million images a year of the back of the eye, all of which are analysed by up to three different people. This generates a colossal and increasingly unsustainable workload, taking up valuable time, money and resource which the researchers say could be put towards better care provision.

Working with the forward-looking Homerton Healthcare NHS Trust and its progressive IT department, a ‘trusted research environment’ of independent researchers was built. A total of 25 companies with CE marked algorithms were invited to take part in the study and eight accepted.

The eight AI algorithms were ‘plugged in’ to the platform and run on 1.2 million images of the back of the eye from the North East London Diabetic Eye Screening Programme – one of the largest and most diverse diabetic screening programmes for ethnicity, age, deprivation level and spectrum of diabetic eye disease.

The performance of the eight algorithms was compared to images analysed by up to three humans who followed the standard protocol currently used in the NHS. Vendor algorithms did not have access to human grading data and companies were excluded from the data ‘safe haven’ where the images were being analysed by their algorithms.

Professor Alicja Rudnicka from the School of Health and Medical Sciences at City St Georges, University of London, who led the study, said:

“Our revolutionary platform delivers the world’s first fair, equitable and transparent evaluation of AI systems to detect sight-threatening diabetic eye disease. This depth of AI scrutiny is far higher than that ever given to human performance. We’ve shown that these AI systems are safe for use in the NHS by using enormous data sets, and most importantly, showing that they work well across different ethnicities and age groups.”

Co-principal investigator Adnan Tufail from Moorfields Eye Hospital said:

“There are more than 4 million patients with diabetes in the UK who need regular eye checks. This groundbreaking study sets a new benchmark by rigorously testing AI systems to detect sight threatening diabetic eye disease before potential mass rollout. The approach we have developed paves the way for safer, smarter AI adoption across many healthcare applications.”

In total, 202,886 screening visits were evaluated, representing 1.2 million images from 32% white, 17% Black, and 39% South Asian ethnic groups. The AI systems took just 240 milliseconds to 45 seconds to analyse all images per patient, compared with up to 20 minutes for a trained human.

The accuracy across the AI algorithms to identify diabetic eye disease potentially in need of clinical intervention was 83.7-98.7%. Importantly, accuracy was 96.7-99.8% for moderate-to-severe diabetic eye disease and 95.8-99.5% for the most advanced (proliferative) sight-threatening diabetic eye disease. This compares to a previously published study [2] where the accuracy of humans to manually grade images for these levels of diabetic eye disease ranged from 75% to 98%, showing that the AI algorithms performed the same as, or even better, than a human in a fraction of the time.

The platform also detected the rate of healthy cases being incorrectly flagged as having diabetic eye disease by each algorithm, another critical measure of accuracy. It showed that the algorithms performed consistently well across different ethnicity groups, the first time this has been assessed.

Professor Alicja Rudnicka added: “This work paves the way to expand the use of our platform from a local to national level.

“Our vision is to deliver centralised AI infrastructure that hosts approved algorithms, enabling all screening centres to upload retinal images securely for analysis. The AI-generated results would be returned to the centre and integrated directly into the patient’s electronic health record. This approach eliminates the need for duplicating infrastructure across multiple sites, reducing setup costs and ensuring consistent, equitable service delivery nationwide.”

The researchers state their platform benefits all – giving companies the opportunity to get independent feedback for improving their technology and for NHS trusts to select the AI tools that work best for them, making highly repetitive tasks more efficient so that people who do the screening can focus on higher risk disease and employing newer types of retinal scans. Patients will also ultimately benefit from much faster diagnosis and optimal care.

The unique and transparent approach could become the blueprint for evaluating AI tools across other chronic diseases such as cancer and heart disease, helping to build public trust and accelerate safe, equitable AI adoption in healthcare.

Professor Sarah Barman, who was involved in the study from Kingston University, said: “This large-scale evaluation of the effectiveness of AI algorithms has allowed us to demonstrate how different algorithms perform across subgroups of the population. It also provides a clear approach that can be applied to other medical domains to help ensure that AI is fair and works well for everyone.”

This study was funded by the NHS Transformation Directorate, The Health Foundation and Wellcome Trust.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.