Introduction to the New AI Evaluation Platform
Researchers at City St George’s, University of London, have unveiled a pioneering platform designed to rigorously test the fairness and accuracy of commercial artificial intelligence (AI) algorithms used in screening for diabetic eye disease. This platform is the first of its kind to offer a real-world, head-to-head comparison of AI systems, ensuring they are suitable for use within the National Health Service (NHS) in a fair, equitable, and transparent manner.
Addressing Bias and Ensuring Fairness
The platform aims to eliminate biases that may arise from companies eager to deploy their AI software in clinical environments. By providing a level playing field, it allows for an unbiased assessment of AI algorithms. Traditionally, the NHS has selected AI algorithms based on cost-effectiveness and their ability to match human performance. However, this approach has not adequately addressed broader challenges, such as the need for robust digital infrastructure and comprehensive testing of commercial algorithms.
Challenges in Algorithmic Fairness
A significant oversight in the use of AI as medical devices has been the lack of large-scale assessments for algorithmic fairness, particularly across diverse populations and ethnicities. This gap has led to unintended health disparities, as seen with pulse oximeters that inaccurately measure oxygen saturation levels in individuals with darker skin tones. Such issues have prompted governmental reviews of the equity of medical devices, including AI.
Study and Methodology
The study, published in The Lancet Digital Health, was led by Professor Alicja Rudnicka and Adnan Tufail, in collaboration with Kingston University and Homerton Healthcare NHS Trust. The platform was used to evaluate commercial AI algorithms designed to detect diabetic eye disease by identifying signs of blood vessel damage in the eye. Of the four million people in England and Wales registered in the NHS diabetic eye screening program, over three million undergo screening every one to two years, generating approximately 18 million images annually.
Implementation and Results
The study involved a ‘trusted research environment’ created with Homerton Healthcare NHS Trust, where 25 companies with CE-marked algorithms were invited to participate, and eight accepted. These algorithms were tested on 1.2 million retinal images from the North East London Diabetic Eye Screening Program, one of the most diverse programs in terms of ethnicity, age, and deprivation levels.
The performance of the algorithms was compared to human analysis, with the AI systems demonstrating an accuracy range of 83.7-98.7% for detecting diabetic eye disease. For moderate-to-severe cases, accuracy was 96.7-99.8%, and for the most advanced cases, it was 95.8-99.5%. These results show that AI algorithms can match or even surpass human performance in a fraction of the time.
Implications for Future AI Use
The platform also assessed the rate of false positives, where healthy cases were incorrectly flagged as having diabetic eye disease. The algorithms performed consistently well across different ethnic groups, marking the first time such an assessment has been conducted. Professor Rudnicka emphasized the potential to expand the platform’s use from a local to a national level, envisioning a centralized AI infrastructure that hosts approved algorithms for nationwide screening.
Benefits and Future Applications
The platform offers significant benefits, providing companies with independent feedback to improve their technology and enabling NHS trusts to select the most effective AI tools. This approach could streamline repetitive tasks, allowing healthcare professionals to focus on higher-risk cases and newer retinal scans. Patients would benefit from faster diagnoses and optimal care.
The transparent evaluation method could serve as a model for assessing AI tools in other chronic diseases, such as cancer and heart disease, fostering public trust and accelerating the safe, equitable adoption of AI in healthcare.
Conclusion
Professor Sarah Barman from Kingston University highlighted the study’s success in demonstrating the performance of different algorithms across population subgroups. The approach provides a clear framework that can be applied to other medical domains, ensuring AI is fair and effective for all.
For more information, refer to the study “Accuracy of automated retinal image analysis systems (ARIAS) to triage for human grading to detect diabetic retinopathy in a large-scale, multiethnic national screening programme” published in The Lancet Digital Health.
🔗 **Fuente:** https://medicalxpress.com/news/2025-11-platform-ai-fairness-accuracy-diabetic.html