Article Highlight | 10-Jun-2026

Expert consensus fills critical data gap for reliable AI in dry eye care

Classification, annotation standards, and quality control protocols for five major dry eye imaging modalities to support high-quality AI datasets

Chinese Medical Journals Publishing House Co., Ltd.

**image:**
**NIBUT software analysis. NIBUT: noninvasive tear film breakup time.**
view more

Credit: Intelligent Medicine. Image source link: https://www.sciencedirect.com/science/article/pii/S2667102625001378

As artificial intelligence advances in ophthalmology, one challenge has become increasingly clear: AI systems are only as strong as the data used to train them. Dry eye disease affects hundreds of millions of people worldwide, yet its diagnosis remains stubbornly inconsistent. The same patient examined at different clinics, with different instruments, or assessed by different physicians, can receive markedly different evaluations. AI holds genuine promise for standardizing this process, but that promise has been stalled by a problem that exists upstream of every algorithm: the absence of unified standards for classifying and annotating the imaging data used to train them.

A new expert consensus published in Intelligent Medicine (February 2026, Volume 6, Issue 1) addresses this gap directly. Developed by a multidisciplinary working group of more than 70 ophthalmologists, AI researchers, and medical imaging specialists from over 40 leading institutions across China, Hong Kong, Singapore, the United Kingdom, and Europe, the consensus is the first to systematically establish a shared framework for the classification, annotation, workflow management, and quality control of dry eye imaging data in AI applications. Existing clinical guidelines, including TFOS DEWS II, define diagnostic criteria and clinical signs but do not provide the AI-trainable labels, annotation boundaries, or data quality standards needed to build generalizable models. This consensus fills that gap by targeting the chain from inconsistent labeling to non-comparable datasets to poor algorithmic generalizability that has held the field back.

The framework spans the five modalities that form the diagnostic backbone of modern dry eye evaluation, and goes beyond naming them to specify exactly how images within each modality should be classified and labeled. For tear film lipid layer images, a dual approach is specified: a three-tier color scoring system (thin, moderately thick, and thick, scored 0 to 2) alongside a seven-level grading system correlating interference morphology with lipid layer thickness from below 15 nm to above 120 nm. Interference pattern subtypes, including pearl-like, Jupiter-like, and crystal-like appearances, are mapped to specific dry eye clinical subtypes, enabling AI-assisted subtype classification. For tear meniscus height, the consensus resolves a common and consequential annotation error: the true lower edge of the tear meniscus is the junction of tear fluid and eyelid skin, not the Placido ring projection as many annotators have assumed. For tear film breakup time, separate protocols govern the non-invasive and fluorescein-based methods, including standards for frame-by-frame video analysis of rupture onset. For corneal fluorescein staining, a four-grade severity classification is paired with a nine-point three-region scoring system, together encoding the full spectrum of ocular surface damage into AI-trainable labels. For meibomian gland imaging, both infrared meibography and in vivo confocal microscopy are covered. The consensus specifically flags the widespread error of including "ghost glands" in measurements of functional gland area. These are atrophied, non-secreting remnants that appear as low-contrast shadows and should be excluded from annotation.

Beyond classification schemes, the consensus specifies a systematic quality assurance architecture governing the entire data pipeline. Annotators must hold ophthalmic clinical expertise rather than merely image-processing training, and must demonstrate proficiency with the relevant imaging modalities. Image pre-processing requirements define exclusion criteria for low-quality images affected by jitter, blur, over-exposure, or reflection interference, and permit only controlled operations such as cropping, contrast adjustment, and noise reduction. Inter-annotator agreement is to be monitored using the kappa coefficient with regular recalibration, and all annotation outputs, both qualitative and quantitative, must be systematically recorded and integrated into structured inspection reports.

The consensus also addresses the structural difficulties facing the field. Uneven data quality across institutions is met with standardized acquisition protocols and automated image-quality screening. Limited algorithm generalizability is tackled via multi-center collaboration, data augmentation, and transfer learning. Restricted cross-institutional data sharing is addressed through federated learning and privacy-preserving de-identification. The persistent gap between laboratory AI performance and real-world clinical deployment is confronted by embedding real-world data into development cycles and tightening clinician-engineer co-design.

Why this matters extends beyond the technical. As AI tools in ophthalmology move toward clinical deployment and regulatory review, training data quality is becoming the pivotal determinant of model performance and generalizability. A model trained on inconsistently annotated tear film images may perform well in its home institution and fail everywhere else. By establishing a shared, clinically grounded language for dry eye imaging data, one that sits in the unaddressed space between clinical diagnostic standards and AI model evaluation guidelines, this consensus creates the foundational infrastructure needed for models capable of genuine scalability across hospitals, diverse patient populations, and real-world clinical workflows.

The consensus was developed under the auspices of the Ophthalmic Imaging and Intelligent Medicine Branch of the China Medical Education Association. Work began in March 2024 with systematic literature review, followed by multiple rounds of expert deliberation and iterative revision. The English-language version published in Intelligent Medicine is consistent with the Chinese version published in the Chinese Journal of Experimental Ophthalmology.

***

Reference
DOI: 10.1016/j.imed.2025.05.012

About the Journal
Intelligent Medicine is a peer-reviewed, open-access journal focusing on the integration of artificial intelligence, data science, and digital technology in clinical medicine and public health. It is published by the Chinese Medical Association in partnership with Elsevier. To learn more about Intelligent Medicine, please visit: https://www.sciencedirect.com/journal/intelligent-medicine

Funding information
This work was supported by National Administration of Traditional Chinese Medicine Science and Technology Department-Zhejiang Provincial Administration of Traditional Chinese Medicine Co-construction Science and Technology Plan (GZY-ZJ-KJ-23086); Sanming Project of Medicine in Shenzhen (SZSM202411007).

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.