News Release

AI-powered vision model accurately estimates occluded fruit size in vertical farming systems

Peer-Reviewed Publication

Nanjing Agricultural University The Academy of Science

By combining transformer-based amodal segmentation models with diffusion model–generated synthetic data, the study achieved remarkable accuracy in fruit size estimation. The transformer-driven Amodal Mask2Former model reduced size estimation errors by nearly 50 %, setting a new benchmark for automated phenotyping under complex occlusion conditions and paving the way for smarter greenhouse automation.

Accurately estimating fruit size directly on plants is essential for precision agriculture, enabling data-driven crop management and improving yield prediction. Traditional fruit detection and measurement in greenhouses remain challenging due to leaf occlusion, particularly in creeping cultivation systems where manual monitoring is labor-intensive. Although convolutional neural networks (CNNs) have long dominated agricultural image analysis, they often fail to infer occluded fruit regions. Recently, transformer-based vision architectures—originally developed for natural language processing—have demonstrated exceptional capacity for image understanding, motivating their use in crop phenotyping. Simultaneously, generative diffusion models have emerged as powerful tools for data augmentation, capable of producing realistic and diverse training images. Due to these challenges and opportunities, accurate estimation of fruit size under occlusion requires integrating advanced vision transformers and generative modeling techniques.

study (DOI: 10.1016/j.plaphe.2025.100097) published in Plant Phenomics on 21 August 2025 by Ghiseok Kim’s team, Seoul National University, provides a vital technological foundation for automated yield estimation, grading, and harvesting in precision agriculture.

The study first enhanced training data using a diffusion model that generated four synthetic images per real image, adding realistic artificial leaves to partially cover fruits and create controlled occlusion. The resulting dataset showed a near-Gaussian distribution of occlusion ratios from 0.05 to 0.65 (mean 0.31, SD 0.13), indicating diverse, natural-looking leaf coverage without severely distorting fruit shape. These augmented images were then used to train baseline instance segmentation models (Mask R-CNN, Mask2Former, DETR) and their de-occlusion variants. In these variants, “amodal” models predicted the full fruit shape, including hidden regions, while “occlusion-aware” models predicted both visible and hidden areas. Transformer-based Amodal Mask2Former delivered the highest segmentation accuracy (AP 85.92 %, AP@50 ≈99 %) and estimated fruit dimensions with mean absolute percentage errors as low as 4.86 % (height) and 5.33 % (diameter), cutting error by roughly half compared with conventional models. The de-occlusion models also generalized well to new greenhouse conditions, remaining robust under severe occlusion and variable lighting. Statistical analysis showed that occlusion ratio, rather than fruit size alone, drives estimation error, and that de-occlusion models consistently outperform standard models across most occlusion levels up to 70 %. Finally, an ablation study revealed that model performance increases sharply once at least ~75 % of the training data include amodal (full-shape) annotations, establishing a practical threshold for building effective agricultural vision datasets.

The ability to infer complete fruit shapes despite heavy occlusion enables continuous, non-destructive monitoring of fruit growth and quality. In vertically cultivated systems, the integration of diffusion-based data augmentation and transformer-driven segmentation could drastically reduce labor requirements while maintaining accuracy across growth stages. Moreover, this framework can be adapted for other greenhouse crops such as tomatoes, cucumbers, and peppers. The results demonstrate that transformer architectures not only improve prediction accuracy but also establish a scalable pathway toward intelligent, vision-based greenhouse automation systems.

###

References

DOI

10.1016/j.plaphe.2025.100097

Original URL

https://doi.org/10.1016/j.plaphe.2025.100097

Funding information

This work was supported by the Rural Development Administration (RDA) through the Cooperative Research Program for Agriculture Science and Technology Development [Project No. RS-2024-00440583].

About Plant Phenomics

Plant Phenomics is dedicated to publishing novel research that will advance all aspects of plant phenotyping from the cell to the plant population levels using innovative combinations of sensor systems and data analytics. Plant Phenomics aims also to connect phenomics to other science domains, such as genomics, genetics, physiology, molecular biology, bioinformatics, statistics, mathematics, and computer sciences. Plant Phenomics should thus contribute to advance plant sciences and agriculture/forestry/horticulture by addressing key scientific challenges in the area of plant phenomics.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.