Article Highlight | 24-Jul-2025

Visual question answering model enables smarter diagnosis of crop diseases

Nanjing Agricultural University The Academy of Science

This research combines deep learning, visual question answering (VQA), and informed learning to bridge the gap between human-level understanding and machine-driven crop diagnostics. ILCD integrates a coattention mechanism, multimodal fusion (MUTAN), and a bias-balancing (BiBa) strategy to enhance performance across diverse visual questions. The work  sets a new benchmark for agricultural AI applications.

Crop diseases account for 10–30% of global agricultural losses annually, according to the Food and Agriculture Organization. Traditionally, experts rely on detailed visual inspection—analyzing crop parts, local symptoms, and specific features like color or lesion shape—to diagnose problems. This multi-step process, while thorough, is slow and resource-intensive. Advances in AI and computer vision have enabled disease image classification, but existing models struggle to address nuanced questions about disease attributes. These limitations underscore the need for smarter, multimodal solutions capable of mimicking expert reasoning.

study (DOI: 10.34133/plantphenomics.0277) published in Plant Phenomics on 16 December 2024 by Shansong Wang’s and Qingtian Zeng’s team, Shandong University of Science and Technology, represents a critical advancement in intelligent agriculture, where timely and accurate diagnosis can significantly reduce yield losses and guide targeted intervention strategies.

To evaluate the effectiveness of the proposed ILCD model, researchers conducted comparative and ablation experiments across three visual question answering (VQA) datasets: VQA-v2, VQA-CP v2, and the newly developed CDwPK-VQA. VQA-v2, a standard benchmark, includes over 1.1 million question–answer (QA) pairs and is known for its inherent unimodal bias. VQA-CP v2, designed to counter such bias, offers a reshuffled version of VQA-v2 with altered answer distributions to assess model robustness. CDwPK-VQA, curated for agricultural applications, contains 19 crop diseases across 10 crops, with detailed multiattribute annotations and embedded prior knowledge, supporting 22,320 QA pairs. Experiments were conducted using PyTorch on an NVIDIA 3090 GPU, and accuracy was evaluated across three question types: yes/no, number, and other. Results showed that ILCD consistently outperformed conventional VQA models across all datasets, particularly excelling in “number” and “other” categories on VQA-CP v2, highlighting its ability to mitigate unimodal bias. On CDwPK-VQA, ILCD achieved the highest overall accuracy (86.06%), demonstrating strong semantic comprehension and generalization capacity. Additional ablation studies verified the contribution of ILCD’s core components—Inception-v4 for image features, LSTM for text, the coattention mechanism, MUTAN for multimodal fusion, and the BiBa bias-balancing strategy. MUTAN with tanh activation outperformed other fusion techniques, while BiBa proved effective in reducing dataset-driven biases. Furthermore, incorporating prior knowledge significantly enhanced model performance, especially when combined with self-attention encoding. Inception-v4 was the top-performing image encoder, and LSTM provided a balance of efficiency and accuracy as a text encoder. Qualitative visualizations revealed ILCD’s capacity to attend to relevant disease features and provide contextually accurate answers, even when facing skewed training data or unfamiliar queries. Collectively, these results affirm ILCD’s robustness, interpretability, and potential for real-world agricultural diagnostics.

ILCD offers broad applications for agricultural monitoring, particularly in remote or resource-limited regions. By accurately interpreting complex visual cues, it can aid early disease identification and guide customized treatment protocols, minimizing pesticide misuse and maximizing crop health. The dataset and model architecture are publicly available, providing a foundation for future research and deployment in smart farming platforms and crop management systems.

###

References

DOI

10.34133/plantphenomics.0277

Original Source URL

https://doi.org/10.34133/plantphenomics.0277

Funding information

This research is supported by the National Key R&D Program of China (2022ZD0119501), the NSFC (52374221), the Sci. & Tech. Development Fund of Shandong Province of China (ZR2022MF288 and ZR2023MF097), and the Taishan Scholar Program of Shandong Province(ts20190936).

About Plant Phenomics

Science Partner Journal Plant Phenomics is an online-only Open Access journal published in affiliation with the State Key Laboratory of Crop Genetics & Germplasm Enhancement, Nanjing Agricultural University (NAU) and distributed by the American Association for the Advancement of Science (AAAS). Like all partners participating in the Science Partner Journal program, Plant Phenomics is editorially independent from the Science family of journals. Editorial decisions and scientific activities pursued by the journal's Editorial Board are made independently, based on scientific merit and adhering to the highest standards for accurate and ethical promotion of science. These decisions and activities are in no way influenced by the financial support of NAU, NAU administration, or any other institutions and sponsors. The Editorial Board is solely responsible for all content published in the journal. To learn more about the Science Partner Journal program, visit the SPJ program homepage.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.