Researchers Develop Vision-Language Foundation Model for Enhancing Medical Diagnostics

Date:30-09-2024   |   【Print】 【close

A research team led by Prof. WANG Shanshan at the Shenzhen Institute of Advanced Technology (SIAT) of the Chinese Academy of Sciences, together with collaborators, has developed a chest X-ray vision-language foundation model, MaCo, reducing the dependency on annotations while improving both clinical efficiency and diagnostic accuracy.

The study was published in Nature Communications on Sept.2.

The rapid evolution of machine learning has driven notable advancements in automated diagnostic systems (ADS), boosting their performance in critical tasks like disease detection and lesion quantification. However, current methods, often relying on task-specific models, require significant computational resources and large amounts of labeled data. This heavy reliance on extensive annotations has impeded the widespread adoption of ADS in medical applications.

To address this problem, researchers integrate expert medical knowledge while harnessing the advantages of pretext tasks and contrastive learning within the proposed model, MaCo. They also introduce a novel correlation weighting mechanism to enhance the effectiveness of masked contrastive learning by prioritizing the importance of masked regions. This strategy enables MaCo to significantly improve diagnostic accuracy while reducing its dependence on large annotated datasets. Impressively, it retains a degree of anomaly recognition and localization capabilities even without annotations.

Researchers employed six well-known open-source X-ray datasets to perform a range of label-efficient fine-tuning tasks, such as classification, segmentation, and detection, comprehensively evaluate the effectiveness of MaCo.

Experimental results showed that MaCo outperformed more than 10 state-of-the-art methods in tasks using varying levels of annotation. The outstanding performance of MaCo in zero-shot learning tasks underscores its potential to significantly reduce annotation costs in medical applications. This showcases its ability to provide enhanced diagnostic performance while significantly reducing the need for extensive manual annotations.

"Our model addresses the challenge of limited annotations by reducing the burden of manual labeling while maintaining high diagnostic accuracy. We believe that MaCo sets a new benchmark for foundational models in the field of medical AI," said Prof. WANG.

An illustration of the masked contrastive learning strategy employed in MaCo (Image by SIAT)


Media Contact: LU Qun

Email: qun.lu@siat.ac.cn


Download the attachment:

Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning