Vol. 4 No. 7 (2025)
Articles

Joint Modeling of Medical Images and Clinical Text for Early Diabetes Risk Detection

Published 2025-07-30

How to Cite

Zi, Y., & Deng, X. (2025). Joint Modeling of Medical Images and Clinical Text for Early Diabetes Risk Detection. Journal of Computer Technology and Software, 4(7). https://doi.org/10.5281/zenodo.16776999

Abstract

This study addresses key challenges in early diabetes prediction, including complex data modalities, heterogeneous semantic distributions, and hidden risk signals. A multimodal data fusion method based on electronic health records is proposed. The method takes structured medical image data (EyePACS) and unstructured clinical text records (MIMIC-III/IV) as core inputs. Constructing a unified temporal alignment mechanism and semantic embedding strategy, enables dynamic association modeling across modalities. In terms of model architecture, a collaborative mechanism between CNN and Transformer is introduced. A channel attention module is integrated to enhance the depth of modality interaction and focus on critical features. In addition, time interval embeddings are employed to strengthen the model's perception of event progression rhythms. A series of comparative experiments, ablation tests, and sensitivity evaluations are conducted. The model is systematically assessed across dimensions such as modality dependence, noise robustness, data distribution, and temporal granularity. The results show that the proposed method demonstrates strong stability and discrimination capability when processing multi-source heterogeneous electronic health record data. It effectively captures early risk patterns associated with diabetes. The method provides accurate data support and model support for the automatic identification of high-risk individuals.