Published 2025-03-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This paper proposes a multimodal Transformer-based model for medical insurance claim adjudication, aiming to enhance the accuracy of claim decisions and improve risk control in the medical insurance domain. By integrating text data with structured data, the model can comprehensively analyze various data sources, such as customer information, medical records, and claim applications, to capture potential risks, particularly those related to fraudulent claims and overclaims. Using the Transformer architecture, the model leverages the self-attention mechanism to perform a weighted fusion of different data modalities, making information extraction more efficient and accurate. Experimental results show that the proposed model significantly outperforms traditional machine learning algorithms and deep learning models, such as XGBoost, random forests, and VGG16, in metrics like AUC, accuracy, and F1-Score, validating the advantages of multimodal learning in medical insurance claim adjudication. Additionally, the study explores hyperparameter tuning, examining the impact of factors such as learning rate and data modalities on model performance. Ultimately, the integration of multimodal data improves the accuracy of claim adjudication, offering insurance companies more scientific and reliable risk management tools. This research provides important theoretical foundations and practical guidance for the application of multimodal learning in the financial sector, particularly in the medical insurance industry, and lays the groundwork for future related studies.