Multimodal Integration of Physiological Signals Clinical Data and Medical Imaging for ICU Outcome Prediction
Published 2025-08-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This study proposes a multimodal feature fusion method that combines a Transformer and a convolutional neural network (CNN) for ICU patient outcome prediction. The method effectively integrates two complementary types of information: physiological waveforms and structured clinical data. It first uses a convolutional structure to extract local temporal patterns from waveform data, and then applies a Transformer encoder to capture long-range dependencies, thereby obtaining a more comprehensive dynamic feature representation. The structured clinical data are then mapped into a unified feature space and fused with waveform features through weighted integration, forming a combined representation that contains both global and local information. To validate the effectiveness of the model, systematic experiments are conducted on an ICU dataset containing multiple waveform signals and clinical records. The model's performance is evaluated under different regularization coefficients, dropout rates, convolution kernel sizes, pooling strategies, sequence lengths, sliding step sizes, and label noise levels. Experimental results show that the proposed method outperforms several existing approaches in accuracy, AUC, and F1-Score, and maintains strong robustness under various data perturbations and hyperparameter changes. Furthermore, comparative analysis and sensitivity experiments reveal how different design parameters in multimodal feature fusion affect performance, providing useful insights for model construction and optimization in similar tasks. The findings indicate that combining deep temporal modeling with multimodal feature fusion can achieve higher accuracy and stability in complex medical prediction tasks, offering a practical technical pathway for ICU clinical decision support systems.