Self-Supervised Credit Scoring with Masked Autoencoders: Addressing Data Gaps and Noise Robustly
Published 2024-11-30
How to Cite
This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This study explores the use of Masked Autoencoders within a self-supervised learning framework to enhance credit scoring models, particularly in handling incomplete and noisy financial data. Traditional models in credit scoring face limitations when dealing with missing values and high data variability; however, Masked Autoencoders address these issues by masking portions of the input data and reconstructing them during training. This enables the model to effectively learn robust feature representations without relying on fully labeled or complete datasets. In experiments, the Masked Autoencoder outperformed models like Transformers and GNNs, achieving superior accuracy (ACC) and F1 scores, which highlights its strong feature extraction capabilities and resilience against data noise. This approach reduces reliance on manual feature engineering and enhances model stability and generalizability in diverse, high-dimensional, and heterogeneous financial data environments. The results suggest that Masked Autoencoders provide a promising solution for improving credit scoring reliability, allowing financial institutions to make more accurate credit decisions even in complex data scenarios.