Published 2025-02-28
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This study proposes a multimodal factor mining method that integrates market data, financial texts, and social emotions to improve the accuracy and interpretability of stock market forecasts. Traditional factor models often rely on a single data modality and are difficult to fully describe market dynamics. This study introduces multimodal data integration to not only extract traditional factors from market fundamentals and technical indicators but also extract sentiment and topic factors from financial texts using natural language processing technology and generate investor influence and social sentiment factors by modeling social media data through graph neural networks. Experimental results show that the integration of multimodal factors significantly improves the prediction ability of the model, and the benchmark model shows superiority in indicators such as mean square error, directional accuracy, and prediction R2. At the same time, factor contribution analysis further verifies the complementary effects of market factors, text factors, and social factors, reflecting the practicality of multimodal methods. The research results provide an important reference for the application of multimodal data in financial markets and provide new ideas for building more intelligent factor models.