Published 2024-09-30
How to Cite
This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
Health queries, as a specialized form of medical text, present unique challenges due to the presence of complex medical terminology, abbreviations, and linguistic features such as synonyms, antonyms, and polysemy. Traditional text classification methods often struggle with the intricacies of category labels, hierarchical relationships, and the scarcity of annotated data samples. This study presents an advanced medical text classification method utilizing the ALBERT pre-trained language model for health queries. We introduce the TLCM and TCLA models, which apply transfer learning and ensemble learning to enhance classification accuracy. By fine-tuning the ALBERT model and integrating CNN, Bi-LSTM, and Attention mechanisms, our models achieve approximately 91% in Precision, Recall, and Micro_F1, significantly improving upon traditional classification methods. This approach demonstrates the potential of pre-trained language models in medical text mining.