Vol. 3 No. 6 (2024)
Articles

Speech Emotion Recognition with Dynamic CNN and Bi-LSTM

Published 2024-09-30

How to Cite

Finch, E. (2024). Speech Emotion Recognition with Dynamic CNN and Bi-LSTM. Journal of Computer Technology and Software, 3(6). Retrieved from https://ashpress.org/index.php/jcts/article/view/82

Abstract

 This study presents a speech emotion recognition system that integrates a dynamic convolutional neural network with a bi-directional long short-term memory (Bi-LSTM) network. The dynamic convolutional kernel enables the neural network to capture global dynamic emotional patterns, enhancing model performance without significantly increasing computational demands. Simultaneously, the Bi-LSTM component allows for more efficient classification of emotional features by leveraging temporal information. The system was evaluated using three datasets: the CISIA Chinese speech emotion dataset, the EMO-DB German emotion corpus, and the IEMOCAP English corpus. The experimental results yielded average emotion recognition accuracies of 59.08%, 89.29%, and 71.25%, respectively. These results represent improvements of 1.17%, 1.36%, and 2.97% over the accuracy achieved by existing speech emotion recognition systems using mainstream models, demonstrating the effectiveness of the proposed approach.