Advancements in Voice Conversion: Spectrogram-Based Speech Style Transfer Using Convolutional Neural Networks
Published 2022-01-30
How to Cite
This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
Voice Conversion (VC) transforms the phonetic style of a source speaker to a target speaker while preserving semantic content. This technology has applications in communication, healthcare, entertainment, and security. Traditional methods using neural networks have enhanced speech quality, but current research aims to reduce training data requirements. Inspired by image style transfer, this paper uses convolutional neural networks (CNNs) to extract and stylize spectrogram features from speech signals. The proposed model achieves high-quality speech style transfer, demonstrating CNNs' effectiveness in voice conversion with reduced data dependency.