Demo page for "CETTS"
Controllable Emotional Speech Synthesis via Emotion Transfer
Abstract
Synthesizing expressive speech based on reference audio style is a key area in emotional speech synthesis. While recent models can produce natural and clear speech, controlling emotional intensity remains a challenge. To address this, we propose a VITS-based TTS model with controllable emotional intensity. We incorporate a pre-trained Emotion2Vec model and design an emotion intensity controller. Emotional embeddings extracted from reference audio via Emo2Vec are fused with phoneme-level text features to enable emotion transfer. We hypothesize—and confirm through experiments—that emotional intensity correlates with pitch and energy. Therefore, we construct the emotional intensity control module around a pitch predictor and an energy predictor to enable global-level control over emotional strength. Experiments show that our model synthesizes speech with quality comparable to ground truth and enables controllable emotional intensity without degrading audio quality.
1. The Architecture of the Proposed Model
2. Demo: Style Transfer for Emotional TTS with ESD datast
To facilitate fair comparison, we synthesize audios with four emotions using five models.
Emotion | Reference | Target Speaker | CME-TTS | ME-TTS | wav2vec2+VITS | Ours w/o Intensity Controller | Ours |
---|---|---|---|---|---|---|---|
Happy | |||||||
Angry | |||||||
Neutral | |||||||
Sad | |||||||
Surprise |
3. Demo: Style Transfer for Emotional TTS with DOE datast
To facilitate fair comparison, we synthesize audios with four emotions using five models.
Emotion | Reference | Target Speaker | CME-TTS | ME-TTS | wav2vec2+VITS | Ours w/o Intensity Controller | Ours |
---|---|---|---|---|---|---|---|
Happy | |||||||
Angry | |||||||
Sad | |||||||
Surprise |
4. Demo: Emotion Strength Control in Emotional TTS
To facilitate fair comparison, we use the same text to synthesize speech in four emotions and three strengths.
Text: 雨后的空气充斥着青草的味道
Scaling Factor
Emotion | Low Intensity | Medium Intensity | Strong Intensity |
---|---|---|---|
Happy | |||
Angry | |||
Sad | |||
Surprise |
Relative Attribute
Emotion | Low Intensity | Medium Intensity | Strong Intensity |
---|---|---|---|
Happy | |||
Angry | |||
Sad | |||
Surprise |
OURS
Emotion | Low Intensity | Medium Intensity | Strong Intensity |
---|---|---|---|
Happy | |||
Angry | |||
Sad | |||
Surprise |
References
Enjoy Reading This Article?
Here are some more articles you might like to read next: