Demo for CATAD
Enhancing Bone-Conducted Speech with CATAD( Conformer-Based Adversarial Training with Adaptive Diffusion)
- Due to the low-pass filtering effect of human tissues, the high frequency components of bone-conducted (BC) speech suffer severe attenuation, resulting in reduced speech quality and intelligibility. To address this issue, this paper proposes a novel adversarial learning network based on the Conformer architecture and an adaptive diffusion process. In the design of the generator, we deploy a TS-Conformer consisting of two Conformer modules that capture temporal and frequency dependencies, respectively, to enhance the expressiveness of speech features. To ensure the stability of the adversarial learning process and the diversity of the generated results, we adopt an adaptive diffusion process that adds noise to both generated and real data. This challenges the discriminator to distinguish between diffused real data and generated data. Experimental results on the ESMB dataset demonstrate that our proposed method significantly improves BC speech recovery, enhancing both speech quality and intelligibility.
BC Speech | AC Speech | CATAD(ours) |
---|---|---|
References
2024
- CATAD: Conformer-Based Adversarial Training with Adaptive Diffusion for Bone-Conducted Speech Enhancement (submitted)The 14th International Symposium on Chinese Spoken Language Processing (ISCSLP 2024), 2024
Enjoy Reading This Article?
Here are some more articles you might like to read next: