Recent advancements in time-domain audio separation networks (TasNets) have markedly propelled the field of speech separation. Unlike conventional time-frequency domain methodologies, TasNets directly model the amalgamated speech signals in the time-domain, employing a convolutional encoder-decoder architecture to effect separation on the output of the encoder. However, the original dual-path framework is characterized by a fixed feature dimension and a constant segment size across all RNN layers, thereby limiting its ability to produce high-resolution features. In this study, we present a novel approach termed Multi-Scale Feature Fusion Transformer Network (MSFFT-Net). The MSFFT-Net incorporates multiple dual-path processing paths in the separation stage, each dedicates to perform feature modeling at different scales. Coarse-grain and fine-grain features are obtained in parallel from different processing paths. In addition, the features from one dual-path processing path can be exchanged and shared with other distinct processing path, ultimately yielding high feature resolution across layers, and thus resulting in more accurate mask estimation. Experimental results on various datasets demonstrate the superiority of the MSFFT-Net over SOTA baselines across diverse datasets in single channel speech separation task.
Several samples from WSJ0-2mix DataSet
Mixed Audio
Speaker1
Speaker2
Separation Results of our proposed MSFFT-3P and MSFFT-2P
Mixed Audio
Speaker
Clean
DPRNN
MSFFT-3P
MSFFT-2P
spk1
spk2
spk1
spk2
spk1
spk2
References
2024
Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation (Submitted)
Jian Zhou, Yinhao Xu, Cunhang Fan, and 3 more authors
@article{cll2022,author={Zhou, Jian and Xu, Yinhao and Fan, Cunhang and Tao, Liang and Lv, Zhao and Kwan, Hon Keung},year={2024},title={Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation (Submitted)},journal={Circuits, Systems, and Signal Processing},}
Enjoy Reading This Article?
Here are some more articles you might like to read next: