Movatterモバイル変換

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information

Rongzhi Gu, Lianwu Chen, Shi-Xiong Zhang, Jimeng Zheng, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

The recent exploration of deep learning for supervised speech separationhas significantly accelerated the progress on the multi-talker speechseparation problem. The multi-channel approaches have attracted muchresearch attention due to the benefit of spatial information. In thispaper, integrated with the power spectra and inter-channel spatialfeatures at the input level, we explore to leverage directional features,which imply the speaker source from the desired target direction, fortarget speaker separation. In addition, we incorporate an attentionmechanism to dynamically tune the model’s attention to the reliableinput features to alleviate spatial ambiguity problem when multiplespeakers are closely located. We demonstrate, on the far-field WSJ02-mix dataset, that our proposed approach significantly improves theperformance of speech separation against the baseline single-channeland multi-channel speech separation methods.

@inproceedings{gu19b_interspeech,  title     = {Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information},  author    = {Rongzhi Gu and Lianwu Chen and Shi-Xiong Zhang and Jimeng Zheng and Yong Xu and Meng Yu and Dan Su and Yuexian Zou and Dong Yu},  year      = {2019},  booktitle = {Interspeech 2019},  pages     = {4290--4294},  doi       = {10.21437/Interspeech.2019-2266},  issn      = {2958-1796},}

Cite as:Gu, R., Chen, L., Zhang, S.-X., Zheng, J., Xu, Y., Yu, M., Su, D., Zou, Y., Yu, D. (2019) Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information. Proc. Interspeech 2019, 4290-4294, doi: 10.21437/Interspeech.2019-2266

doi:10.21437/Interspeech.2019-2266