Movatterモバイル変換


[0]ホーム

URL:


ISCAArchiveInterspeech 2021
ISCAArchiveInterspeech 2021

Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers

Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

Automatic transcription of meetings requires handling of overlappedspeech, which calls for continuous speech separation (CSS) systems.The uPIT criterion was proposed for utterance-level separation withneural networks and introduces the constraint that the total numberof speakers must not exceed the number of output channels. When processingmeeting-like data in a segment-wise manner, i.e., by separating overlappingsegments independently and stitching adjacent segments to continuousoutput streams, this constraint has to be fulfilled for any segment.In this contribution, we show that this constraint can be significantlyrelaxed. We propose a novel graph-based PIT criterion, which caststhe assignment of utterances to output channels in a graph coloringproblem. It only requires that the number of concurrently active speakersmust not exceed the number of output channels. As a consequence, thesystem can process an arbitrary number of speakers and arbitrarilylong segments and thus can handle more diverse scenarios. Further,the stitching algorithm for obtaining a consistent output order inneighboring segments is of less importance and can even be eliminatedcompletely, not the least reducing the computational effort. Experimentson meeting-style WSJ data show improvements in recognition performanceover using the uPIT criterion.

@inproceedings{neumann21_interspeech,  title     = {Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers},  author    = {Thilo von Neumann and Keisuke Kinoshita and Christoph Boeddeker and Marc Delcroix and Reinhold Haeb-Umbach},  year      = {2021},  booktitle = {Interspeech 2021},  pages     = {3490--3494},  doi       = {10.21437/Interspeech.2021-1177},  issn      = {2958-1796},}

Cite as:Neumann, T.v., Kinoshita, K., Boeddeker, C., Delcroix, M., Haeb-Umbach, R. (2021) Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers. Proc. Interspeech 2021, 3490-3494, doi: 10.21437/Interspeech.2021-1177

doi:10.21437/Interspeech.2021-1177

[8]ページ先頭

©2009-2025 Movatter.jp