As a crucial task in the task-oriented dialogue systems, spoken language understanding (SLU) has garnered increasing attention. However, errors from automatic speech recognition (ASR) often hinder the performance of understanding. To tackle this problem, we propose MoE-SLU, an ASR-Robust SLU framework based on the mixture-of-experts technique. Specifically, we first introduce three strategies to generate additional transcripts from clean transcripts. Then, we employ the mixture-of-experts technique to weigh the representations of the generated transcripts, ASR transcripts, and the corresponding clean manual transcripts. Additionally, we also regularize the weighted average of predictions and the predictions of ASR transcripts by minimizing the Jensen-Shannon Divergence (JSD) between these two output distributions. Experiment results on three benchmark SLU datasets demonstrate that our MoE-SLU achieves state-of-the-art performance. Further model analysis also verifies the superiority of our method.
Xuxin Cheng, Zhihong Zhu, Xianwei Zhuang, Zhanpeng Chen, Zhiqi Huang, and Yuexian Zou. 2024.MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts. InFindings of the Association for Computational Linguistics: ACL 2024, pages 14868–14879, Bangkok, Thailand. Association for Computational Linguistics.
@inproceedings{cheng-etal-2024-moe, title = "{M}o{E}-{SLU}: Towards {ASR}-Robust Spoken Language Understanding via Mixture-of-Experts", author = "Cheng, Xuxin and Zhu, Zhihong and Zhuang, Xianwei and Chen, Zhanpeng and Huang, Zhiqi and Zou, Yuexian", editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Findings of the Association for Computational Linguistics: ACL 2024", month = aug, year = "2024", address = "Bangkok, Thailand", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.findings-acl.882/", doi = "10.18653/v1/2024.findings-acl.882", pages = "14868--14879", abstract = "As a crucial task in the task-oriented dialogue systems, spoken language understanding (SLU) has garnered increasing attention. However, errors from automatic speech recognition (ASR) often hinder the performance of understanding. To tackle this problem, we propose MoE-SLU, an ASR-Robust SLU framework based on the mixture-of-experts technique. Specifically, we first introduce three strategies to generate additional transcripts from clean transcripts. Then, we employ the mixture-of-experts technique to weigh the representations of the generated transcripts, ASR transcripts, and the corresponding clean manual transcripts. Additionally, we also regularize the weighted average of predictions and the predictions of ASR transcripts by minimizing the Jensen-Shannon Divergence (JSD) between these two output distributions. Experiment results on three benchmark SLU datasets demonstrate that our MoE-SLU achieves state-of-the-art performance. Further model analysis also verifies the superiority of our method."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="cheng-etal-2024-moe"> <titleInfo> <title>MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts</title> </titleInfo> <name type="personal"> <namePart type="given">Xuxin</namePart> <namePart type="family">Cheng</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zhihong</namePart> <namePart type="family">Zhu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Xianwei</namePart> <namePart type="family">Zhuang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zhanpeng</namePart> <namePart type="family">Chen</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zhiqi</namePart> <namePart type="family">Huang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yuexian</namePart> <namePart type="family">Zou</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2024-08</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Findings of the Association for Computational Linguistics: ACL 2024</title> </titleInfo> <name type="personal"> <namePart type="given">Lun-Wei</namePart> <namePart type="family">Ku</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Andre</namePart> <namePart type="family">Martins</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vivek</namePart> <namePart type="family">Srikumar</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Bangkok, Thailand</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>As a crucial task in the task-oriented dialogue systems, spoken language understanding (SLU) has garnered increasing attention. However, errors from automatic speech recognition (ASR) often hinder the performance of understanding. To tackle this problem, we propose MoE-SLU, an ASR-Robust SLU framework based on the mixture-of-experts technique. Specifically, we first introduce three strategies to generate additional transcripts from clean transcripts. Then, we employ the mixture-of-experts technique to weigh the representations of the generated transcripts, ASR transcripts, and the corresponding clean manual transcripts. Additionally, we also regularize the weighted average of predictions and the predictions of ASR transcripts by minimizing the Jensen-Shannon Divergence (JSD) between these two output distributions. Experiment results on three benchmark SLU datasets demonstrate that our MoE-SLU achieves state-of-the-art performance. Further model analysis also verifies the superiority of our method.</abstract> <identifier type="citekey">cheng-etal-2024-moe</identifier> <identifier type="doi">10.18653/v1/2024.findings-acl.882</identifier> <location> <url>https://aclanthology.org/2024.findings-acl.882/</url> </location> <part> <date>2024-08</date> <extent unit="page"> <start>14868</start> <end>14879</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts%A Cheng, Xuxin%A Zhu, Zhihong%A Zhuang, Xianwei%A Chen, Zhanpeng%A Huang, Zhiqi%A Zou, Yuexian%Y Ku, Lun-Wei%Y Martins, Andre%Y Srikumar, Vivek%S Findings of the Association for Computational Linguistics: ACL 2024%D 2024%8 August%I Association for Computational Linguistics%C Bangkok, Thailand%F cheng-etal-2024-moe%X As a crucial task in the task-oriented dialogue systems, spoken language understanding (SLU) has garnered increasing attention. However, errors from automatic speech recognition (ASR) often hinder the performance of understanding. To tackle this problem, we propose MoE-SLU, an ASR-Robust SLU framework based on the mixture-of-experts technique. Specifically, we first introduce three strategies to generate additional transcripts from clean transcripts. Then, we employ the mixture-of-experts technique to weigh the representations of the generated transcripts, ASR transcripts, and the corresponding clean manual transcripts. Additionally, we also regularize the weighted average of predictions and the predictions of ASR transcripts by minimizing the Jensen-Shannon Divergence (JSD) between these two output distributions. Experiment results on three benchmark SLU datasets demonstrate that our MoE-SLU achieves state-of-the-art performance. Further model analysis also verifies the superiority of our method.%R 10.18653/v1/2024.findings-acl.882%U https://aclanthology.org/2024.findings-acl.882/%U https://doi.org/10.18653/v1/2024.findings-acl.882%P 14868-14879
[MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts](https://aclanthology.org/2024.findings-acl.882/) (Cheng et al., Findings 2024)
Xuxin Cheng, Zhihong Zhu, Xianwei Zhuang, Zhanpeng Chen, Zhiqi Huang, and Yuexian Zou. 2024.MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts. InFindings of the Association for Computational Linguistics: ACL 2024, pages 14868–14879, Bangkok, Thailand. Association for Computational Linguistics.