In this paper we report on the first phase of the speech corpus ISS_CSS collection for purposes of the CEST(Chinese-English speech translation) project. The corpus is intended to provide training material for speaker independent spontaneous Chinese speech recognition and automatic dialogue management over the telephone line. This paper describes the collection measures, processing methods, annotation and contents of this corpus. It consists of two parts: human-human dialogues and human-machine dialogues. Presently, the corpus has finished 10-hour speech and the associated annotation. Finally, we will present our collecting plan in the future.
@inproceedings{feng00_icslp, title = {Data collection and processing in a Chinese spontaneous speech corpus IIS_CSS}, author = {JunLan Feng and XianFang Wang and LiMin Du}, year = {2000}, booktitle = {6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages = {vol. 3, 394-397}, doi = {10.21437/ICSLP.2000-558}, issn = {2958-1796},}
Cite as:Feng, J., Wang, X., Du, L. (2000) Data collection and processing in a Chinese spontaneous speech corpus IIS_CSS. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 394-397, doi: 10.21437/ICSLP.2000-558