




技术领域technical field
本发明涉及一种帮助学习语言的语言学习系统。The present invention relates to a language learning system that assists language learning.
背景技术Background technique
在外语或母语的语言学习,特别是发音或朗读的自学中,广泛使用对记录在CD(Compact Disk)等记录介质上的示范声音进行再现,通过模仿该示范声音来进行发音或朗读这一学习方法。其目的在于,通过模仿示范声音来掌握正确的发音。在这里,为了更有效地促进学习,必须要评价示范声音与自己的声音之间的差别。但是,绝大多数情况下,记录在CD中的示范声音是某个特定的播音员或说母语的人的声音。也就是说,由于对于大多数学习者来说,这些示范声音是通过具有与自己的声音完全不同的特征的声音发出来的,所以存在以下问题,即,难以评价自己的发音与示范声音相比,正确到哪种程度。In the language learning of a foreign language or mother tongue, especially in the self-study of pronunciation or reading aloud, it is widely used to reproduce the demonstration sound recorded on a recording medium such as a CD (Compact Disk), and to learn pronunciation or reading aloud by imitating the demonstration sound method. Its purpose is to master correct pronunciation by imitating model voices. Here, in order to promote learning more effectively, it is necessary to evaluate the difference between the model voice and one's own voice. However, in most cases, the sample voice recorded on the CD is that of a particular announcer or native speaker. That is, since for most learners, these model sounds are uttered by a voice having completely different characteristics from their own voice, there is a problem that it is difficult to evaluate their own pronunciation compared with the model voice , to what extent is it correct.
作为解决该问题的技术,有例如专利文献1、2中所述的技术。专利文献1中所述的技术是使用户的语调、语速、音质等参数反映到示范声音上,将示范声音变换为与用户声音相似的声音的技术。专利文献2中所述的技术是学习者可以从多个示范声音之中选择任意一种的技术。As a technique for solving this problem, there are techniques described in Patent Documents 1 and 2, for example. The technique described in Patent Document 1 reflects parameters such as the user's intonation, speech rate, and voice quality on a sample voice, and converts the sample voice into a voice similar to the user's voice. The technique described in Patent Document 2 is a technique in which a learner can select any one from among a plurality of demonstration sounds.
专利文献1:特开2002-244547号公报Patent Document 1: JP-A-2002-244547
专利文献2:特开2004-133409号公报Patent Document 2: JP-A-2004-133409
发明内容Contents of the invention
但是,由专利文献1中所述的技术,虽然可以矫正语调,但也存在以下问题,即,难以矫正例如英语中的“r与1”或“s和th”等发音明显不同的发音。此外,由于需要对声音波形进行修正,所以还存在处理复杂的问题。此外,由专利文献2中所述的技术,由于是选择示范声音的方式,所以存在以下问题,即,必须要学习者自己选择示范声音,比较烦琐。However, although the intonation can be corrected by the technique described in Patent Document 1, there is a problem that it is difficult to correct pronounced pronunciations such as "r and 1" or "s and th" in English that are obviously different. In addition, since the sound waveform needs to be corrected, there is also a problem of complicated processing. In addition, since the technology described in Patent Document 2 is a method of selecting a demonstration sound, there is a problem that the learner must select a demonstration sound by himself, which is cumbersome.
本发明是鉴于上述问题而提出来的,其目的在于,提供一种利用更简单的处理就可以使用与学习者相似的示范声音进行学习的语言学习系统以及方法。The present invention has been made in view of the above problems, and an object of the present invention is to provide a language learning system and method capable of learning using a model voice similar to a learner with simpler processing.
为了解决上述课题,本发明提供一种语言学习系统,该系统具有:数据库,其对于每一个讲话者,将从该讲话者的声音中提取出的特征量和该讲话者的一个或多个声音数据相关联而分别进行存储;声音取得单元,其取得学习者的声音;特征量提取单元,其从利用上述声音取得单元所取得的声音中,提取上述学习者的声音的特征量;声音数据选择单元,其对利用上述特征量提取单元提取出的上述学习者的特征量,与记录在上述数据库中的多个讲话者的特征量进行比较,并根据该比较,从上述数据库中选择出一个讲话者的声音数据;以及再现单元,其按照利用上述声音数据选择单元所选择出的1个声音数据,输出声音。In order to solve the above-mentioned problems, the present invention provides a language learning system including: a database that, for each speaker, includes feature quantities extracted from the speaker's voice and one or more voices of the speaker The data are associated and stored separately; a voice acquisition unit, which acquires the learner's voice; a feature quantity extraction unit, which extracts the feature quantity of the learner's voice from the voice obtained by the above-mentioned voice acquisition unit; voice data selection A unit for comparing the feature quantity of the learner extracted by the feature quantity extraction unit with the feature quantities of a plurality of speakers recorded in the database, and selecting a speaker from the database based on the comparison audio data of the user; and a reproducing unit that outputs a sound according to one piece of audio data selected by the audio data selection unit.
优选的实施方式的特征为,上述声音数据选择单元包括近似度计算单元,该近似度计算单元对于每个讲话者计算出近似度指数,该近似度指数表示记录在上述数据库中的多个讲话者的特征量与利用上述特征量提取单元所提取的上述学习者的特征量之差,然后,根据利用该近似度计算单元所计算出的近似度指数,从上述数据库中选择与满足规定条件的1个讲话者的特征量相对应的1个声音数据。在此情况下,上述规定条件也可以是以下条件,即,选择与表示近似度最高的近似度指数相关联的1个讲话者的声音数据。A preferred embodiment is characterized in that the voice data selection unit includes an approximation calculation unit that calculates, for each speaker, an approximation index indicating the number of speakers recorded in the database. The difference between the feature quantity of the above-mentioned learner and the feature quantity of the above-mentioned learner extracted by the above-mentioned feature quantity extraction unit, and then, according to the approximation degree index calculated by the use of the approximation degree calculation unit, select 1 from the above-mentioned database and satisfy the specified condition. One piece of voice data corresponding to the feature quantity of one speaker. In this case, the predetermined condition may be a condition that the voice data of one speaker associated with the similarity index indicating the highest similarity degree is selected.
另外的优选的实施方式,该语言学习系统还可以具有语速变换单元,其对利用上述声音数据选择单元所选择出的声音数据的语速进行变换,上述再现单元按照利用上述语速变换单元变换语速的声音数据,输出声音。In another preferred embodiment, the language learning system may further include a speech rate conversion unit that converts the speech rate of the voice data selected by the voice data selection unit, and the reproduction unit converts the speech rate by the speech rate conversion unit. Speech rate audio data, output audio.
另外的优选的实施方式,该语言学习系统还可以具有:存储单元,其存储示范声音;比较单元,其对上述示范声音和利用上述声音取得单元所取得的学习者的声音进行比较,产生表示二者的近似度的信息;以及数据库更新单元,其在由利用上述比较单元产生的信息所表示的近似度满足规定条件的情况下,使利用上述声音取得单元所取得的学习者的声音,与利用上述特征量提取单元所提取出的特征量相关联而追加到上述数据库中。Another preferred embodiment, the language learning system can also have: a storage unit, which stores the demonstration voice; a comparison unit, which compares the above-mentioned demonstration voice with the learner's voice obtained by the above-mentioned voice acquisition unit, and generates a representation. Information about the degree of approximation of the learner; and a database update unit that, when the degree of approximation represented by the information generated by the comparison unit satisfies a predetermined condition, makes use of the learner’s voice acquired by the voice acquisition unit to be compared with the learner’s voice obtained by the voice acquisition unit The feature quantities extracted by the feature quantity extraction means are linked and added to the database.
由本发明,可以再现具有与学习者相似的声音特征的讲话者的声音,作为学习中的范文的声音。因此,学习者能更正确地识别应模仿(应作为目标)的发音,由此,可以提高其学习效率。According to the present invention, it is possible to reproduce the voice of a speaker having a voice characteristic similar to that of a learner as the voice of a model text being studied. Therefore, the learner can more correctly recognize the utterances that should be imitated (should be targeted), thereby improving their learning efficiency.
附图说明Description of drawings
图1是表示本发明的第1实施方式涉及的语言学习系统1的功能结构的框图。FIG. 1 is a block diagram showing the functional configuration of a language learning system 1 according to the first embodiment of the present invention.
图2是例示数据库DB1的内容的图。FIG. 2 is a diagram illustrating the contents of the database DB1.
图3是表示语言学习系统1的硬件结构的框图。FIG. 3 is a block diagram showing the hardware configuration of the language learning system 1 .
图4是表示语言学习系统1的动作的流程图。FIG. 4 is a flowchart showing the operation of the language learning system 1 .
图5是表示语言学习系统1中的数据库DB1的更新动作的流程图。FIG. 5 is a flowchart showing an update operation of the database DB1 in the language learning system 1 .
图6是例示示范声音(上)及用户声音(下)的频谱包络的图。FIG. 6 is a diagram illustrating spectral envelopes of an exemplary voice (top) and a user voice (bottom).
具体实施方式Detailed ways
下面参照附图说明本发明的具体实施方式。Specific embodiments of the present invention will be described below with reference to the accompanying drawings.
<1.结构><1. Structure>
图1是表示本发明的第1实施方式涉及的语言学习系统1的功能结构的框图。存储部11存储了数据库DB1,该数据库DB1将从讲话者的声音中提取出的特征量与该讲话者声音的声音数据相相关联而进行存储。输入部12取得学习者(用户)的声音,作为用户声音数据进行输出。特征提取部13从学习者的声音中提取特征量。声音数据提取(选择)部14将由特征提取部13提取的特征量与记录在数据库DB1中的特征量进行比较,并提取出满足预先规定的条件的、1个讲话者的特征量,再从数据库DB1中提取(选择)出与提取出的1个讲话者的特征量相关联的声音数据。再现部15再现由声音数据提取(选择)部14提取(选择)出的声音数据,通过扬声器或者耳机等发出可听的声音。FIG. 1 is a block diagram showing the functional configuration of a language learning system 1 according to the first embodiment of the present invention. The storage unit 11 stores a database DB1 that stores a feature amount extracted from a speaker's voice in association with voice data of the speaker's voice. The input unit 12 acquires a learner's (user's) voice, and outputs it as user voice data. The feature extraction unit 13 extracts feature amounts from the learner's voice. The voice data extraction (selection) section 14 compares the feature quantity extracted by the feature extraction section 13 with the feature quantity recorded in the database DB1, and extracts the feature quantity of a speaker satisfying a predetermined condition, and then extracts the feature quantity from the database DB1. Voice data associated with the extracted feature amount of one speaker is extracted (selected) from DB1. The reproducing unit 15 reproduces the audio data extracted (selected) by the audio data extracting (selecting) unit 14, and emits audible audio through a speaker, earphone, or the like.
关于数据库DB1的详细内容将如后所述,但语言学习系统1还具有用来更新数据库DB1的下述结构要素。存储部16存储了示范声音数据库DB2,该示范声音数据库DB2将作为语言学习样本的示范声音数据和该示范声音的文本数据相相关联而存储。比较部17对由输入部12所取得的用户声音数据和存储在存储部16中的示范声音数据进行比较。比较的结果,如果用户声音满足预先规定的条件,则DB更新部18将用户声音数据追加到数据库DB1中。The details of the database DB1 will be described later, but the language learning system 1 also has the following components for updating the database DB1. The storage unit 16 stores a sample voice database DB2 that stores sample voice data as a language learning sample in association with text data of the sample voice. The comparison unit 17 compares the user voice data acquired by the input unit 12 with the sample voice data stored in the storage unit 16 . As a result of the comparison, if the user's voice satisfies a predetermined condition, the DB update unit 18 adds the user's voice data to the database DB1.
图2是例示数据库DB1的内容的图。数据库DB1中记录了讲话者ID(图2中为“ID001”)和从该讲话者的声音数据中提取出的特征量,该讲话者ID是确定讲话者的标识符。在数据库DB1中,还将范文ID、该范文的声音数据、以及该范文的发音水平(后述)相相关联而记录,该范文ID是确定范文的标识符。数据库DB1具有多个由范文ID、声音数据、以及发音水平构成的数据组,各数据组与赋予给声音数据的讲话者的讲话者ID相关联而存储记录。也就是说,数据库DB1具有从多个讲话者得到的多个范文的声音数据,这些数据通过讲话者ID及特征量,与每个讲话者相关联而记录。FIG. 2 is a diagram illustrating the contents of the database DB1. A speaker ID ("ID001" in FIG. 2), which is an identifier for specifying a speaker, and a feature quantity extracted from the speaker's voice data are recorded in the database DB1. In the database DB1, a model text ID, which is an identifier for specifying a model text, is recorded in association with the voice data of the model text and the pronunciation level (described later) of the model text. The database DB1 has a plurality of data groups consisting of sample text ID, voice data, and pronunciation level, and each data group is associated with a speaker ID assigned to a speaker of the voice data and stored for recording. That is, the database DB1 has voice data of a plurality of sample texts obtained from a plurality of speakers, and these data are associated and recorded for each speaker by speaker ID and feature value.
图3是表示语言学习系统1的硬件结构的框图。CPU(CentralProcessing Unit)101以RAM(Random Access Memory)102作为工作区域,读出存储在ROM(Read Only Memory)103或HDD(Hard DiskDrive)104中的程序并执行。HDD 104是存储各种应用程序及数据的存储装置。此外,HDD 104还存储数据库DB1及示范声音数据库DB2。显示器105是CRT(Cathode Ray Tube)或LCD(Liquid CrystalDisplay)等、在CPU 101的控制下显示文字及图像的显示装置。麦克风106是用来取得用户的声音的声音收集装置,输出与用户发出的声音相对应的声音信号。声音处理部107具有将由麦克风106所输出的模拟声音信号变换为数字声音数据的功能,以及将存储在HDD 104中的声音数据变换为声音信号并输出给扬声器108的功能。此外,用户可以通过操作键盘109,向语言学习系统1输入指令。以上所说明的各结构要素通过总线110彼此连接。此外,语言学习系统1可以通过I/F(接口)111与其它设备进行通信。FIG. 3 is a block diagram showing the hardware configuration of the language learning system 1 . CPU (Central Processing Unit) 101 reads and executes programs stored in ROM (Read Only Memory) 103 or HDD (Hard Disk Drive) 104 using RAM (Random Access Memory) 102 as a work area. The HDD 104 is a storage device that stores various application programs and data. In addition, the HDD 104 also stores a database DB1 and a demonstration sound database DB2. The
<2.动作><2. Action>
下面,对本实施方式涉及的语言学习系统1的动作进行说明。在这里,首先说明对范文的声音进行再现的动作,然后再说明对数据库DB1的内容进行更新的动作。在语言学习系统1中,通过CPU 101执行存储在HDD 104中的语言学习程序,而具有图1所示的功能。此外,学习者(用户)在语言学习程序的开始时等,操作键盘109,输入确定自己的标识符即用户ID。CPU 101将所输入的用户ID存储到RAM 102中,作为当前正在使用系统的学习者的用户ID。Next, the operation of the language learning system 1 according to the present embodiment will be described. Here, the operation of reproducing the sound of the sample text will be described first, and then the operation of updating the contents of the database DB1 will be described. In the language learning system 1, the language learning program stored in the
<2-1.再现声音><2-1. Reproducing sound>
图4是表示语言学习系统1动作的流程图。如果执行语言学习程序,则语言学习系统1的CPU 101对示范声音数据库DB2进行检索,制成可以利用的范文的列表。CPU 101根据该列表,在显示器105上显示提醒用户选择范文的消息。用户按照显示器105上所显示的消息,从列表中存在的范文中选择出1篇范文。CPU 101对选择出的范文的声音进行再现(步骤S101)。具体地说,CPU 101从示范声音数据库DB2中读出范文的示范声音数据,并将读出的示范声音数据输出给声音处理部107。声音处理部107对输入的示范声音数据进行数/模变换后,作为模拟声音信号输出给扬声器108。这样,从扬声器108中再现出示范声音。FIG. 4 is a flowchart showing the operation of the language learning system 1 . If the language learning program is executed, the
用户从扬声器108中听到再现的示范声音后,对着麦克风模仿示范声音而朗读范文。也就是说,进行用户声音的输入(步骤S102)。具体地说,如下所述。如果示范声音的再现结束,则CPU 101在显示器105上显示诸如“下面轮到你了。请朗读范文。”等提醒用户朗读范文的消息。接着CPU 101在显示器105上显示“按下空格键后开始朗读,如果朗读结束请再按一次空格键。”等指示用于进行用户声音输入的操作的消息。用户按照显示器105上所显示的消息对键盘109进行操作,进行用户声音的输入。也就是说,在按下键盘109的空格键后,对着麦克风朗读范文。如果朗读结束了,则用户再按一次空格键。After the user hears the reproduced demonstration sound from the
用户的声音由麦克风106变换为电信号。麦克风106对用户声音信号进行输出。用户声音信号由声音处理部107变换为数字声音数据,并作为用户声音数据记录到HDD 104中。CPU 101在示范声音的再现完成之后,以空格键的按下作为触发,开始用户声音数据的记录,以再次按下空格键作为触发,结束用户声音数据的记录。也就是说,从用户首次按下空格键,到再次按下空格键之间的用户声音被记录到HDD 104之中。The user's voice is converted into an electrical signal by the
接下来,CPU 101对得到的用户声音数据进行特征量提取处理(步骤S103)。具体地说,如下所述。CPU 101将声音数据分割为各个预先确定的时间段(帧)。CPU 101求出振幅频谱的对数,该振幅频谱是将被分解为帧的、表示示范声音数据的波形和表示用户声音信号的波形进行傅里叶变换之后得到的,然后,对其进行傅里叶逆变换后,得到每个帧的频谱包络。CPU 101从这样得到的频谱包络中提取第1共振峰及第2共振峰的共振峰频率。一般地,元音由第1及第2共振峰的分布而进行特征识别。CPU 101从声音数据的开头起,将从每个帧得到的共振峰频率分布与预先确定的元音(例如“a”)的共振峰频率分布进行匹配。如果通过匹配而判断该帧为与元音“a”相当的帧,则CPU 101计算出该帧的共振峰之中,预先确定的共振峰(例如第1、第2、第3这3个共振峰)的共振峰频率。CPU 101将计算出的共振峰频率存储到RAM 102中,作为用户声音的特征量P。Next, the
然后,CPU 101从数据库DB1中提取(选择)与该用户声音的特征量P相似的特征量相关联的声音数据(步骤S104)。具体地说,对所提取的特征量P和记录在数据库DB1中的特征量进行比较,确定与特征量P最近似的特征量。在比较中,例如在特征量P和数据库DB1之间,计算出第1~第3共振峰频率值的差,再计算补足三个共振峰频率的差的绝对值的量,作为表示二者的近似度的近似度指数。CPU 101从数据库DB1中确定所计算出的近似度指数最小的特征量,即与特征量P最近似的特征量。CPU 101再提取出与所确定的特征量相关联的声音数据,并将提取出的声音数据存储到RAM 102之中。Then, the
然后,CPU 101进行声音数据的再现(步骤S105)。具体地说,如下所述。CPU 101向声音处理部107输出声音数据。声音处理部107将输入的声音数据进行数/模变换后,作为声音信号输出给扬声器108。这样,提取出的声音数据作为声音从扬声器108中再现。在这里,因为声音数据是利用特征量的匹配而提取的,所以再现的声音成为与学习者的声音特征近似的声音。因此,对于那些仅通过听由声音特征完全不同于自己的讲话者(播音员、说母语的人等)发出的声音而很难模仿的范文,由于是由具有与自己非常相似的声音特征的讲话者发出的声音,学习者也可以更准确地理解应模仿的发音,从而使学习效率提高。Then, the
<2-2.数据库更新><2-2. Database update>
下面,说明数据库DB1的更新动作。Next, the updating operation of the database DB1 will be described.
图5是表示语言学习系统1中的数据库DB1的更新动作的流程图。首先,利用上述步骤S101~S102的处理,进行示范声音的再现及用户声音的输入。然后,CPU 101进行示范声音与用户声音的比较处理(步骤S201)。具体地说,如下所述。CPU 101将表示示范声音数据的波形分割为各个预先确定的时间段(帧)。此外,CPU 101将表示用户声音数据的波形也同样分割为各个帧。CPU 101以对数值求出振幅频谱,该振幅频谱是将被分解为帧的、表示示范声音数据的波形和表示用户声音信号的波形进行傅里叶变换之后得到的,然后,对其进行傅里叶逆变换后得到每个帧的频谱包络。FIG. 5 is a flowchart showing an update operation of the database DB1 in the language learning system 1 . First, by the above-mentioned processing of steps S101 to S102, playback of demonstration voices and input of user voices are performed. Then, the
图6是例示示范声音(上)及用户声音(下)的频谱包络的图。图6所示的频谱包络由帧I~帧III这3个帧构成。CPU 101对于每个帧,比较取得的频谱包络,进行将二者的近似度数值化的处理。近似度的数值化(近似度指数的计算)例如可以以如下的方式进行。CPU 101可以对于整个声音数据,计算对将特征性的共振峰的频率和频谱密度表示在频谱密度-频率图中时的两点间的距离进行补足后的值,作为近似度指数。或者,也可以对于整个声音数据,计算对特定的频率中的频谱密度的差进行积分后得到的值,作为近似度指数。此外,由于示范声音和用户声音通常长度(时间)不同,所以优选在上述处理之前进行使二者长度一致的处理。FIG. 6 is a diagram illustrating spectral envelopes of an exemplary voice (top) and a user voice (bottom). The spectrum envelope shown in FIG. 6 is composed of three frames, frame I to frame III. The
下面,再参照图5进行说明。CPU 101根据计算出的近似度指数,判断是否进行数据库DB1的更新(步骤S202)。具体地说,如下所述。HDD 104中预先存储用于将取得的声音数据追加登录到数据库DB1中的条件。CPU 101判断步骤S201中计算出的近似度指数是否满足该登录条件。在满足登录条件的情况下(步骤S202:是),CPU 101使处理进入后述的步骤S203。在不满足登录条件的情况下(步骤S202:否),CPU 101结束处理。Next, it will be described with reference to FIG. 5 again. The
在满足登录条件的情况下,CPU 101进行数据库更新处理(步骤S203)。具体地说,如下所述。CPU 101对满足登录条件的声音数据,赋予确定该声音数据的讲话者即学习者(用户)的用户ID。CPU 101从数据库DB1中检索与用户ID相同的用户ID,使声音数据与该用户ID相关联而追加登录到数据库DB1中。在从更新请求中提取出的用户ID未在数据库DB1中登录的情况下,CPU101追加登录该用户ID,与该用户ID相关联而登录声音数据。这样,学习者的声音数据被追加登录到数据库DB1中,进行了更新。When the login condition is satisfied, the
以上所说明的数据库更新的动作,可以与上述的声音再现动作同时进行,也可以在声音再现动作完成后进行。这样,通过将学习者的声音数据依次追加到数据库DB1中,而在数据库DB1中积累多个的讲话者的声音数据。因此,随着语言学习系统1被使用,数据库DB1中登录的讲话者的声音数据越来越多,同时,在新的学习者使用语言学习系统1时,再现与自己特征相似的声音的概率也会越来越高。The operation of updating the database described above may be performed simultaneously with the above-mentioned audio reproduction operation, or may be performed after the audio reproduction operation is completed. In this way, voice data of a plurality of speakers are accumulated in database DB1 by sequentially adding voice data of learners to database DB1. Therefore, as the language learning system 1 is used, more and more voice data of speakers are registered in the database DB1, and at the same time, when a new learner uses the language learning system 1, the probability of reproducing a voice similar to his own characteristics also increases. will get higher and higher.
<3.变形例><3. Modifications>
本发明并不局限于上述实施方式,可以进行各种变形。The present invention is not limited to the above-described embodiments, and various modifications are possible.
<3-1.变形例1><3-1. Modification 1>
在上述实施方式之中,也可以在将步骤S104中提取出的声音数据存储到RAM 102中后,CPU 101对声音数据进行语速变换处理。具体地说,如下所述。RAM 102预先存储对语速变换处理前后的语速比例进行指定的变量a。CPU 101对提取出的声音数据进行使声音时间(从声音数据的开头到末尾的再现所需要的时间)为原来的a倍的处理。在a>1的情况下,利用语速变换处理,声音的长度伸长。即语速变慢。相反,在a<1的情况下,利用语速变换处理,声音的长度缩短。即语速变快。在本实施方式之中,作为变量a的初始值,设定为比1大的值。因此,在示范声音被再现,然后输入用户声音后,以与用户声音相似的声音而再现的范文以比示范声音慢的方式被再现。因此,学习者可更加明确地识别应模仿的发音(作为目标的发音)。In the above embodiment, after the voice data extracted in step S104 is stored in the
<3-2.变形例2><3-2. Modification 2>
在上述实施方式之中,是步骤S104中,提取与从学习者(用户)的声音中提取出的特征量最近似的特征量相关联的声音数据,但提取声音数据的条件并不局限于与学习者声音的特征量最近似。例如,也可以在数据库DB1之中,预先与范文的声音数据相关联记录该声音的发音水平(表示与示范声音的近似度的指数,发音水平越高,越近似于示范声音),将该发音水平加入声音数据选择的条件之中。作为具体的条件,例如也可以是如下条件,即,从发音水平大于或等于某一定水平的声音数据当中提取特征量最近似的。或者,也可以是如下条件,即,从特征量的近似度大于或等于某值的声音数据当中提取出发音水平最高的。发音水平可以与例如步骤S201中的近似度指数的计算同样地进行计算。In the above-described embodiment, in step S104, the sound data associated with the feature quantity closest to the feature quantity extracted from the sound of the learner (user) is extracted, but the condition for extracting the sound data is not limited to the same The feature amount of the learner's voice is most approximate. For example, it is also possible to record the pronunciation level of the voice in association with the voice data of the model text in advance in the database DB1 (an index representing the similarity to the model voice, the higher the pronunciation level is, the closer it is to the model voice), and the pronunciation The level is added to the condition of sound data selection. As a specific condition, for example, the condition that the most approximate feature value is extracted from voice data whose pronunciation level is equal to or higher than a certain level may be used. Alternatively, it may be a condition that the highest utterance level is extracted from the audio data whose feature values have a degree of approximation greater than or equal to a certain value. The pronunciation level can be calculated in the same way as the calculation of the similarity index in step S201, for example.
<3-3.变形例3><3-3.
此外,系统的结构并不局限于上述实施方式中说明的。语言学习系统1也可以通过网络与服务器装置相连接,使服务器承担上述语言学习系统的部分功能。In addition, the structure of the system is not limited to that described in the above-mentioned embodiment. The language learning system 1 can also be connected to a server device through a network, so that the server can assume part of the functions of the above language learning system.
此外,在上述实施方式中,CPU 101通过执行语言学习程序,以软件的方式实现作为语言学习系统的功能。但也可以使用与图1所示的功能结构要素相当的电子电路等,以硬件的方式实现系统。In addition, in the above-described embodiment, the
<3-4.变形例4><3-4.
在上述实施方式之中,对使用第1~第3共振峰的共振峰频率作为讲话者的声音特征量的方式进行了说明,但声音的特征量并不限于共振峰频率。也可以是根据频谱图等其它声音分析方法计算出的特征量。In the above-mentioned embodiments, the form using the formant frequencies of the first to third formants as the speaker's voice feature quantity has been described, but the voice feature quantity is not limited to the formant frequency. It may also be a feature quantity calculated from other sound analysis methods such as a spectrogram.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2004373815 | 2004-12-24 | ||
| JP2004373815AJP2006178334A (en) | 2004-12-24 | 2004-12-24 | Language learning system |
| Publication Number | Publication Date |
|---|---|
| CN1794315A CN1794315A (en) | 2006-06-28 |
| CN100585663Ctrue CN100585663C (en) | 2010-01-27 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN200510132618AExpired - Fee RelatedCN100585663C (en) | 2004-12-24 | 2005-12-23 | language learning system |
| Country | Link |
|---|---|
| JP (1) | JP2006178334A (en) |
| KR (1) | KR100659212B1 (en) |
| CN (1) | CN100585663C (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006184813A (en)* | 2004-12-28 | 2006-07-13 | Advanced Telecommunication Research Institute International | Foreign language learning device |
| CN101630448B (en)* | 2008-07-15 | 2011-07-27 | 上海启态网络科技有限公司 | Language learning client and system |
| KR101228909B1 (en)* | 2009-09-10 | 2013-02-01 | 최종근 | Electronic Dictionary Device and Method on Providing Sounds of Words |
| KR101141793B1 (en)* | 2011-08-22 | 2012-05-04 | 광주대학교산학협력단 | A language learning system with variations of voice pitch |
| CN102760434A (en)* | 2012-07-09 | 2012-10-31 | 华为终端有限公司 | Method for updating voiceprint feature model and terminal |
| CN104485115B (en)* | 2014-12-04 | 2019-05-03 | 上海流利说信息技术有限公司 | Pronounce valuator device, method and system |
| JP6613560B2 (en)* | 2014-12-12 | 2019-12-04 | カシオ計算機株式会社 | Electronic device, learning support method and program |
| CN105933635A (en)* | 2016-05-04 | 2016-09-07 | 王磊 | Method for attaching label to audio and video content |
| CN110556095B (en)* | 2018-05-30 | 2023-06-23 | 卡西欧计算机株式会社 | Learning device, robot, learning support system, learning device control method, and storage medium |
| KR102416041B1 (en)* | 2021-11-23 | 2022-07-01 | 진기석 | Multilingual simultaneous learning system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6449081A (en)* | 1987-08-19 | 1989-02-23 | Chuo Hatsujo Kk | Pronunciation training apparatus |
| JP2844817B2 (en)* | 1990-03-22 | 1999-01-13 | 日本電気株式会社 | Speech synthesis method for utterance practice |
| JP3931442B2 (en)* | 1998-08-10 | 2007-06-13 | ヤマハ株式会社 | Karaoke equipment |
| JP2001051580A (en)* | 1999-08-06 | 2001-02-23 | Nyuuton:Kk | Voice learning device |
| JP2002244547A (en)* | 2001-02-19 | 2002-08-30 | Nippon Hoso Kyokai <Nhk> | Computer program for utterance learning system and server device cooperating with this program |
| JP2004093915A (en)* | 2002-08-30 | 2004-03-25 | Casio Comput Co Ltd | Server device, information terminal device, learning support device, and program |
| JP3842746B2 (en)* | 2003-03-03 | 2006-11-08 | 富士通株式会社 | Teaching material providing program, teaching material providing system, and teaching material providing method |
| Publication number | Publication date |
|---|---|
| JP2006178334A (en) | 2006-07-06 |
| KR20060073502A (en) | 2006-06-28 |
| CN1794315A (en) | 2006-06-28 |
| KR100659212B1 (en) | 2006-12-20 |
| Publication | Publication Date | Title |
|---|---|---|
| US10789290B2 (en) | Audio data processing method and apparatus, and computer storage medium | |
| CN106898340B (en) | Song synthesis method and terminal | |
| CN106373580B (en) | Method and device for synthesizing singing voice based on artificial intelligence | |
| US12027165B2 (en) | Computer program, server, terminal, and speech signal processing method | |
| Garellek et al. | Voice quality and tone identification in White Hmong | |
| US20190130894A1 (en) | Text-based insertion and replacement in audio narration | |
| US6182044B1 (en) | System and methods for analyzing and critiquing a vocal performance | |
| CN109949783A (en) | Song synthesis method and system | |
| US10971125B2 (en) | Music synthesis method, system, terminal and computer-readable storage medium | |
| CN108831437A (en) | A kind of song generation method, device, terminal and storage medium | |
| CN110675886A (en) | Audio signal processing method, audio signal processing device, electronic equipment and storage medium | |
| JP2007249212A (en) | Method, computer program and processor for text speech synthesis | |
| CN112289300B (en) | Audio processing method and device, electronic equipment and computer readable storage medium | |
| CN112992109B (en) | Auxiliary singing system, auxiliary singing method and non-transient computer readable recording medium | |
| CN100585663C (en) | language learning system | |
| CN113421544B (en) | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium | |
| CN117238273A (en) | Singing voice synthesizing method, computer device and storage medium | |
| JP2006139162A (en) | Language learning system | |
| CN112164387B (en) | Audio synthesis method, device, electronic device and computer-readable storage medium | |
| CN100458914C (en) | Speech recognition system and method | |
| CN115440198B (en) | Method, apparatus, computer device and storage medium for converting mixed audio signal | |
| CN114282941B (en) | Method, device, equipment and storage medium for determining advertisement insertion position | |
| Bous | A neural voice transformation framework for modification of pitch and intensity | |
| JP4543919B2 (en) | Language learning device | |
| TWI235823B (en) | Speech recognition system and method thereof |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20100127 Termination date:20151223 | |
| EXPY | Termination of patent right or utility model |