CN106157974A

Movatterモバイル変換

Info

Publication number: CN106157974A
Application number: CN201510161880.5A
Authority: CN
Inventors: 石自强; 刘汝杰
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-04-07
Filing date: 2015-04-07
Publication date: 2016-11-23

Abstract

本公开涉及文本背诵质量评估装置和方法。所述装置包括：获取单元，用于获取通过背诵文本而产生的文本背诵特征曲线；分割单元，用于对所述文本背诵特征曲线进行分割，以获取所述文本中每个字的字背诵特征曲线；韵律得分获取单元，用于将所述每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较，以获取每个字的韵律得分；声学得分获取单元，用于根据所述每个字的字背诵特征曲线确定所述每个字的背诵准确度，以获取每个字的声学得分；以及评估单元，用于基于每个字的韵律得分和声学得分对所述文本的背诵质量进行评估。由于能够针对每个字的韵律和发声进行打分，从而可以对文本的背诵质量进行评估，使得评估的结果更加精确且符合实际。

The present disclosure relates to a text recitation quality assessment device and method. The device includes: an acquisition unit, configured to acquire a text reciting characteristic curve generated by reciting text; a segmentation unit, configured to segment the text reciting characteristic curve, to acquire the character reciting characteristics of each character in the text Curve; The prosody score acquisition unit is used to compare the character recitation characteristic curve of each word with the word standard characteristic curve of each word, to obtain the prosody score of each word; The acoustic score acquisition unit is used for according to the The character reciting characteristic curve of each character is used to determine the reciting accuracy of each character to obtain the acoustic score of each character; Recitation quality is assessed. Since the prosody and pronunciation of each character can be scored, the recitation quality of the text can be evaluated, making the evaluation results more accurate and realistic.

Description

Translated fromChinese

文本背诵质量评估装置和方法Apparatus and method for assessing quality of text reciting

技术领域technical field

本公开涉及音频处理的技术领域，具体地涉及文本背诵质量评估装置和方法。The present disclosure relates to the technical field of audio processing, in particular to a text reciting quality evaluation device and method.

背景技术Background technique

这个部分提供了与本公开有关的背景信息，这不一定是现有技术。This section provides background information related to the present disclosure which is not necessarily prior art.

背诵语音质量的自动评估是这样一种技术，在这种技术中，学生按照指定文本进行背诵，并且计算机根据背诵语音与标准朗读语音的质量反馈出分数。这种技术能够帮助学生更好地了解自己背诵的水平，从而自我提高背诵技巧，同时能够大幅降低老师的劳动量。虽然此种技术应用广泛，但是目前鲜有这方面的工作。Automatic assessment of recitation speech quality is a technique in which a student recites a given text and the computer returns a score based on the quality of the recited speech versus standard read-aloud speech. This technology can help students better understand their level of recitation, so as to improve their recitation skills by themselves, and at the same time can greatly reduce the workload of teachers. Although this technique is widely used, very little work has been done in this area.

发明内容Contents of the invention

这个部分提供了本公开的一般概要，而不是其全部范围或其全部特征的全面披露。This section provides a general summary of the disclosure, not a comprehensive disclosure of its full scope or all of its features.

本公开的目的在于提供一种文本背诵质量评估装置和方法，其能够针对文本中的每个字的韵律和发声进行打分，从而可以对文本的背诵质量进行评估，使得评估的结果更加精确且符合实际。The purpose of the present disclosure is to provide a text reciting quality evaluation device and method, which can score the prosody and pronunciation of each word in the text, so that the text reciting quality can be evaluated, making the evaluation result more accurate and in line with actual.

根据本公开的一方面，提供了一种文本背诵质量评估装置，该装置包括：获取单元，用于获取通过背诵文本而产生的文本背诵特征曲线；分割单元，用于对所述文本背诵特征曲线进行分割，以获取所述文本中每个字的字背诵特征曲线；韵律得分获取单元，用于将所述每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较，以获取每个字的韵律得分；声学得分获取单元，用于根据所述每个字的字背诵特征曲线确定所述每个字的背诵准确度，以获取每个字的声学得分；以及评估单元，用于基于每个字的韵律得分和声学得分对所述文本的背诵质量进行评估。According to an aspect of the present disclosure, a text reciting quality assessment device is provided, the device comprising: an acquisition unit for acquiring a text reciting characteristic curve generated by reciting a text; a segmentation unit for reciting the text characteristic curve Segmentation is performed to obtain the character recitation characteristic curve of each character in the text; the prosody score acquisition unit is used to compare the character recitation characteristic curve of each character with the character standard characteristic curve of each character to obtain The prosody score of each word; Acoustic score obtaining unit, is used for determining the reciting accuracy of described each word according to the word reciting characteristic curve of described each word, to obtain the acoustic score of each word; And evaluation unit, with The recitation quality of the text was assessed based on prosody scores and acoustic scores for each word.

根据本公开的另一方面，提供了一种基于由字组成的文本以及所述文本的标准读音来评估对所述文本进行背诵的质量的方法，该方法包括：获取通过所述背诵而产生的文本背诵特征曲线；对所述文本背诵特征曲线进行分割，以获取所述文本中每个字的字背诵特征曲线；将所述每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较，以获取每个字的韵律得分；根据所述每个字的字背诵特征曲线确定所述每个字的背诵准确度，以获取每个字的声学得分；以及基于每个字的韵律得分和声学得分对所述文本的背诵质量进行评估。According to another aspect of the present disclosure, there is provided a method for evaluating the quality of reciting the text based on the text composed of characters and the standard pronunciation of the text, the method comprising: obtaining the text generated by the reciting Text reciting characteristic curve; The text reciting characteristic curve is segmented to obtain the word reciting characteristic curve of each word in the text; The word reciting characteristic curve of the described each word and the word standard characteristic curve of each word Comparing to obtain the prosody score of each character; determining the reciting accuracy of each character according to the character reciting characteristic curve of each character to obtain the acoustic score of each character; and based on the prosody of each character Scores and acoustic scores assess the recitation quality of the text.

根据本公开的另一方面，提供了一种机器可读存储介质，其上携带有包括存储在其中的机器可读指令代码的程序产品，其中，所述指令代码当由计算机读取和执行时，能够使所述计算机执行根据本公开的文本背诵质量评估方法。According to another aspect of the present disclosure, there is provided a machine-readable storage medium carrying thereon a program product including machine-readable instruction codes stored therein, wherein the instruction codes, when read and executed by a computer, , enabling the computer to execute the text reciting quality assessment method according to the present disclosure.

使用根据本公开的文本背诵质量评估装置和方法，可以获取文本中的每个字的韵律得分和声学得分，并且可以基于每个字的韵律得分和声学得分对文本的背诵质量进行评估，这样一来就使得评估的结果更加精确且符合实际。Using the text recitation quality assessment device and method according to the present disclosure, the prosody score and acoustic score of each word in the text can be obtained, and the recitation quality of the text can be evaluated based on the prosody score and acoustic score of each word, such that To make the evaluation results more accurate and realistic.

从在此提供的描述中，进一步的适用性区域将会变得明显。这个概要中的描述和特定例子只是为了示意的目的，而不旨在限制本公开的范围。Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.

附图说明Description of drawings

在此描述的附图只是为了所选实施例的示意的目的而非全部可能的实施，并且不旨在限制本公开的范围。在附图中：The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure. In the attached picture:

图1为图示根据本公开的实施例的文本背诵质量评估装置的结构的框图；1 is a block diagram illustrating the structure of a text reciting quality evaluation device according to an embodiment of the present disclosure;

图2为文本背诵特征曲线的示意图；Fig. 2 is the schematic diagram of text reciting characteristic curve;

图3为图示存在进行背诵的概率和背诵的能量的示意图；Fig. 3 is a schematic diagram illustrating the probability that there is reciting and the energy of reciting;

图4为图示文本中的字的基频曲线的示意图；Fig. 4 is the schematic diagram that illustrates the basic frequency curve of the word in the text;

图5为图示根据本公开的实施例的文本背诵质量评估装置中的评估单元的结构的框图；FIG. 5 is a block diagram illustrating a structure of an evaluation unit in a text recitation quality evaluation device according to an embodiment of the present disclosure;

图6为图示根据本公开的实施例的文本背诵质量评估装置中的起止位置确定单元的结构的框图；6 is a block diagram illustrating the structure of a start-stop position determination unit in a text-reciting quality assessment device according to an embodiment of the present disclosure;

图7为图示根据本公开的实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元的结构的框图；7 is a block diagram illustrating the structure of a determination unit in a start-stop position determination unit in a text-reciting quality assessment device according to an embodiment of the present disclosure;

图8为图示根据本公开的实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元的另一结构的框图；8 is a block diagram illustrating another structure of a determination unit in a start-stop position determination unit in a text-reciting quality assessment device according to an embodiment of the present disclosure;

图9为图示根据本公开的实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元的另一结构的框图；9 is a block diagram illustrating another structure of a determination unit in a start-stop position determination unit in a text recitation quality assessment device according to an embodiment of the present disclosure;

图10为图示根据本公开的实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元的另一结构的框图；10 is a block diagram illustrating another structure of a determination unit in a start-stop position determination unit in a text-reciting quality assessment device according to an embodiment of the present disclosure;

图11为图示根据本公开的实施例的文本背诵质量评估装置中的韵律得分获取单元的结构的框图；11 is a block diagram illustrating the structure of a prosody score acquisition unit in a text recitation quality assessment device according to an embodiment of the present disclosure;

图12为根据本公开的实施例的文本背诵质量评估方法的流程图；以及FIG. 12 is a flow chart of a method for evaluating the quality of text reciting according to an embodiment of the present disclosure; and

图13为其中可以实现根据本公开的实施例的文本背诵质量评估装置和方法的通用个人计算机的示例性结构的框图。FIG. 13 is a block diagram of an exemplary structure of a general-purpose personal computer in which the text recitation quality evaluation apparatus and method according to an embodiment of the present disclosure can be implemented.

虽然本公开容易经受各种修改和替换形式，但是其特定实施例已作为例子在附图中示出，并且在此详细描述。然而应当理解的是，在此对特定实施例的描述并不打算将本公开限制到公开的具体形式，而是相反地，本公开目的是要覆盖落在本公开的精神和范围之内的所有修改、等效和替换。要注意的是，贯穿几个附图，相应的标号指示相应的部件。While the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and described in detail herein. It should be understood, however, that the description herein of specific embodiments is not intended to limit the present disclosure to the precise forms disclosed, but on the contrary, the present disclosure is intended to cover all matters falling within the spirit and scope of the present disclosure. Modifications, Equivalents and Substitutions. It is noted that corresponding numerals indicate corresponding parts throughout the several views of the drawings.

具体实施方式detailed description

现在参考附图来更加充分地描述本公开的例子。以下描述实质上只是示例性的，而不旨在限制本公开、应用或用途。Examples of the present disclosure will now be described more fully with reference to the accompanying drawings. The following description is merely exemplary in nature and is not intended to limit the disclosure, application or uses.

提供了示例实施例，以便本公开将会变得详尽，并且将会向本领域技术人员充分地传达其范围。阐述了众多的特定细节如特定部件、装置和方法的例子，以提供对本公开的实施例的详尽理解。对于本领域技术人员而言将会明显的是，不需要使用特定的细节，示例实施例可以用许多不同的形式来实施，它们都不应当被解释为限制本公开的范围。在某些示例实施例中，没有详细地描述众所周知的过程、众所周知的结构和众所周知的技术。Example embodiments are provided so that this disclosure will be thorough, and will fully convey the scope to those who are skilled in the art. Numerous specific details are set forth such as examples of specific components, devices and methods, to provide a thorough understanding of embodiments of the present disclosure. It will be apparent to those skilled in the art that specific details need not be employed, that example embodiments may be embodied in many different forms, and that neither should be construed to limit the scope of the disclosure. In some example embodiments, well-known processes, well-known structures, and well-known technologies are not described in detail.

本公开提出了一种背诵语音的自动评估技术。首先，可以提取例如由学生背诵的语音的特征曲线。在此特征曲线基础上，可以将背诵语音分割为对应每个字的发音特征曲线。然后，可以将每个字的特征曲线与标准发音的每个字的特征曲线进行比较，从而可以得到的每个字的发音打分，并进一步得到文本的分数。由于这个分数只与韵律特征相关，因此有可能出现不同文本导致相同的特征曲线的情况。为了克服这个问题，可以使用预先得到的声学模型根据文本计算背诵发声的声学得分。最后结合韵律得分和声学得分，获得此背诵语音的质量评估打分。The present disclosure proposes an automatic evaluation technique for reciting speech. First, for example, a characteristic curve of speech recited by a student can be extracted. On the basis of this characteristic curve, the reciting speech can be divided into pronunciation characteristic curves corresponding to each character. Then, the characteristic curve of each character can be compared with the characteristic curve of each character of the standard pronunciation, so that the pronunciation score of each character can be obtained, and the score of the text can be further obtained. Since this score is only related to prosodic features, it is possible that different texts lead to the same feature curve. To overcome this problem, a pre-obtained acoustic model can be used to calculate an acoustic score for reciting utterances from the text. Finally, combine the prosody score and the acoustic score to obtain the quality evaluation score of the recited speech.

图1图示了根据本公开的实施例的文本背诵质量评估装置100的结构。如图1所示，根据本公开的实施例的文本背诵质量评估装置100可以包括获取单元110、分割单元120、韵律得分获取单元130、声学得分获取单元140和评估单元150。FIG. 1 illustrates the structure of a text recitation quality evaluation device 100 according to an embodiment of the present disclosure. As shown in FIG. 1 , the text reciting quality assessment device 100 according to an embodiment of the present disclosure may include an acquisition unit 110 , a segmentation unit 120 , a prosody score acquisition unit 130 , an acoustic score acquisition unit 140 and an evaluation unit 150 .

获取单元110可以获取通过背诵文本而产生的文本背诵特征曲线。The acquisition unit 110 may acquire a text recitation characteristic curve generated by reciting a text.

图2示出了文本背诵特征曲线的例子。在图2中，横坐标表示以秒为单位的时间，而纵坐标则表示语音波形的幅值。为了有利于对比和随后的打分，根据本公开的优选实施例，装置100可以进一步包括压缩单元(未示出)，以对文本背诵特征曲线进行幅值压缩。在图2中，文本背诵特征曲线的幅值被压缩在[-1,1]之间。Fig. 2 shows an example of a text recitation characteristic curve. In FIG. 2, the abscissa represents the time in seconds, while the ordinate represents the amplitude of the speech waveform. In order to facilitate comparison and subsequent scoring, according to a preferred embodiment of the present disclosure, the device 100 may further include a compression unit (not shown) to compress the amplitude of the text recitation characteristic curve. In Figure 2, the amplitude of the text recitation characteristic curve is compressed between [-1,1].

分割单元120可以对文本背诵特征曲线进行分割，以获取文本中每个字的字背诵特征曲线。The segmentation unit 120 may segment the text reciting characteristic curve to obtain the character reciting characteristic curve of each character in the text.

在这之后，韵律得分获取单元130可以将每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较，以获取每个字的韵律得分。After that, the prosody score obtaining unit 130 can compare the character recitation characteristic curve of each character with the character standard characteristic curve of each character to obtain the prosody score of each character.

进一步，声学得分获取单元140可以根据每个字的字背诵特征曲线确定每个字的背诵准确度，以获取每个字的声学得分。Further, the acoustic score obtaining unit 140 may determine the reciting accuracy of each character according to the character recitation characteristic curve of each character, so as to obtain the acoustic score of each character.

最后，评估单元150可以基于每个字的韵律得分和声学得分对文本的背诵质量进行评估。Finally, the evaluation unit 150 may evaluate the recitation quality of the text based on the prosody score and the acoustic score of each character.

在根据本公开的实施例的文本背诵质量评估装置100中，韵律得分获取单元130可以获取每个字的韵律得分，并且声学得分获取单元140可以获取每个字的声学得分。基于每个字的韵律得分和声学得分，评估单元150可以对文本的背诵质量进行评估。与基于背诵的整个句子(或文本)的连续韵律特性和连续声学特性来进行背诵质量评估相比，根据本公开的技术方案使得评估的结果更加精确且符合实际。In the text-reciting quality evaluation device 100 according to an embodiment of the present disclosure, the prosody score acquisition unit 130 may acquire the prosody score of each character, and the acoustic score acquisition unit 140 may acquire the acoustic score of each character. Based on the prosody score and the acoustic score of each character, the evaluation unit 150 can evaluate the recitation quality of the text. Compared with evaluating the recitation quality based on the continuous prosodic characteristics and continuous acoustic characteristics of the entire sentence (or text) to be recited, the technical solution according to the present disclosure makes the evaluation result more accurate and realistic.

为了更好地理解本公开的技术方案，下面针对本公开的文本背诵质量评估装置进行更加详细地描述。In order to better understand the technical solution of the present disclosure, the text reciting quality evaluation device of the present disclosure will be described in more detail below.

图5示出了根据本公开的实施例的文本背诵质量评估装置中的评估单元500。图5所示的评估单元500对应于图1所示的评估单元150。Fig. 5 shows an evaluation unit 500 in the text recitation quality evaluation device according to an embodiment of the present disclosure. Evaluation unit 500 shown in FIG. 5 corresponds to evaluation unit 150 shown in FIG. 1 .

如图5所示，评估单元500可以包括字背诵得分获取单元510、文本背诵得分获取单元520和质量评估单元530。As shown in FIG. 5 , the evaluation unit 500 may include a character recitation score acquisition unit 510 , a text recitation score acquisition unit 520 and a quality evaluation unit 530 .

字背诵得分获取单元510可以合并每个字的韵律得分和每个字的声学得分，以获取每个字的背诵得分。The character recitation score obtaining unit 510 may combine the prosody score of each character and the acoustic score of each character to obtain the recitation score of each character.

进一步，文本背诵得分获取单元520可以合并文本中包含的所有字的背诵得分，以获取文本的背诵总得分。Further, the text recitation score obtaining unit 520 may combine the recitation scores of all words contained in the text to obtain the total recitation score of the text.

进而，质量评估单元530可以根据背诵总得分对文本的背诵质量进行评估。Furthermore, the quality evaluation unit 530 may evaluate the recitation quality of the text according to the total score of recitation.

如图5所示的评估单元500仅仅是完成对文本的背诵质量进行评估的功能的一种实施方式。本领域技术人员可以意识到的是，也可以使用其它实施方式来完成这一功能。例如，同样可能的是，首先合并文本中包含的所有字的韵律得分以获取文本的韵律总得分，然后合并文本中包含的所有字的声学得分以获取文本的声学总得分，最后合并文本的韵律总得分和声学总得分以获取文本的背诵总得分。The evaluation unit 500 as shown in FIG. 5 is only an implementation manner for completing the function of evaluating the text recitation quality. Those skilled in the art can appreciate that other implementations can also be used to accomplish this function. For example, it is also possible to first combine the prosody scores of all words contained in the text to obtain the prosodic total score of the text, then combine the acoustic scores of all the words contained in the text to obtain the total acoustic score of the text, and finally combine the prosody scores of the text Total Score and Acoustic Total Score to get the Recitation Total Score for the text.

另外，根据本公开的实施例，如图1所示，分割单元120可以包括起止位置确定单元121。In addition, according to an embodiment of the present disclosure, as shown in FIG. 1 , the segmentation unit 120 may include a start-stop position determination unit 121 .

起止位置确定单元121可以根据文本背诵特征曲线确定文本中的每个字在文本背诵特征曲线中的起止位置，以获取文本中每个字的字背诵特征曲线。The start and end position determining unit 121 may determine the start and end positions of each character in the text in the text reciting characteristic curve according to the text reciting characteristic curve, so as to obtain the character reciting characteristic curve of each character in the text.

图6示出了根据本公开的实施例的文本背诵质量评估装置中的起止位置确定单元600。图6所示的起止位置确定单元600对应于图1所示的起止位置确定单元121。Fig. 6 shows a start-stop position determining unit 600 in the text-reciting quality assessment device according to an embodiment of the present disclosure. The start-stop position determining unit 600 shown in FIG. 6 corresponds to the start-stop position determining unit 121 shown in FIG. 1 .

如图6所示，起止位置确定单元600可以包括计算单元610、基频曲线获取单元620和确定单元630。As shown in FIG. 6 , the start-stop position determination unit 600 may include a calculation unit 610 , a fundamental frequency curve acquisition unit 620 and a determination unit 630 .

计算单元610可以根据文本背诵特征曲线计算每一帧中存在进行背诵的概率和背诵的能量。The calculation unit 610 may calculate the probability of reciting and the energy of reciting in each frame according to the text reciting characteristic curve.

图3示出了存在进行背诵的概率和背诵的能量的曲线图。例如，首先可以对如图2所示的波形提取频谱，并且对频谱取对数尺度，以进行幅值压缩。接下来，可以对对数尺度的频谱进行滤波，然后就可以计算每一帧是否有人说话的概率以及能量。在图3中，实线表示是否是人说话的概率，而虚线则表示能量。Fig. 3 shows a graph of the probability of presence of recitation and the energy of recitation. For example, firstly, the frequency spectrum can be extracted from the waveform shown in FIG. 2 , and the frequency spectrum can be taken on a logarithmic scale to perform amplitude compression. Next, the logarithmic scale spectrum can be filtered, and then the probability and energy of whether someone is speaking can be calculated for each frame. In Figure 3, the solid line represents the probability of whether it is a human speaking, while the dashed line represents the energy.

基频曲线获取单元620可以根据文本背诵特征曲线获取文本中每个字的基频曲线。The fundamental frequency curve acquiring unit 620 can acquire the fundamental frequency curve of each word in the text according to the text reciting characteristic curve.

图4为图示文本中的字的基频曲线的示意图。可以利用动态规划算法提取波形的基频特征，以得到每个单字的基频曲线。动态规划算法在本领域中是众所周知的，本公开对此不再予以详述。FIG. 4 is a schematic diagram illustrating fundamental frequency curves of words in text. A dynamic programming algorithm can be used to extract the fundamental frequency feature of the waveform to obtain the fundamental frequency curve of each word. Dynamic programming algorithms are well known in the art and will not be described in detail in this disclosure.

根据每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线，确定单元630可以确定文本中的每个字在文本背诵特征曲线中的起止位置。According to the probability and energy of reciting in each frame and the fundamental frequency curve of each character, the determination unit 630 can determine the start and end positions of each character in the text in the text reciting characteristic curve.

图7示出了根据本公开的实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元700。图7所示的确定单元700对应于图6所示的确定单元630。Fig. 7 shows a determination unit 700 in the start-stop position determination unit in the text reciting quality assessment device according to an embodiment of the present disclosure. The determining unit 700 shown in FIG. 7 corresponds to the determining unit 630 shown in FIG. 6 .

如图7所示，确定单元700可以包括基频段数目确定单元710和确定单元(第一确定单元)720。As shown in FIG. 7 , the determination unit 700 may include a base frequency band number determination unit 710 and a determination unit (first determination unit) 720 .

基频段数目确定单元710可以根据每个字的基频曲线确定基频段的数目。这里，基频段是每个字的基频曲线中连续不间断的基频的片段。例如可以在图4中看到多个基频段。The base frequency band number determining unit 710 can determine the number of base frequency bands according to the base frequency curve of each word. Here, the fundamental frequency segment is a segment of continuous and uninterrupted fundamental frequency in the fundamental frequency curve of each word. Several basebands can be seen, for example, in FIG. 4 .

然后，根据基频段的数目与文本的字数的关系，确定单元720可以确定文本中的每个字在文本背诵特征曲线中的起止位置。Then, according to the relationship between the number of fundamental frequency bands and the number of words in the text, the determination unit 720 can determine the start and end positions of each word in the text in the text reciting characteristic curve.

图8示出了根据本公开的另一实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元800。图8所示的确定单元800对应于图6所示的确定单元630。Fig. 8 shows a determination unit 800 in the start-stop position determination unit in the text reciting quality assessment device according to another embodiment of the present disclosure. The determining unit 800 shown in FIG. 8 corresponds to the determining unit 630 shown in FIG. 6 .

如图8所示，确定单元800可以包括概率曲线确定单元810、能量曲线确定单元820、划分单元830和确定单元(第二确定单元)840。As shown in FIG. 8 , the determination unit 800 may include a probability curve determination unit 810 , an energy curve determination unit 820 , a division unit 830 and a determination unit (second determination unit) 840 .

概率曲线确定单元810可以根据每一帧中存在进行背诵的概率确定概率曲线(如图3中的实线所示)。The probability curve determining unit 810 may determine a probability curve (as shown by the solid line in FIG. 3 ) according to the probability of recitation in each frame.

能量曲线确定单元820可以根据背诵的能量确定能量曲线(如图3中的虚线所示)。The energy curve determination unit 820 can determine the energy curve (as shown by the dotted line in FIG. 3 ) according to the recited energy.

然后，划分单元830可以根据概率曲线或者能量曲线中的谷点将文本背诵特征曲线划分为曲线段。Then, the division unit 830 may divide the text recitation characteristic curve into curve segments according to the valley points in the probability curve or the energy curve.

在这之后，根据曲线段的数目与文本的字数的关系，确定单元840可以确定文本中的每个字在文本背诵特征曲线中的起止位置。Afterwards, according to the relationship between the number of curve segments and the number of words in the text, the determination unit 840 can determine the start and end positions of each word in the text in the text reciting characteristic curve.

图9示出了根据本公开的另一实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元900。图9所示的确定单元900对应于图6所示的确定单元630。Fig. 9 shows a determination unit 900 in the start-stop position determination unit in the text reciting quality assessment device according to another embodiment of the present disclosure. The determining unit 900 shown in FIG. 9 corresponds to the determining unit 630 shown in FIG. 6 .

如图9所示，确定单元900可以包括概率曲线确定单元910、能量曲线确定单元920、能量概率曲线确定单元930、划分单元940和确定单元(第三确定单元)950。As shown in FIG. 9 , the determination unit 900 may include a probability curve determination unit 910 , an energy curve determination unit 920 , an energy probability curve determination unit 930 , a division unit 940 and a determination unit (third determination unit) 950 .

同样地，概率曲线确定单元910可以根据每一帧中存在进行背诵的概率确定概率曲线，并且能量曲线确定单元920可以根据背诵的能量确定能量曲线。Likewise, the probability curve determining unit 910 may determine a probability curve according to the probability of recitation in each frame, and the energy curve determining unit 920 may determine an energy curve according to the energy of recitation.

进一步，能量概率曲线确定单元930可以根据概率曲线和能量曲线确定能量概率曲线。这里，能量概率曲线既包括概率曲线的特征又包括能量曲线的特征。Further, the energy probability curve determination unit 930 may determine the energy probability curve according to the probability curve and the energy curve. Here, the energy probability curve includes both the characteristics of the probability curve and the characteristics of the energy curve.

然后，划分单元940可以根据能量概率曲线中的谷点将文本背诵特征曲线划分为曲线段。Then, the division unit 940 may divide the text recitation characteristic curve into curve segments according to the valley points in the energy probability curve.

在这之后，根据曲线段的数目与文本的字数的关系，确定单元950可以确定文本中的每个字在文本背诵特征曲线中的起止位置。After that, according to the relationship between the number of curve segments and the number of words in the text, the determination unit 950 can determine the start and end positions of each word in the text in the text reciting characteristic curve.

图10示出了根据本公开的另一实施例的文本背诵质量评估装置中的起止位置确定单元中的确定单元1000。图10所示的确定单元1000对应于图6所示的确定单元630。Fig. 10 shows a determination unit 1000 in the start-stop position determination unit in the text reciting quality assessment device according to another embodiment of the present disclosure. The determining unit 1000 shown in FIG. 10 corresponds to the determining unit 630 shown in FIG. 6 .

如图10所示，确定单元1000可以包括概率曲线确定单元1010、能量曲线确定单元1020、基频能量概率曲线确定单元1030、划分单元1040和确定单元(第四确定单元)1050。As shown in FIG. 10 , the determination unit 1000 may include a probability curve determination unit 1010 , an energy curve determination unit 1020 , a fundamental frequency energy probability curve determination unit 1030 , a division unit 1040 and a determination unit (fourth determination unit) 1050 .

同样地，概率曲线确定单元1010可以根据每一帧中存在进行背诵的概率确定概率曲线，并且能量曲线确定单元1020可以根据背诵的能量确定能量曲线。Likewise, the probability curve determination unit 1010 may determine the probability curve according to the probability of recitation in each frame, and the energy curve determination unit 1020 may determine the energy curve according to the energy of the recitation.

进一步，基频能量概率曲线确定单元1030可以根据概率曲线、能量曲线和每个字的基频曲线确定基频能量概率曲线。这里，基频能量概率曲线包括了字基频曲线的特征、概率曲线的特征以及能量曲线的特征。Further, the fundamental frequency energy probability curve determining unit 1030 may determine the fundamental frequency energy probability curve according to the probability curve, the energy curve and the fundamental frequency curve of each word. Here, the fundamental frequency energy probability curve includes the characteristics of the fundamental frequency curve, the characteristic of the probability curve and the characteristic of the energy curve.

然后，划分单元1040可以根据基频能量概率曲线中的谷点将文本背诵特征曲线划分为曲线段。Then, the division unit 1040 may divide the text recitation characteristic curve into curve segments according to the valley points in the fundamental frequency energy probability curve.

在这之后，根据曲线段的数目与文本的字数的关系，确定单元1050可以确定文本中的每个字在文本背诵特征曲线中的起止位置。Afterwards, according to the relationship between the number of curve segments and the number of words in the text, the determination unit 1050 can determine the start and end positions of each word in the text in the text reciting characteristic curve.

图11示出了根据本公开的实施例的文本背诵质量评估装置中的韵律得分获取单元1100。图11所示的韵律得分获取单元1100对应于图1所示的韵律得分获取单元130。Fig. 11 shows the prosody score acquisition unit 1100 in the text reciting quality assessment device according to an embodiment of the present disclosure. The prosody score acquisition unit 1100 shown in FIG. 11 corresponds to the prosody score acquisition unit 130 shown in FIG. 1 .

如图11所示，韵律得分获取单元1100可以包括转换单元(第一转换单元)1110、转换单元(第二转换单元)1120和比较单元1130。As shown in FIG. 11 , the prosody score acquisition unit 1100 may include a conversion unit (first conversion unit) 1110 , a conversion unit (second conversion unit) 1120 and a comparison unit 1130 .

转换单元1110可以将每个字的字背诵特征曲线转换成背诵基频序列。The converting unit 1110 may convert the character-reciting characteristic curve of each character into a reciting fundamental frequency sequence.

转换单元1120可以将每个字的字标准特征曲线转换成标准基频序列。The conversion unit 1120 can convert the word standard characteristic curve of each word into a standard fundamental frequency sequence.

在这之后，比较单元1130可以将背诵基频序列与标准基频序列进行比较，以获取每个字的韵律得分。After that, the comparing unit 1130 can compare the recited base frequency sequence with the standard base frequency sequence to obtain the prosody score of each word.

需要说明的是，将特征曲线转换成基频序列的具体方式在本领域中是众所周知的，并且本公开对此并没有特殊限制。It should be noted that the specific manner of converting the characteristic curve into the fundamental frequency sequence is well known in the art, and the present disclosure is not particularly limited thereto.

如从图4可以看到的那样，不同发音的字的基频曲线是不同的。结合基频特征以及其他语音特征，例如MFCC(Mel-Frequency CepstralCoefficient,梅尔频率倒谱系数)，对比标准朗读以及小学生朗读中的单字发音，可以给出单字朗读的准确度。在此基础上，进而可以给出单句以及整首诗的准确率打分。As can be seen from Figure 4, the fundamental frequency curves of words with different pronunciations are different. Combined with fundamental frequency features and other phonetic features, such as MFCC (Mel-Frequency Cepstral Coefficient, Mel-Frequency Cepstral Coefficient), comparing the pronunciation of single characters in standard reading and primary school students' reading, the accuracy of single-word reading can be given. On this basis, the accuracy rate of a single sentence and the entire poem can be scored.

虽然基于字的背诵语音打分可以得到句子的打分，并进一步得到背诵语音的得分，但由于此分数只与韵律特征相关，因此可能存在不同文本导致相同特征曲线的情况。为了克服这个问题，可以使用预先训练好的声学模型，根据所述预分段文本得到声学得分。然后，结合韵律得分和声学得分，可以获得此背诵的最终语音评估打分。Although the score of the sentence based on the reciting speech can be obtained, and the score of the reciting speech can be further obtained, but since this score is only related to the prosodic feature, there may be cases where different texts lead to the same characteristic curve. To overcome this problem, a pre-trained acoustic model can be used to obtain an acoustic score based on the pre-segmented text. Then, combining the prosody score and the acoustic score, a final phonological assessment score for this recitation can be obtained.

具体地，如图1所示的声学得分获取单元140可以包括建模单元(未示出)和准确度确定单元(未示出)。Specifically, the acoustic score acquisition unit 140 as shown in FIG. 1 may include a modeling unit (not shown) and an accuracy determination unit (not shown).

建模单元可以建立隐马尔可夫模型，以将每个字的字背诵特征曲线转换成特征序列。The modeling unit can establish a hidden Markov model to convert the character reciting characteristic curve of each character into a characteristic sequence.

进一步，准确度确定单元可以根据特征序列和隐马尔可夫模型确定每个字的背诵准确度，以获取每个字的声学得分。Further, the accuracy determining unit may determine the reciting accuracy of each character according to the feature sequence and the hidden Markov model, so as to obtain the acoustic score of each character.

下面是有关声学得分的计算细节。此处的问题变为：在对应给定文字的隐马尔可夫模型上，得到前面提取出的特征曲线的概率。这里需要计算所有可能的状态序列。因此观测到长度为L的特征曲线The following are details about the calculation of the acoustic score. The problem here becomes: on the hidden Markov model corresponding to the given text, get the probability of the previously extracted characteristic curve. All possible state sequences need to be calculated here. Therefore, a characteristic curve of length L is observed

Y＝y(0)，y(1)，...，y(L-1)Y = y(0), y(1), ..., y(L-1)

的概率，也就是声学得分由下式给定：The probability of , that is, the acoustic score is given by:

$P P ((Y Y)) = = \underset{X x}{Σ Σ} P P ((Y Y | | X x)) P P ((X x)),,$

其中求和是针对所有可能的状态序列：where the summation is over all possible state sequences:

X＝x(0)，x(1)，...，x(L-1).X = x(0), x(1), ..., x(L-1).

具体的计算方法可以通过动态规划原则进行求解。隐马尔可夫模型在本领域中是众所周知的，本公开对此不再予以详述。The specific calculation method can be solved by the principle of dynamic programming. Hidden Markov models are well known in the art and will not be described in detail in this disclosure.

下面结合图12来描述根据本公开的实施例的文本背诵质量评估方法。根据本公开的方法可以基于由字组成的文本以及文本的标准读音来评估对文本进行背诵的质量。The text reciting quality assessment method according to an embodiment of the present disclosure will be described below with reference to FIG. 12 . The method according to the present disclosure can evaluate the quality of reciting a text based on the text composed of characters and the standard pronunciation of the text.

如图12所示，根据本公开的实施例的文本背诵质量评估方法开始于步骤S110。在步骤S110中，获取通过背诵而产生的文本背诵特征曲线。As shown in FIG. 12 , the text reciting quality assessment method according to the embodiment of the present disclosure starts at step S110 . In step S110, a text recitation characteristic curve generated by recitation is acquired.

接下来，在步骤S120中，对文本背诵特征曲线进行分割，以获取文本中每个字的字背诵特征曲线。Next, in step S120, the text reciting characteristic curve is segmented to obtain the character reciting characteristic curve of each character in the text.

接下来，在步骤S130中，将每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较，以获取每个字的韵律得分。Next, in step S130, the character recitation characteristic curve of each character is compared with the character standard characteristic curve of each character to obtain the prosody score of each character.

接下来，在步骤S140中，根据每个字的字背诵特征曲线确定每个字的背诵准确度，以获取每个字的声学得分。Next, in step S140, the reciting accuracy of each character is determined according to the character reciting characteristic curve of each character, so as to obtain the acoustic score of each character.

接下来，在步骤S150中，基于每个字的韵律得分和声学得分对文本的背诵质量进行评估。在这之后，过程结束。Next, in step S150, the recitation quality of the text is evaluated based on the prosody score and the acoustic score of each character. After this, the process ends.

根据本公开的实施例，在步骤S150中基于每个字的韵律得分和声学得分对文本的背诵质量进行评估可以包括：合并每个字的韵律得分和每个字的声学得分，以获取每个字的背诵得分；合并文本中包含的所有字的背诵得分，以获取所述文本的背诵总得分；以及根据背诵总得分对背诵的质量进行评估。According to an embodiment of the present disclosure, evaluating the reciting quality of the text based on the prosody score and the acoustic score of each character in step S150 may include: combining the prosody score of each character and the acoustic score of each character to obtain each The recitation score of the word; the recitation scores of all the words contained in the text are combined to obtain the total recitation score of the text; and the quality of the recitation is evaluated according to the total recitation score.

根据本公开的实施例，在步骤S120中对文本背诵特征曲线进行分割以获取文本中每个字的字背诵特征曲线可以包括：根据文本背诵特征曲线确定文本中的每个字在文本背诵特征曲线中的起止位置，以获取文本中每个字的字背诵特征曲线。According to an embodiment of the present disclosure, segmenting the text reciting characteristic curve to obtain the character reciting characteristic curve of each character in the text in step S120 may include: determining the text reciting characteristic curve of each character in the text according to the text reciting characteristic curve The starting and ending positions in the text to obtain the character reciting characteristic curve of each character in the text.

根据本公开的实施例，根据文本背诵特征曲线确定文本中的每个字在文本背诵特征曲线中的起止位置可以包括：根据文本背诵特征曲线计算每一帧中存在进行背诵的概率和背诵的能量；根据文本背诵特征曲线获取文本中每个字的基频曲线；以及根据每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定文本中的每个字在文本背诵特征曲线中的起止位置。According to an embodiment of the present disclosure, determining the start and end positions of each character in the text in the text reciting characteristic curve according to the text reciting characteristic curve may include: calculating the probability of reciting and the reciting energy in each frame according to the text reciting characteristic curve ; Obtain the fundamental frequency curve of each word in the text according to the text reciting characteristic curve; The start and end positions in the characteristic curve.

根据本公开的实施例，根据每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定文本中的每个字在文本背诵特征曲线中的起止位置可以包括：根据每个字的基频曲线确定基频段的数目，其中基频段是每个字的基频曲线中连续不间断的基频的片段；以及根据基频段的数目与文本的字数的关系来确定文本中的每个字在文本背诵特征曲线中的起止位置。According to an embodiment of the present disclosure, determining the start and end positions of each character in the text in the text reciting characteristic curve according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each character may include: according to each The fundamental frequency curve of each word determines the number of fundamental frequency bands, wherein the fundamental frequency band is the continuous uninterrupted fundamental frequency segment in the fundamental frequency curve of each word; The starting and ending positions of each character in the text reciting characteristic curve.

根据本公开的实施例，根据每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定文本中的每个字在文本背诵特征曲线中的起止位置还可以包括：根据每一帧中存在进行背诵的概率确定概率曲线；根据背诵的能量确定能量曲线；根据概率曲线或者能量曲线中的谷点将文本背诵特征曲线划分为曲线段；以及根据曲线段的数目与文本的字数的关系来确定文本中的每个字在文本背诵特征曲线中的起止位置。According to an embodiment of the present disclosure, determining the start and end positions of each character in the text in the text reciting characteristic curve according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each character may also include: Determine the probability curve according to the probability of reciting in each frame; determine the energy curve according to the energy of reciting; divide the text reciting characteristic curve into curve segments according to the valley points in the probability curve or energy curve; and according to the number of curve segments and the text The relationship between the number of words in the text is used to determine the start and end positions of each word in the text reciting characteristic curve.

根据本公开的实施例，根据每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定文本中的每个字在文本背诵特征曲线中的起止位置还可以包括：根据每一帧中存在进行背诵的概率确定概率曲线；根据背诵的能量确定能量曲线；根据概率曲线和能量曲线确定能量概率曲线；根据能量概率曲线中的谷点将文本背诵特征曲线划分为曲线段；以及根据曲线段的数目与文本的字数的关系来确定文本中的每个字在文本背诵特征曲线中的起止位置。According to an embodiment of the present disclosure, determining the start and end positions of each character in the text in the text reciting characteristic curve according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each character may also include: There is a probability of recitation in each frame to determine the probability curve; determine the energy curve according to the energy of recitation; determine the energy probability curve according to the probability curve and the energy curve; divide the text recitation characteristic curve into curve segments according to the valley points in the energy probability curve; And according to the relationship between the number of curve segments and the number of words in the text, the start and end positions of each word in the text in the text reciting characteristic curve are determined.

根据本公开的实施例，根据每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定文本中的每个字在文本背诵特征曲线中的起止位置还可以包括：根据每一帧中存在进行背诵的概率确定概率曲线；根据背诵的能量确定能量曲线；根据概率曲线、能量曲线和每个字的基频曲线确定基频能量概率曲线；根据基频能量概率曲线中的谷点将文本背诵特征曲线划分为曲线段；以及根据曲线段的数目与文本的字数的关系来确定文本中的每个字在文本背诵特征曲线中的起止位置。According to an embodiment of the present disclosure, determining the start and end positions of each character in the text in the text reciting characteristic curve according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each character may also include: Exist in each frame and determine the probability curve of the probability of reciting; Determine the energy curve according to the energy of reciting; Determine the fundamental frequency energy probability curve according to the fundamental frequency curve of probability curve, energy curve and each word; According to the The valley points divide the text reciting characteristic curve into curve segments; and determine the starting and ending positions of each character in the text in the text reciting characteristic curve according to the relationship between the number of curve segments and the number of words in the text.

根据本公开的实施例，将每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较以获取每个字的韵律得分可以包括：将每个字的字背诵特征曲线转换成背诵基频序列；将每个字的字标准特征曲线转换成标准基频序列；将背诵基频序列与标准基频序列进行比较，以获取每个字的韵律得分。According to an embodiment of the present disclosure, comparing the character recitation characteristic curve of each character with the character standard characteristic curve of each character to obtain the prosody score of each character may include: converting the character recitation characteristic curve of each character into a recitation Fundamental frequency sequence; convert the word standard characteristic curve of each character into a standard fundamental frequency sequence; compare the reciting fundamental frequency sequence with the standard fundamental frequency sequence to obtain the prosody score of each character.

根据本公开的实施例，根据每个字的字背诵特征曲线确定每个字的背诵准确度以获取每个字的声学得分可以包括：建立隐马尔可夫模型，以将每个字的字背诵特征曲线转换成特征序列；以及根据特征序列和隐马尔可夫模型确定每个字的背诵准确度，以获取每个字的声学得分。According to an embodiment of the present disclosure, determining the reciting accuracy of each character according to the character reciting characteristic curve of each character to obtain the acoustic score of each character may include: establishing a hidden Markov model to recite the character of each character The characteristic curve is converted into a characteristic sequence; and the reciting accuracy of each character is determined according to the characteristic sequence and the hidden Markov model, so as to obtain the acoustic score of each character.

根据本公开的实施例，在对文本背诵特征曲线进行分割之前，该方法还可以包括：对文本背诵特征曲线进行幅值压缩。According to an embodiment of the present disclosure, before segmenting the text recitation characteristic curve, the method may further include: performing amplitude compression on the text recitation characteristic curve.

根据本公开的实施例的文本背诵质量评估方法的上述步骤的各种具体实施方式前面已经作过详细描述，在此不再重复说明。Various specific implementations of the above steps of the method for evaluating the quality of text reciting according to the embodiments of the present disclosure have been described in detail above, and will not be repeated here.

显然，根据本公开的文本背诵质量评估方法的各个操作过程可以以存储在各种机器可读的存储介质中的计算机可执行程序的方式实现。Apparently, each operation process of the text reciting quality assessment method according to the present disclosure can be implemented in the form of computer executable programs stored in various machine-readable storage media.

而且，本公开的目的也可以通过下述方式实现：将存储有上述可执行程序代码的存储介质直接或者间接地提供给系统或设备，并且该系统或设备中的计算机或者中央处理单元(CPU)读出并执行上述程序代码。此时，只要该系统或者设备具有执行程序的功能，则本公开的实施方式不局限于程序，并且该程序也可以是任意的形式，例如，目标程序、解释器执行的程序或者提供给操作系统的脚本程序等。Moreover, the object of the present disclosure can also be achieved in the following manner: the storage medium storing the above-mentioned executable program code is directly or indirectly provided to a system or device, and the computer or central processing unit (CPU) in the system or device Read and execute the above program code. At this time, as long as the system or device has the function of executing the program, the embodiment of the present disclosure is not limited to the program, and the program can also be in any form, for example, an object program, a program executed by an interpreter, or a program provided to an operating system. script programs, etc.

上述这些机器可读存储介质包括但不限于：各种存储器和存储单元，半导体设备，磁盘单元例如光、磁和磁光盘，以及其它适于存储信息的介质等。The above-mentioned machine-readable storage media include, but are not limited to: various memories and storage units, semiconductor devices, magnetic disk units such as optical, magnetic and magneto-optical disks, and other media suitable for storing information, and the like.

另外，计算机通过连接到因特网上的相应网站，并且将依据本公开的计算机程序代码下载和安装到计算机中然后执行该程序，也可以实现本公开的技术方案。In addition, the technical solution of the present disclosure can also be realized by connecting the computer to a corresponding website on the Internet, downloading and installing the computer program code according to the present disclosure into the computer and then executing the program.

如图13所示，CPU 1301根据只读存储器(ROM)1302中存储的程序或从存储部分1308加载到随机存取存储器(RAM)1303的程序执行各种处理。在RAM 1303中，也根据需要存储当CPU 1301执行各种处理等等时所需的数据。CPU 1301、ROM 1302和RAM 1303经由总线1304彼此连接。输入/输出接口1305也连接到总线1304。As shown in FIG. 13 , a CPU 1301 executes various processing according to programs stored in a read only memory (ROM) 1302 or loaded from a storage section 1308 to a random access memory (RAM) 1303 . In the RAM 1303, data required when the CPU 1301 executes various processes and the like is also stored as necessary. The CPU 1301 , ROM 1302 , and RAM 1303 are connected to each other via a bus 1304 . The input/output interface 1305 is also connected to the bus 1304 .

下述部件连接到输入/输出接口1305：输入部分1306(包括键盘、鼠标等等)、输出部分1307(包括显示器，比如阴极射线管(CRT)、液晶显示器(LCD)等，以及扬声器等)、存储部分1308(包括硬盘等)、通信部分1309(包括网络接口卡比如LAN卡、调制解调器等)。通信部分1309经由网络比如因特网执行通信处理。根据需要，驱动器1310也可连接到输入/输出接口1305。可拆卸介质1311比如磁盘、光盘、磁光盘、半导体存储器等等根据需要被安装在驱动器1310上，使得从中读出的计算机程序根据需要被安装到存储部分1308中。The following components are connected to the input/output interface 1305: an input section 1306 (including a keyboard, a mouse, etc.), an output section 1307 (including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.), A storage section 1308 (including a hard disk, etc.), a communication section 1309 (including a network interface card such as a LAN card, a modem, etc.). The communication section 1309 performs communication processing via a network such as the Internet. A driver 1310 may also be connected to the input/output interface 1305 as needed. A removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1310 as necessary, so that a computer program read therefrom is installed into the storage section 1308 as necessary.

在通过软件实现上述系列处理的情况下，从网络比如因特网或存储介质比如可拆卸介质1311安装构成软件的程序。In the case of realizing the above-described series of processes by software, the programs constituting the software are installed from a network such as the Internet or a storage medium such as the removable medium 1311 .

本领域的技术人员应当理解，这种存储介质不局限于图13所示的其中存储有程序、与设备相分离地分发以向用户提供程序的可拆卸介质1311。可拆卸介质1311的例子包含磁盘(包含软盘(注册商标))、光盘(包含光盘只读存储器(CD-ROM)和数字通用盘(DVD))、磁光盘(包含迷你盘(MD)(注册商标))和半导体存储器。或者，存储介质可以是ROM 1302、存储部分1308中包含的硬盘等等，其中存有程序，并且与包含它们的设备一起被分发给用户。Those skilled in the art should understand that such a storage medium is not limited to the removable medium 1311 shown in FIG. 13 in which the program is stored and distributed separately from the device to provide the program to the user. Examples of the removable media 1311 include magnetic disks (including floppy disks (registered trademark)), optical disks (including compact disk read only memory (CD-ROM) and digital versatile disks (DVD)), magneto-optical disks (including )) and semiconductor memory. Alternatively, the storage medium may be the ROM 1302, a hard disk contained in the storage section 1308, or the like, in which programs are stored and distributed to users together with devices containing them.

在本公开的系统和方法中，显然，各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。并且，执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行，但是并不需要一定按照时间顺序执行。某些步骤可以并行或彼此独立地执行。In the systems and methods of the present disclosure, obviously, each component or each step can be decomposed and/or recombined. These decompositions and/or recombinations should be considered equivalents of the present disclosure. Also, the steps for performing the above series of processes can naturally be performed in chronological order in the order described, but need not necessarily be performed in chronological order. Certain steps may be performed in parallel or independently of each other.

以上虽然结合附图详细描述了本公开的实施例，但是应当明白，上面所描述的实施方式只是用于说明本公开，而并不构成对本公开的限制。对于本领域的技术人员来说，可以对上述实施方式作出各种修改和变更而没有背离本公开的实质和范围。因此，本公开的范围仅由所附的权利要求及其等效含义来限定。Although the embodiments of the present disclosure have been described in detail in conjunction with the accompanying drawings, it should be understood that the above-described embodiments are only used to illustrate the present disclosure, and are not intended to limit the present disclosure. Various modifications and changes can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present disclosure. Therefore, the scope of the present disclosure is limited only by the appended claims and their equivalents.

关于包括以上实施例的实施方式，还公开下述的附记：Regarding the implementation manner comprising the above embodiments, the following additional notes are also disclosed:

附记1.一种文本背诵质量评估装置，包括：Additional Note 1. A text reciting quality assessment device, comprising:

获取单元，用于获取通过背诵文本而产生的文本背诵特征曲线；An acquisition unit, configured to acquire a text recitation characteristic curve generated by reciting the text;

分割单元，用于对所述文本背诵特征曲线进行分割，以获取所述文本中每个字的字背诵特征曲线；A segmentation unit, configured to segment the text reciting characteristic curve, to obtain the character reciting characteristic curve of each character in the text;

韵律得分获取单元，用于将所述每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较，以获取每个字的韵律得分；A prosody score acquisition unit, used to compare the character recitation characteristic curve of each character with the character standard characteristic curve of each character, to obtain the prosody score of each character;

声学得分获取单元，用于根据所述每个字的字背诵特征曲线确定所述每个字的背诵准确度，以获取每个字的声学得分；以及an acoustic score acquisition unit, configured to determine the recitation accuracy of each character according to the character recitation characteristic curve of each character, so as to obtain the acoustic score of each character; and

评估单元，用于基于每个字的韵律得分和声学得分对所述文本的背诵质量进行评估。An evaluation unit is configured to evaluate the recitation quality of the text based on the prosody score and the acoustic score of each character.

附记2.根据附记1所述的装置，其中，所述评估单元包括：Supplement 2. The device according to Supplement 1, wherein the evaluation unit comprises:

字背诵得分获取单元，用于合并所述每个字的韵律得分和所述每个字的声学得分，以获取每个字的背诵得分；The word recitation score acquisition unit is used to combine the prosody score of each word and the acoustic score of each word to obtain the recitation score of each word;

文本背诵得分获取单元，用于合并所述文本中包含的所有字的背诵得分，以获取所述文本的背诵总得分；以及A text recitation score acquisition unit, configured to combine the recitation scores of all words contained in the text to obtain the total recitation score of the text; and

质量评估单元，用于根据所述背诵总得分对所述文本的背诵质量进行评估。A quality evaluation unit, configured to evaluate the recitation quality of the text according to the total recitation score.

附记3.根据附记1所述的装置，其中，所述分割单元包括：Supplement 3. The device according to Supplement 1, wherein the segmentation unit includes:

起止位置确定单元，用于根据所述文本背诵特征曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置，以获取所述文本中每个字的字背诵特征曲线。The start-stop position determining unit is configured to determine the start-stop position of each character in the text in the text-recite characteristic curve according to the text-recite characteristic curve, so as to obtain the character-recite characteristic curve of each character in the text.

附记4.根据附记3所述的装置，其中，所述起止位置确定单元包括：Supplement 4. The device according to Supplement 3, wherein the start-stop position determination unit includes:

计算单元，用于根据所述文本背诵特征曲线计算每一帧中存在进行背诵的概率和背诵的能量；A computing unit, configured to calculate the probability of reciting and the energy of reciting in each frame according to the text reciting characteristic curve;

基频曲线获取单元，用于根据所述文本背诵特征曲线获取所述文本中每个字的基频曲线；以及A fundamental frequency curve acquisition unit, configured to acquire the fundamental frequency curve of each word in the text according to the text reciting characteristic curve; and

确定单元，用于根据所述每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The determining unit is configured to determine the start and end positions of each character in the text in the text reciting characteristic curve according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each character.

附记5.根据附记4所述的装置，其中，所述确定单元包括：Supplement 5. The device according to Supplement 4, wherein the determining unit includes:

基频段数目确定单元，用于根据所述每个字的基频曲线确定基频段的数目，其中所述基频段是所述每个字的基频曲线中连续不间断的基频的片段；以及A base frequency band number determining unit, configured to determine the number of base frequency bands according to the base frequency curve of each word, wherein the base frequency band is a segment of continuous and uninterrupted base frequency in the base frequency curve of each word; and

第一确定单元，用于根据所述基频段的数目与所述文本的字数的关系来确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The first determination unit is configured to determine the start and end positions of each word in the text in the text reciting characteristic curve according to the relationship between the number of the fundamental frequency band and the number of words in the text.

附记6.根据附记4所述的装置，其中，所述确定单元包括：Supplement 6. The device according to Supplement 4, wherein the determining unit includes:

概率曲线确定单元，用于根据所述每一帧中存在进行背诵的概率确定概率曲线；a probability curve determining unit, configured to determine a probability curve according to the probability of recitation in each frame;

能量曲线确定单元，用于根据所述背诵的能量确定能量曲线；an energy curve determining unit, configured to determine an energy curve according to the energy of the recitation;

划分单元，用于根据所述概率曲线或者所述能量曲线中的谷点将所述文本背诵特征曲线划分为曲线段；以及A division unit, configured to divide the text recitation characteristic curve into curve segments according to valley points in the probability curve or the energy curve; and

第二确定单元，用于根据所述曲线段的数目与所述文本的字数的关系来确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The second determining unit is configured to determine the start and end positions of each word in the text in the text reciting characteristic curve according to the relationship between the number of curve segments and the number of words in the text.

附记7.根据附记4所述的装置，其中，所述确定单元包括：Supplement 7. The device according to Supplement 4, wherein the determining unit includes:

能量概率曲线确定单元，用于根据所述概率曲线和所述能量曲线确定能量概率曲线；an energy probability curve determining unit, configured to determine an energy probability curve according to the probability curve and the energy curve;

划分单元，用于根据所述能量概率曲线中的谷点将所述文本背诵特征曲线划分为曲线段；以及a division unit, configured to divide the text recitation characteristic curve into curve segments according to valley points in the energy probability curve; and

第三确定单元，用于根据所述曲线段的数目与所述文本的字数的关系来确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The third determining unit is configured to determine the start and end positions of each word in the text in the text reciting characteristic curve according to the relationship between the number of curve segments and the number of words in the text.

附记8.根据附记4所述的装置，其中，所述确定单元包括：Supplement 8. The device according to Supplement 4, wherein the determination unit includes:

基频能量概率曲线确定单元，用于根据所述概率曲线、所述能量曲线和所述每个字的基频曲线确定基频能量概率曲线；A fundamental frequency energy probability curve determining unit, configured to determine a fundamental frequency energy probability curve according to the probability curve, the energy curve and the fundamental frequency curve of each word;

划分单元，用于根据所述基频能量概率曲线中的谷点将所述文本背诵特征曲线划分为曲线段；以及A division unit, configured to divide the text recitation characteristic curve into curve segments according to valley points in the fundamental frequency energy probability curve; and

第四确定单元，用于根据所述曲线段的数目与所述文本的字数的关系来确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The fourth determining unit is configured to determine the start and end positions of each word in the text in the text reciting characteristic curve according to the relationship between the number of curve segments and the number of words in the text.

附记9.根据附记1所述的装置，其中，所述韵律得分获取单元包括：Supplement 9. The device according to Supplement 1, wherein the prosody score acquisition unit includes:

第一转换单元，用于将所述每个字的字背诵特征曲线转换成背诵基频序列；The first conversion unit is used to convert the character recitation characteristic curve of each character into a recitation fundamental frequency sequence;

第二转换单元，用于将所述每个字的字标准特征曲线转换成标准基频序列；以及The second conversion unit is used to convert the word standard characteristic curve of each word into a standard fundamental frequency sequence; and

比较单元，用于将所述背诵基频序列与所述标准基频序列进行比较，以获取所述每个字的韵律得分。A comparing unit, configured to compare the recited base frequency sequence with the standard base frequency sequence to obtain the prosody score of each word.

附记10.根据附记1所述的装置，其中，所述声学得分获取单元包括：Supplement 10. The device according to Supplement 1, wherein the acoustic score acquisition unit includes:

建模单元，用于建立隐马尔可夫模型，以将所述每个字的字背诵特征曲线转换成特征序列；以及A modeling unit is used to set up a Hidden Markov Model to convert the character reciting characteristic curve of each character into a characteristic sequence; and

准确度确定单元，用于根据所述特征序列和所述隐马尔可夫模型确定所述每个字的背诵准确度，以获取所述每个字的声学得分。The accuracy determination unit is configured to determine the reciting accuracy of each character according to the feature sequence and the hidden Markov model, so as to obtain the acoustic score of each character.

附记11.根据附记1所述的装置，进一步包括：Supplement 11. The device according to Supplement 1, further comprising:

压缩单元，用于对所述文本背诵特征曲线进行幅值压缩。A compression unit, configured to perform amplitude compression on the text recitation characteristic curve.

附记12.一种基于由字组成的文本以及所述文本的标准读音来评估对所述文本进行背诵的质量的方法，包括：Additional Note 12. A method for evaluating the quality of reciting the text based on the text composed of characters and the standard pronunciation of the text, comprising:

获取通过所述背诵而产生的文本背诵特征曲线；Obtaining the text recitation characteristic curve produced by the recitation;

对所述文本背诵特征曲线进行分割，以获取所述文本中每个字的字背诵特征曲线；Segmenting the text reciting characteristic curve to obtain the word reciting characteristic curve of each word in the described text;

将所述每个字的字背诵特征曲线与每个字的字标准特征曲线进行比较，以获取每个字的韵律得分；The word recitation characteristic curve of described each word is compared with the word standard characteristic curve of each word, to obtain the prosody score of each word;

根据所述每个字的字背诵特征曲线确定所述每个字的背诵准确度，以获取每个字的声学得分；以及determining the reciting accuracy of each character according to the character reciting characteristic curve of each character, so as to obtain the acoustic score of each character; and

基于每个字的韵律得分和声学得分对所述文本的背诵质量进行评估。The recitation quality of the text was assessed based on prosody scores and acoustic scores for each word.

附记13.根据附记12所述的方法，其中，基于每个字的韵律得分和声学得分对所述文本的背诵质量进行评估包括：Supplementary Note 13. The method according to Supplementary Note 12, wherein evaluating the recitation quality of the text based on the prosody score and the acoustic score of each word comprises:

合并所述每个字的韵律得分和所述每个字的声学得分，以获取每个字的背诵得分；Combining the prosody score of each character and the acoustic score of each character to obtain the recitation score of each character;

合并所述文本中包含的所有字的背诵得分，以获取所述文本的背诵总得分；以及combining the recitation scores for all words contained in the text to obtain a total recitation score for the text; and

根据所述背诵总得分对所述背诵的质量进行评估。The quality of the recitation is evaluated according to the total score of the recitation.

附记14.根据附记12所述的方法，其中，对所述文本背诵特征曲线进行分割以获取所述文本中每个字的字背诵特征曲线包括：Supplementary Note 14. The method according to Supplementary Note 12, wherein segmenting the text reciting characteristic curve to obtain the character reciting characteristic curve of each character in the text comprises:

根据所述文本背诵特征曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置，以获取所述文本中每个字的字背诵特征曲线。Determining the start and end positions of each character in the text in the text reciting characteristic curve according to the text reciting characteristic curve, so as to obtain the character reciting characteristic curve of each character in the text.

附记15.根据附记14所述的方法，其中，根据所述文本背诵特征曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置包括：Supplementary Note 15. The method according to Supplementary Note 14, wherein, according to the text reciting characteristic curve, determining the start and end positions of each character in the text in the text reciting characteristic curve comprises:

根据所述文本背诵特征曲线计算每一帧中存在进行背诵的概率和背诵的能量；Calculating the probability and the energy of reciting in each frame according to the text reciting characteristic curve;

根据所述文本背诵特征曲线获取所述文本中每个字的基频曲线；以及Obtaining the fundamental frequency curve of each word in the text according to the text reciting characteristic curve; and

根据所述每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The starting and ending positions of each word in the text in the text reciting characteristic curve are determined according to the probability and energy of reciting in each frame and the fundamental frequency curve of each word.

附记16.根据附记15所述的方法，其中，根据所述每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置包括：Supplementary Note 16. The method according to Supplementary Note 15, wherein, according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each word, it is determined that each word in the text is in the The starting and ending positions in the text reciting characteristic curve include:

根据所述每个字的基频曲线确定基频段的数目，其中所述基频段是所述每个字的基频曲线中连续不间断的基频的片段；以及Determine the number of fundamental frequency segments according to the fundamental frequency curve of each word, wherein the fundamental frequency segment is a continuous uninterrupted fundamental frequency segment in the fundamental frequency curve of each word; and

根据所述基频段的数目与所述文本的字数的关系来确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The starting and ending positions of each word in the text in the text reciting characteristic curve are determined according to the relationship between the number of the fundamental frequency segment and the number of words in the text.

附记17.根据附记15所述的方法，其中，根据所述每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置包括：Supplementary Note 17. The method according to Supplementary Note 15, wherein, according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each word, it is determined that each word in the text is in the The starting and ending positions in the text reciting characteristic curve include:

根据所述每一帧中存在进行背诵的概率确定概率曲线；Determining a probability curve according to the probability of reciting in each frame;

根据所述背诵的能量确定能量曲线；determining an energy curve according to the energy of the recitation;

根据所述概率曲线或者所述能量曲线中的谷点将所述文本背诵特征曲线划分为曲线段；以及dividing the text recitation characteristic curve into curve segments according to valley points in the probability curve or the energy curve; and

根据所述曲线段的数目与所述文本的字数的关系来确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置。The start and end positions of each word in the text in the text reciting characteristic curve are determined according to the relationship between the number of curve segments and the number of words in the text.

附记18.根据附记15所述的方法，其中，根据所述每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置包括：Supplementary Note 18. The method according to Supplementary Note 15, wherein, according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each word, it is determined that each word in the text is in the The starting and ending positions in the text reciting characteristic curve include:

根据所述概率曲线和所述能量曲线确定能量概率曲线；determining an energy probability curve based on the probability curve and the energy curve;

根据所述能量概率曲线中的谷点将所述文本背诵特征曲线划分为曲线段；以及dividing the text recitation characteristic curve into curve segments according to valley points in the energy probability curve; and

附记19.根据附记15所述的方法，其中，根据所述每一帧中存在进行背诵的概率和背诵的能量以及每个字的基频曲线确定所述文本中的每个字在所述文本背诵特征曲线中的起止位置包括：Supplementary Note 19. The method according to Supplementary Note 15, wherein, according to the probability of reciting and the energy of reciting in each frame and the fundamental frequency curve of each word, it is determined that each word in the text is in the The starting and ending positions in the text reciting characteristic curve include:

根据所述概率曲线、所述能量曲线和所述每个字的基频曲线确定基频能量概率曲线；Determine the fundamental frequency energy probability curve according to the probability curve, the energy curve and the fundamental frequency curve of each word;

根据所述基频能量概率曲线中的谷点将所述文本背诵特征曲线划分为曲线段；以及dividing the text recitation characteristic curve into curve segments according to valley points in the fundamental frequency energy probability curve; and

附记20.一种机器可读存储介质，其上携带有包括存储在其中的机器可读指令代码的程序产品，其中，所述指令代码当由计算机读取和执行时，能够使所述计算机执行根据附记12-19中任何一项所述的方法。Supplementary Note 20. A machine-readable storage medium carrying thereon a program product including machine-readable instruction codes stored therein, wherein the instruction codes, when read and executed by a computer, enable the computer to Perform the method according to any one of Supplements 12-19.