CN115273900A

Movatterモバイル変換

Info

Publication number: CN115273900A
Application number: CN202210892278.9A
Authority: CN
Inventors: 黄毅华; 武巍; 陈秀敏; 刘瑞强; 许向东
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-11-01

Abstract

Translated fromChinese

本公开提供了一种语音质量评估方法及装置、电子设备、存储介质，涉及通信技术领域。该方法包括：获取原始语音数据，以及获取原始语音数据在经过通信网络传输后输出的有损语音数据；对原始语音数据和有损语音数据进行语音识别分析，得到语音识别分析结果；根据语音识别分析结果评估通信网络的语音质量分数。本公开实施例的技术方案可以对通信网络语音质量进行语音识别评估，评估结果更加符合用户听觉感知，提升语音质量分数的合理性以及准确性，提升使用通信网络进行语音通话的用户体验。

The present disclosure provides a voice quality evaluation method and device, an electronic device, and a storage medium, and relates to the technical field of communications. The method includes: acquiring original speech data, and acquiring lossy speech data outputted by the original speech data after being transmitted through a communication network; performing speech recognition analysis on the original speech data and the lossy speech data to obtain a speech recognition analysis result; The analysis results evaluate the speech quality score of the communication network. The technical solutions of the embodiments of the present disclosure can perform speech recognition evaluation on the voice quality of the communication network, and the evaluation results are more in line with the user's auditory perception, improve the rationality and accuracy of the voice quality score, and improve the user experience of using the communication network for voice calls.

Description

Translated fromChinese

语音质量评估方法及装置、电子设备、存储介质Voice quality evaluation method and device, electronic equipment, storage medium

技术领域technical field

本公开涉及通信技术领域，具体而言，涉及一种语音质量评估方法、语音质量评估装置、电子设备以及计算机可读存储介质。The present disclosure relates to the technical field of communications, and in particular, to a voice quality assessment method, a voice quality assessment device, electronic equipment, and a computer-readable storage medium.

背景技术Background technique

随着通信技术的不断发展，语音通话业务得到了越来越广泛的应用，为了更好的进行语音信号的传输，以满足终端用户对语音业务的良好体验，需要对通信网络的语音业务质量进行评估测试。With the continuous development of communication technology, the voice call service has been more and more widely used. In order to better transmit the voice signal and meet the good experience of the end user for the voice service, it is necessary to carry out the voice service quality of the communication network. assessment test.

相关技术中，通信网络的语音业务质量一般是通过使用平均意见得分(MeanOpinion Score，MOS)算法进行评估，使用的是声音的波形衰减评分技术，即使用专业仪表对通信网络传输后的退化信号通过电平调整、滤波等处理环节，变换后与原始信号进行比较，再通过认知模型，映射到对主观平均意见分的预测。但是，该方案中，使用的是主观模型，无法与用户感知相关联，得到的评估结果的合理性与准确性较差。In the related art, the voice service quality of the communication network is generally evaluated by using the Mean Opinion Score (MOS) algorithm, which uses the sound waveform attenuation scoring technology, that is, the degraded signal transmitted by the communication network is passed through a professional instrument. Level adjustment, filtering and other processing links are compared with the original signal after transformation, and then mapped to the prediction of the subjective average opinion score through the cognitive model. However, in this solution, a subjective model is used, which cannot be associated with user perception, and the rationality and accuracy of the obtained evaluation results are poor.

需要说明的是，在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解，因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only for enhancing the understanding of the background of the present disclosure, and therefore may include information that does not constitute the prior art known to those of ordinary skill in the art.

发明内容Contents of the invention

本公开实施例的目的在于提供一种语音质量评估方法、语音质量评估装置、电子设备以及计算机可读存储介质，进而至少在一定程度上克服相关技术方案中评估语音质量时未考虑用户听觉感知的问题，提升语音质量分数的合理性与准确性。The purpose of the embodiments of the present disclosure is to provide a voice quality assessment method, a voice quality assessment device, an electronic device, and a computer-readable storage medium, so as to overcome at least to a certain extent the problem of not considering the user's auditory perception when evaluating voice quality in related technical solutions Improve the rationality and accuracy of voice quality scores.

本公开的其他特性和优点将通过下面的详细描述变得显然，或部分地通过本公开的实践而习得。Other features and advantages of the present disclosure will become apparent from the following detailed description, or in part, be learned by practice of the present disclosure.

根据本公开实施例的第一方面，提供了一种语音质量评估方法，包括：According to the first aspect of the embodiments of the present disclosure, a voice quality assessment method is provided, including:

获取原始语音数据，以及获取所述原始语音数据在经过通信网络传输后输出的有损语音数据；Obtaining original voice data, and obtaining lossy voice data outputted after the original voice data is transmitted through a communication network;

对所述原始语音数据和所述有损语音数据进行语音识别分析，得到语音识别分析结果；performing speech recognition analysis on the original speech data and the lossy speech data to obtain a speech recognition analysis result;

根据所述语音识别分析结果评估所述通信网络的语音质量分数。A speech quality score for the communication network is evaluated based on the speech recognition analysis results.

在本公开的一些示例实施例中，基于前述方案，所述对所述原始语音数据和所述有损语音数据进行语音识别分析，还包括：In some exemplary embodiments of the present disclosure, based on the aforementioned solution, performing speech recognition analysis on the original speech data and the lossy speech data further includes:

对所述原始语音数据和所述有损语音数据进行网络传输变化分析，得到网络传输变化分析结果；Performing network transmission change analysis on the original voice data and the lossy voice data to obtain a network transmission change analysis result;

对所述原始语音数据和所述有损语音数据进行语音编码分析，得到语音编码分析结果。Speech coding analysis is performed on the original speech data and the lossy speech data to obtain a speech coding analysis result.

在本公开的一些示例实施例中，基于前述方案，所述根据语音识别分析结果评估所述通信网络的语音质量分数，包括：In some exemplary embodiments of the present disclosure, based on the foregoing solution, the evaluating the voice quality score of the communication network according to the voice recognition analysis result includes:

根据所述网络传输变化分析结果、所述语音编码分析结果以及所述语音识别分析结果中的一种或者多种组合评估所述通信网络的语音质量分数。Evaluate the voice quality score of the communication network according to one or more combinations of the network transmission change analysis results, the voice coding analysis results, and the voice recognition analysis results.

在本公开的一些示例实施例中，基于前述方案，所述网络传输变化分析结果包括传输时延数据，所述对所述原始语音数据和所述有损语音数据进行网络传输变化分析，得到网络传输变化分析结果，包括：In some exemplary embodiments of the present disclosure, based on the foregoing solution, the network transmission change analysis result includes transmission delay data, and the network transmission change analysis is performed on the original voice data and the lossy voice data to obtain a network Transmit change analysis results, including:

确定所述有损语音数据中的标识位置对应的第一时间点；determining a first time point corresponding to the identified position in the lossy speech data;

确定所述标识位置在所述原始语音数据中对应的第二时间点；determining a second time point corresponding to the marker position in the original voice data;

根据所述第一时间点和所述第二时间点确定所述原始语音数据和所述有损语音数据之间的传输时延数据。Determining transmission delay data between the original voice data and the lossy voice data according to the first time point and the second time point.

在本公开的一些示例实施例中，基于前述方案，所述确定所述有损语音数据中的标识位置，包括：In some exemplary embodiments of the present disclosure, based on the aforementioned solution, the determining the location of the marker in the lossy speech data includes:

根据预设时长在所述有损语音数据中确定目标波形，将所述目标波形在所述有损语音数据中的位置作为所述标识位置；或者determining a target waveform in the lossy speech data according to a preset duration, and using the position of the target waveform in the lossy speech data as the identification position; or

在所述原始语音数据中插入标识音频，将所述标识音频在所述有损语音数据中的位置作为所述标识位置。Inserting the identification audio into the original speech data, using the position of the identification audio in the lossy speech data as the identification position.

在本公开的一些示例实施例中，基于前述方案，所述网络传输变化分析结果包括语音幅度变化数据，所述对所述原始语音数据和所述有损语音数据进行网络传输变化分析，得到网络传输变化分析结果，包括：In some exemplary embodiments of the present disclosure, based on the foregoing solution, the network transmission change analysis result includes voice amplitude change data, and the network transmission change analysis is performed on the original voice data and the lossy voice data to obtain a network Transmit change analysis results, including:

确定所述有损语音数据对应的第一语音幅度；determining the first voice amplitude corresponding to the lossy voice data;

确定所述原始语音数据对应的第二语音幅度；determining the second voice amplitude corresponding to the original voice data;

根据所述第一语音幅度以及所述第二语音幅度确定所述原始语音数据和所述有损语音数据之间的语音幅度变化数据。Determine voice amplitude change data between the original voice data and the lossy voice data according to the first voice amplitude and the second voice amplitude.

在本公开的一些示例实施例中，基于前述方案，所述网络传输变化分析结果包括声音波形损失数据，所述对所述原始语音数据和所述有损语音数据进行网络传输变化分析，得到网络传输变化分析结果，包括：In some exemplary embodiments of the present disclosure, based on the foregoing solution, the network transmission change analysis result includes sound waveform loss data, and the network transmission change analysis is performed on the original voice data and the lossy voice data to obtain a network Transmit change analysis results, including:

对所述原始语音数据的声音波形和所述有损语音数据的声音波形进行包络检测，确定所述原始语音数据和所述有损语音数据之间的声音波形损失数据；和/或performing envelope detection on the sound waveform of the original speech data and the sound waveform of the lossy speech data, and determining sound waveform loss data between the original speech data and the lossy speech data; and/or

获取所述有损语音数据对应的丢包参数和抖动参数，并基于所述丢包参数和抖动参数确定所述原始语音数据和所述有损语音数据之间的声音波形损失数据。Acquiring packet loss parameters and jitter parameters corresponding to the lossy voice data, and determining sound waveform loss data between the original voice data and the lossy voice data based on the packet loss parameters and jitter parameters.

在本公开的一些示例实施例中，基于前述方案，所述语音编码分析结果包括编码方案分数，所述对所述原始语音数据和所述有损语音数据进行语音编码分析，得到语音编码分析结果，包括：In some exemplary embodiments of the present disclosure, based on the aforementioned solution, the speech coding analysis result includes a coding scheme score, and the speech coding analysis is performed on the original speech data and the lossy speech data to obtain the speech coding analysis result ,include:

确定所述原始语音数据对应的第一编码方式以及第一编码速率；determining a first encoding method and a first encoding rate corresponding to the original speech data;

确定所述有损语音数据对应的第二编码方式以及第二编码速率；determining a second encoding method and a second encoding rate corresponding to the lossy speech data;

基于所述第一编码方式、所述第一编码速率、所述第二编码方式和所述第二编码速率，确定所述原始语音数据和所述有损语音数据之间的编码方案分数。A coding scheme score between the original speech data and the lossy speech data is determined based on the first coding scheme, the first coding rate, the second coding scheme, and the second coding rate.

在本公开的一些示例实施例中，基于前述方案，所述语音识别分析结果包括语音识别准确率，所述对所述原始语音数据和所述有损语音数据进行语音识别分析，得到语音识别分析结果，包括：In some exemplary embodiments of the present disclosure, based on the aforementioned solutions, the speech recognition analysis results include speech recognition accuracy, and the speech recognition analysis is performed on the original speech data and the lossy speech data to obtain speech recognition analysis Results, including:

对所述有损语音数据进行语音识别，确定所述有损语音数据对应的有损语音文本；performing speech recognition on the lossy speech data, and determining a lossy speech text corresponding to the lossy speech data;

获取所述原始语音数据对应的原始语音文本，并将所述有损语音文本与所述原始语音文本进行比对，确定所述原始语音数据和所述有损语音数据之间的语音识别准确率。Acquire the original speech text corresponding to the original speech data, and compare the lossy speech text with the original speech text, and determine the speech recognition accuracy rate between the original speech data and the lossy speech data .

在本公开的一些示例实施例中，基于前述方案，所述方法还包括：In some exemplary embodiments of the present disclosure, based on the aforementioned solution, the method further includes:

根据预设的评估周期确定所述通信网络的语音质量分数。The voice quality score of the communication network is determined according to a preset evaluation period.

根据本公开实施例的第二方面，提供了一种语音质量评估装置，包括：According to a second aspect of an embodiment of the present disclosure, there is provided a voice quality assessment device, including:

语音数据获取模块，用于获取原始语音数据，以及获取所述原始语音数据在经过通信网络传输后输出的有损语音数据；A voice data acquisition module, configured to acquire original voice data, and to acquire lossy voice data outputted after the original voice data is transmitted through a communication network;

语音识别分析模块，用于对所述原始语音数据和所述有损语音数据进行语音识别分析，得到语音识别分析结果；A voice recognition analysis module, configured to perform voice recognition analysis on the original voice data and the lossy voice data, to obtain a voice recognition analysis result;

语音质量评估模块，用于根据所述语音识别分析结果评估所述通信网络的语音质量分数。A speech quality evaluation module, configured to evaluate the speech quality score of the communication network according to the speech recognition analysis result.

在本公开的一个示例实施例中，语音质量评估装置还可以包括：In an example embodiment of the present disclosure, the voice quality assessment device may further include:

网络传输变化分析模块，用于对所述原始语音数据和所述有损语音数据进行网络传输变化分析，得到网络传输变化分析结果；A network transmission change analysis module, configured to perform network transmission change analysis on the original voice data and the lossy voice data, to obtain network transmission change analysis results;

语音编码分析模块，用于对所述原始语音数据和所述有损语音数据进行语音编码分析，得到语音编码分析结果。A speech coding analysis module, configured to perform speech coding analysis on the original speech data and the lossy speech data, and obtain a speech coding analysis result.

在本公开的一个示例实施例中，语音质量评估模块还可以用于：In an example embodiment of the present disclosure, the speech quality evaluation module can also be used for:

在本公开的一个示例实施例中，基于前述方案，网络传输变化分析结果可以包括传输时延数据，网络传输变化分析模块可以用于：In an example embodiment of the present disclosure, based on the foregoing solution, the network transmission change analysis result may include transmission delay data, and the network transmission change analysis module may be used to:

在本公开的一个示例实施例中，基于前述方案，网络传输变化分析模块还可以用于：In an example embodiment of the present disclosure, based on the foregoing solutions, the network transmission change analysis module can also be used to:

在本公开的一个示例实施例中，基于前述方案，网络传输变化分析结果可以包括语音幅度变化数据，网络传输变化分析模块还可以用于：In an example embodiment of the present disclosure, based on the foregoing solution, the network transmission change analysis result may include voice amplitude change data, and the network transmission change analysis module may also be used for:

在本公开的一个示例实施例中，基于前述方案，网络传输变化分析结果包括声音波形损失数据，网络传输变化分析模块还可以用于：In an exemplary embodiment of the present disclosure, based on the foregoing solution, the network transmission change analysis result includes sound waveform loss data, and the network transmission change analysis module can also be used to:

在本公开的一个示例实施例中，基于前述方案，语音编码分析结果可以包括编码方案分数，语音编码分析模块可以用于：In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the speech coding analysis result may include a coding scheme score, and the speech coding analysis module may be used for:

在本公开的一个示例实施例中，基于前述方案，语音识别分析结果可以包括语音识别准确率，语音识别分析模块可以用于：In an exemplary embodiment of the present disclosure, based on the foregoing solution, the speech recognition analysis result may include speech recognition accuracy, and the speech recognition analysis module may be used for:

在本公开的一个示例实施例中，基于前述方案，语音质量评估装置可以用于：In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice quality assessment device may be used to:

根据本公开实施例的第三方面，提供了一种电子设备，包括：处理器；以及存储器，所述存储器上存储有计算机可读指令，所述计算机可读指令被所述处理器执行时实现上述任意一项所述的语音质量评估方法。According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; and a memory, on which computer-readable instructions are stored, and the computer-readable instructions are executed by the processor to implement The voice quality assessment method described in any one of the above.

根据本公开实施例的第四方面，提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现根据上述任意一项所述的语音质量评估方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the voice quality assessment method according to any one of the foregoing is implemented.

本公开实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

本公开的示例实施例中的语音质量评估方法，可以获取在发送终端播放的原始语音数据，以及获取原始语音数据在经过通信网络传输后输出到接收终端的有损语音数据，进而可以对原始语音数据和有损语音数据进行网络传输变化分析，得到网络传输变化分析结果；可以对原始语音数据和有损语音数据进行语音编码分析，得到语音编码分析结果；对原始语音数据和有损语音数据进行语音识别分析，得到语音识别分析结果；最后可以根据网络传输变化分析结果、语音编码分析结果以及语音识别分析结果确定通信网络的语音质量分数。一方面，通过对通信网络传输后的有损语音数据进行语音识别分析，将评估结果与用户听觉感知相关联，并且结合通信网络的网络传输影响，以及不同语音编码方式对人耳感知的影响，对通信网络的语音质量进行综合评估，进一步提升评估结果与用户感知的关联性，提高评估结果的合理性与准确性；另一方面，相比于相关技术中通过波形衰减评分技术对语音质量进行评估，本方案并不需要对整段语音的波形进行评估，语音质量分数的输出效率更高，响应更快，有效提升语音质量评估效率。The voice quality assessment method in the exemplary embodiment of the present disclosure can acquire the original voice data played by the sending terminal, and acquire the lossy voice data output to the receiving terminal after the original voice data is transmitted through the communication network, and then can analyze the original voice Data and lossy voice data can be analyzed for network transmission changes, and the results of network transmission change analysis can be obtained; original voice data and lossy voice data can be analyzed for voice coding, and voice coding analysis results can be obtained; original voice data and lossy voice data can be analyzed Speech recognition analysis to obtain speech recognition analysis results; finally, the speech quality score of the communication network can be determined according to the network transmission change analysis results, speech coding analysis results, and speech recognition analysis results. On the one hand, by performing speech recognition analysis on the lossy speech data transmitted by the communication network, the evaluation result is associated with the user's auditory perception, and combined with the influence of the network transmission of the communication network and the impact of different speech coding methods on human ear perception, Comprehensively evaluate the voice quality of the communication network, further enhance the correlation between the evaluation results and user perception, and improve the rationality and accuracy of the evaluation results; Evaluation, this solution does not need to evaluate the waveform of the entire speech, the output efficiency of the speech quality score is higher, the response is faster, and the speech quality evaluation efficiency is effectively improved.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。在附图中：The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure. Apparently, the drawings in the following description are only some embodiments of the present disclosure, and those skilled in the art can obtain other drawings according to these drawings without creative efforts. In the attached picture:

图1示意性示出了根据本公开的一些实施例的相关技术方案语音质量评估的流程示意图；Fig. 1 schematically shows a schematic flow chart of voice quality assessment of a related technical solution according to some embodiments of the present disclosure;

图2示意性示出了根据本公开的一些实施例的语音质量评估方法的流程示意图；Fig. 2 schematically shows a schematic flowchart of a voice quality assessment method according to some embodiments of the present disclosure;

图3示意性示出了根据本公开的一些实施例的确定通信网络的传输时延数据的原理示意图；Fig. 3 schematically shows a schematic diagram of the principle of determining transmission delay data of a communication network according to some embodiments of the present disclosure;

图4示意性示出了根据本公开的一些实施例的基于语音识别实现语音质量评估的流程示意图；Fig. 4 schematically shows a schematic flow diagram of implementing speech quality assessment based on speech recognition according to some embodiments of the present disclosure;

图5示意性示出了根据本公开的一些实施例的语音质量评估装置的示意图；Fig. 5 schematically shows a schematic diagram of a voice quality assessment device according to some embodiments of the present disclosure;

图6示意性示出了根据本公开的一些实施例的电子设备的计算机系统的结构示意图；Fig. 6 schematically shows a schematic structural diagram of a computer system of an electronic device according to some embodiments of the present disclosure;

图7示意性示出了根据本公开的一些实施例的计算机可读存储介质的示意图。FIG. 7 schematically illustrates a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure.

在附图中，相同或对应的标号表示相同或对应的部分。In the drawings, the same or corresponding reference numerals denote the same or corresponding parts.

具体实施方式Detailed ways

现在将参考附图更全面地描述示例实施方式。然而，示例实施方式能够以多种形式实施，且不应被理解为限于在此阐述的范例；相反，提供这些实施方式使得本公开将更加全面和完整，并将示例实施方式的构思全面地传达给本领域的技术人员。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of example embodiments to those skilled in the art.

此外，所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中，提供许多具体细节从而给出对本公开的实施例的充分理解。然而，本领域技术人员将意识到，可以实践本公开的技术方案而没有特定细节中的一个或更多，或者可以采用其它的方法、组元、装置、步骤等。在其它情况下，不详细示出或描述公知方法、装置、实现或者操作以避免模糊本公开的各方面。Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details, or other methods, components, means, steps, etc. may be employed. In other instances, well-known methods, apparatus, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

此外，附图仅为示意性图解，并非一定是按比例绘制。附图中所示的方框图仅仅是功能实体，不一定必须与物理上独立的实体相对应。即，可以采用软件形式来实现这些功能实体，或在一个或多个硬件模块或集成电路中实现这些功能实体，或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。Furthermore, the drawings are merely schematic illustrations and are not necessarily drawn to scale. The block diagrams shown in the drawings are merely functional entities and do not necessarily correspond to physically separate entities. That is, these functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices entity.

相关技术方案中，通信网络的语音业务质量一般是通过使用平均意见得分MOS算法进行评估，使用的是声音的波形衰减评分技术。图1示意性示出了根据本公开的一些实施例的相关技术方案语音质量评估的流程示意图，参考图1所示，首先基于各种素材资源得到原始语音数据110，然后通过播放设备在发送终端120播放原始语音数据110，原始语音数据110经过发送终端120接入的通信网络130，发送到接收终端140，并由接收终端140输出有损语音数据150，最后通过平均意见得分MOS算法对原始语音数据110以及有损语音数据150的音频信号波形进行比对，确定波形对比评分160，并将该波形对比评分160作为通信网络的语音质量分数。In related technical solutions, the voice service quality of the communication network is generally evaluated by using the Mean Opinion Score MOS algorithm, which uses the voice waveform attenuation scoring technology. Fig. 1 schematically shows a schematic flow diagram of a voice quality assessment of a related technical solution according to some embodiments of the present disclosure. Referring to Fig. 1, theoriginal voice data 110 is first obtained based on various material resources, and then the playback device is used in the sendingterminal 120 plays theoriginal voice data 110, theoriginal voice data 110 is sent to the receivingterminal 140 through thecommunication network 130 accessed by the sendingterminal 120, and thelossy voice data 150 is output by the receivingterminal 140, and finally the original voice data is analyzed by the average opinion score MOS algorithm The audio signal waveforms of thedata 110 and thelossy voice data 150 are compared, awaveform comparison score 160 is determined, and thewaveform comparison score 160 is used as a voice quality score of the communication network.

但是，该方案中，通过认知模型，将原始语音数据110以及有损语音数据150的音频信号波形映射到对主观平均意见分的预测，由于使用的是主观认知模型，在语音质量评估过程无法与用户听觉感知相关联，得到的评估结果的合理性与准确性较差。However, in this solution, the audio signal waveforms of theoriginal speech data 110 and thelossy speech data 150 are mapped to the prediction of the subjective average opinion score through the cognitive model. It cannot be associated with the user's auditory perception, and the rationality and accuracy of the evaluation results obtained are poor.

基于相关技术中存在的一个或者多个问题，在本示例实施例中，首先提供了一种语音质量评估方法，该语音质量评估方法可以应用于IMS(IP Multimedia Subsystem，IP多媒体系统)，IMS是解决移动接入与固网接入融合，引入语音、数据、视频三重融合等差异化业务的重要方式。图2示意性示出了根据本公开的一些实施例的语音质量评估方法的流程示意图。参考图2所示，该语音质量评估方法可以包括以下步骤：Based on one or more problems existing in related technologies, in this exemplary embodiment, a voice quality assessment method is firstly provided, which can be applied to IMS (IP Multimedia Subsystem, IP Multimedia System), and IMS is It is an important way to solve the convergence of mobile access and fixed network access, and to introduce differentiated services such as triple convergence of voice, data, and video. Fig. 2 schematically shows a flow diagram of a voice quality assessment method according to some embodiments of the present disclosure. Shown in Fig. 2 with reference to, this speech quality assessment method can comprise the following steps:

步骤S210，获取原始语音数据，以及获取所述原始语音数据在经过通信网络传输后输出的有损语音数据；Step S210, acquiring original voice data, and acquiring lossy voice data outputted after the original voice data is transmitted through a communication network;

步骤S220，对所述原始语音数据和所述有损语音数据进行语音识别分析，得到语音识别分析结果；Step S220, performing speech recognition analysis on the original speech data and the lossy speech data to obtain a speech recognition analysis result;

步骤S230，根据所述语音识别分析结果评估所述通信网络的语音质量分数。Step S230, evaluating the voice quality score of the communication network according to the voice recognition analysis result.

根据本示例实施例中的语音质量评估方法，一方面，通过对通信网络传输后的有损语音数据进行语音识别分析，将评估结果与用户感知相关联，并且结合通信网络的网络传输影响，以及不同语音编码方式对人耳感知的影响，对通信网络的语音质量进行综合评估，进一步提升评估结果与用户感知的关联性，提高评估结果的合理性与准确性；另一方面，相比于相关技术中通过波形衰减评分技术对语音质量进行评估，本方案并不需要对整段语音的波形进行评估，语音质量分数的输出效率更高，响应更快，有效提升语音质量评估效率。According to the speech quality evaluation method in this exemplary embodiment, on the one hand, by performing speech recognition and analysis on the lossy speech data transmitted by the communication network, the evaluation result is associated with user perception, and combined with the influence of network transmission of the communication network, and The influence of different speech coding methods on human ear perception, comprehensively evaluate the speech quality of communication networks, further improve the correlation between evaluation results and user perception, and improve the rationality and accuracy of evaluation results; on the other hand, compared with related In the technology, the voice quality is evaluated by the waveform attenuation scoring technology. This solution does not need to evaluate the waveform of the entire voice. The output efficiency of the voice quality score is higher, the response is faster, and the voice quality evaluation efficiency is effectively improved.

下面，将对本示例实施例中的语音质量评估方法进行说明。Next, the voice quality evaluation method in this exemplary embodiment will be described.

在步骤S210中，获取原始语音数据，以及获取所述原始语音数据在经过通信网络传输后输出的有损语音数据。In step S210, the original voice data is acquired, and the lossy voice data outputted after the original voice data is transmitted through the communication network is acquired.

在本公开的一个示例实施例中，原始语音数据是指通过相关的播放设备在发送终端播放的样本语音，原始语音数据可以是任意类型的语音数据，例如，原始语音数据可以是中文、英文、法文等任意国家语言对应的语音数据，也可以是男性、女性对应的语音数据，也可以是任意年龄段的语音数据，当然，还可以是其他类型的、具有不同音色/音质的样本语音数据，本示例实施例对此不作特殊限定。In an example embodiment of the present disclosure, the original voice data refers to the sample voice played on the sending terminal through the relevant playback device, the original voice data can be any type of voice data, for example, the original voice data can be Chinese, English, The speech data corresponding to any national language such as French, can also be the speech data corresponding to men and women, or it can be the speech data of any age group, of course, it can also be other types of sample speech data with different timbre/quality, This example embodiment does not specifically limit it.

发送终端是指在评估语音质量阶段设置的、用于模拟通话过程中的说话用户对应的终端，发送终端可以是任意类型的能够接入通信网络实现语音通话的终端设备，例如，发送终端可以是智能手机、平板电脑、CPE(Customer Premise Equipment，用于访问以太网或通常访问运营商网络上的服务的任何连接设备)，本示例实施例对接入通信网络的发送终端的类型不做特殊限定。The sending terminal refers to the terminal corresponding to the speaking user during the simulated call, which is set during the voice quality assessment stage. The sending terminal can be any type of terminal device that can access the communication network to realize a voice call. For example, the sending terminal can be Smartphone, tablet computer, CPE (Customer Premise Equipment, any connection equipment used to access Ethernet or usually access services on the operator's network), this example embodiment does not make special restrictions on the type of sending terminal accessing the communication network .

有损语音数据是指原始语音数据经过通信网络传输后，在接收终端输出的语音数据，可以通过录制设备录制接收终端输出的有损语音数据。接收终端是指在评估语音质量阶段设置的、用于模拟通话过程中的接听用户对应的终端，接收终端可以是任意类型的能够接入通信网络实现语音通话的终端设备，例如，发送终端可以是智能手机、平板电脑、CPE(Customer Premise Equipment，用于访问以太网或通常访问运营商网络上的服务的任何连接设备)，本示例实施例对接入通信网络的接收终端的类型不做特殊限定。Lossy voice data refers to the voice data output by the receiving terminal after the original voice data is transmitted through the communication network, and the lossy voice data output by the receiving terminal can be recorded by a recording device. The receiving terminal refers to the terminal corresponding to the answering user during the simulated call, which is set during the voice quality assessment stage. The receiving terminal can be any type of terminal device that can access the communication network to realize voice calls. For example, the sending terminal can be Smartphones, tablet computers, CPE (Customer Premise Equipment, any connected device used to access Ethernet or usually access services on the operator's network), this example embodiment does not specifically limit the type of receiving terminal accessing the communication network .

可选的，发送终端以及接收终端接入的通信网络可以是相同运营商提供的通信网络，也可以是不同运营商提供的通信网络，例如，发送终端接入的可以是A运营商提供的通信网络，接收终端接入的可以是B运营商提供的通信网络，本示例实施例对于发送终端以及接收终端接入通信网络的方式不做任何特殊限定。Optionally, the communication network accessed by the sending terminal and the receiving terminal may be a communication network provided by the same operator, or a communication network provided by different operators. For example, the sending terminal may access a communication network provided by operator A. For the network, the receiving terminal may access the communication network provided by operator B, and this example embodiment does not make any special restrictions on the way the sending terminal and the receiving terminal access the communication network.

可选的，通过播放设备在发送终端播放原始音频数据时，可以自定义设置原始音频数据的音量，例如，可以固定设置为80％，也可以随机变化设置；可选的，对于原始音频数据的播放方式，可以是连续播放原始音频数据，也可以是间隔时间播放原始音频数据，本示例实施例对于原始音频数据在发送终端的播放音量、播放方式等参数不做任何特殊限定。通过设置原始音频数不同的播放音量、播放方式等参数，可以有效模拟不同环境中说话用户的说话环境，提升语音质量的评估结果的准确性。Optionally, when the original audio data is played by the sending terminal through the playback device, the volume of the original audio data can be customized, for example, it can be fixed at 80%, and the setting can also be changed randomly; The playback method can be to play the original audio data continuously, or to play the original audio data at intervals. This example embodiment does not make any special restrictions on the parameters such as the playback volume and playback method of the original audio data at the sending terminal. By setting parameters such as playback volume and playback mode with different original audio numbers, the speaking environment of speaking users in different environments can be effectively simulated, and the accuracy of voice quality evaluation results can be improved.

在步骤S220中，对所述原始语音数据和所述有损语音数据进行语音识别分析，得到语音识别分析结果。In step S220, speech recognition analysis is performed on the original speech data and the lossy speech data to obtain a speech recognition analysis result.

在本公开的一个示例实施例中，语音识别分析是指对有损语音数据进行语音识别得到有损语音文本之后，将该有损语音文本与原始语音数据对应的原始语音文本进行分析的过程，例如，语音识别分析确定经过通信网络传输后的语音识别准确率的分析过程。语音识别(Automatic Speech Recognition，ASR)是指将人的语音转换为语音文本内容的技术。In an exemplary embodiment of the present disclosure, speech recognition analysis refers to a process of analyzing the lossy speech text and the original speech text corresponding to the original speech data after performing speech recognition on the lossy speech data, For example, speech recognition analysis determines the analysis process of speech recognition accuracy after transmission through the communication network. Speech recognition (Automatic Speech Recognition, ASR) refers to the technology of converting human speech into speech and text content.

语音识别分析结果可以是对所述原始语音数据和所述有损语音数据进行语音识别分析后得到的参数，例如，语音识别分析结果可以是将原始语音文本与有损语音文本进行比对后得到的语音识别准确率，也可以是将原始语音文本与有损语音文本进行比对后得到的语音文本完整度，当然，还可以是其他衡量语音文本传输损失的参数，本示例实施例对此不做特殊限定。The speech recognition analysis result may be a parameter obtained after performing speech recognition analysis on the original speech data and the lossy speech data, for example, the speech recognition analysis result may be obtained after comparing the original speech text with the lossy speech text Speech recognition accuracy rate, can also be the completeness of the speech text obtained after comparing the original speech text with the lossy speech text, of course, it can also be other parameters to measure the transmission loss of the speech text, this example embodiment does not Make a special limit.

在步骤S230中，根据所述语音识别分析结果评估所述通信网络的语音质量分数。In step S230, the speech quality score of the communication network is evaluated according to the speech recognition analysis result.

在本公开的一个示例实施例中，语音质量分数是指衡量通信网络的语音通话质量的数据，语音质量分数越高，可以认为当前通信网络的通话质量较好，语音质量分数越高，可以认为当前通信网络的通话质量较差，需要对通信网络进行维护或者优化。In an exemplary embodiment of the present disclosure, the voice quality score refers to the data for measuring the voice call quality of the communication network. The higher the voice quality score, it can be considered that the call quality of the current communication network is better, and the higher the voice quality score, it can be considered The call quality of the current communication network is poor, and the communication network needs to be maintained or optimized.

通过语音识别技术，并通过语音识别分析结果可以直接评估用户对移动网络语音质量的直接听觉感受，有效指导通信网络的维护及优化，从而提升运营商网络的优化效率，提升用户体验。Through speech recognition technology and speech recognition analysis results, users can directly evaluate the user's direct auditory experience of mobile network speech quality, effectively guide the maintenance and optimization of communication networks, thereby improving the optimization efficiency of operators' networks and improving user experience.

下面对步骤S210至步骤S230进行展开说明。Steps S210 to S230 will be expanded and described below.

在本公开的一个示例实施例中，除了对原始语音数据和有损语音数据进行语音识别分析，还可以对原始语音数据和有损语音数据进行网络传输变化分析，得到网络传输变化分析结果；对原始语音数据和有损语音数据进行语音编码分析，得到语音编码分析结果。In an exemplary embodiment of the present disclosure, in addition to performing voice recognition analysis on the original voice data and the lossy voice data, network transmission change analysis may also be performed on the original voice data and the lossy voice data to obtain network transmission change analysis results; The original speech data and the lossy speech data are subjected to speech coding analysis, and the speech coding analysis result is obtained.

进一步的，可以根据网络传输变化分析结果、语音编码分析结果以及语音识别分析结果中的一种或者多种组合评估通信网络的语音质量分数。例如，可以通过网络传输变化分析结果、语音编码分析结果或者语音识别分析结果评估通信网络的语音质量分数，也可以通过网络传输变化分析结果与语音编码分析结果的组合、语音编码分析结果与语音识别分析结果的组合、或者网络传输变化分析结果与语音识别分析结果的组合评估通信网络的语音质量分数，当然，还可以通过网络传输变化分析结果、语音编码分析结果以及语音识别分析结果综合评估通信网络的语音质量分数，本示例实施例对此不做特殊限定。Further, the voice quality score of the communication network may be evaluated according to one or more combinations of network transmission change analysis results, voice coding analysis results, and voice recognition analysis results. For example, the voice quality score of the communication network can be evaluated through network transmission of change analysis results, speech encoding analysis results, or speech recognition analysis results, or a combination of change analysis results and speech encoding analysis results, speech encoding analysis results and speech recognition analysis results can be transmitted through the network. The combination of analysis results, or the combination of network transmission change analysis results and speech recognition analysis results evaluates the voice quality score of the communication network. Of course, the communication network can also be comprehensively evaluated through the network transmission change analysis results, speech coding analysis results, and speech recognition analysis results. The voice quality score of , which is not specifically limited in this example embodiment.

在本实施例中，网络传输分析是指分析原始语音数据经过通信网络传输后产生的影响的过程，例如，网络传输分析可以是通过原始语音数据与有损语音数据之间特征点的时间差距确定传输时延的分析过程，也可以是通过比对原始语音数据与有损语音数据之间的语音信号振幅确定幅度变化的分析过程，当然，还可以是分析其他经过通信网络传输可能造成的影响的过程，如还可以是比对原始语音数据与有损语音数据之间的信噪比确定信号能量消耗的分析过程，本示例实施例对此不做特殊限定。In this embodiment, the network transmission analysis refers to the process of analyzing the impact of the original voice data after it is transmitted through the communication network. For example, the network transmission analysis can be determined by the time gap between the original voice data and the lossy voice data. The analysis process of transmission delay can also be the analysis process of determining the amplitude change by comparing the voice signal amplitude between the original voice data and the lossy voice data, of course, it can also be the analysis of other possible impacts caused by transmission through the communication network The process, for example, may also be an analysis process for determining the signal energy consumption by comparing the signal-to-noise ratio between the original voice data and the lossy voice data, which is not specifically limited in this exemplary embodiment.

网络传输变化分析结果是指对原始语音数据和有损语音数据进行网络传输变化分析后得到的数据，例如，若网络传输分析是通过原始语音数据与有损语音数据之间特征点的时间差距确定传输时延的分析过程，那么网络传输变化分析结果可以是传输时延数据，若网络传输分析是通过比对原始语音数据与有损语音数据之间的语音信号振幅确定幅度变化的分析过程，那么网络传输变化分析结果可以是语音幅度变化数据，本示例实施例不以此为限。The network transmission change analysis result refers to the data obtained after analyzing the network transmission change of the original voice data and the lossy voice data. The analysis process of transmission delay, then the network transmission change analysis result can be transmission delay data, if the network transmission analysis is the analysis process of determining the amplitude change by comparing the voice signal amplitude between the original voice data and the lossy voice data, then The network transmission change analysis result may be voice amplitude change data, which is not limited in this exemplary embodiment.

在本实施例中，语音编码分析是指分析原始语音数据和有损语音数据对应的语音编码方案的分析过程，例如，语音编码分析可以是确定原始语音数据和有损语音数据的语音编码方式，并根据不同的语音编码方式对于用户感知影响进行评分的过程；也可以是确定原始语音数据和有损语音数据的语音编码速率，并根据不同的语音编码速率对于用户感知影响进行评分的过程；当然，还可以是根据两者的语音编码方式以及语音编码速率(即语音编码方案)对于用户感知影响进行评分的过程，本示例实施例对此不做特殊限定。In this embodiment, the speech coding analysis refers to the analysis process of analyzing the speech coding scheme corresponding to the original speech data and the lossy speech data, for example, the speech coding analysis can be to determine the speech coding method of the original speech data and the lossy speech data, And the process of scoring the impact on user perception according to different speech encoding methods; it can also be the process of determining the speech encoding rate of the original speech data and lossy speech data, and scoring the impact on user perception according to different speech encoding rates; of course , it may also be a process of scoring the impact on user perception according to the two speech coding modes and speech coding rates (that is, the speech coding scheme), which is not specifically limited in this exemplary embodiment.

语音编码分析结果是指对原始语音数据和有损语音数据进行语音编码分析，例如，语音编码分析结果可以是原始语音数据和有损语音数据所采用的语音编码方式对应的评分数据，也可以是原始语音数据和有损语音数据所采用的语音编码速率对应的评分数据，当然，还可以原始语音数据和有损语音数据所采用的语音编码方案(即语音编码方式以及语音编码速率)对应的编码方案分数。The speech coding analysis result refers to the speech coding analysis of the original speech data and the lossy speech data. For example, the speech coding analysis result can be the score data corresponding to the speech coding method adopted by the original speech data and the lossy speech data, or it can be Scoring data corresponding to the speech coding rate adopted by the original speech data and the lossy speech data, of course, the coding corresponding to the speech coding scheme (ie speech coding method and speech coding rate) adopted by the original speech data and the lossy speech data Scenario score.

可选的，可以预先获取不同类型通信网络下包含不同语音编码方式以及相应语音编码速率的语音编码参数表，可以通过线性插值或者正态分布的方式确定采用不同语音编码速率和/或语音编码方式对应的编码方案分数。Optionally, a speech coding parameter table containing different speech coding methods and corresponding speech coding rates under different types of communication networks can be obtained in advance, and different speech coding rates and/or speech coding methods can be determined by linear interpolation or normal distribution Corresponding coding scheme score.

通过对通信网络传输后的语音进行语音识别分析，将评估结果与用户感知相关联，并且结合通信网络的网络传输影响，以及不同语音编码方式对人耳感知的影响，对通信网络的语音质量进行综合评估，进一步提升评估结果与用户感知的关联性，提高评估结果的合理性与准确性。Through the speech recognition and analysis of the speech transmitted by the communication network, the evaluation result is associated with the user's perception, and combined with the influence of the network transmission of the communication network and the impact of different speech coding methods on the perception of the human ear, the speech quality of the communication network is analyzed. Comprehensive evaluation, further enhance the correlation between evaluation results and user perception, and improve the rationality and accuracy of evaluation results.

在本公开的一个示例实施例中，网络传输变化分析结果可以包括通信网络的传输时延数据，可以通过以下步骤对原始语音数据和有损语音数据进行网络传输变化分析得到通信网络的传输时延数据：In an exemplary embodiment of the present disclosure, the network transmission change analysis result may include the transmission delay data of the communication network, and the network transmission change analysis of the original voice data and the lossy voice data may be performed through the following steps to obtain the transmission delay of the communication network data:

可以确定有损语音数据中的标识位置对应的第一时间点，确定该标识位置在原始语音数据中对应的第二时间点，进而可以根据第一时间点和第二时间点确定原始语音数据和有损语音数据之间的传输时延数据。The first time point corresponding to the identification position in the lossy voice data can be determined, and the second time point corresponding to the identification position in the original voice data can be determined, and then the original voice data and the original voice data can be determined according to the first time point and the second time point. Transmission delay between lossy voice data.

其中，标识位置是指用于判断原始语音数据与有损语音数据之间的传输时延数据的特征点位置，例如，标识位置可以是原始语音数据或者有损语音数据中语音信号峰值对应的波形位置，也可以是在原始语音数据或者有损语音数据中插入的标识音频对应的波形位置，本示例实施例对于判断传输时延数据的方式不做特殊限定。Wherein, the marked position refers to the feature point position used to judge the transmission delay data between the original voice data and the lossy voice data, for example, the marked position can be the waveform corresponding to the peak value of the voice signal in the original voice data or the lossy voice data The position may also be the waveform position corresponding to the logo audio inserted in the original voice data or the lossy voice data, and this example embodiment does not specifically limit the method of judging the transmission delay data.

第一时间点是指标识位置在有损语音数据中所对应的时间点，第二时间点是指标识位置在原始语音数据中所对应的时间点，例如，以标识位置为语音信号峰值对应的波形位置为例，有损语音数据中语音信号峰值对应的波形位置对应的第一时间点为2秒35毫秒，原始语音数据中该语音信号峰值对应的波形位置对应的第二时间点为2秒10毫秒，则可以认为原始语音数据和有损语音数据之间的传输时延数据为25毫秒，当然，此处仅是示意性举例说明，并不应对本示例实施例造成任何特殊限定。The first time point refers to the time point corresponding to the marked position in the lossy speech data, and the second time point refers to the time point corresponding to the marked position in the original speech data, for example, taking the marked position as the peak corresponding to the voice signal Taking the waveform position as an example, the first time point corresponding to the waveform position corresponding to the peak value of the voice signal in the lossy voice data is 2 seconds and 35 milliseconds, and the second time point corresponding to the waveform position corresponding to the peak value of the voice signal in the original voice data is 2 seconds 10 milliseconds, it can be considered that the transmission delay data between the original voice data and the lossy voice data is 25 milliseconds. Of course, this is only a schematic illustration and should not cause any special limitation to this example embodiment.

需要说明的是，本示例实施例的“第一时间点”“第二时间点”中的“第一”“第二”仅用于区分标识位置在原始语音数据或者有损语音数据中的不同时间点，没有任何特殊含义，并不应对本示例实施例造成任何特殊限定。It should be noted that the "first" and "second" in the "first time point" and "second time point" in this example embodiment are only used to distinguish the difference between the identification position in the original voice data or the lossy voice data The time point has no special meaning, and should not cause any special limitation in this example embodiment.

可选的，可以根据预设时长在有损语音数据中确定目标波形，将目标波形在有损语音数据中的位置作为标识位置，例如，预设时长可以是8秒，也可以是4秒，本实施例不以此为限。在有损语音数据播放预设时长之后，将当前时刻下有损语音数据对应的波形作为目标波形，并将该目标波形在有损语音数据中的位置作为标识位置，同时将当前时刻作为第一时间点；可以理解的是，在确定该目标波形之后，可以在原始语音数据中匹配该目标波形，并且将该目标波形在原始语音数据的时间点作为第二时间点。当然，也可以直接确定有损语音数据对应的语音信号峰值，并将该语音信号峰值对应的波形作为目标波形，本示例实施例不以此为限。Optionally, the target waveform can be determined in the lossy speech data according to the preset duration, and the position of the target waveform in the lossy speech data is used as the identification position. For example, the preset duration can be 8 seconds or 4 seconds, This embodiment is not limited thereto. After the lossy voice data is played for a preset duration, the waveform corresponding to the lossy voice data at the current moment is used as the target waveform, and the position of the target waveform in the lossy voice data is used as the identification position, and the current moment is used as the first time point; it can be understood that after the target waveform is determined, the target waveform can be matched in the original voice data, and the time point of the target waveform in the original voice data is used as the second time point. Of course, it is also possible to directly determine the peak value of the speech signal corresponding to the lossy speech data, and use the waveform corresponding to the peak value of the speech signal as the target waveform, which is not limited in this exemplary embodiment.

可选的，可以在原始语音数据中插入标识音频，标识音频可以是具有特殊波形的音频信号，例如，标识音频可以是具有方形波形的音频信号，也可以是具有锯齿状波形的音频信号，本示例实施例对此不做特殊限定。在将原始语音数据中插入标识音频之后，可以确定标识音频在原始语音数据中的第二时间点，经过通信网络传输得到的有损语音数据中也存在标识音频，进而可以将标识音频在有损语音数据中的位置作为标识位置，并确定标识音频在传输音频数据中的第一时间点。Optionally, the identification audio can be inserted into the original speech data, and the identification audio can be an audio signal with a special waveform. For example, the identification audio can be an audio signal with a square waveform or an audio signal with a sawtooth waveform. Example embodiments do not specifically limit this. After inserting the identification audio into the original voice data, it can be determined that the second time point of the identification audio in the original voice data, the identification audio also exists in the lossy voice data obtained through communication network transmission, and then the identification audio can be inserted into the lossy audio data. The position in the voice data is used as the identification position, and the first time point of the identification audio in the transmission audio data is determined.

图3示意性示出了根据本公开的一些实施例的确定通信网络的传输时延数据的原理示意图。Fig. 3 schematically shows a schematic diagram of the principle of determining transmission delay data of a communication network according to some embodiments of the present disclosure.

参考图3所示，原始语音数据对应的音频信号可以是音频信号波形310，在经过通信网络传输后，得到的有损语音数据，有损语音数据对应的音频信号可以是音频信号波形320。在确定通信网络的传输时延数据时，可以确定标识位置，例如标识位置可以是音频信号波形310中任意波形对应的时间点330，同时可以确定该波形在音频信号波形320中对应的时间点340，基于时间点330和时间点340可以确定通信网络的传输时延数据。当然，也可以确定音频信号波形310中语音信号峰值对应的时间点350，同时可以确定音频信号波形320中语音信号峰值对应的时间点360，基于时间点350和时间点360可以确定通信网络的传输时延数据。Referring to FIG. 3 , the audio signal corresponding to the original voice data may be anaudio signal waveform 310 , and the lossy voice data obtained after being transmitted through a communication network, the audio signal corresponding to the lossy voice data may be anaudio signal waveform 320 . When determining the transmission delay data of the communication network, the identification position can be determined, for example, the identification position can be thetime point 330 corresponding to the arbitrary waveform in theaudio signal waveform 310, and thetime point 340 corresponding to the waveform in theaudio signal waveform 320 can be determined at the same time , the transmission delay data of the communication network can be determined based on thetime point 330 and thetime point 340 . Of course, thetime point 350 corresponding to the peak value of the voice signal in theaudio signal waveform 310 can also be determined, and at the same time thetime point 360 corresponding to the peak value of the voice signal in theaudio signal waveform 320 can be determined. Based on thetime point 350 and thetime point 360, the transmission of the communication network can be determined. Latency data.

可以确定原始语音数据或者传输音频数据中的标识位置，并通过标识位置分别处于原始语音数据或者传输音频数据中的第二时间点或者第一时间点，确定原始语音数据与传输音频数据之间的传输时延数据，提升确定的传输时延数据的准确性，从而保证语音质量分数的准确性。It is possible to determine the identification position in the original voice data or the transmission audio data, and determine the distance between the original voice data and the transmission audio data by identifying the position at the second time point or the first time point in the original voice data or the transmission audio data respectively. Transmission delay data to improve the accuracy of the determined transmission delay data, thereby ensuring the accuracy of the voice quality score.

在本公开的一个示例实施例中，网络传输变化分析结果还可以包括语音幅度变化数据，可以通过以下步骤对原始语音数据和有损语音数据进行网络传输变化分析得到语音幅度变化数据：In an example embodiment of the present disclosure, the network transmission change analysis result may also include voice amplitude change data, and the voice amplitude change data may be obtained by performing network transmission change analysis on original voice data and lossy voice data through the following steps:

可以确定有损语音数据对应的第一语音幅度，确定原始语音数据对应的第二语音幅度，进而可以根据第一语音幅度以及第二语音幅度确定原始语音数据和有损语音数据之间的语音幅度变化数据。The first voice amplitude corresponding to the lossy voice data can be determined, the second voice amplitude corresponding to the original voice data can be determined, and then the voice amplitude between the original voice data and the lossy voice data can be determined according to the first voice amplitude and the second voice amplitude change data.

其中，第一语音幅度是指有损语音数据对应的波形幅度数据，第二语音幅度是指原始语音数据对应的波形幅度数据，可以通过包络检波(envelope-demodulation)的方式确定有损语音数据或者原始语音数据对应的波形幅度数据，即第一语音幅度或者第二语音幅度，当然，还可以通过其他方式确定第一语音幅度和第二语音幅度，如通过积分的方式确定有损语音数据或者原始语音数据的波形对应的面积数据，并将波形面积数据作为第一语音幅度或者第二语音幅度，本实施例对确定语音数据对应的语音幅度的方式不做特殊限定。Wherein, the first voice amplitude refers to the waveform amplitude data corresponding to the lossy voice data, and the second voice amplitude refers to the waveform amplitude data corresponding to the original voice data, and the lossy voice data can be determined by means of envelope-demodulation Or the waveform amplitude data corresponding to the original voice data, that is, the first voice amplitude or the second voice amplitude, of course, the first voice amplitude and the second voice amplitude can also be determined in other ways, such as determining lossy voice data or The area data corresponding to the waveform of the original voice data is used as the first voice amplitude or the second voice amplitude. The method of determining the voice amplitude corresponding to the voice data is not particularly limited in this embodiment.

需要说明的是，本示例实施例的“第一语音幅度”“第二语音幅度”中的“第一”“第二”仅用于区分原始语音数据或者有损语音数据对应的不同语音幅度，没有任何特殊含义，并不应对本示例实施例造成任何特殊限定。It should be noted that the "first" and "second" in the "first voice amplitude" and "second voice amplitude" in this example embodiment are only used to distinguish different voice amplitudes corresponding to original voice data or lossy voice data, does not have any special meaning, and should not cause any special limitation to this exemplary embodiment.

可选的，网络传输变化分析结果还可以包括声音波形损失数据，可以通过以下步骤对原始语音数据和有损语音数据进行网络传输变化分析得到声音波形损失数据：Optionally, the network transmission change analysis result may also include sound waveform loss data, and the sound waveform loss data may be obtained by performing network transmission change analysis on the original voice data and lossy voice data through the following steps:

可以对原始语音数据的声音波形和有损语音数据的声音波形进行包络检测，确定原始语音数据和有损语音数据之间的声音波形损失数据；和/或获取有损语音数据对应的丢包参数和抖动参数，并基于丢包参数和抖动参数确定原始语音数据和有损语音数据之间的声音波形损失数据。Envelope detection can be performed on the sound waveform of the original voice data and the sound waveform of the lossy voice data to determine the sound waveform loss data between the original voice data and the lossy voice data; and/or obtain the packet loss corresponding to the lossy voice data parameter and the jitter parameter, and determine sound waveform loss data between the original voice data and the lossy voice data based on the packet loss parameter and the jitter parameter.

其中，可以对原始语音数据的声音波形进行包络检测，确定原始语音数据对应的原始波形图，可以对有损语音数据的声音波形进行包络检测，确定有损语音数据对应的有损波形图，可以通过检测分析工具对原始波形图与有损波形图进行比对，确定波形图中的图像特性变化，并将该图像特性变化作为原始语音数据和有损语音数据之间的声音波形损失数据。Among them, the envelope detection can be performed on the sound waveform of the original voice data to determine the original waveform diagram corresponding to the original voice data, and the envelope detection can be performed on the sound waveform of the lossy voice data to determine the lossy waveform diagram corresponding to the lossy voice data , the original waveform image can be compared with the lossy waveform image through the detection and analysis tool, the image characteristic change in the waveform image can be determined, and the image characteristic change can be used as the sound waveform loss data between the original voice data and the lossy voice data .

在接收到有损语音数据之后，可以获取有损语音数据对应的丢包参数和抖动参数，并将丢包参数以及抖动参数作为原始语音数据和有损语音数据之间的声音波形损失数据。After receiving the lossy voice data, the packet loss parameter and the jitter parameter corresponding to the lossy voice data can be obtained, and the packet loss parameter and the jitter parameter can be used as sound waveform loss data between the original voice data and the lossy voice data.

在本公开的一个示例实施例中，语音编码分析结果可以包括编码方案分数，可以通过以下步骤对原始语音数据和所述有损语音数据进行语音编码分析得到编码方案分数：In an exemplary embodiment of the present disclosure, the speech coding analysis result may include a coding scheme score, and the coding scheme score may be obtained by performing speech coding analysis on the original speech data and the lossy speech data through the following steps:

可以确定原始语音数据对应的第一编码方式以及第一编码速率，确定有损语音数据对应的第二编码方式以及第二编码速率，进而可以基于第一编码方式、第一编码速率、第二编码方式和第二编码速率，确定原始语音数据和有损语音数据之间的编码方案分数。The first coding method and the first coding rate corresponding to the original speech data can be determined, the second coding method and the second coding rate corresponding to the lossy speech data can be determined, and then based on the first coding method, the first coding rate, and the second coding rate mode and a second coding rate to determine a coding scheme fraction between the original speech data and the lossy speech data.

其中，第一编码方式是指发送终端接入的通信网络在编码原始语音数据时所采用的语音编码方式，第一编码速率是指在第一编码方式下编码原始语音数据时所采用的语音编码速率；相应的，第二编码方式是指发送终端接入的通信网络在编码有损语音数据时所采用的语音编码方式，第二编码速率是指在第二编码方式下编码有损语音数据时所采用的语音编码速率。Among them, the first coding method refers to the speech coding method adopted by the communication network accessed by the sending terminal when coding the original speech data, and the first coding rate refers to the speech coding method adopted when coding the original speech data under the first coding method. rate; correspondingly, the second encoding method refers to the speech encoding method adopted by the communication network accessed by the sending terminal when encoding the lossy voice data, and the second encoding rate refers to the encoding rate of the lossy voice data in the second encoding mode. The speech encoding rate to use.

可以理解的是，此处的“第一”“第二”仅是用于区分原始语音数据和有损语音数据所对应的语音编码方式或者语音编码速率，没有任何特殊含义，并不应对本示例实施例造成任何特殊限定。It can be understood that the "first" and "second" here are only used to distinguish the speech coding method or speech coding rate corresponding to the original speech data and the lossy speech data, without any special meaning, and should not be used in this example The examples impose no special limitations.

可选的，可以获取不同通信网络下对应的语音编码方式以及语音编码速率对应的语音编码参数表，例如，可以是VoLTE(Voice over Long-Term Evolution，长期演进语音承载)通信网络下对应的语音编码方式以及语音编码速率对应的参数表，也可以是VoNR(Voice over New Radio，5G网络的目标语音解决方案)通信网络下对应的语音编码方式以及语音编码速率对应的参数表，当然，还可以其他类型通信网络对应的语音编码参数表，本示例实施例对此不做特殊限定。Optionally, the speech coding parameter tables corresponding to the corresponding speech coding methods and speech coding rates under different communication networks may be obtained, for example, it may be the corresponding speech encoding method under the VoLTE (Voice over Long-Term Evolution, long-term evolution voice bearer) communication network The parameter table corresponding to the encoding method and the speech encoding rate can also be the corresponding speech encoding method and the corresponding parameter table of the speech encoding rate under the VoNR (Voice over New Radio, 5G network target voice solution) communication network. Of course, it can also be The speech coding parameter tables corresponding to other types of communication networks are not specifically limited in this exemplary embodiment.

举例而言，以通信网络为VoNR为例，可以获取VoNR网络对应的语音编码参数表。For example, taking the communication network as VoNR as an example, the speech coding parameter table corresponding to the VoNR network may be acquired.

具体的，VoNR网络对应的语音编码参数表可以如表1所示，可以根据实际情况设置不同语音编码方式对应的分数，以及不同语音编码方式下各语音编码速率对应的分数，例如，语音编码方式EVS-NB、EVS-WB、EVS-SWB、EVS-FB、AMR-WB I/O^a对应的分数可以分别是1、2、3、4、5，假设语音数据的语音编码方式为EVS-NB，其对应的语音编码速率5.9、7.2、8.0、9.6、13.2、16.4、24.4对应的分数为0.1、0.2、0.3、0.4、0.5、0.6、0.7，一般语音编码速率越高，分数越高。Specifically, the speech coding parameter table corresponding to the VoNR network can be shown in Table 1, and the scores corresponding to different speech coding methods can be set according to the actual situation, and the corresponding scores of each speech coding rate under different speech coding methods, for example, the speech coding method The scores corresponding to EVS-NB, EVS-WB, EVS-SWB, EVS-FB, AMR-WB I/O^a can be 1, 2, 3, 4, 5 respectively, assuming that the voice coding method of voice data is EVS-NB , the corresponding speech coding rates 5.9, 7.2, 8.0, 9.6, 13.2, 16.4, and 24.4 correspond to scores of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7. Generally, the higher the speech coding rate, the higher the score.

表1VoNR网络的语音编码参数表Table 1 Speech coding parameter list of VoNR network

编码方式Encoding支持的语音编码速率(kbit/s)Supported Speech Coding Rate(kbit/s)EVS-NBEVS-NB5.9、7.2、8.0、9.6、13.2、16.4、24.45.9, 7.2, 8.0, 9.6, 13.2, 16.4, 24.4EVS-WBEVS-WB5.9、7.2、8.0、9.6、13.2、16.4、24.4、32、48、64、96、1285.9, 7.2, 8.0, 9.6, 13.2, 16.4, 24.4, 32, 48, 64, 96, 128EVS-SWBEVS-SWB9.6、13.2、16.4、24.4、32、48、64、96、1289.6, 13.2, 16.4, 24.4, 32, 48, 64, 96, 128EVS-FBEVS-FB16.4、24.4、32、48、64、96、12816.4, 24.4, 32, 48, 64, 96, 128AMR-WB I/OaAMR-WB I/Oa6.6、8.85、12.65、14.25、15.85、18.25、19.85、23.05、23.856.6, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85

假设语音数据采用EVS-NB的语音编码方式，并采用9.6的语音编码速率，那么该语音数据的语音编码方案的分数为1.4，当然，此处仅是示意性举例说明，具体的分数设置可以根据实际应用情况进行自定义设置，本示例实施例对此不做特殊限定。Assuming that the speech data adopts EVS-NB speech coding method, and adopts the speech coding rate of 9.6, then the score of the speech coding scheme of the speech data is 1.4, of course, here is only a schematic example, and the specific score setting can be based on The custom setting is performed according to the actual application situation, and this example embodiment makes no special limitation on this.

可以通过第一编码方式以及第一编码速率对应的分数确定原始语音数据对应的第一编码方案分数，可以通过第二编码方式以及第二编码速率对应的分数确定有损语音数据对应的第二编码方案分数，最后可以对第一编码方案分数以及第二编码方案分数进行加权平均，得到原始语音数据和有损语音数据之间的编码方案分数。The first encoding scheme score corresponding to the original speech data can be determined by the first encoding method and the score corresponding to the first encoding rate, and the second encoding corresponding to the lossy speech data can be determined by the second encoding method and the second encoding rate corresponding score scheme scores, and finally the scores of the first coding scheme and the scores of the second coding scheme can be weighted and averaged to obtain the coding scheme scores between the original speech data and the lossy speech data.

可选的，可以仅通过原始语音数据的第一编码方式以及第一编码速率对应的分数确定原始语音数据和有损语音数据之间的编码方案分数，本示例实施例对确定编码方案分数的方式不做特殊限定。Optionally, the encoding scheme score between the original voice data and the lossy voice data can be determined only by the first encoding method of the original voice data and the score corresponding to the first encoding rate. No special restrictions are made.

可选的，可以通过线性插值或者正态分布的方式，确定原始语音数据和有损语音数据之间的编码方案分数。Optionally, the coding scheme score between the original speech data and the lossy speech data may be determined by means of linear interpolation or normal distribution.

在本公开的一个示例实施例中，语音识别分析结果可以包括语音识别准确率，可以通过以下步骤对原始语音数据和有损语音数据进行语音识别分析得到语音识别准确率：In an example embodiment of the present disclosure, the speech recognition analysis result may include the speech recognition accuracy rate, and the speech recognition accuracy rate may be obtained by performing speech recognition analysis on the original speech data and the lossy speech data through the following steps:

可以对有损语音数据进行语音识别，确定有损语音数据对应的有损语音文本，可以获取原始语音数据对应的原始语音文本，并将有损语音文本与原始语音文本进行比对，确定原始语音数据和有损语音数据之间的语音识别准确率。Speech recognition can be performed on the lossy voice data, the lossy voice text corresponding to the lossy voice data can be determined, the original voice text corresponding to the original voice data can be obtained, and the lossy voice text can be compared with the original voice text to determine the original voice Speech recognition accuracy between data and lossy speech data.

其中，有损语音文本是指对有损语音数据进行语音识别转换得到的文本内容，例如，可以通过隐马尔可夫模型(Hidden Markov Model，HMM)对有损语音数据进行语音识别，也可以通过基于卷积神经网络(Convolutional Neural Networks，CNN)的语音识别网络对有损语音数据进行语音识别，当然，还可以通过其他能够实现对有损语音数据进行语音识别得到有损语音文本的处理方式，本示例实施例对此不做特殊限定。Wherein, the lossy voice text refers to the text content obtained by performing voice recognition conversion on the lossy voice data. The speech recognition network based on convolutional neural network (Convolutional Neural Networks, CNN) performs speech recognition on the lossy speech data. Of course, other processing methods that can realize the speech recognition of the lossy speech data to obtain the lossy speech text, This example embodiment does not specifically limit it.

语音识别准确率是指在经过通信网络传输后得到的有损语音数据能够被准确识别部分的比例，例如，原始语音数据对应的原始语音文本可以是“这是一篇专利申请文件”，而有损语音数据在经过语音识别之后，生成的有损语音文本可以是“这是XXXX申请文件”，只有6个字被准确识别，其他部分要么无法识别，要么识别错误，此时，原始语音数据和有损语音数据之间的语音识别准确率为60％。当然，此处仅是示意性举例说明，并不应对本示例实施例造成任何特殊限定。Speech recognition accuracy refers to the proportion of lossy speech data that can be accurately recognized after being transmitted through a communication network. For example, the original speech text corresponding to the original speech data can be "This is a patent application document", while After the speech recognition of the lossy speech data, the lossy speech text generated can be "this is the XXXX application document", only 6 characters are accurately recognized, and the other parts are either unrecognizable or misrecognized. At this time, the original speech data and Speech recognition accuracy between lossy speech data is 60%. Of course, this is only a schematic illustration, and should not cause any special limitation to this example embodiment.

可选的，语音识别分析结果还可以是抗干扰等级，可以在通过播放设备在发送终端播放原始语音数据时，插入背景噪音，背景噪音分为多个噪音等级，在保证原始语音数据和有损语音数据之间的语音识别准确率大于或者等于语音识别准确率阈值的前提下，不断提升背景噪音对应的噪音等级，直到原始语音数据和有损语音数据之间的语音识别准确率小于语音识别准确率阈值，将当前背景噪音对应的当前噪音等级作为通信网络的抗干扰等级，例如，语音识别准确率阈值可以是80％，也可以是90％，具体可以根据实际使用情况进行自定义设置，本示例实施例对此不做特殊限定。Optionally, the speech recognition analysis result can also be the anti-interference level. When the original speech data is played on the sending terminal through the playback device, background noise can be inserted. The background noise is divided into multiple noise levels. On the premise that the speech recognition accuracy rate between the speech data is greater than or equal to the speech recognition accuracy threshold, the noise level corresponding to the background noise is continuously increased until the speech recognition accuracy rate between the original speech data and the lossy speech data is less than the speech recognition accuracy rate threshold, the current noise level corresponding to the current background noise is used as the anti-interference level of the communication network, for example, the speech recognition accuracy rate threshold can be 80% or 90%, and can be customized according to the actual usage. Example embodiments do not specifically limit this.

在本公开的一个示例实施例中，可以根据网络传输变化分析结果、语音编码分析结果、语音识别分析结果以及预设的综合评分算法确定通信网络的语音质量分数，例如，综合评分算法可以如关系式(1)所示：In an exemplary embodiment of the present disclosure, the voice quality score of the communication network can be determined according to the network transmission change analysis results, speech coding analysis results, speech recognition analysis results, and a preset comprehensive scoring algorithm. For example, the comprehensive scoring algorithm can be as follows: Formula (1) shows:

其中，S可以表示通信网络的语音质量分数，X可以表示语音识别准确率，单位为百分比，Y可以表示传输时延数据，单位为毫秒，Z可以表示语音幅度变化数据，单位为百分比，C可以表示编码方案分数，赋值一般在[1，10]之间。当然，此处仅是示意性举例说明，本示例实施例对此不做特殊限定。Among them, S can represent the voice quality score of the communication network, X can represent the speech recognition accuracy rate, the unit is a percentage, Y can represent the transmission delay data, the unit is milliseconds, Z can represent the voice amplitude change data, the unit is a percentage, and C can represent Indicates the score of the encoding scheme, and the assignment is generally between [1, 10]. Of course, this is only a schematic illustration, and this example embodiment does not make a special limitation on it.

图4示意性示出了根据本公开的一些实施例的基于语音识别实现语音质量评估的流程示意图。Fig. 4 schematically shows a flow chart of implementing speech quality assessment based on speech recognition according to some embodiments of the present disclosure.

参考图4所示，可以基于各种素材资源得到原始语音数据410，然后通过播放设备在发送终端420播放原始语音数据410，原始音频数据410经过发送终端420接入的通信网络430，发送到接收终端440，并由接收终端440输出有损语音数据450。As shown in FIG. 4 , theoriginal voice data 410 can be obtained based on various material resources, and then theoriginal voice data 410 is played at the sendingterminal 420 by a playback device, and theoriginal audio data 410 is sent to the receivingterminal 420 through thecommunication network 430 accessed by the sendingterminal 420. terminal 440, and the receivingterminal 440 outputslossy voice data 450.

可以通过网络传输变化分析模块对原始语音数据410和有损语音数据450进行网络传输变化分析460，确定网络传输变化分析结果，例如，网络传输变化分析结果可以包括但不限于通信网络的传输时延数据、语音幅度变化数据；可以通过语音编码分析模块对原始语音数据410和有损语音数据450进行语音编码分析470，确定语音编码分析结果，例如，语音编码分析结果可以包括但不限于编码方案分数；可以通过语音识别分析模块对原始语音数据410和有损语音数据450进行语音识别分析480，确定语音识别分析结果，例如，语音识别分析结果可以包括但不限于语音识别准确率。最后可以将网络传输变化分析结果(如传输时延数据、语音幅度变化数据等)、将语音编码分析结果(如编码方案分数)以及语音识别分析结果(如语音识别准确率)进行综合分数评估，得到通信网络430对应的语音质量分数490。Networktransmission change analysis 460 can be performed on theoriginal voice data 410 andlossy voice data 450 through the network transmission change analysis module to determine the network transmission change analysis result, for example, the network transmission change analysis result can include but not limited to the transmission delay of the communication network Data, speech amplitude change data; can carry outspeech coding analysis 470 tooriginal speech data 410 andlossy speech data 450 by speech coding analysis module, determine speech coding analysis result, for example, speech coding analysis result can include but not limited to coding scheme score The voice recognition analysis module can performvoice recognition analysis 480 on theoriginal voice data 410 and thelossy voice data 450 to determine the voice recognition analysis results, for example, the voice recognition analysis results can include but not limited to voice recognition accuracy. Finally, the network transmission change analysis results (such as transmission delay data, voice amplitude change data, etc.), speech coding analysis results (such as coding scheme scores) and speech recognition analysis results (such as speech recognition accuracy) can be evaluated by comprehensive scores. Avoice quality score 490 corresponding to thecommunication network 430 is obtained.

在本公开的一个示例实施例中，可以根据预设的评估周期确定通信网络的语音质量分数，例如，预设的评估周期可以是8秒，也可以是4秒、2秒等，本示例实施例对此不做特殊限定。例如，预设的评估周期可以是8秒，那么可以分别在原始语音数据播放到第8秒、第16秒、第24秒……时重新评估通信网络的语音质量分数，最后可以将各评估周期对应的所有语音质量分数进行平均，作为通信网络最终的语音质量分数。In an example embodiment of the present disclosure, the voice quality score of the communication network may be determined according to a preset evaluation period, for example, the preset evaluation period may be 8 seconds, 4 seconds, 2 seconds, etc. There is no special limitation for this example. For example, the preset evaluation period can be 8 seconds, then the voice quality score of the communication network can be re-evaluated when the original voice data is played to the 8th second, the 16th second, the 24th second..., and finally each evaluation period can be All corresponding voice quality scores are averaged to be the final voice quality score of the communication network.

综上所述，可以获取在发送终端播放的原始语音数据，以及获取原始语音数据在经过通信网络传输后输出到接收终端的有损语音数据，进而可以对原始语音数据和有损语音数据进行网络传输变化分析，得到网络传输变化分析结果；可以对原始语音数据和有损语音数据进行语音编码分析，得到语音编码分析结果；对原始语音数据和有损语音数据进行语音识别分析，得到语音识别分析结果；最后可以根据网络传输变化分析结果、语音编码分析结果以及语音识别分析结果确定通信网络的语音质量分数。一方面，通过对通信网络传输后的语音进行语音识别分析，将评估结果与用户感知相关联，并且结合通信网络的网络传输影响，以及不同语音编码方式对人耳感知的影响，对通信网络的语音质量进行综合评估，进一步提升评估结果与用户感知的关联性，提高评估结果的合理性与准确性；另一方面，相比于相关技术中通过波形衰减评分技术对语音质量进行评估，本方案并不需要对整段语音的波形进行评估，语音质量分数的输出效率更高，响应更快，有效提升语音质量评估效率。To sum up, it is possible to obtain the original voice data played on the sending terminal, as well as the lossy voice data output to the receiving terminal after the original voice data is transmitted through the communication network, and then the original voice data and the lossy voice data can be networked. Analysis of transmission changes to obtain the analysis results of network transmission changes; conduct speech encoding analysis on the original voice data and lossy voice data to obtain the results of speech encoding analysis; conduct speech recognition analysis on the original voice data and lossy voice data to obtain speech recognition analysis Result; finally, the voice quality score of the communication network can be determined according to the analysis results of network transmission change, speech coding analysis and speech recognition analysis. On the one hand, through speech recognition and analysis of the speech transmitted by the communication network, the evaluation results are associated with user perception, and combined with the influence of network transmission of the communication network and the impact of different speech coding methods on human ear perception, the influence of the communication network on Voice quality is comprehensively evaluated to further enhance the correlation between evaluation results and user perception, and improve the rationality and accuracy of evaluation results; It is not necessary to evaluate the waveform of the entire speech, the output efficiency of the speech quality score is higher, the response is faster, and the speech quality evaluation efficiency is effectively improved.

本方案通过语音识别技术等技术，直接评估用户对移动网络语音质量的直接感受；不需要对整段语音样本进行评估，时延响应更快；区别于一般的语音识别，考虑了通信网络传输的时延、语音幅度变化及语音编码方案的变化，有效提升评估结果的合理性与准确性。This solution directly evaluates the user's direct experience of the voice quality of the mobile network through voice recognition technology and other technologies; it does not need to evaluate the entire voice sample, and the delay response is faster; different from general voice recognition, it considers the communication network transmission Changes in time delay, speech amplitude and speech coding scheme effectively improve the rationality and accuracy of the evaluation results.

本方案通过语音识别技术等技术，可以直接评估用户对通信网络语音质量的直接感受，有效指导通信网络的维护及优化，从而提升运营商对通信网络的优化效率，提升用户体验；使用语音识别技术，不需要对整段语音样本进行评估，时延响应更快，能达到0.5秒粒度(参照语速120字/分钟)；不需要提前订制测试语音样本，仅需普通语音样本以及无损耗参考样本识别后的文字即可完成测试，降低实施成本。Through voice recognition technology and other technologies, this solution can directly evaluate the user's direct experience of the voice quality of the communication network, effectively guide the maintenance and optimization of the communication network, thereby improving the operator's optimization efficiency of the communication network and improving user experience; using voice recognition technology , there is no need to evaluate the entire speech sample, and the delay response is faster, which can reach a granularity of 0.5 seconds (refer to the speech rate of 120 words/minute); there is no need to order test speech samples in advance, only ordinary speech samples and lossless reference The text after the sample recognition can complete the test, reducing the implementation cost.

需要说明的是，尽管在附图中以特定顺序描述了本公开中方法的各个步骤，但是，这并非要求或者暗示必须按照该特定顺序来执行这些步骤，或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的，可以省略某些步骤，将多个步骤合并为一个步骤执行，以及/或者将一个步骤分解为多个步骤执行等。It should be noted that although the steps of the method in the present disclosure are described in a specific order in the drawings, this does not require or imply that these steps must be performed in this specific order, or that all shown steps must be performed to achieve achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

此外，在本示例实施例中，还提供了一种语音质量评估装置。参照图5所示，该语音质量评估装置500可以包括语音数据获取模块510、语音识别分析模块520以及语音质量评估模块530，其中：In addition, in this exemplary embodiment, a device for evaluating voice quality is also provided. Referring to Fig. 5, the voicequality assessment device 500 may include a voicedata acquisition module 510, a voicerecognition analysis module 520 and a voicequality assessment module 530, wherein:

语音数据获取模块510用于获取原始语音数据，以及获取所述原始语音数据在经过通信网络传输后输出的有损语音数据；The voicedata acquisition module 510 is used to acquire the original voice data, and acquire the lossy voice data outputted after the original voice data is transmitted through the communication network;

语音识别分析模块520用于对所述原始语音数据和所述有损语音数据进行语音识别分析，得到语音识别分析结果；The voice recognition andanalysis module 520 is used to perform voice recognition and analysis on the original voice data and the lossy voice data to obtain a voice recognition analysis result;

语音质量评估模块530用于根据所述语音识别分析结果评估所述通信网络的语音质量分数。The voicequality evaluation module 530 is used for evaluating the voice quality score of the communication network according to the voice recognition analysis result.

在本公开的一个示例实施例中，语音质量评估装置500还可以包括：In an exemplary embodiment of the present disclosure, the voicequality assessment apparatus 500 may further include:

在本公开的一个示例实施例中，语音质量评估模块530还可以用于：In an example embodiment of the present disclosure, the speechquality assessment module 530 can also be used for:

在本公开的一个示例实施例中，网络传输变化分析结果可以包括传输时延数据，网络传输变化分析模块可以用于：In an example embodiment of the present disclosure, the network transmission change analysis result may include transmission delay data, and the network transmission change analysis module may be used to:

在本公开的一个示例实施例中，网络传输变化分析模块还可以用于：In an exemplary embodiment of the present disclosure, the network transmission change analysis module may also be used for:

在本公开的一个示例实施例中，网络传输变化分析结果可以包括语音幅度变化数据，网络传输变化分析模块还可以用于：In an exemplary embodiment of the present disclosure, the network transmission change analysis result may include voice amplitude change data, and the network transmission change analysis module may also be used for:

在本公开的一个示例实施例中，网络传输变化分析结果可以包括声音波形损失数据，网络传输变化分析模块还可以用于：In an exemplary embodiment of the present disclosure, the network transmission change analysis result may include sound waveform loss data, and the network transmission change analysis module may also be used for:

在本公开的一个示例实施例中，语音编码分析结果可以包括编码方案分数，语音编码分析模块可以用于：In an example embodiment of the present disclosure, the speech coding analysis result may include a coding scheme score, and the speech coding analysis module may be used to:

在本公开的一个示例实施例中，语音识别分析结果可以包括语音识别准确率，语音识别分析模块520可以用于：In an example embodiment of the present disclosure, the speech recognition analysis result may include speech recognition accuracy, and the speechrecognition analysis module 520 may be used to:

在本公开的一个示例实施例中，语音质量评估装置500可以用于：In an exemplary embodiment of the present disclosure, the voicequality assessment apparatus 500 may be used for:

上述中语音质量评估装置各模块的具体细节已经在对应的语音质量评估方法中进行了详细的描述，因此此处不再赘述。The specific details of each module of the above-mentioned voice quality assessment device have been described in detail in the corresponding voice quality assessment method, so details will not be repeated here.

应当注意，尽管在上文详细描述中提及了语音质量评估装置的若干模块或者单元，但是这种划分并非强制性的。实际上，根据本公开的实施方式，上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之，上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the speech quality assessment device are mentioned in the above detailed description, this division is not mandatory. Actually, according to the embodiment of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided to be embodied by a plurality of modules or units.

此外，在本公开的示例性实施例中，还提供了一种能够实现上述语音质量评估方法的电子设备。In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above voice quality assessment method is also provided.

所属技术领域的技术人员能够理解，本公开的各个方面可以实现为系统、方法或程序产品。因此，本公开的各个方面可以具体实现为以下形式，即：完全的硬件实施例、完全的软件实施例(包括固件、微代码等)，或硬件和软件方面结合的实施例，这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, method or program product. Therefore, various aspects of the present disclosure can be embodied in the following forms, namely: a complete hardware embodiment, a complete software embodiment (including firmware, microcode, etc.), or an embodiment combining hardware and software aspects, which may be collectively referred to herein as "circuit", "module" or "system".

下面参照图6来描述根据本公开的这种实施例的电子设备600。图6所示的电子设备600仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。Anelectronic device 600 according to such an embodiment of the present disclosure is described below with reference to FIG. 6 . Theelectronic device 600 shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.

如图6所示，电子设备600以通用计算设备的形式表现。电子设备600的组件可以包括但不限于：上述至少一个处理单元610、上述至少一个存储单元620、连接不同系统组件(包括存储单元620和处理单元610)的总线630、显示单元640。As shown in FIG. 6,electronic device 600 takes the form of a general-purpose computing device. The components of theelectronic device 600 may include, but are not limited to: at least oneprocessing unit 610, at least onestorage unit 620, abus 630 connecting different system components (including thestorage unit 620 and the processing unit 610), and adisplay unit 640.

其中，所述存储单元存储有程序代码，所述程序代码可以被所述处理单元610执行，使得所述处理单元610执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施例的步骤。例如，所述处理单元610可以执行如图2中所示的步骤S210，获取原始语音数据，以及获取所述原始语音数据在经过通信网络传输后输出的有损语音数据；步骤S220，对所述原始语音数据和所述有损语音数据进行语音识别分析，得到语音识别分析结果；步骤S230，根据所述语音识别分析结果评估所述通信网络的语音质量分数。Wherein, the storage unit stores program codes, and the program codes can be executed by theprocessing unit 610, so that theprocessing unit 610 executes various exemplary methods according to the present disclosure described in the “Exemplary Methods” section above in this specification. Example steps. For example, theprocessing unit 610 may execute step S210 as shown in FIG. 2 to acquire original voice data, and acquire lossy voice data outputted after the original voice data is transmitted through a communication network; step S220, to the Perform voice recognition analysis on the original voice data and the lossy voice data to obtain a voice recognition analysis result; step S230, evaluate the voice quality score of the communication network according to the voice recognition analysis result.

存储单元620可以包括易失性存储单元形式的可读介质，例如随机存取存储单元(RAM)621和/或高速缓存存储单元622，还可以进一步包括只读存储单元(ROM)623。Thestorage unit 620 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 621 and/or acache storage unit 622 , and may further include a read-only storage unit (ROM) 623 .

存储单元620还可以包括具有一组(至少一个)程序模块625的程序/实用工具624，这样的程序模块625包括但不限于：操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。Thestorage unit 620 may also include a program/utility 624 having a set (at least one) ofprogram modules 625,such program modules 625 including but not limited to: an operating system, one or more application programs, other program modules, and program data, Implementations of networked environments may be included in each or some combination of these examples.

总线630可以为表示几类总线结构中的一种或多种，包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。Bus 630 may represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local area using any of a variety of bus structures. bus.

电子设备600也可以与一个或多个外部设备670(例如键盘、指向设备、蓝牙设备等)通信，还可与一个或者多个使得用户能与该电子设备600交互的设备通信，和/或与使得该电子设备600能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口650进行。并且，电子设备600还可以通过网络适配器660与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器660通过总线630与电子设备600的其它模块通信。应当明白，尽管图中未示出，可以结合电子设备600使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Theelectronic device 600 can also communicate with one or more external devices 670 (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more devices that enable the user to interact with theelectronic device 600, and/or communicate with Any device (eg, router, modem, etc.) that enables theelectronic device 600 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O)interface 650 . Moreover, theelectronic device 600 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network such as the Internet) through thenetwork adapter 660 . As shown, thenetwork adapter 660 communicates with other modules of theelectronic device 600 through thebus 630 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction withelectronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

通过以上的实施例的描述，本领域的技术人员易于理解，这里描述的示例实施例可以通过软件实现，也可以通过软件结合必要的硬件的方式来实现。因此，根据本公开实施例的技术方案可以以软件产品的形式体现出来，该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM，U盘，移动硬盘等)中或网络上，包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施例的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described here can be implemented by software, or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure can be embodied in the form of software products, and the software products can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present disclosure.

在本公开的示例性实施例中，还提供了一种计算机可读存储介质，其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施例中，本公开的各个方面还可以实现为一种程序产品的形式，其包括程序代码，当所述程序产品在终端设备上运行时，所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施例的步骤。In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium on which a program product capable of implementing the above-mentioned method in this specification is stored. In some possible embodiments, various aspects of the present disclosure may also be implemented in the form of a program product, which includes program code, and when the program product is run on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present disclosure described in the "Exemplary Method" section above in this specification.

参考图7所示，描述了根据本公开的实施例的用于实现上述语音质量评估方法的程序产品700，其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码，并可以在终端设备，例如个人电脑上运行。然而，本公开的程序产品不限于此，在本文件中，可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。As shown in FIG. 7 , aprogram product 700 for realizing the above-mentioned speech quality assessment method according to an embodiment of the present disclosure is described, which can adopt a portable compact disc read-only memory (CD-ROM) and include program codes, and can be used in Runs on end devices such as personal computers. However, the program product of the present disclosure is not limited thereto. In this document, a readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus or device.

所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product may reside on any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了可读程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质，该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in conjunction with an instruction execution system, apparatus, or device.

可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于无线、有线、光缆、RF等等，或者上述的任意合适的组合。Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中，远程计算设备可以通过任意种类的网络，包括局域网(LAN)或广域网(WAN)，连接到用户计算设备，或者，可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming language - such as "C" or a similar programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute. In cases involving a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., using an Internet service provider). business to connect via the Internet).

此外，上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明，而不是限制目的。易于理解，上述附图所示的处理并不表明或限制这些处理的时间顺序。另外，也易于理解，这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It is easy to understand that the processes shown in the above figures do not imply or limit the chronological order of these processes. In addition, it is also easy to understand that these processes may be executed synchronously or asynchronously in multiple modules, for example.

通过以上的实施例的描述，本领域的技术人员易于理解，这里描述的示例实施例可以通过软件实现，也可以通过软件结合必要的硬件的方式来实现。因此，根据本公开实施例的技术方案可以以软件产品的形式体现出来，该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM，U盘，移动硬盘等)中或网络上，包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本公开实施例的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described here can be implemented by software, or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure can be embodied in the form of software products, and the software products can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to make a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present disclosure.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由权利要求指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered exemplary only, with the true scope and spirit of the disclosure indicated by the appended claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A speech quality assessment method, comprising:

acquiring original voice data and lossy voice data output by the original voice data after being transmitted through a communication network;

performing voice recognition analysis on the original voice data and the lossy voice data to obtain a voice recognition analysis result;

and evaluating the voice quality score of the communication network according to the voice recognition analysis result.

2. The speech quality assessment method according to claim 1, characterized in that said method further comprises:

performing network transmission change analysis on the original voice data and the damaged voice data to obtain a network transmission change analysis result;

and carrying out voice coding analysis on the original voice data and the lossy voice data to obtain a voice coding analysis result.

3. The voice quality assessment method according to claim 2, wherein said assessing a voice quality score of said communication network based on a voice recognition analysis result comprises:

and evaluating the voice quality score of the communication network according to one or more combinations of the network transmission change analysis result, the voice coding analysis result and the voice recognition analysis result.

4. The method according to claim 2, wherein the network transmission change analysis result includes transmission delay data, and the performing network transmission change analysis on the original voice data and the damaged voice data to obtain a network transmission change analysis result includes:

determining a first time point corresponding to the identification position in the lossy voice data;

determining a corresponding second time point of the identification position in the original voice data;

and determining transmission delay data between the original voice data and the damaged voice data according to the first time point and the second time point.

5. The method of claim 4, wherein the determining the identified location in the lossy speech data comprises:

determining a target waveform in the lossy voice data according to a preset duration, and taking the position of the target waveform in the lossy voice data as the identification position; or alternatively

And inserting identification audio in the original voice data, and taking the position of the identification audio in the lossy voice data as the identification position.

6. The method according to claim 2, wherein the network transmission change analysis result includes voice amplitude change data, and the performing network transmission change analysis on the original voice data and the damaged voice data to obtain a network transmission change analysis result includes:

determining a first voice amplitude corresponding to the lossy voice data;

determining a second voice amplitude corresponding to the original voice data;

and determining voice amplitude change data between the original voice data and the damaged voice data according to the first voice amplitude and the second voice amplitude.

7. The method according to claim 2, wherein the network transmission change analysis result includes sound waveform loss data, and the performing network transmission change analysis on the original speech data and the damaged speech data to obtain a network transmission change analysis result includes:

carrying out envelope detection on the sound waveform of the original voice data and the sound waveform of the lossy voice data, and determining sound waveform loss data between the original voice data and the lossy voice data; and/or

And acquiring a packet loss parameter and a jitter parameter corresponding to the damaged voice data, and determining sound waveform loss data between the original voice data and the damaged voice data based on the packet loss parameter and the jitter parameter.

8. The method of claim 2, wherein the speech coding analysis result comprises a coding scheme score, and performing speech coding analysis on the original speech data and the lossy speech data to obtain a speech coding analysis result comprises:

determining a first coding mode and a first coding rate corresponding to the original voice data;

determining a second coding mode and a second coding rate corresponding to the lossy voice data;

determining a coding scheme score between the original speech data and the lossy speech data based on the first coding scheme, the first coding rate, the second coding scheme, and the second coding rate.

9. The method of claim 1, wherein the speech recognition analysis result comprises speech recognition accuracy, and performing speech recognition analysis on the original speech data and the lossy speech data to obtain a speech recognition analysis result comprises:

performing voice recognition on the lossy voice data, and determining a lossy voice text corresponding to the lossy voice data;

and acquiring an original voice text corresponding to the original voice data, comparing the damaged voice text with the original voice text, and determining the voice recognition accuracy between the original voice data and the damaged voice data.

10. The speech quality assessment method according to claim 1, characterized in that said method further comprises:

and determining the voice quality score of the communication network according to a preset evaluation period.

11. A speech quality assessment apparatus, comprising:

the voice data acquisition module is used for acquiring original voice data and acquiring lossy voice data output by the original voice data after the original voice data is transmitted through a communication network;

the voice recognition analysis module is used for carrying out voice recognition analysis on the original voice data and the lossy voice data to obtain a voice recognition analysis result;

and the voice quality evaluation module is used for evaluating the voice quality score of the communication network according to the voice recognition analysis result.

12. An electronic device, comprising:

a processor; and

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the speech quality assessment method of any one of claims 1 to 10.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the speech quality assessment method according to any one of claims 1 to 10.