CN118280351A

Movatterモバイル変換

Info

Publication number: CN118280351A
Application number: CN202410402433.3A
Authority: CN
Inventors: 张磊; 罗剑; 刘占杰
Original assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Zeekr Intelligent Technology Co Ltd
Current assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Zeekr Intelligent Technology Co Ltd
Priority date: 2024-04-03
Filing date: 2024-04-03
Publication date: 2024-07-02

Abstract

The invention relates to the technical field of vehicle-mounted voice, and discloses a voice recognition method, a device, equipment and a storage medium, wherein the method comprises the following steps: when current voice information is received, determining a target NLU model from a plurality of NLU models based on a preset rule; identifying the current voice information through the target NLU model to obtain an identification result; determining user intention according to the current voice information; judging whether the identification result is matched with the user intention; if not, the preset rule is adjusted until the obtained identification result is matched with the user intention. According to the method and the device for adjusting the preset rules, when the obtained identification result is not matched with the intention of the user, the preset rules can be automatically adjusted in time, the preset rules do not need to be manually retested and adjusted in idle time, and the use experience of the user is effectively improved.

Description

Translated fromChinese

语音识别方法、装置、设备及存储介质Speech recognition method, device, equipment and storage medium

技术领域Technical Field

本发明涉及车载语音技术领域，尤其涉及一种语音识别方法、装置、设备及存储介质。The present invention relates to the field of vehicle-mounted voice technology, and in particular to a voice recognition method, device, equipment and storage medium.

背景技术Background technique

随着自然语言理解(Natural Language Understanding，NLU)的发展，NLU模型已广泛应用于车载智能助手中。车载智能助手通过多个NLU模型识别用户语音，然后生成相应的回复，实现与用户的对话。With the development of natural language understanding (NLU), NLU models have been widely used in in-vehicle smart assistants. In-vehicle smart assistants recognize user voices through multiple NLU models, and then generate corresponding responses to achieve dialogue with users.

为了避免同一用户语音同时使用多个NLU模型进行处理产生冲突，目前，在语音云端配置了相应规则，如预设规则，该规则能针对不同的用户语音类型，由技术人员通过测试预先从多个NLU模型中选择固定的NLU模型来识别该类用户语音。In order to avoid conflicts caused by using multiple NLU models to process the same user voice at the same time, corresponding rules are currently configured in the voice cloud, such as preset rules. The rules can target different user voice types. Technical personnel can select a fixed NLU model from multiple NLU models in advance through testing to identify this type of user voice.

但是，上述规则的配置方式中，若通过规则反馈的回答违背了用户意图，则需要在空闲时间由人工重新测试调整，无法及时调整，影响了用户的使用体验。However, in the configuration method of the above rules, if the answer fed back by the rules violates the user's intention, it needs to be manually retested and adjusted during free time, which cannot be adjusted in time, affecting the user's experience.

上述内容仅用于辅助理解本发明的技术方案，并不代表承认上述内容是现有技术。The above contents are only used to assist in understanding the technical solution of the present invention and do not constitute an admission that the above contents are prior art.

发明内容Summary of the invention

本发明的主要目的在于提供了一种语音识别方法、装置、设备及存储介质，旨在解决现有技术若通过相应规则反馈的回答违背了用户意图，则需要在空闲时间由人工重新测试调整，无法及时调整，影响了用户的使用体验的技术问题。The main purpose of the present invention is to provide a speech recognition method, device, equipment and storage medium, aiming to solve the technical problem that if the answer fed back by the corresponding rules violates the user's intention, it needs to be manually retested and adjusted in spare time, and the adjustment cannot be made in time, which affects the user's experience.

为实现上述目的，本发明提供了一种语音识别方法，所述方法包括以下步骤：To achieve the above object, the present invention provides a speech recognition method, which comprises the following steps:

在接收到当前语音信息时，基于预设规则从多个NLU模型中选取目标NLU模型；When receiving the current voice information, a target NLU model is selected from multiple NLU models based on preset rules;

通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果；Recognize the current voice information through the target NLU model to obtain a recognition result;

根据所述当前语音信息确定用户意图；Determining user intention based on the current voice information;

判断所述识别结果是否与所述用户意图匹配；Determining whether the recognition result matches the user intention;

若否，则对所述预设规则进行调整，直至获得的识别结果与所述用户意图匹配。If not, the preset rules are adjusted until the obtained recognition result matches the user intention.

可选地，所述在接收到当前语音信息时，基于预设规则从多个NLU模型中确定目标NLU模型的步骤之前，还包括：Optionally, before the step of determining the target NLU model from multiple NLU models based on a preset rule when receiving the current voice information, the method further includes:

获取语音数据样本；Obtaining voice data samples;

调用所述预设规则中配置的各NLU模型识别所述语音数据样本，获得所述各NLU模型的识别结果；Calling each NLU model configured in the preset rule to recognize the voice data sample, and obtaining the recognition result of each NLU model;

根据所述识别结果确定所述各NLU模型的优先级指标，并通过所述优先级指标确定所述各NLU模型的优先级；Determine a priority index of each NLU model according to the recognition result, and determine the priority of each NLU model by the priority index;

相应地，若否，则对所述预设规则进行调整，直至获得的识别结果与所述用户意图匹配的步骤，包括：Correspondingly, if not, the step of adjusting the preset rules until the obtained recognition result matches the user intention includes:

若否，则对所述预设规则进行调整，将所述预设规则中当前选择的NLU模型切换为下一优先级的NLU模型，直至获得的识别结果与所述用户意图匹配。If not, the preset rules are adjusted, and the currently selected NLU model in the preset rules is switched to an NLU model of the next priority until the obtained recognition result matches the user intention.

可选地，所述语音数据样本包括正例样本和反例样本，所述优先级指标包括第一优先级指标和第二优先级指标；Optionally, the speech data samples include positive samples and negative samples, and the priority index includes a first priority index and a second priority index;

所述根据所述识别结果确定所述各NLU模型的优先级的步骤，包括：The step of determining the priority of each NLU model according to the recognition result includes:

从所述识别结果中确定所述各NLU模型预测所述正例样本和所述反例样本为正例的第一样本数，并确定所述第一样本数中属于所述正例样本的第二样本数；Determine, from the recognition results, a first number of samples in which each NLU model predicts that the positive sample and the negative sample are positive samples, and determine a second number of samples that are positive samples in the first number of samples;

根据所述第一样本数与所述正例样本和所述反例样本的样本总数确定所述各NLU模型的第一优先级指标；Determine a first priority index of each NLU model according to the first number of samples and the total number of samples of the positive example samples and the negative example samples;

根据所述第二样本数与所述正例样本的数目确定所述各NLU模型的第二优先级指标。A second priority index of each NLU model is determined according to the second number of samples and the number of positive samples.

可选地，所述通过所述优先级指标确定所述各NLU模型的优先级的步骤，包括：Optionally, the step of determining the priority of each NLU model by using the priority indicator includes:

根据所述第一优先级指标和所述第二优先级估指标确定所述各NLU模型的调和平均值；Determine a harmonic mean of the NLU models according to the first priority indicator and the second priority estimation indicator;

基于所述调和平均值的取值范围确定所述各NLU模型的优先级。The priority of each NLU model is determined based on the value range of the harmonic mean.

可选地，所述根据所述当前语音信息确定用户意图的步骤，包括：Optionally, the step of determining the user intention according to the current voice information includes:

判断是否存在与所述当前语音信息存在关联的历史语音信息；Determining whether there is historical voice information associated with the current voice information;

若存在，则根据所述历史语音信息所属的意图类别预测所述当前语音信息属于各预设意图类别的概率；If so, predict the probability that the current voice information belongs to each preset intention category according to the intention category to which the historical voice information belongs;

将所述各预设意图类别中概率最大的目标意图类别作为用户意图。The target intent category with the highest probability among the preset intent categories is taken as the user intent.

可选地，所述判断是否存在所述当前语音信息的历史语音信息的步骤之后，还包括：Optionally, after the step of determining whether there is historical voice information of the current voice information, the method further includes:

若不存在，则对所述当前语音信息进行文本识别，获得文本信息；If not, performing text recognition on the current voice information to obtain text information;

识别出所述文本信息中的多个领域关键词和多个动作关键词；Identifying a plurality of domain keywords and a plurality of action keywords in the text information;

基于所述多个领域关键词和所述多个动作关键词确定所述文本信息的多个意图信息；Determining a plurality of intention information of the text information based on the plurality of domain keywords and the plurality of action keywords;

确定所述多个意图信息的概率得分，并将概率得分最高的意图信息作为用户意图。Determine probability scores of the multiple intent information, and use the intent information with the highest probability score as the user intent.

可选地，所述通过预设规则识别接收到的当前语音信息，获得识别结果的步骤，包括：Optionally, the step of recognizing the received current voice information by using preset rules to obtain a recognition result includes:

提取所述当前语音信息的语义信息；Extracting semantic information of the current speech information;

基于所述语义信息对所述当前语音信息进行语义拒识判断；Performing semantic rejection judgment on the current voice information based on the semantic information;

若所述当前语音信息未被判定为语义拒识，则通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果。If the current voice information is not determined to be semantically rejected, the current voice information is recognized by the target NLU model to obtain a recognition result.

此外，为实现上述目的，本发明还提出一种语音识别装置，所述装置包括：In addition, to achieve the above object, the present invention also provides a speech recognition device, the device comprising:

语音识别模块，用于通过预设规则识别接收到的当前语音信息，获得识别结果，所述预设规则中配置的各NLU模型存在优先级；A speech recognition module is used to recognize the current speech information received through preset rules to obtain a recognition result, wherein each NLU model configured in the preset rules has a priority;

意图确定模块，用于根据所述当前语音信息确定用户意图；An intention determination module, used to determine the user intention based on the current voice information;

结果匹配模块，用于判断所述识别结果是否与所述用户意图匹配；A result matching module, used to determine whether the recognition result matches the user intention;

规则调整模块，用于若否，则对所述预设规则进行调整，直至获得的识别结果与所述用户意图匹配。The rule adjustment module is used to adjust the preset rule if no, until the obtained recognition result matches the user intention.

此外，为实现上述目的，本发明还提出一种语音识别设备，所述设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的语音识别程序，所述语音识别程序配置为实现如上文所述的语音识别方法的步骤。In addition, to achieve the above-mentioned purpose, the present invention also proposes a speech recognition device, which includes: a memory, a processor, and a speech recognition program stored in the memory and executable on the processor, wherein the speech recognition program is configured to implement the steps of the speech recognition method described above.

此外，为实现上述目的，本发明还提出一种存储介质，所述存储介质上存储有语音识别程序，所述语音识别程序被处理器执行时实现如上文所述的语音识别方法的步骤。In addition, to achieve the above-mentioned purpose, the present invention further proposes a storage medium, on which a speech recognition program is stored, and when the speech recognition program is executed by a processor, the steps of the speech recognition method described above are implemented.

本发明提供了一种语音识别方法、装置、设备及存储介质，该方法通过在接收到当前语音信息时，基于预设规则从多个NLU模型中确定目标NLU模型；通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果；根据当前语音信息确定用户意图；判断识别结果是否与用户意图匹配；若否，则对所述预设规则进行调整，直至获得的识别结果与所述用户意图匹配。本发明通过在目标NLU模型对当前用户语音进行识别获得的识别结果与用户意图不匹配时，能够及时自动对预设规则进行调整，无需在空闲时间由人工重新测试调整预设规则便能够使识别结果与用户意图匹配，有效提高了用户的使用体验。The present invention provides a speech recognition method, device, equipment and storage medium. The method determines a target NLU model from multiple NLU models based on preset rules when receiving current speech information; recognizes the current speech information through the target NLU model to obtain a recognition result; determines the user's intention based on the current speech information; determines whether the recognition result matches the user's intention; if not, adjusts the preset rules until the obtained recognition result matches the user's intention. The present invention can automatically and timely adjust the preset rules when the recognition result obtained by recognizing the current user's speech through the target NLU model does not match the user's intention, and can make the recognition result match the user's intention without manually retesting and adjusting the preset rules in idle time, thereby effectively improving the user's experience.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例方案涉及的硬件运行环境的语音识别设备结构示意图；FIG1 is a schematic diagram of the structure of a speech recognition device in a hardware operating environment according to an embodiment of the present invention;

图2为本发明语音识别方法第一实施例的流程示意图；FIG2 is a flow chart of a first embodiment of a speech recognition method according to the present invention;

图3为本发明语音识别方法第二实施例的流程示意图；FIG3 is a flow chart of a second embodiment of a speech recognition method according to the present invention;

图4为本发明语音识别方法第三实施例的流程示意图；FIG4 is a flow chart of a third embodiment of a speech recognition method according to the present invention;

图5为本发明语音识别装置第一实施例的结构框图。FIG. 5 is a structural block diagram of the first embodiment of the speech recognition device of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose, functional features and advantages of the present invention will be further explained in conjunction with embodiments and with reference to the accompanying drawings.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, and are not used to limit the present invention.

参照图1，图1为本发明实施例方案涉及的硬件运行环境的语音识别设备结构示意图。Refer to FIG. 1 , which is a schematic diagram of the structure of a speech recognition device in a hardware operating environment according to an embodiment of the present invention.

如图1所示，该语音识别设备可以包括：处理器1001，例如中央处理器(CentralProcessing Unit，CPU)，通信总线1002、用户接口1003，网络接口1004，存储器1005。其中，通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard)，可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(Wireless-Fidelity，Wi-Fi)接口)。存储器1005可以是高速的随机存取存储器(RandomAccess Memory，RAM)，也可以是稳定的非易失性存储器(Non-Volatile Memory，NVM)，例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG1 , the speech recognition device may include: a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (Wireless-Fidelity, Wi-Fi) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM), or a stable non-volatile memory (Non-Volatile Memory, NVM), such as a disk storage. The memory 1005 may also be a storage device independent of the aforementioned processor 1001.

本领域技术人员可以理解，图1中示出的结构并不构成对语音识别设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art will appreciate that the structure shown in FIG. 1 does not constitute a limitation on the speech recognition device, and may include more or fewer components than shown in the figure, or a combination of certain components, or a different arrangement of components.

如图1所示，作为一种存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及语音识别程序。As shown in FIG. 1 , the memory 1005 as a storage medium may include an operating system, a network communication module, a user interface module, and a voice recognition program.

在图1所示的语音识别设备中，网络接口1004主要用于与网络服务器进行数据通信；用户接口1003主要用于与用户进行数据交互；本发明语音识别设备中的处理器1001、存储器1005可以设置在语音识别设备中，所述语音识别设备通过处理器1001调用存储器1005中存储的语音识别程序，并执行本发明实施例提供的语音识别方法。In the speech recognition device shown in Figure 1, the network interface 1004 is mainly used for data communication with the network server; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and the memory 1005 in the speech recognition device of the present invention can be set in the speech recognition device, and the speech recognition device calls the speech recognition program stored in the memory 1005 through the processor 1001, and executes the speech recognition method provided by the embodiment of the present invention.

本发明实施例提供了一种语音识别方法，参照图2，图2为本发明语音识别方法第一实施例的流程示意图。An embodiment of the present invention provides a speech recognition method. Referring to FIG. 2 , FIG. 2 is a flow chart of a first embodiment of the speech recognition method of the present invention.

本实施例中，所述语音识别方法包括以下步骤：In this embodiment, the speech recognition method includes the following steps:

步骤S10：在接收到当前语音信息时，基于预设规则从多个NLU模型中选取目标NLU模型。Step S10: When the current voice information is received, a target NLU model is selected from multiple NLU models based on preset rules.

需要说明的是，本实施例方法的执行主体可以是具有语音识别、网络通信以及程序运行功能的计算服务设备，例如手机、平板电脑、个人电脑等，还可以是实现相同或相似功能的其他电子设备。以下以上述语音识别设备(简称调整设备)对本实施例和下述各实施例进行说明。It should be noted that the execution subject of the method of this embodiment can be a computing service device with voice recognition, network communication and program running functions, such as a mobile phone, a tablet computer, a personal computer, etc., and can also be other electronic devices that achieve the same or similar functions. The following embodiment and the following embodiments are described using the above-mentioned voice recognition device (referred to as the adjustment device).

可理解的是，上述预设规则可为当多个NLU模型在处理同一用户语音信息时，出现不一致或冲突的结果时，需要遵循的决策或判定规则。例如，当所接收到的用户语音信息支持多个NLU模型处理时，可以从多个NLU模型中选择预设规则中指定的NLU模型对该用户语音信息进行处理，避免多个NLU模型同时处理用户语音信息所产生冲突，确保能够产生正确的识别结果。It is understandable that the above preset rules may be decision or judgment rules that need to be followed when multiple NLU models produce inconsistent or conflicting results when processing the same user voice information. For example, when the received user voice information supports multiple NLU models to process, the NLU model specified in the preset rules can be selected from multiple NLU models to process the user voice information, avoiding conflicts caused by multiple NLU models processing the user voice information at the same time, and ensuring that correct recognition results can be generated.

需要说明的是，上述NLU模型可为识别语音输入的含义和意图的模型。该NLU模型可用于处理和分析人类语言，以识别文本或语音中的关键信息，包括用户的意图、情感、命名实体、关键词等。It should be noted that the above NLU model can be a model for recognizing the meaning and intent of voice input. The NLU model can be used to process and analyze human language to identify key information in text or voice, including user intent, emotion, named entities, keywords, etc.

在具体实现中，上述调整设备可以应用于车载智能助手中，并且可以获取语音云端已配置的预设规则，当接收到用户语音输入的当前语音信息时，可以通过该预设规则从支持处理该当前语音信息的多个NLU模型中选择目标NLU模型。In a specific implementation, the above-mentioned adjustment device can be applied to an in-vehicle intelligent assistant, and can obtain the preset rules configured in the voice cloud. When the current voice information input by the user's voice is received, the target NLU model can be selected from multiple NLU models that support processing the current voice information through the preset rules.

步骤S20：通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果。Step S20: Recognize the current voice information through the target NLU model to obtain a recognition result.

在具体实现中，上述调整设备可以调用上述目标NLU模型，由目标NLU模型对当前语音信息进行识别，分析出当前语音信息的含义和意图，生成回应用户的回复信息，并将该回复信息作为识别结果反馈至调整设备。In a specific implementation, the above-mentioned adjustment device can call the above-mentioned target NLU model, and the target NLU model recognizes the current voice information, analyzes the meaning and intention of the current voice information, generates reply information to respond to the user, and feeds back the reply information as a recognition result to the adjustment device.

进一步地，本实施例中，所述步骤S20，包括：Furthermore, in this embodiment, step S20 includes:

步骤S201：提取所述当前语音信息的语义信息。Step S201: extracting semantic information of the current voice information.

在具体实现中，上述调整设备在接收到上述当前用户语音时，可以将当前语音信息转换为文本，对识别出的文本进行预处理，包括分词、去除停用词、词干提取等操作，以便提取文本中的特征，然后使用词袋模型、TF-IDF等方法提取文本中的语义信息。In a specific implementation, when the above-mentioned adjustment device receives the above-mentioned current user voice, it can convert the current voice information into text, and preprocess the recognized text, including word segmentation, stop word removal, stem extraction and other operations, so as to extract features in the text, and then use bag-of-words model, TF-IDF and other methods to extract semantic information in the text.

步骤S202：基于所述语义信息对所述当前语音信息进行语义拒识判断。Step S202: Performing semantic rejection judgment on the current voice information based on the semantic information.

需要说明的是，可以根据业务需求和数据特点定义语义拒识规则。这些规则可以基于语义信息的完整性、准确性、一致性等方面进行制定。例如，如果提取出的实体与预期不符，或者意图与上下文矛盾，则可能被判定为语义拒识。It should be noted that semantic rejection rules can be defined based on business needs and data characteristics. These rules can be formulated based on the completeness, accuracy, consistency, etc. of semantic information. For example, if the extracted entity does not match the expectation, or the intent is inconsistent with the context, it may be judged as semantic rejection.

在具体实现中，上述调整设备可以判断上述语义信息是否与上述语义拒识规则进行匹配来进行语义拒识判断，以判断该语义信息是否满足拒绝识别的条件。In a specific implementation, the adjustment device may determine whether the semantic information matches the semantic rejection rule to perform a semantic rejection judgment, so as to determine whether the semantic information meets the condition for rejecting recognition.

步骤S203：若所述当前语音信息未被判定为语义拒识，则通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果。Step S203: If the current voice information is not determined to be semantically rejected, the current voice information is recognized by the target NLU model to obtain a recognition result.

在具体实现中，上述调整识别在判断出上述当前语音信息未被判定为语义拒识时，认定当前语义信息符合处理条件，便可调用目标NLU模型对所述当前语音信息进行识别，获得识别结果，从而避免用户输入不正确的语音信息仍然进行处理获得错误的答复，提高语音信息的处理精度。In a specific implementation, when the above-mentioned adjustment recognition determines that the above-mentioned current voice information is not determined to be semantically rejected, it is determined that the current semantic information meets the processing conditions, and then the target NLU model can be called to recognize the current voice information to obtain a recognition result, thereby avoiding the user inputting incorrect voice information and still processing it to obtain an erroneous answer, thereby improving the processing accuracy of the voice information.

应理解的是，若判断出当前语音信息被判定为语义拒识，则认定当前语义信息不符合处理条件，不进行处理，并且可以输出提示信息至用户，如语音输出输入有误等信息，以使用户重新输入。It should be understood that if the current voice information is determined to be semantically rejected, the current semantic information is deemed not to meet the processing conditions and is not processed, and prompt information can be output to the user, such as information that the voice output input is incorrect, so that the user can re-enter.

步骤S30：根据所述当前语音信息确定用户意图。Step S30: Determine the user intention according to the current voice information.

在具体实现中，上述调整设备可以将当前语音信息转换为文本后，提取文本中的特征信息，然后使用预先训练好的意图分类模型，将提取的特征信息输入至意图分类模型中，获得用户意图的分类结果，然后将分类结果与预先定义的意图列表进行匹配，将最匹配的意图作为用户的真实意图。In a specific implementation, the above-mentioned adjustment device can convert the current voice information into text, extract feature information from the text, and then use a pre-trained intent classification model to input the extracted feature information into the intent classification model to obtain the classification result of the user's intent, and then match the classification result with a pre-defined intent list, and take the best matching intent as the user's true intention.

其中，上述意图分类模型可以通过监督学习或无监督学习的方法进行训练，学习文本特征与用户意图之间的映射关系。The above-mentioned intent classification model can be trained by supervised learning or unsupervised learning methods to learn the mapping relationship between text features and user intent.

步骤S40：判断所述识别结果是否与所述用户意图匹配。Step S40: Determine whether the recognition result matches the user intention.

在具体实现中，上述调整设备可以判断上述识别结果中的回复信息是否符合该用户意图来判断识别结果是否与用户意图匹配。In a specific implementation, the adjustment device may determine whether the reply information in the recognition result matches the user's intention by determining whether the recognition result matches the user's intention.

步骤S50：若否，则对所述预设规则进行调整，直至获得的识别结果与所述用户意图匹配。Step S50: If not, the preset rule is adjusted until the obtained recognition result matches the user intention.

在具体实现中，上述调整设备在上述识别结果中的回复信息不符合所确定的用户意图时，便可判定基于上述预设规则选择的NLU模型存在误差，需要调整。此时，可以将当前选择的NLU模型切换为其他NLU模型，然后重复上述过程，以调整后的预设规则重新从NLU模型中选取目标NLU模型对当前语音信息进行识别，直至所获得的识别结果与用户意图匹配，从而在基于用户输入的用户语音信息反馈回复信息这一过程中实现对预设规则的动态调整，并向用户回复与用户意图的回复信息，无需在线下空闲时间由人工进行大量测试，有效提高了用户的使用体验。In a specific implementation, when the reply information in the above recognition result does not conform to the determined user intention, the above adjustment device can determine that the NLU model selected based on the above preset rules has errors and needs to be adjusted. At this time, the currently selected NLU model can be switched to other NLU models, and then the above process is repeated, and the target NLU model is re-selected from the NLU model with the adjusted preset rules to recognize the current voice information until the obtained recognition result matches the user intention, thereby realizing dynamic adjustment of the preset rules in the process of feedback reply information based on the user voice information input by the user, and replying to the user with reply information that matches the user's intention, without the need for a large number of manual tests in offline idle time, which effectively improves the user experience.

本实施例通过在接收到当前语音信息时，基于预设规则从多个NLU模型中确定目标NLU模型；通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果；根据当前语音信息确定用户意图；判断识别结果是否与用户意图匹配；若否，则对所述预设规则进行调整，直至获得的识别结果与所述用户意图匹配。本实施例通过在目标NLU模型对当前用户语音进行识别获得的识别结果与用户意图不匹配时，能够及时自动对预设规则进行调整，无需在空闲时间由人工重新测试调整预设规则便能够使识别结果与用户意图匹配，有效提高了用户的使用体验。This embodiment determines the target NLU model from multiple NLU models based on preset rules when receiving the current voice information; recognizes the current voice information through the target NLU model to obtain a recognition result; determines the user's intention based on the current voice information; determines whether the recognition result matches the user's intention; if not, adjusts the preset rules until the obtained recognition result matches the user's intention. This embodiment can automatically and timely adjust the preset rules when the recognition result obtained by recognizing the current user's voice with the target NLU model does not match the user's intention, and can make the recognition result match the user's intention without manually retesting and adjusting the preset rules in idle time, thereby effectively improving the user's experience.

参考图3，图3为本发明语音识别方法第二实施例的流程示意图。Refer to FIG. 3 , which is a flow chart of a second embodiment of a speech recognition method according to the present invention.

基于上述第一实施例，在本实施例中，所述步骤S10之前，还包括：Based on the first embodiment above, in this embodiment, before step S10, the following steps are further included:

步骤S01：获取语音数据样本。Step S01: Acquire a voice data sample.

需要说明的是，上述语音数据样本可以为参考用户以往输入的用户语音所构建的样本数据。It should be noted that the above-mentioned voice data samples may be sample data constructed with reference to user voices previously input by the user.

在具体实现中，上述调整设备可以获取以往接收到的历史语音数据，然后基于该历史语音数据构建语音数据样本，或者，参考历史语音数据的结构和类型，构建与历史语音数据结构和类型一致的语音数据样本。In a specific implementation, the above adjustment device can obtain historical voice data received in the past, and then construct a voice data sample based on the historical voice data, or refer to the structure and type of the historical voice data to construct a voice data sample consistent with the structure and type of the historical voice data.

步骤S02：调用所述预设规则中配置的各NLU模型识别所述语音数据样本，获得所述各NLU模型的识别结果。Step S02: calling each NLU model configured in the preset rule to recognize the voice data sample, and obtaining the recognition results of each NLU model.

在具体实现中，上述调整设备可以调用上述预设规则中所配置的所有NLU模型依次识别上述语音数据样本，获得各NLU模型针对语音数据样本的识别结果。In a specific implementation, the adjustment device may call all NLU models configured in the preset rules to sequentially identify the voice data samples, and obtain recognition results of the NLU models for the voice data samples.

步骤S03：根据所述识别结果确定所述各NLU模型的优先级指标，并通过所述优先级指标确定所述各NLU模型的优先级。Step S03: Determine the priority index of each NLU model according to the recognition result, and determine the priority of each NLU model through the priority index.

在具体实现中，上述调整设备可以根据识别结果中各NLU模型的识别精度高低分配各NLU模型的优先级指标，即识别精度高的NLU模型分配较高的优先级指标。In a specific implementation, the above-mentioned adjustment device can allocate a priority index to each NLU model according to the recognition accuracy of each NLU model in the recognition result, that is, an NLU model with high recognition accuracy is allocated a higher priority index.

相应地，本实施例中，所述步骤S50包括：Accordingly, in this embodiment, step S50 includes:

在具体实现中，上述调整设备在上述响应结果中的回复信息不符合所确定的用户意图时，可以将当前选择的NLU模型切换为下一优先级的NLU模型，重复上述过程，以下一优先级的NLU模型对当前语音信息进行识别，直至所获得的响应结果与用户意图匹配。In a specific implementation, when the reply information of the above-mentioned adjustment device in the above-mentioned response result does not conform to the determined user intention, the currently selected NLU model can be switched to the NLU model of the next priority, and the above-mentioned process can be repeated to recognize the current voice information with the NLU model of the next priority until the obtained response result matches the user intention.

此外，当经过上述方式调整NLU仲裁规则将最高优先级的NLU模型切换为目标优先级的NLU模型之后，可以将目标优先级的NLU模型作为下一轮接收到用户语音信息的首个使用的NLU模型，若目标优先级的NLU模型生成的响应经过不符合用户意图，可以继续按上述重复上述方式，此时，若遍历到最低优先级的NLU模型仍然不符合用户意图，可以重新使用最高优先级的NLU模型重复上述过程。若遍历所有NLU模型仍然无法符合用户意图，则可生成提示信息至技术人员，由技术人员线下重新测试调整。In addition, after the NLU arbitration rules are adjusted in the above manner to switch the highest priority NLU model to the target priority NLU model, the target priority NLU model can be used as the first NLU model to be used in the next round of receiving user voice information. If the response generated by the target priority NLU model does not meet the user's intention, the above method can be repeated. At this time, if the lowest priority NLU model still does not meet the user's intention, the highest priority NLU model can be reused to repeat the above process. If all NLU models still cannot meet the user's intention, a prompt message can be generated to the technician, who will retest and adjust offline.

进一步地，本实施例中，所述语音数据样本包括正例样本和反例样本，所述优先级指标包括第一优先级指标和第二优先级指标，所述步骤S03包括：Further, in this embodiment, the speech data sample includes a positive sample and a negative sample, the priority index includes a first priority index and a second priority index, and the step S03 includes:

步骤S031：从所述识别结果中确定所述各NLU模型预测所述正例样本和所述反例样本为正例的第一样本数，并确定所述第一样本数中属于所述正例样本的第二样本数。Step S031: determining, from the recognition results, a first number of samples in which each NLU model predicts that the positive sample and the negative sample are positive samples, and determining a second number of samples in the first number of samples that are positive samples.

需要说明的是，上述正例样本可为期望NLU模型预测正确的样本。It should be noted that the above positive examples are samples that the expected NLU model predicts correctly.

相应地，上述反例样本可为不期望NLU模型将其误测为正例的样本，如与正例样本不同的文本或句子，或者是背景噪声等。Accordingly, the above-mentioned counterexample samples may be samples that are not expected to be misidentified as positive examples by the NLU model, such as text or sentences different from the positive example samples, or background noise, etc.

在具体实现中，上述调整识别可以从上述识别结果中确定各NLU模型正确预测出正例样本的数目和正确预测出反例样本(即未将反例样本误预测为正例样本)的数目，将正确预测出正例样本的数目和正确预测出反例样本的数目作为第一样本数，然后确定第一样本数中属于正例样本的第二样本数，即正确预测正例样本的数目。In a specific implementation, the above-mentioned adjustment identification can determine the number of positive samples correctly predicted by each NLU model and the number of negative samples correctly predicted (i.e., the negative samples are not mistakenly predicted as positive samples) from the above-mentioned recognition results, and use the number of correctly predicted positive samples and the number of correctly predicted negative samples as the first sample number, and then determine the second sample number belonging to the positive samples in the first sample number, that is, the number of correctly predicted positive samples.

步骤S032：根据所述第一样本数与所述正例样本和所述反例样本的样本总数确定所述各NLU模型的第一优先级指标。Step S032: determining a first priority index of each NLU model according to the first number of samples and the total number of samples of the positive example samples and the negative example samples.

在具体实现中，上述调整设备可以计算各NLU模型正确预测的第一样本数占正例样本和反例样本的样本总数的比例，并将计算出来的比例作为各NLU模型的预测准确率，该预测准确率即可为上述第一优先级指标。In a specific implementation, the above-mentioned adjustment device can calculate the ratio of the first number of samples correctly predicted by each NLU model to the total number of positive samples and negative samples, and use the calculated ratio as the prediction accuracy of each NLU model, which can be the above-mentioned first priority indicator.

步骤S033：根据所述第二样本数与所述正例样本的数目确定所述各NLU模型的第二优先级指标。Step S033: Determine the second priority index of each NLU model according to the second sample number and the number of positive samples.

在具体实现中，上述调整设备可以计算准确预测正例样本的第二样本数占正例样本的数目的比例，并将计算出来的比例作为各NLU模型的预测召回率，该预测召回率即可为上述第二优先级指标。In a specific implementation, the adjustment device may calculate the ratio of the number of second samples that accurately predict positive samples to the number of positive samples, and use the calculated ratio as the prediction recall rate of each NLU model, which may be the second priority indicator.

步骤S04：通过所述优先级指标确定所述各NLU模型的优先级。Step S04: Determine the priority of each NLU model according to the priority index.

在具体实现中，上述调整设备可以结合上述第一优先级指标和上述第二优先级指标，即综合各NLU模型的预测准确率和预测召回率来为各NLU模型分配不同的优先级。In a specific implementation, the adjustment device may combine the first priority index and the second priority index, that is, comprehensively consider the prediction accuracy and prediction recall of each NLU model to assign different priorities to each NLU model.

进一步地，本实施例中，所述步骤S04的步骤，包括：Furthermore, in this embodiment, the steps of step S04 include:

步骤S041：根据所述第一优先级指标和所述第二优先级估指标确定所述各NLU模型的调和平均值。Step S041: Determine the harmonic mean of each NLU model according to the first priority index and the second priority estimation index.

在具体实现中，上述调整设备可以通过预设调和平均值公式结合第一优先级指标和第二优先级估指标计算各NLU模型的调和平均值。其中，预设调和平均值公式为：In a specific implementation, the adjustment device can calculate the harmonic mean of each NLU model by combining the first priority index and the second priority estimation index through a preset harmonic mean formula. The preset harmonic mean formula is:

式中，F1为调和平均值，precision为第一优先级指标，recall为第二优先级指标。In the formula, F1 is the harmonic mean, precision is the first priority indicator, and recall is the second priority indicator.

步骤S042：基于所述调和平均值的取值范围确定所述各NLU模型的优先级。Step S042: Determine the priority of each NLU model based on the value range of the harmonic mean.

在具体实现中，上述调和平均值同时兼顾了各NLU模型的第一优先级指标和第二优先级指标，是第一优先级指标和第二优先级指标的调和平均数，且调和平均值的取值范围在0到1之间。上述调整设备在确定各NLU模型的调和平均值后，可以为调和平均值靠近1的NLU模型分配更高的优先级指标，为调和平均值靠近0的NLU模型分配较低的优先级指标，从而有效提高了各NLU模型优先级的分配精度。In a specific implementation, the harmonic mean takes into account both the first priority index and the second priority index of each NLU model, and is the harmonic mean of the first priority index and the second priority index, and the value range of the harmonic mean is between 0 and 1. After determining the harmonic mean of each NLU model, the adjustment device can assign a higher priority index to the NLU model whose harmonic mean is close to 1, and assign a lower priority index to the NLU model whose harmonic mean is close to 0, thereby effectively improving the accuracy of the priority allocation of each NLU model.

本实施例通过获取语音数据样本；调用所述预设规则中配置的各NLU模型识别所述语音数据样本，获得所述各NLU模型的识别结果；根据所述识别结果确定所述各NLU模型的优先级指标；通过所述优先级指标确定所述各NLU模型的优先级。本实施例通过各NLU模型预测正例样本和反例样本，根据正例样本和反例样本的预测情况确定第一优先级指标和第二优先级指标，然后通过调和平均值综合第一优先级指标和第二优先级指标，能够更为准确地为各NLU模型分配不同的优先级，有效提高了NLU模型优先级分配的精度。This embodiment obtains a speech data sample; calls each NLU model configured in the preset rule to identify the speech data sample, and obtains the recognition result of each NLU model; determines the priority index of each NLU model according to the recognition result; and determines the priority of each NLU model according to the priority index. This embodiment predicts positive samples and negative samples through each NLU model, determines the first priority index and the second priority index according to the prediction of the positive samples and the negative samples, and then integrates the first priority index and the second priority index through the harmonic mean, so as to more accurately assign different priorities to each NLU model, and effectively improve the accuracy of the priority assignment of the NLU model.

参考图4，图4为本发明语音识别方法第三实施例的流程示意图。Refer to FIG. 4 , which is a flow chart of a third embodiment of a speech recognition method according to the present invention.

基于上述各实施例，在本实施例中，所述步骤S30包括：Based on the above embodiments, in this embodiment, step S30 includes:

步骤S301：判断是否存在与所述当前语音信息存在关联的历史语音信息。Step S301: Determine whether there is any historical voice information associated with the current voice information.

需要说明的是，历史语音信息可为用户在当前输入语音信息之前所输入的语音。例如，用户输入语音“查询某某区域是否存在A餐馆”，车载智能助手回复“存在”，用户又输入语音“导航至A餐馆”，其中，“导航至A餐馆”这一语音即可作为当前语音信息，“查询某某区域是否存在A餐馆”这部分语音信息即可作为与当前语音信息存在关联的历史语音信息。It should be noted that historical voice information can be the voice information input by the user before the current voice information input. For example, the user inputs the voice "check whether there is restaurant A in a certain area", and the in-vehicle intelligent assistant replies "yes", and the user inputs the voice "navigate to restaurant A", in which the voice "navigate to restaurant A" can be regarded as the current voice information, and the voice information "check whether there is restaurant A in a certain area" can be regarded as the historical voice information associated with the current voice information.

在具体实现中，上述调整设备可在接收到用户输入的当前语音信息后，检测在接收当前语音信息之前，用户是否输入了与当前语音信息存在关联历史语音信息。In a specific implementation, after receiving the current voice information input by the user, the adjustment device may detect whether the user has input historical voice information associated with the current voice information before receiving the current voice information.

步骤S302：若存在，则根据所述历史语音信息所属的意图类别预测所述当前语音信息属于各预设意图类别的概率。Step S302: If yes, predict the probability that the current voice information belongs to each preset intention category according to the intention category to which the historical voice information belongs.

需要说明的是，上述预设意图类别可为预先构建的多个类别，并且各意图类别与不同关键词所绑定。It should be noted that the above-mentioned preset intent categories can be multiple pre-constructed categories, and each intent category is bound to different keywords.

在具体实现中，上述调整设备在检测到存在与当前语音信息存在关联的历史语音信息时，可以将历史语音信息转换为文本，然后提取历史语音信息文本中的关键词，并确定各关键词从预设意图类别中确定历史语音信息所属的多个意图类别。之后，调整设备可以将当前语音信息之后为文本，并提取当前语音信息文本的关键词，确定当前语音信息文本的关键词属于历史语音信息所属的多个意图类别的概率，将属于历史语音信息所属的多个意图类别的概率作为属于各预设意图类别的概率。In a specific implementation, when the adjustment device detects the existence of historical voice information associated with the current voice information, it can convert the historical voice information into text, then extract keywords from the historical voice information text, and determine the multiple intent categories to which the historical voice information belongs from the preset intent categories for each keyword. Afterwards, the adjustment device can convert the current voice information into text, extract keywords from the current voice information text, determine the probability that the keywords in the current voice information text belong to the multiple intent categories to which the historical voice information belongs, and use the probability of belonging to the multiple intent categories to which the historical voice information belongs as the probability of belonging to each preset intent category.

步骤S303：将所述各预设意图类别中概率最大的目标意图类别作为用户意图。Step S303: taking the target intent category with the highest probability among the preset intent categories as the user intent.

在具体实现中，上述调整设备可以将所属的多个意图类别的概率中概率最大的目标意图类别直接作为当前语音信息对应的用户意图。In a specific implementation, the above-mentioned adjustment device can directly use the target intention category with the highest probability among the probabilities of the multiple intention categories as the user intention corresponding to the current voice information.

应理解的是，由于用户前后输入的语音一般存在较强的关联性，因此，可以通过历史语音信息所述的用户意图来预测当前语音信息所述的用户意图，以提高用户意图的确定效率。It should be understood that since the voices input by the user before and after are generally strongly correlated, the user intention described in the historical voice information can be used to predict the user intention described in the current voice information to improve the efficiency of determining the user intention.

进一步地，本实施例中，所述步骤S201之后，还包括：Furthermore, in this embodiment, after step S201, the following steps are further included:

步骤S302'：若不存在，则对所述当前语音信息进行文本识别，获得文本信息。Step S302': If not, perform text recognition on the current voice information to obtain text information.

在具体实现中，上述调整设备在检测到不存在与当前语音信息存在关联的历史语音信息时，可以对当前语音信息进行文本识别，获得当前语音信息的文本信息。In a specific implementation, when the adjustment device detects that there is no historical voice information associated with the current voice information, it can perform text recognition on the current voice information to obtain text information of the current voice information.

步骤S303'：识别出所述文本信息中的多个领域关键词和多个动作关键词。Step S303': identifying a plurality of domain keywords and a plurality of action keywords in the text information.

需要说明的是，上述领域关键词可为与领域相关的关键词。相应地，上述动作关键词可为与动作相关的关键词。例如，当前语音信息的文本信息为“我想去看新出的电影XXX，帮我搜索附近有小吃的影院，看完电影顺便吃东西”，该文本信息中涉及的领域为影视领域和美食领域，领域关键词为“电影XXX”和“小吃”，涉及的动作关键词为“看”和“吃”。It should be noted that the above-mentioned domain keywords may be keywords related to the domain. Correspondingly, the above-mentioned action keywords may be keywords related to the action. For example, the text information of the current voice information is "I want to watch the new movie XXX, help me search for a theater with snacks nearby, and eat after watching the movie", and the fields involved in the text information are the film and television field and the food field, the domain keywords are "movie XXX" and "snacks", and the action keywords involved are "watch" and "eat".

在具体实现中，上述调整识别可以识别出上述文本信息中涉及的全部领域关键词和动作关键词。In a specific implementation, the above adjustment identification can identify all the field keywords and action keywords involved in the above text information.

步骤S304'：基于所述多个领域关键词和所述多个动作关键词确定所述文本信息的多个意图信息。Step S304 ′: determining multiple intention information of the text information based on the multiple field keywords and the multiple action keywords.

在具体实现中，上述调整设备可以依次从预设意图类别中确定各领域关键词的相关的意图作为领域关键词的意图信息，并依次从预设意图类别中确定各动作关键词相关的意图作为动作关键词的意图信息。In a specific implementation, the above-mentioned adjustment device can determine the relevant intentions of each field keyword from the preset intention category in turn as the intention information of the field keyword, and determine the relevant intentions of each action keyword from the preset intention category in turn as the intention information of the action keyword.

步骤S305'：确定所述多个意图信息的概率得分，并将概率得分最高的意图信息作为用户意图。Step S305 ′: Determine the probability scores of the plurality of intent information, and take the intent information with the highest probability score as the user intent.

在具体实现中，上述调整设备可以使用预先通过机器学习算法(如逻辑回归、支持向量机、神经网络等)训练的分类器预测各意图信息的概率得分，即各领域关键词和各动作关键词落入各自相关意图的概率得分，然后将概率得分最高的领域关键词的意图信息和动作关键词的意图得分进行结合，并将结合后的意图作为用户意图。In a specific implementation, the above-mentioned adjustment device can use a classifier pre-trained by a machine learning algorithm (such as logistic regression, support vector machine, neural network, etc.) to predict the probability score of each intent information, that is, the probability score of each domain keyword and each action keyword falling into their respective related intentions, and then combine the intent information of the domain keyword with the highest probability score and the intention score of the action keyword, and use the combined intent as the user intention.

应理解的是，上述分类器可以使用预先构建的数据集对机器信息模型进行训练获得。该数据集包括文本样本和各自所对应的标签或意图。每个文本样本应包括关键词信息，以及它们属于的意图标签。上述机器学习模型可以为选择适用于分类的分类器模型，如朴素贝叶斯、支持向量机、决策树、深度学习模型等。It should be understood that the above classifier can be obtained by training the machine information model using a pre-constructed data set. The data set includes text samples and their corresponding labels or intents. Each text sample should include keyword information and the intent labels to which they belong. The above machine learning model can be a classifier model selected for classification, such as naive Bayes, support vector machine, decision tree, deep learning model, etc.

本实施例通过判断是否存在所述当前语音信息的上一语音信息；若存在，则根据所述上一语音信息所属的意图类别预测所述当前语音信息属于各预设意图类别的概率；将所述各预设意图类别中概率最大的目标意图类别作为用户意图。若不存在，则对所述当前语音信息进行文本识别，获得文本信息；识别出所述文本信息中的多个领域关键词和多个动作关键词；基于所述多个领域关键词和所述多个动作关键词确定所述文本信息的多个意图信息；确定所述多个意图信息的概率得分，并将概率得分最高的意图信息作为用户意图。本实施例通过在存在上一语音信息时，使用上一语音信息的意图确定当前语音信息对应的用户意图，在不存在上一语音信息时，使用领域关键词和动作关键词确定当前语音信息的用户意图，有效提高了用户意图确定的灵活性和精度，进而提高了NLU仲裁规则调整的精度。This embodiment determines whether there is a previous voice information of the current voice information; if so, predicts the probability that the current voice information belongs to each preset intent category according to the intent category to which the previous voice information belongs; and takes the target intent category with the highest probability among the preset intent categories as the user intent. If not, text recognition is performed on the current voice information to obtain text information; multiple domain keywords and multiple action keywords in the text information are identified; multiple intent information of the text information is determined based on the multiple domain keywords and the multiple action keywords; the probability scores of the multiple intent information are determined, and the intent information with the highest probability score is taken as the user intent. This embodiment effectively improves the flexibility and accuracy of user intent determination by using the intent of the previous voice information to determine the user intent corresponding to the current voice information when the previous voice information exists, and by using domain keywords and action keywords to determine the user intent of the current voice information when the previous voice information does not exist, thereby improving the accuracy of NLU arbitration rule adjustment.

此外，本发明实施例还提出一种存储介质，所述存储介质上存储有语音识别程序，所述语音识别程序被处理器执行时实现如上文所述的语音识别方法的步骤。In addition, an embodiment of the present invention further provides a storage medium, on which a speech recognition program is stored. When the speech recognition program is executed by a processor, the steps of the speech recognition method described above are implemented.

参照图5，图5为本发明语音识别装置第一实施例的结构框图。5 , which is a structural block diagram of a first embodiment of a speech recognition device according to the present invention.

如图5所示，本发明实施例提出的语音识别装置包括：As shown in FIG5 , the speech recognition device provided in the embodiment of the present invention includes:

模型确定模块501，用于在接收到当前语音信息时，基于预设规则从多个NLU模型中确定目标NLU模型。The model determination module 501 is used to determine a target NLU model from multiple NLU models based on preset rules when receiving current voice information.

语音识别模块502，用于通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果。The speech recognition module 502 is used to recognize the current speech information through the target NLU model to obtain a recognition result.

意图确定模块503，用于根据所述当前语音信息确定用户意图。The intention determination module 503 is used to determine the user intention according to the current voice information.

结果匹配模块504，用于判断所述识别结果是否与所述用户意图匹配。The result matching module 504 is used to determine whether the recognition result matches the user intention.

规则调整模块505，用于若否，则对所述预设规则进行调整，直至获得的识别结果与所述用户意图匹配。The rule adjustment module 505 is used to adjust the preset rule if no, until the obtained recognition result matches the user intention.

作为一种实施方式，所述语音识别模块502，还用于提取所述当前语音信息的语义信息；基于所述语义信息对所述当前语音信息进行语义拒识判断；若所述当前语音信息未被判定为语义拒识，则通过所述目标NLU模型对所述当前语音信息进行识别，获得识别结果。As an implementation mode, the speech recognition module 502 is also used to extract semantic information of the current speech information; perform semantic rejection judgment on the current speech information based on the semantic information; if the current speech information is not judged as semantic rejection, recognize the current speech information through the target NLU model to obtain a recognition result.

基于本发明上述语音识别装置第一实施例，提出本发明语音识别装置的第二实施例。Based on the above-mentioned first embodiment of the speech recognition device of the present invention, a second embodiment of the speech recognition device of the present invention is proposed.

在本实施例中，所述模型确定模块501，还用于获取语音数据样本；调用所述预设规则中配置的各NLU模型识别所述语音数据样本，获得所述各NLU模型的识别结果；根据所述识别结果确定所述各NLU模型的优先级指标，并通过所述优先级指标确定所述各NLU模型的优先级。In this embodiment, the model determination module 501 is also used to obtain speech data samples; call each NLU model configured in the preset rule to identify the speech data sample, and obtain the recognition result of each NLU model; determine the priority index of each NLU model according to the recognition result, and determine the priority of each NLU model through the priority index.

所述规则调整模块505，还用于若否，则对所述预设规则进行调整，将所述预设规则中当前选择的NLU模型切换为下一优先级的NLU模型，直至获得的识别结果与所述用户意图匹配。The rule adjustment module 505 is also used to adjust the preset rules if not, and switch the currently selected NLU model in the preset rules to the NLU model of the next priority until the obtained recognition result matches the user intention.

作为一种实施方式，所述语音数据样本包括正例样本和反例样本，所述优先级指标包括第一优先级指标和第二优先级指标；所述模型确定模块501，用于，还用于从所述识别结果中确定所述各NLU模型预测所述正例样本和所述反例样本为正例的第一样本数，并确定所述第一样本数中属于所述正例样本的第二样本数；根据所述第一样本数与所述正例样本和所述反例样本的样本总数确定所述各NLU模型的第一优先级指标；根据所述第二样本数与所述正例样本的数目确定所述各NLU模型的第二优先级指标。As an implementation mode, the speech data samples include positive samples and negative samples, and the priority index includes a first priority index and a second priority index; the model determination module 501 is used to determine from the recognition results the first sample number of each NLU model predicting the positive samples and the negative samples as positive samples, and determine the second sample number belonging to the positive samples in the first sample number; determine the first priority index of each NLU model according to the first sample number and the total number of samples of the positive samples and the negative samples; determine the second priority index of each NLU model according to the second sample number and the number of positive samples.

作为一种实施方式，所述模型确定模块501，还用于根据所述第一优先级指标和所述第二优先级估指标确定所述各NLU模型的调和平均值；基于所述调和平均值的取值范围确定所述各NLU模型的优先级。As an implementation mode, the model determination module 501 is also used to determine the harmonic mean of each NLU model according to the first priority index and the second priority estimation index; and determine the priority of each NLU model based on the value range of the harmonic mean.

基于本发明上述语音识别装置各实施例，提出本发明语音识别装置的第三实施例。Based on the above-mentioned embodiments of the speech recognition device of the present invention, a third embodiment of the speech recognition device of the present invention is proposed.

在本实施例中，所述意图确定模块503，还用于判断是否存在与所述当前语音信息存在关联的历史语音信息；若存在，则根据所述历史语音信息所属的意图类别预测所述当前语音信息属于各预设意图类别的概率；将所述各预设意图类别中概率最大的目标意图类别作为用户意图。In this embodiment, the intention determination module 503 is also used to determine whether there is historical voice information associated with the current voice information; if so, predict the probability that the current voice information belongs to each preset intention category based on the intention category to which the historical voice information belongs; and take the target intention category with the highest probability among the preset intention categories as the user intention.

作为一种实施方式，所述意图确定模块503，还用于若不存在，则对所述当前语音信息进行文本识别，获得文本信息；识别出所述文本信息中的多个领域关键词和多个动作关键词；基于所述多个领域关键词和所述多个动作关键词确定所述文本信息的多个意图信息；确定所述多个意图信息的概率得分，并将概率得分最高的意图信息作为用户意图。As an implementation mode, the intention determination module 503 is also used to perform text recognition on the current voice information to obtain text information if it does not exist; identify multiple domain keywords and multiple action keywords in the text information; determine multiple intention information of the text information based on the multiple domain keywords and the multiple action keywords; determine the probability scores of the multiple intention information, and use the intention information with the highest probability score as the user intention.

本发明语音识别装置的其他实施例或具体实现方式可参照上述各方法实施例，此处不再赘述。Other embodiments or specific implementations of the speech recognition device of the present invention can refer to the above-mentioned method embodiments, which will not be described in detail here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or system. In the absence of further restrictions, an element defined by the sentence "comprises a ..." does not exclude the existence of other identical elements in the process, method, article or system including the element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are only for description and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如只读存储器/随机存取存储器、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as a read-only memory/random access memory, a magnetic disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present invention.

以上仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made using the contents of the present invention specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present invention.