Movatterモバイル変換


[0]ホーム

URL:


CN112562684B - Voice recognition method and device and electronic equipment - Google Patents

Voice recognition method and device and electronic equipment
Download PDF

Info

Publication number
CN112562684B
CN112562684BCN202011425798.6ACN202011425798ACN112562684BCN 112562684 BCN112562684 BCN 112562684BCN 202011425798 ACN202011425798 ACN 202011425798ACN 112562684 BCN112562684 BCN 112562684B
Authority
CN
China
Prior art keywords
text field
preset
target
word
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011425798.6A
Other languages
Chinese (zh)
Other versions
CN112562684A (en
Inventor
李倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co LtdfiledCriticalVivo Mobile Communication Co Ltd
Priority to CN202011425798.6ApriorityCriticalpatent/CN112562684B/en
Publication of CN112562684ApublicationCriticalpatent/CN112562684A/en
Application grantedgrantedCritical
Publication of CN112562684BpublicationCriticalpatent/CN112562684B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The application discloses a voice recognition method, a voice recognition device and electronic equipment, and belongs to the technical field of voice recognition. Wherein the method comprises the following steps: converting the received audio data into a first text field; detecting each first word segment of a preset type in the first text field under the condition that the first text field meets a first preset condition; processing the target word segmentation in the first text field to generate a target text field; wherein the processing of the target word segment comprises at least one of the following: deleting the target word and replacing the single sentence to which the target word belongs with a target character string. According to the embodiment of the application, through deleting or replacing each first word of the preset type in the first text field, the intention of a user can be better clarified, the rewriting can be rapidly completed, and the execution effect of voice recognition can be effectively improved.

Description

Translated fromChinese
一种语音识别方法、装置和电子设备A speech recognition method, device and electronic equipment

技术领域Technical field

本申请属于语音识别技术领域,具体涉及一种语音识别方法、装置和电子设备。This application belongs to the field of speech recognition technology, and specifically relates to a speech recognition method, device and electronic equipment.

背景技术Background technique

当前,人机交互设备通过自动语音识别技术(Automatic Speech Recognition,ASR)分析理解用户指令,继而完成后续相应操作,极大地提升了人机交互的便利性。Currently, human-computer interaction equipment analyzes and understands user instructions through Automatic Speech Recognition (ASR) technology, and then completes subsequent corresponding operations, which greatly improves the convenience of human-computer interaction.

但是,现有的自动语音识别技术在因周围噪音等出现收音异常时,容易因上屏话术包含大量冗余信息而扭曲用户的意图,进而导致相应地执行结果达不到用户的预期,影响了用户的使用体验。However, when the existing automatic speech recognition technology has abnormal reception due to surrounding noise, etc., it is easy to distort the user's intention because the on-screen speech contains a large amount of redundant information, which in turn causes the corresponding execution results to fail to meet the user's expectations, affecting the user's performance. improve the user experience.

发明内容Contents of the invention

本申请实施例的目的是提供一种语音识别方法,能够解决现有的语音识别技术容易因收音异常导致用户的意图被扭曲,使得相应地执行结果达不到用户预期的问题。The purpose of the embodiments of the present application is to provide a speech recognition method that can solve the problem that the existing speech recognition technology is prone to distorting the user's intention due to abnormal sound collection, so that the corresponding execution results do not meet the user's expectations.

为了解决上述技术问题,本申请是这样实现的:In order to solve the above technical problems, this application is implemented as follows:

第一方面,本申请实施例提供了一种语音识别方法,其中,所述方法包括:In a first aspect, embodiments of the present application provide a speech recognition method, wherein the method includes:

将接收的音频数据转化为第一文字段;Convert the received audio data into the first text field;

在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词;其中,所述第一预设条件包括以下至少之一:文字总数大于预设字数阈值且包含预设核心词;文字总数大于预设字数阈值且预设话术库中存在与所述第一文字段相似度大于预设相似度阈值的第二文字段;When the first text field satisfies a first preset condition, each first word segment of the preset type in the first text field is detected; wherein the first preset condition includes at least one of the following: the total number of words is greater than There is a preset word count threshold and includes a preset core word; the total number of words is greater than the preset word count threshold and there is a second text field in the preset vocabulary library that is similar to the first text field and is greater than the preset similarity threshold;

对所述第一文字段中的目标分词进行处理生成目标文字段;其中,对所述目标分词的处理包括以下至少之一:删除所述目标分词、将所述目标分词所属单句替换为目标字符串。Process the target word segmentation in the first text field to generate a target text field; wherein the processing of the target word segmentation includes at least one of the following: deleting the target word segmentation and replacing the single sentence to which the target word segmentation belongs with the target string .

第二方面,本申请实施例提供了一种语音识别装置,其中,该装置包括:In a second aspect, embodiments of the present application provide a speech recognition device, wherein the device includes:

转换模块,用于将接收的音频数据转化为第一文字段;A conversion module used to convert the received audio data into the first text field;

检测模块,用于在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词;其中,所述第一预设条件包括以下至少之一:文字总数大于预设字数阈值且包含预设核心词;文字总数大于预设字数阈值且预设话术库中存在与所述第一文字段相似度大于预设相似度阈值的第二文字段;A detection module configured to detect each first word segment of a preset type in the first text field when the first text field satisfies a first preset condition; wherein the first preset condition includes at least one of the following 1: The total number of words is greater than the preset word count threshold and includes the preset core words; the total number of words is greater than the preset word count threshold and there is a second text field in the preset vocabulary library that is more similar to the first text field than the preset similarity threshold. ;

处理模块,用于对所述第一文字段中的目标分词进行处理生成目标文字段;其中,对所述目标分词的处理包括以下至少之一:删除所述目标分词、将所述目标分词所属单句替换为目标字符串。A processing module, configured to process the target word segmentation in the first text field to generate a target text field; wherein the processing of the target word segmentation includes at least one of the following: deleting the target word segmentation, and converting the single sentence to which the target word segmentation belongs. Replace with target string.

第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, embodiments of the present application provide an electronic device. The electronic device includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor. The program or instructions are When executed by the processor, the steps of the method described in the first aspect are implemented.

第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, embodiments of the present application provide a readable storage medium. Programs or instructions are stored on the readable storage medium. When the programs or instructions are executed by a processor, the steps of the method described in the first aspect are implemented. .

第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。In a fifth aspect, embodiments of the present application provide a chip. The chip includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the first aspect. the method described.

在本申请实施例中,在进行语音识别时,先将接收的音频数据转化为第一文字段;然后在第一文字段满足判定为异常话术的第一预设条件的情况下,检测第一文字段中预设类型的各第一分词;再对第一文字段中的目标分词进行处理生成目标文字段。通过在接收的音频数据转化的第一文字段满足第一预设条件的情况下,即第一文字段判定为异常话术的情况下,对第一文字段中预设类型的各第一分词进行删除或替换处理,能更好地明确用户的意图,快速完成改写,有效提升语音识别的执行效果。In the embodiment of the present application, when performing speech recognition, the received audio data is first converted into a first text field; and then when the first text field meets the first preset condition for determining abnormal speech skills, the first text field is detected Each first word segmentation of the preset type in the first text field is processed; and then the target word segmentation in the first text field is processed to generate the target text field. By deleting or deleting each first segment of the preset type in the first text field when the first text field converted from the received audio data satisfies the first preset condition, that is, when the first text field is determined to be abnormal speech. Replacement processing can better clarify the user's intention, quickly complete the rewriting, and effectively improve the execution effect of speech recognition.

附图说明Description of the drawings

图1是本申请实施例提供的语音识别方法的步骤流程图;Figure 1 is a step flow chart of a speech recognition method provided by an embodiment of the present application;

图2是本申请实施例中第一文字段的显示效果示意图;Figure 2 is a schematic diagram of the display effect of the first text field in the embodiment of the present application;

图3是本申请实施例中对冗余词的裁剪操作示意图;Figure 3 is a schematic diagram of the cutting operation of redundant words in the embodiment of the present application;

图4是本申请实施例中对冗余词进行裁剪操作后的效果示意图;Figure 4 is a schematic diagram of the effect after trimming redundant words in the embodiment of the present application;

图5是本申请实施例中第一字符串推荐列表的显示效果示意;Figure 5 is a schematic diagram of the display effect of the first string recommendation list in the embodiment of the present application;

图6是本申请实施例中对第一目标字符串进行点击后的显示效果示意;Figure 6 is a schematic diagram of the display effect after clicking on the first target string in the embodiment of the present application;

图7是本申请实施例中第一字符串推荐列表的显示效果示意;Figure 7 is a schematic diagram of the display effect of the first string recommendation list in the embodiment of the present application;

图8是本申请实施例中对第二目标字符串进行点击后的显示效果示意;Figure 8 is a schematic diagram of the display effect after clicking on the second target string in the embodiment of the present application;

图9是本申请实施例所提供的语音识别方法的一种执行流程图;Figure 9 is an execution flow chart of the speech recognition method provided by the embodiment of the present application;

图10是本申请实施例所提供的冗余词判断过程示意图;Figure 10 is a schematic diagram of the redundant word determination process provided by the embodiment of the present application;

图11是本申请实施例所提供的语音识别方法的另一种执行流程图;Figure 11 is another execution flow chart of the speech recognition method provided by the embodiment of the present application;

图12是本申请实施例所提供的第二字符串推荐列表的生成过程示意图;Figure 12 is a schematic diagram of the generation process of the second string recommendation list provided by the embodiment of the present application;

图13是本申请实施例提供的语音识别装置的结构示意图;Figure 13 is a schematic structural diagram of a speech recognition device provided by an embodiment of the present application;

图14是本申请实施例提供的电子设备的结构示意图。Figure 14 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second", etc. in the description and claims of this application are used to distinguish similar objects and are not used to describe a specific order or sequence. It is to be understood that data so used are interchangeable under appropriate circumstances so that embodiments of the present application can be practiced in sequences other than those illustrated or described herein. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the related objects are in an "or" relationship.

下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的文档显示方法进行详细地说明。The document display method provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios.

请参阅图1,示出了本申请实施例提供的一种语音识别方法的步骤流程图,其中,该方法可以包括步骤S100~S300。Please refer to FIG. 1 , which shows a step flow chart of a speech recognition method provided by an embodiment of the present application, in which the method may include steps S100 to S300.

本申请实施例中,上述语音识别方法应用于具有语音接收功能及文字显示功能的终端设备,具体可以是手机、平板电脑、电视机、车载电子设备、可穿戴设备、语音助手、有屏音响等人机交互设备。In the embodiment of the present application, the above speech recognition method is applied to terminal equipment with voice reception function and text display function. Specifically, it can be a mobile phone, a tablet computer, a television, a vehicle-mounted electronic device, a wearable device, a voice assistant, a screen speaker, etc. Human-computer interaction equipment.

步骤S100、将接收的音频数据转化为第一文字段。Step S100: Convert the received audio data into a first text field.

上述步骤S100中,利用语音识别技术,对接收的音频数据进行识别,转化为对应的文字段,即上述第一文字段,以便于后续显示及进行语义分析、分词处理等。In the above-mentioned step S100, speech recognition technology is used to identify the received audio data and convert it into the corresponding text field, that is, the above-mentioned first text field, so as to facilitate subsequent display, semantic analysis, word segmentation processing, etc.

其中,因为接收的音频数据不仅包含了用户语义,还可能包含的外界周围噪音,因而ABS服务在识别音频数据时会同时识别用户语音及外界噪音,使得第一文字段包含了用户语音及外界噪音对应的文字。例如,用户原始语音为 “天气咋样”,但因为外界噪音的干扰,使得得到第一文字段可能为“走哈哈啊别忘了啊呀闹钟,你你你天气咋样”。Among them, because the received audio data not only contains user semantics, but may also contain external surrounding noise, the ABS service will simultaneously identify the user's voice and external noise when identifying the audio data, so that the first text field contains the corresponding user voice and external noise. Text. For example, the user's original voice is "How is the weather?", but due to interference from external noise, the first text field may be "Let's go haha, don't forget the alarm clock, you, you, how is the weather?".

步骤S200、在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词;其中,所述第一预设条件包括以下至少之一:文字总数大于预设字数阈值且包含预设核心词;文字总数大于预设字数阈值且预设话术库中存在与所述第一文字段相似度大于预设相似度阈值的第二文字段。Step S200: When the first text field satisfies the first preset condition, detect each first word segment of the preset type in the first text field; wherein the first preset condition includes at least one of the following: The total number of words is greater than the preset word count threshold and includes the preset core words; the total number of words is greater than the preset word count threshold and there is a second text field in the preset vocabulary library that is more similar to the first text field than the preset similarity threshold.

上述步骤S200中,第一预设条件为确定第一文字段是否为不符合常规表达习惯的异常话术的判定条件,包括文字总数大于预设字数阈值且包含预设核心词,以及文字总数大于预设字数阈值且预设话术库中存在与该第一文字段相似度大于预设相似度阈值的第二文字段,这两个条件中的至少一种。In the above-mentioned step S200, the first preset condition is a determination condition for determining whether the first text field is an abnormal speech technique that does not conform to conventional expression habits, including that the total number of words is greater than the preset word count threshold and contains preset core words, and the total number of words is greater than the preset word count threshold. At least one of the two conditions is that a word count threshold is set and that there is a second text field in the preset vocabulary library that is similar to the first text field and is greater than the preset similarity threshold.

上述步骤S200中,预设核心词为通过分析当前日志,基于文本频率与逆文档频率指数(term frequency–inverse document frequency,TFIDF)和词语词性构建的核心词集合中的词语。例如,假设当词语的TFIDF的分数阈值为0.03,且词语的常见词性为名词或者动词时,则认为该词语是核心词;若在语音助手业务下,当天的全量日志中共10000条话术,针对每条话术进行分词和词性标注,共60000个词语;其中“天气”这个词语曾在1000条话术中出现过3000次,因此“天气”词语的TFIDF的分值为(3000/60000)*log(10000/(1000+1))=0.05,该值大于0.03,且“天气”的常见词性为名词,则将“天气”确定为核心词;In the above step S200, the preset core words are words in the core word set constructed by analyzing the current log and based on text frequency, inverse document frequency index (term frequency–inverse document frequency, TFIDF) and word part of speech. For example, assume that when the TFIDF score threshold of a word is 0.03, and the common part of speech of the word is a noun or a verb, the word is considered to be a core word; if in the voice assistant business, there are a total of 10,000 words in the full log of the day, for Each discourse is segmented and tagged, with a total of 60,000 words; among them, the word "weather" has appeared 3,000 times in 1,000 discourses, so the TFIDF score of the "weather" word is (3000/60000)* log(10000/(1000+1))=0.05, the value is greater than 0.03, and the common part of speech of "weather" is a noun, then "weather" is determined as the core word;

其中,上述预设字数阈值为界定当前文字段是否容易出现执行效果与用户期望偏差的字数条件。由于文字段字数较少时,语音识别结果收噪音干扰较小,其执行效果与用户期望出现的偏差较小;而在文字段字数达到一定字数阈值时,则语音识别结果受噪音干扰的可能性较大,容易在语音识别的执行过程中出现与用户期望偏差的情况,因而需要设置上述预设字数阈值。可选地,预设字数阈值为10。Among them, the above-mentioned preset word count threshold is a word count condition that defines whether the current text field is prone to deviations between execution effects and user expectations. Because when the number of words in the text field is small, the speech recognition results are less disturbed by noise, and the execution effect deviates less from the user's expectations; when the number of words in the text field reaches a certain threshold, the speech recognition results are likely to be interfered by noise. If it is large, it is easy to deviate from user expectations during the execution of speech recognition, so it is necessary to set the above-mentioned preset word count threshold. Optionally, the preset word count threshold is 10.

本申请实施例中,因为在文字段的文字总数大于预设字数阈值时,语音识别结果受噪音干扰的可能性较大,容易在语音识别的执行过程中出现与用户期望偏差的情况,但若此时文字段中未包含预设核心词,也即未包含动作指令或执行对象,该文字段无法被执行相应的动作,也即没必要进行进一步地改写或更正,因而只有在文字总数大于预设字数阈值且包含预设核心词的情况下,才确定第一文字段为需要进行后续修正、改写操作的异常话术,因而将文字总数大于预设字数阈值且包含预设核心词为上述第一预设条件的一个子条件。In the embodiment of this application, because when the total number of characters in the text field is greater than the preset word count threshold, the speech recognition result is more likely to be interfered by noise, and it is easy to deviate from the user's expectation during the execution of speech recognition, but if At this time, the text field does not contain the preset core words, that is, it does not contain action instructions or execution objects, and the corresponding actions cannot be performed on the text field. That is, there is no need for further rewriting or correction. Therefore, only when the total number of words is greater than the preset When a word count threshold is set and the preset core words are included, the first text field is determined to be an abnormal speech that requires subsequent correction and rewriting operations. Therefore, the total number of words is greater than the preset word count threshold and includes the preset core words as the first text above. A subcondition of the preset condition.

例如,在语音助手业务下,当设置预设字数阈值为10;若当识别得到的第一文字段为“走哈哈啊别忘了啊呀闹钟,你你你天气咋样”,其字数为19,其由于含有“天气”这类语音助手业务下的核心词语,满足文字总数大于预设字数阈值且包含预设核心词的判定条件,因而确定该文字段满足第一预设条件,是潜在有意义话术,值得进一步做话术改写后,再发送给后台服务器执行对应的动作。For example, in the voice assistant business, when the default word count threshold is set to 10; if the first text field recognized is "Let's go haha, don't forget the alarm clock, you, you, how is the weather?", the number of words is 19, and the other text fields are 19. Since it contains core words under the voice assistant business such as "weather", it meets the judgment conditions that the total number of words is greater than the preset word count threshold and contains the preset core words, so it is determined that the text field meets the first preset condition and is a potentially meaningful word. It is worth further rewriting the words and then sending them to the backend server to perform the corresponding actions.

再如:若当识别得到的第一文字段为“走哈哈啊别忘了啊呀,你你你咋样”,虽然该文字段的字数为15,但是由于该文字段不含有任何业务相关的核心词语,不满足文字总数大于预设字数阈值且包含预设核心词的判定条件,因而确定该文字段不满足第一预设条件,不值得进一步做改写,因而作为正常话术直接发给后台服务器去确定对应的执行动作。Another example: If the first text field recognized is "Let's go haha, don't forget, how are you, you, you, you, you, you", although the number of words in this text field is 15, because this text field does not contain any business-related core words , does not meet the judgment conditions that the total number of words is greater than the preset word count threshold and contains the preset core words, so it is determined that the text field does not meet the first preset condition and is not worthy of further rewriting, so it is sent directly to the backend server as a normal speech. Determine the corresponding execution action.

上述步骤S200中,预设话术库为电子设备所包含的业务场景对应的领域标准话术库,为该业务场景对应的领域中经常出现的话术的集合。例如,若当前电子设备为语音助手,其业务下仅包含闹钟、天气业务场景,因而其对应的领域标准话术库可为:[“今天天气怎么样”、“天气好不好”、“今天天气适合户外么”、“定个闹钟”、“修改闹钟”]。In the above step S200, the preset vocabulary library is a domain standard vocabulary library corresponding to the business scenario included in the electronic device, which is a collection of vocabulary that often appears in the field corresponding to the business scenario. For example, if the current electronic device is a voice assistant, its business only includes alarm clock and weather business scenarios, so its corresponding domain standard vocabulary library can be: ["How is the weather today?", "Is the weather good?", "Today's weather Is it suitable for outdoor use?", "Set an alarm clock", "Modify the alarm clock"].

预设相似度阈值为用于界定第一文字段是否与预设话术库中的文字段相似的阈值,在第一文字段与预设话术库中的第二文字段相似度大于该阈值时,确定第一文字段与该第二文字段相似;该预设相似度阈值可以设置为0.1。The preset similarity threshold is a threshold used to define whether the first text field is similar to the text field in the preset vocabulary library. When the similarity between the first text field and the second text field in the preset vocabulary library is greater than the threshold, It is determined that the first text field is similar to the second text field; the preset similarity threshold can be set to 0.1.

本申请实施例中,因为在文字段的文字总数大于预设字数阈值时,若该文字段不含有预设核心词,但其涉及到了领域标准话术库的内容,说明该文字段属于潜在有意义话术,因此判定该话术为异常话术,因而将文字总数大于预设字数阈值且预设话术库中存在与第一文字段相似度大于预设相似度阈值的第二文字段为上述第一预设条件的另一个子条件。其中,相似度算法可采用现有的相似度算法。In the embodiment of this application, because when the total number of words in the text field is greater than the preset word count threshold, if the text field does not contain the preset core words, but it involves the content of the domain standard vocabulary library, it means that the text field is potentially useful. Therefore, it is determined that the speech technique is an abnormal speech technique. Therefore, the total number of words is greater than the preset word count threshold and there is a second text field in the preset speech art library that is similar to the first text field and is greater than the preset similarity threshold as the above. Another subcondition of the first preset condition. Among them, the similarity algorithm can adopt an existing similarity algorithm.

例如,假设预设字数阈值为10,相似度阈值设置为0.1,若识别出的第一文字段为“走哈哈啊别忘了啊呀户外,你你你咋样”,该文字段的文字总数为17,且不含有任何业务相关的核心词语,但是当用该文字段检索领域标准话术库后,利用开源的ElascticSearch等搜索引擎的默认相似度算法打分后,获取相似度最高的1条话术为:“今天天气适合户外么”,此时搜索引擎给出的相似度为0.2,大于0.1的相似度阈值,因而认为该第一文字段为潜在有意义话术,确定满足第一预设条件,值得进一步做改写后,再发给后台服务器执行对应的动作。For example, assuming that the preset word count threshold is 10 and the similarity threshold is set to 0.1, if the first text field recognized is "Let's go haha ah don't forget ah ah outdoor, how are you you you", the total number of words in this text field is 17 , and does not contain any business-related core words, but after using the text field to search the field standard phrase library, and using the default similarity algorithm of open source ElasticSearch and other search engines to score, the highest similarity phrase is obtained. : "Is the weather suitable for outdoor use today?" At this time, the similarity given by the search engine is 0.2, which is greater than the similarity threshold of 0.1. Therefore, the first text field is considered to be a potentially meaningful speech, and it is determined that it meets the first preset condition and is worthy of After further rewriting, it is sent to the backend server to perform corresponding actions.

上述步骤S200中,在确定第一文字段满足第一预设条件的情况下,说明第一文字段为值得进行改写、修正的异常话术,因而先对第一文字段进行分词处理,以便于从该第一文字段中检测出符合预设类型的各第一分词,以供后续对应进行改写、修正处理。In the above step S200, when it is determined that the first text field satisfies the first preset condition, it means that the first text field is an abnormal speech worthy of rewriting and correction. Therefore, the first text field is first subjected to word segmentation processing in order to extract the first text field from the first text field. Each first participle that conforms to the preset type is detected in the text field for subsequent corresponding rewriting and correction processing.

其中,在对第一文字段进行分词处理时,可以采用正向最大匹配分词算法执行分词处理。Among them, when performing word segmentation processing on the first text field, a forward maximum matching word segmentation algorithm can be used to perform word segmentation processing.

步骤S300、对所述第一文字段中的目标分词进行处理生成目标文字段;其中,对所述目标分词的处理包括以下至少之一:删除所述目标分词、将所述目标分词所属单句替换为目标字符串。Step S300: Process the target word segmentation in the first text field to generate a target text field; wherein, processing the target word segmentation includes at least one of the following: deleting the target word segmentation, replacing the single sentence to which the target word segmentation belongs. Target string.

上述步骤S300中,目标分词为需要进行改写的分词,一般为所接收的音频数据中外界噪音对应的分词而需要删除,或者为因用户表述不当造成语音识别偏差而需要更正,因而可以根据目标分词的具体类型,对应执行删除目标分词,或将目标分词所属单句替换为目标字符串,进而基于处理后的第一文字段生成目标文字段。In the above step S300, the target word segmentation is a word segmentation that needs to be rewritten. It is generally a word segmentation corresponding to external noise in the received audio data that needs to be deleted, or a speech recognition deviation caused by improper user expression that needs to be corrected. Therefore, the target word segmentation can be The specific type corresponds to deleting the target word segmentation, or replacing the single sentence to which the target word segmentation belongs with the target string, and then generating the target text field based on the processed first text field.

本申请实施例提供的语音识别方法,在进行语音识别时,先将接收的音频数据转化为第一文字段;然后在第一文字段满足判定为异常话术的第一预设条件的情况下,检测第一文字段中预设类型的各第一分词;再对第一文字段中的目标分词进行处理生成目标文字段。通过在接收的音频数据转化的第一文字段满足第一预设条件的情况下,即第一文字段判定为异常话术的情况下,对第一文字段中预设类型的各第一分词进行删除或替换处理,能更好地明确用户的意图,快速完成改写,有效提升语音识别的执行效果。The speech recognition method provided by the embodiment of the present application first converts the received audio data into a first text field when performing speech recognition; and then detects when the first text field meets the first preset condition for determining abnormal speech skills. Each first word segment of the preset type in the first text field; and then the target word segmentation in the first text field is processed to generate a target text field. By deleting or deleting each first segment of the preset type in the first text field when the first text field converted from the received audio data satisfies the first preset condition, that is, when the first text field is determined to be abnormal speech. Replacement processing can better clarify the user's intention, quickly complete the rewriting, and effectively improve the execution effect of speech recognition.

可选地,在一种实施方式中,上述步骤S200具体包括步骤S201。Optionally, in one implementation, the above step S200 specifically includes step S201.

步骤S201、在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中的冗余词、主语、谓语、宾语及预设核心词;Step S201: When the first text field meets the first preset condition, detect redundant words, subjects, predicates, objects and preset core words in the first text field;

其中,所述冗余词为第一文字段中与前后预设位数的分词的组合不符合预设语言表达方式,且不存在于预设热词库中的分词。Wherein, the redundant words are word segments whose combination with the preset digits before and after the first text field does not conform to the preset language expression and does not exist in the preset hot word database.

上述步骤S201中,上述预设热词库为由当前热词组成的词库,具体可以通过分析最近若干天的日志,然后统计词语出现频数大于预设频数阈值的词语组成。例如,在语音助手业务下,一天所有话术分词去重,将频数大于1000的词语定义为热词,则得到的预设热词库为:[“天气”、“今天”、“紫外线”、“闹钟”、“定个”]。In the above-mentioned step S201, the above-mentioned preset hot word library is a word library composed of current hot words. Specifically, it can be composed by analyzing the logs of the last few days and then counting the words whose frequency of occurrence is greater than the preset frequency threshold. For example, in the voice assistant business, all words in a day are deduplicated and words with a frequency greater than 1,000 are defined as hot words. The resulting default hot word library is: ["weather", "today", "ultraviolet", "Alarm clock", "Set one"].

上述冗余词为第一文字段中与前后预设位数的分词的组合不符合预设语言表达方式,且不存在于预设热词库中的分词,也即是在将第一文字段进行分词处理后,将得到的每个第一分词与第一文字段中位于该第一分词前后预设位数的第二分词形成第一组合,再判定该第一组合是否符合预设语言表达习惯,以及判定第一分词是否存在于预设热词库中;若第一组合不符合预设语言表达方式,且第一分词不存在于预设热词库中,则可以确定该第一分词为冗余词。The above-mentioned redundant words are word segments whose combination with the preset digits before and after the first text field does not conform to the default language expression and does not exist in the preset hot thesaurus. That is, the first text field is segmented. After processing, each obtained first participle is formed into a first combination with a second participle located at a preset number of digits before and after the first participle in the first text field, and then it is determined whether the first combination conforms to the preset language expression habit, and Determine whether the first participle exists in the preset hot word library; if the first combination does not conform to the default language expression and the first participle does not exist in the preset hot word library, it can be determined that the first participle is redundant word.

其中,因为存在因第一分词为最新出现的热门词,导致第一分词与第二分词形成的组合不符合预设语言表达习惯的情况,因而为了防止误伤,只有在第一分词与第二分词形成的组合不符合预设语言表达习惯,且第一分词不存在于预设热词库中的情况下,才可以将该第一分词确定为冗余词,也即不需要的、重复的或多余的词。Among them, because the first participle is the latest popular word, the combination formed by the first participle and the second participle does not conform to the preset language expression habits. Therefore, in order to prevent accidental injury, only the combination of the first participle and the second participle is Only when the formed combination does not conform to the preset language expression habits and the first participle does not exist in the preset hot word database, the first participle can be determined as a redundant word, that is, it is unnecessary, repeated or Superfluous words.

其中,因为第一文字段为基于音频数据转化的文字,因而在判定第一分词是否存在于预设热词库中时,也是基于拼音进行判定,即若第一分词的拼音与预设热词库中的第三分词相同或相似,则判定第一分词存在于预设热词库中,即该第一分词为热词。Among them, because the first text field is text based on audio data conversion, when determining whether the first participle exists in the preset hot thesaurus, the determination is also based on pinyin. That is, if the pinyin of the first participle is different from the preset hot thesaurus If the third participle in is the same or similar, it is determined that the first participle exists in the preset hot word library, that is, the first participle is a hot word.

在实际应用中,在确定第一文字段中与前后预设位数的分词的组合是否符合预设语言表达方式时,可以输入到预训练好的语言模型中,由模型对该组合进行打分,当分值大于第一模型阈值时,则认为不符合预设语言表达方式,其中,第一模型阈值为判定分词是否不符合预设语言表达习惯的阈值,当模型分值大于该第一模型阈值时,说明对应分词不符合预设语言表达习惯。In practical applications, when determining whether the combination of the word segmentation in the first text field and the preset digits before and after conforms to the preset language expression, it can be input into the pre-trained language model, and the model will score the combination. When When the score is greater than the first model threshold, it is considered to be inconsistent with the preset language expression. The first model threshold is the threshold for determining whether the word segmentation does not conform to the preset language expression habit. When the model score is greater than the first model threshold , indicating that the corresponding participle does not conform to the default language expression habits.

可选地,第一分词与第一文字段中位于该第一分词前后预设位数的第二分词形成第一组合输入到预训练好的语言模型中进行打分时,若分值大于第二模型阈值,则直接将该第一分词删除;其中,第二模型阈值大于第一模型阈值,第二模型阈值为判定分词是否严重不符合预设语言表达习惯的阈值,当模型分值大于该第二模型阈值时,说明对应的分词严重不符合预设语言表达习惯,因而可以直接将对应的分词删除。Optionally, when the first participle and the second participle located in the first text field with a preset number of digits before and after the first participle form a first combination and are input into the pre-trained language model for scoring, if the score is greater than the second model threshold, the first word segmentation is directly deleted; wherein, the second model threshold is greater than the first model threshold, and the second model threshold is the threshold for determining whether the word segmentation seriously does not conform to the preset language expression habits. When the model score is greater than the second model threshold, When the model threshold is exceeded, it means that the corresponding word segmentation seriously does not conform to the preset language expression habits, so the corresponding word segmentation can be deleted directly.

例如:若第一文字段为“走哈哈啊别忘了啊呀闹钟,你你你天气咋样”,核心词集合为:[“闹钟”、“天气”],热词库为[“天气”、“今天”、“紫外线”、“闹钟”、“定个”],第一模型阈值为0.01;For example: If the first text field is "Let's go haha, don't forget the alarm clock, you, you, what's the weather like?", the core word set is: ["alarm clock", "weather"], and the hot word database is ["weather", " "Today", "Ultraviolet", "Alarm Clock", "Determine"], the first model threshold is 0.01;

若由第一文字段中各分词形成的候选词集合为[走,哈哈,啊,别忘了,啊,呀,闹钟,你,你,你,天气,咋样],先从左向右依次判断每个词语是否是预设核心词,如 “闹钟”,由于“闹钟”、“天气”在核心词集合中,因此“闹钟”是预设核心词;If the candidate word set formed by each participle in the first text field is [go, haha, ah, don’t forget, ah, ah, alarm clock, you, you, you, how is the weather], first judge from left to right Whether each word is a default core word, such as "alarm clock". Since "alarm clock" and "weather" are in the core word set, "alarm clock" is a default core word;

在已识别出预设核心词的各分词后,从左向右依次判断剩余分词是否是冗余词,如:判断“别忘了”词语,先获取别忘了的上下文,即第一文字段中距离“别忘了”最近的前后K个词;如在[走,哈哈,啊,别忘了,啊,呀,闹钟,你,你,你,天气,咋样]中,当K=2时,“别忘了”的上下文为[“哈哈”,“啊”,“啊”,“呀”],再将词语“别忘了”和其上下文[“哈哈”,“啊”,“啊”,“呀”]输入预训练的语言模型中,语言模型输出打分为0.002,假设第一模型阈值为0.01,则说明“哈哈啊别忘了啊呀”这个片段不符合常规表达,因此“别忘了”很可能是冗余词;为了防止误伤,将再进一步结合热词库进行拼音检测,由于此时热词库为:[“天气”、“今天”、“紫外线”、“闹钟”、“定个”],“别忘了”的拼音和热词库中的所有的词语的拼音都不一样,因此可以最终确认“别忘了”冗余词。After each participle of the preset core word has been identified, determine whether the remaining participles are redundant words from left to right. For example: to determine the word "don't forget", first obtain the context of "don't forget", that is, in the first text field The K words closest to "don't forget"; such as in [go, haha, ah, don't forget, ah, ah, alarm clock, you, you, you, the weather, how's it], when K=2 , the context of "don't forget" is ["haha", "ah", "ah", "ah"], and then the word "don't forget" and its context ["haha", "ah", "ah" , "Yeah"] is input into the pre-trained language model, and the language model output score is 0.002. Assuming that the first model threshold is 0.01, it means that the fragment "Haha, don't forget, ah, ah" does not conform to the conventional expression, so "Don't forget " is likely to be a redundant word; in order to prevent accidental injury, pinyin detection will be further combined with the hot thesaurus, because the hot thesaurus at this time is: ["weather", "today", "ultraviolet", "alarm clock", "fixed" "], the pinyin of "don't forget" is different from the pinyin of all words in the hot thesaurus, so the redundant word "don't forget" can be finally confirmed.

本实施方式中,通过在第一文字段满足第一预设条件的情况下,对第一文字段中的各个分词进行语义及词性分析,确定其中包括的冗余词、主语、谓语、宾语及预设核心词,以便于后续确定目标分词,以及根据目标分词类型的不同进行区别化的改写。In this implementation, when the first text field satisfies the first preset condition, semantic and part-of-speech analysis is performed on each segment of the first text field to determine the redundant words, subjects, predicates, objects and presets included therein. Core words to facilitate subsequent determination of target word segmentation, and differentiated rewriting based on different types of target word segmentation.

可选地,在检测出所述第一文字段中的冗余词、主语、谓语、宾语及预设核心词之后,显示所述第一文字段,并将所述冗余词及预设核心词进行区别化显示,以便于用户更好地确定目标分词,以更有针对性地对文字段进行改写。例如,将冗余词以黄色显示,将核心词以红色显示。Optionally, after detecting redundant words, subjects, predicates, objects and preset core words in the first text field, the first text field is displayed, and the redundant words and preset core words are displayed. Differentiated display allows users to better determine the target word segmentation and rewrite text fields in a more targeted manner. For example, display redundant words in yellow and core words in red.

可选地,在一种实施方式中,本申请实施例所提供的语音识别方法,在步骤S300之前,还包括步骤S202:Optionally, in one implementation, the speech recognition method provided by the embodiment of the present application further includes step S202 before step S300:

S202、在接收对所述各第一分词中目标分词的第一输入的情况下,执行所述步骤S300。S202. When receiving the first input to the target word segmentation among the first word segments, execute the step S300.

上述步骤S202中,第一输入为包括各第一分词中目标分词的选择输入,以及确认对目标分词进行处理的输入;目标分词为各第一分词中需要处理的分词,该目标分词为目标分词中的至少一个,该分词可以由第一输入确定,也可以预先根据预设类型进行确定。In the above step S202, the first input is the selection input including the target word segmentation in each first word segmentation, and the input to confirm the processing of the target word segmentation; the target word segmentation is the word segmentation that needs to be processed in each first word segmentation, and the target word segmentation is the target word segmentation. At least one of them, the word segmentation can be determined by the first input, or can be determined in advance according to a preset type.

可选地,上述步骤S202具体包括:在预设时长内接收对所述各第一分词中目标分词的第一输入的情况下,执行收对所述各第一分词中目标分词的第四输入的情况下,执行所述步骤S300。上述预设时长为供用户确认是否对第一文字段进行改写的时长。如果在预设时长内未接收到用户对目标分词的第一输入,说明用户确定不需要对当前显示的第一文字段进行改写、修正,因而不再执行后续动作,并直接将上述第一文字段发送给后台服务器进行识别及执行对应的动作。Optionally, the above-mentioned step S202 specifically includes: in the case of receiving the first input of the target word segmentation in each of the first word segments within a preset time period, executing the fourth input of the target word segmentation in each of the first word segmentations. In the case of , perform step S300. The above-mentioned preset time period is the time period for the user to confirm whether to rewrite the first text field. If the user's first input of the target word segmentation is not received within the preset time period, it means that the user determines that there is no need to rewrite or modify the currently displayed first text field, so no subsequent actions are performed, and the above-mentioned first text field is sent directly. Recognize and perform corresponding actions for the backend server.

可选地,在一种具体实施方式中,在所述目标分词为冗余词的情况下,上述步骤S300包括步骤S301。Optionally, in a specific implementation, when the target word segmentation is a redundant word, the above step S300 includes step S301.

步骤S301、将所述目标分词从所述第一文字段中删除,生成目标文字段。Step S301: Delete the target word segmentation from the first text field to generate a target text field.

上述步骤S301中,在目标分词为冗余词的情况下,因为用户需要对该冗余词进行改写,而因为冗余词为多余、无意义的词,将该冗余词删除,可以明确、突出用户的真实意图,进而基于删除目标分词后的第一文字段,生成目标文字段,便于后台服务器更好地识别并执行对应的动作。In the above step S301, when the target word segmentation is a redundant word, because the user needs to rewrite the redundant word, and because the redundant word is a redundant and meaningless word, deleting the redundant word can clarify and Highlight the user's true intention, and then generate the target text field based on deleting the first text field after the target word segmentation, so that the background server can better identify and execute the corresponding action.

例如,若第一文字段为“走哈哈啊别忘了啊呀闹钟,你你你天气咋样”,其显示效果如图2所示;通过分词判断确定出“走哈哈啊”、“啊呀”及“你你你”均为冗余词,然后用户通过对“走哈哈啊”、“啊呀”及“你你你”点击进行裁剪,其操作效果如图3所示,然后得到“别忘闹钟,天气咋样”的目标文字段,其显示效果如图4所示。For example, if the first text field is "Let's go haha, don't forget the alarm clock, you, you, what's the weather like?", the display effect is as shown in Figure 2; through word segmentation judgment, it is determined that "go haha," "ah," and "ah," "You, you, you" are all redundant words, and then the user clicks on "Let's go haha," "Ah," and "you, you," to crop them. The operation effect is shown in Figure 3, and then the user gets "Don't forget the alarm clock, weather "What's up?" The display effect of the target text field is shown in Figure 4.

可选地,在一种具体实施方式中,在所述目标分词为预设核心词的情况下,上述步骤S300包括步骤S302~S305。Optionally, in a specific implementation, when the target word segmentation is a preset core word, the above step S300 includes steps S302 to S305.

步骤S302、接收对所述各第一分词中目标分词的第一输入。Step S302: Receive the first input of the target word segmentation in each of the first word segments.

上述步骤S302中,第一输入为包括各第一分词中目标分词的选择输入,以及确认对目标分词进行处理的输入;目标分词为各第一分词中需要处理的分词,该目标分词为目标分词中的至少一个,该分词可以由第一输入确定,也可以预先根据预设类型进行确定。In the above step S302, the first input is the selection input including the target word segmentation in each first word segmentation, and the input to confirm the processing of the target word segmentation; the target word segmentation is the word segmentation that needs to be processed in each first word segmentation, and the target word segmentation is the target word segmentation. At least one of them, the word segmentation can be determined by the first input, or can be determined in advance according to a preset type.

步骤S303、响应于所述第一输入,生成与所述目标分词匹配的第一字符串推荐列表。Step S303: In response to the first input, generate a first string recommendation list matching the target word segmentation.

上述步骤S303中,在目标分词为预设核心词的情况下,也即在接收到对第一文字段中预设核心词的第一输入的情况下,说明用户需要对该预设核心词进行改写,而因为第一文字段中的预设核心词为表达句子语义的关键,用户希望对其进行改写或修正的话,说明该词并正确识别出音频数据对应的文字,或者虽然正确识别出了音频数据对应的文字,但并未精准地表达出用户的意图,因而生成与目标分词匹配的第一字符串推荐列并显示,以便于用于选择出能够准确表达其真实意图的字符串。In the above step S303, when the target word segmentation is the preset core word, that is, when the first input of the preset core word in the first text field is received, it means that the user needs to rewrite the preset core word. , and because the preset core word in the first text field is the key to expressing the semantics of the sentence, if the user wants to rewrite or modify it, explain the word and correctly identify the text corresponding to the audio data, or even though the audio data is correctly identified The corresponding text does not accurately express the user's intention, so the first string recommendation column that matches the target word segmentation is generated and displayed, so that it can be used to select a string that can accurately express the true intention.

步骤S304、接收对所述第一字符串推荐列表中第一目标字符串的第二输入。Step S304: Receive a second input to the first target character string in the first character string recommendation list.

上述步骤S304中,第二输入为对第一字符串推荐列表中第一目标字符串的选择输入,具体可以是对上述第一目标字符串所在屏幕区域的点击、触碰等操作。In the above-mentioned step S304, the second input is a selection input of the first target string in the first string recommendation list. Specifically, it may be a click, touch or other operation on the screen area where the first target string is located.

步骤S305、响应于所述第二输入,将所述第一文字段中所述目标分词所在单句替换为所述第一目标字符串,生成目标文字段。Step S305: In response to the second input, replace the single sentence containing the target word segment in the first text field with the first target character string to generate a target text field.

上述步骤S305中,将目标分词所在的单句替换为第二输入所确定的第一目标字符串,并基于替换处理后的第一文字段,生成能够准确表达用户真实意图的目标文字段。其中,单句为将第一文字段按预设标点符号进行分割后得到的句子,该预设标点符号可以包括逗号、顿号及句号等。In the above step S305, the single sentence in which the target word segmentation is located is replaced with the first target string determined by the second input, and based on the replaced first text field, a target text field that can accurately express the user's true intention is generated. The single sentence is a sentence obtained by dividing the first text field according to preset punctuation marks. The preset punctuation marks may include commas, pauses, periods, etc.

例如,在用户通过对第一文字段“走哈哈啊别忘了啊呀闹钟,你你你天气咋样”中的冗余词“走哈哈啊”、“啊呀”及“你你你”点击进行裁剪,得到“别忘闹钟,天气咋样”后,继续对其中的预设核心词“闹钟”进行点击,则会显示基于“闹钟”的字符串推荐列表,而对预设核心词“天气”进行点击,则会显示基于“天气”的字符串推荐列表,其显示效果如图5所示;For example, when the user clicks on the redundant words "Let's go haha," "ah," and "you, you," in the first text field, "Let's go, haha, don't forget the alarm clock, you, you, what's the weather like?" After getting "Don't forget the alarm clock, how is the weather", continue to click on the preset core word "alarm clock", a string recommendation list based on "alarm clock" will be displayed, and click on the preset core word "weather" , a string recommendation list based on "weather" will be displayed, and the display effect is shown in Figure 5;

若在图5中点击“闹钟”后选中基于 “闹钟”的字符串推荐列表中的“定个闹钟”,则会将“闹钟”替换为“定个闹钟”,若点击“天气”后选中基于 “天气”的字符串推荐列表中的“今天天气怎么样”,则会将“天气咋样”替换为“今天天气怎么样”,其具体显示效果如图6所示。If you click "Alarm Clock" in Figure 5 and select "Set an Alarm Clock" in the string recommendation list based on "Alarm Clock", "Alarm Clock" will be replaced by "Set an Alarm Clock". If you click "Weather" and select "Set an Alarm Clock" based on "How is the weather today" in the string recommendation list of "Weather" will replace "How is the weather today" with "How is the weather today", and the specific display effect is shown in Figure 6.

上述具体实施方式中,在需要改写的目标分词为预设核心词时,通过显示与目标分词匹配的第一字符串推荐列表供用户选择对应的目标字符串,并基于用户选择的目标字符串替换目标分词所在的单句,从而生成能够准确表达用户真实意图的目标文字段。In the above specific implementation, when the target word segmentation that needs to be rewritten is a preset core word, the first string recommendation list matching the target word segmentation is displayed for the user to select the corresponding target string, and the target string is replaced based on the user selected The single sentence where the target word is located is generated to generate a target text field that can accurately express the user's true intention.

可选地,在一种实施方式中,在所述第一文字段中包括主语、谓语、宾语及预设核心词的情况下,上述步骤S300包括步骤S306~S308。Optionally, in one implementation, when the first text field includes a subject, a predicate, an object and a preset core word, the above step S300 includes steps S306 to S308.

本实施方式中,目标分词设置为第一文字段所包含的主语、谓语、宾语及预设核心词。In this implementation, the target word segmentation is set to the subject, predicate, object and preset core words included in the first text field.

本实施方式适用于场景业务单一的电子设备,例如有屏音箱,用户在使用音箱时,需要音箱执行只是播放音乐或播放电台。This implementation mode is suitable for electronic devices with a single scene service, such as screen speakers. When users use the speakers, they need the speakers to only play music or radio stations.

步骤S306、根据所述第一文字段中的主语、谓语、宾语、预设核心词及用户的使用日志,生成第二字符串推荐列表。Step S306: Generate a second string recommendation list based on the subject, predicate, object, preset core words in the first text field and the user's usage log.

上述步骤S306中,在第一文字段中包括主语、谓语、宾语及预设核心词的情况下,也即在接收到对第一文字段中目标分词的第一输入的情况下,说明用户需要对该目标分词所属单句进行替换处理,因而基于第一文字段中的主语、谓语、宾语、预设核心词及用户的使用日志,生成与当前第一文字段相关,且包含匹配用户使用习惯的字符串的第二字符串推荐列表,并显示该第二字符串推荐列表,以便于用于选择出能够准确表达其真实意图的字符串。In the above step S306, when the first text field includes the subject, the predicate, the object and the preset core word, that is, when the first input of the target word segmentation in the first text field is received, it means that the user needs to The single sentence to which the target word segmentation belongs is replaced. Therefore, based on the subject, predicate, object, preset core words in the first text field and the user's usage log, a third text string related to the current first text field and containing a string matching the user's usage habits is generated. A second string recommendation list is generated, and the second string recommendation list is displayed so as to select a string that can accurately express its true intention.

其中,在生成第二字符串推荐列表的过程中,先基于用户的使用日志,构建用户常用话术库,然后根据第一文字段中的上述主语、谓语、宾语及预设核心词,在用户常用话术库中进行检索,并将检索结果按匹配度由高到低进行排序,再输出排序值在预设排序值之前的话术生成上述第二字符串推荐列表Among them, in the process of generating the second string recommendation list, first a user's commonly used phrase library is constructed based on the user's usage log, and then based on the above-mentioned subjects, predicates, objects and preset core words in the first text field, the user's commonly used phrases are constructed. Search the utterance library, sort the search results from high to low in matching degree, and then output the utterances whose sorting value is before the preset sorting value to generate the above second string recommendation list

在实际应用中,生成第二目标推荐列表的具体步骤如下:In practical applications, the specific steps to generate the second target recommendation list are as follows:

(1)、先基于用户的使用日志进行分析,为每一个用户打上多个维度的标签,如:张三经常听“XXX”的歌,则可以为他打上一个标签 “XXX”,李四经常听古风类型的歌曲,则可以为他打上一个标签 “古风”;(1) First, analyze based on the user's usage log, and label each user with multiple dimensions. For example, if Zhang San often listens to "XXX" songs, he can be labeled with "XXX". Li Si often listens to "XXX" songs. If you listen to ancient-style songs, you can label them as "ancient style";

(2)、然后通过频数进行判断目标话术是否为用户常用话术,为每条话术也打上多个维度的标签,记为Set1,具体可以记录为(用户,常用话术,各种标签),如:(用户ID1,“播放稻香”,流行音乐,XXX);(2) Then use the frequency to determine whether the target utterance is a commonly used utterance by users, and label each utterance with multiple dimensions, recorded as Set1. Specifically, it can be recorded as (user, common utterance, various labels ), such as: (User ID1, "Play Daoxiang", pop music, XXX);

(3)、基于用户行为数据,采用矩阵分解或者Embedding技术找出相互关联的用户,并获取相似用户的常用话术集合,记为Set2,具体可以记录为(用户,相似用户,相似用户常用话术);例如:(用户ID1,[用户ID2,用户ID6],[“播放晴天”,“来一首Mojito”]);(3) Based on user behavior data, use matrix decomposition or Embedding technology to find related users, and obtain a set of common phrases of similar users, recorded as Set2, which can be recorded specifically as (user, similar user, common phrase of similar user) technique); for example: (User ID1, [User ID2, User ID6], ["Play Sunny Day", "Let's play a Mojito"]);

(4)、针对第一文字段进行文本分析,结合依存句法分析和预设核心词抽取技术,获取主谓宾以及预设核心词,作为检索项;(4) Conduct text analysis on the first text field, combine dependency syntax analysis and preset core word extraction technology to obtain subject, predicate, object and preset core words as search terms;

(5)、采用上述项,检索 Set1和Set2,采用BM25打分,分别召回若干话术,然后合在一起排序后,输出排序值为1~N的话术作为上述第二字符串推荐列表。(5) Use the above items to retrieve Set1 and Set2, use BM25 to score, recall a number of utterances respectively, and then sort them together, and output the utterances with a sorting value of 1~N as the second string recommendation list.

步骤S307、接收对所述第二字符串推荐列表中第二目标字符串的第三输入。Step S307: Receive a third input to the second target string in the second string recommendation list.

上述步骤S307中,第三输入为对第二字符串推荐列表中第二目标字符串的选择输入,第三输入同时也是上述第一输入,是确认以第二目标字符串对目标分词进行处理的输入,具体可以是对上述第二目标字符串所在屏幕区域的点击、触碰等操作。In the above-mentioned step S307, the third input is the selection input of the second target string in the second string recommendation list. The third input is also the above-mentioned first input, which is to confirm that the target word segmentation is processed with the second target string. Input may specifically be a click, touch or other operation on the screen area where the second target string is located.

步骤S308、将所述第二目标字符串确定为目标文字段。Step S308: Determine the second target character string as a target text field.

上述步骤S308中,因为第二目标字符串为用户选择确定,且跟第一文字段相关,且匹配了用户的使用习惯,直接将第二目标字符串确定为目标字符串,也即将第一文字段替换为所述第二目标字符串,可以贴合用户的真实意图。In the above step S308, because the second target string is selected by the user, is related to the first text field, and matches the user's usage habits, the second target string is directly determined as the target string, that is, the first text field is replaced. is the second target string, which can fit the user's true intention.

例如,若第一文字段为“哈哈啊稻香播放闹钟,你你XXX啊”,则会在该文字段上方显示基于该第一文字段中的主语、谓语、宾语、预设核心词及用户的使用日志生成的第二字符串推荐列表,其显示效果如图7所示;当用户点击第二字符串推荐列表中的“播放稻香”时,则会将显示的第一文字段替换为“播放稻香”,其显示效果如图8所示。For example, if the first text field is "Haha, Daoxiang plays the alarm clock, you, you XXX", the subject, predicate, object, default core words and user usage in the first text field will be displayed above the text field. The display effect of the second string recommendation list generated by the log is shown in Figure 7; when the user clicks "Play Daoxiang" in the second string recommendation list, the displayed first text field will be replaced with "Play Daoxiang" "Fragrance", the display effect is shown in Figure 8.

上述实施方式,在第一文字段满足第一预设条件的情况下,先自动根据检测出的第一文字段中所包含的主语、谓语、宾语、预设核心词及用户的使用日志,生成与当前第一文字段相关,且包含匹配用户使用习惯的字符串的第二字符串推荐列表,并显示该第二字符串推荐列表,然后在接收到用户对第二目标字符串的选择操作时,将第二目标字符串作为目标文字段,也即执行了对目标分词进行处理生成目标文字段的步骤。In the above embodiment, when the first text field satisfies the first preset condition, the system automatically generates the current text based on the subject, predicate, object, preset core words and user's usage log contained in the detected first text field. A second string recommendation list is related to the first text field and contains strings that match the user's usage habits, and the second string recommendation list is displayed, and then when receiving the user's selection operation on the second target string, the second string recommendation list is The second target string is used as the target text field, that is, the steps of processing the target word segmentation to generate the target text field are performed.

上述实施方式,在第一文字段满足第一预设条件的情况下,通过第一文字段所包含的主语、谓语、宾语、预设核心词及用户的使用日志,生成与当前第一文字段相关,且包含匹配用户使用习惯的字符串的第二字符串推荐列表,进而便于用户快速从第二字符串推荐列表中选择能够准确表达其真实意图的目标字符串。In the above embodiment, when the first text field satisfies the first preset condition, the subject, predicate, object, preset core words contained in the first text field and the user's usage log are used to generate information related to the current first text field, and The second string recommendation list includes a string that matches the user's usage habits, thereby facilitating the user to quickly select a target string that can accurately express his true intention from the second string recommendation list.

请参阅图9,示出了本申请实施例所提供的语音识别方法的一种执行流程图。如图9所示,在接收到包括用户输入语音及外界噪音的音频数据后,利用ASR识别音频获取原始上屏话术,即第一文字段,然后通过判断该第一文字段是否满足第一预设条件来确定其是否为异常话术,若该第一文字段不满足第一预设条件,则说明书该第一文字段为正常话术,可以直接作为待执行话术;若第一文字段满足第一预设条件,则说明书该第一文字段为异常话术,因而需要进行话术裁剪或整句替换,再基于处理后的第一文字段得到待执行话术,即目标文字段。Please refer to FIG. 9 , which shows an execution flow chart of the speech recognition method provided by the embodiment of the present application. As shown in Figure 9, after receiving the audio data including the user's input voice and external noise, ASR is used to recognize the audio to obtain the original on-screen speech, that is, the first text field, and then determine whether the first text field satisfies the first preset Conditions are used to determine whether it is an abnormal speech technique. If the first text field does not meet the first preset condition, it indicates that the first text field is a normal speech technique and can be directly used as the speech technique to be executed; if the first text field meets the first preset condition Assuming a condition, the first text field of the description is an abnormal utterance, so the utterance needs to be trimmed or the entire sentence replaced, and then the utterance to be executed is obtained based on the processed first text field, that is, the target text field.

请参阅图10,示出了本申请实施例所提供的冗余词判断过程示意图,如图10所示,在利用ASR识别音频获取了原始上屏话术,即获取第一文字段后,先对第一文字段进行分词处理,并基于分词处理后得到的各词语形成候选词集合,依次选择其中一个作为候选词进行核心词检测,以判断当前候选词是否为预设核心词;如果当前候选词不属于预设核心词,则结合当前候选词上下文输入预先训练好的语言模型中判断是否符合预设语言表达方式,同时通过热词拼音检测当前候选词是否与预设热词库中的热词相似;如果当前候选词与其上下文的结合不符合预设语言表达方式且预设热词库中不存在与当前候选词相似的热词,则判断当前候选词为冗余词,否则判断当前候选词不是冗余词;而如果当前候选词属于预设核心词,则选择选择下一个分词作为候选词,直至遍历检测候选词集合中所有词语。Please refer to Figure 10, which shows a schematic diagram of the redundant word determination process provided by the embodiment of the present application. As shown in Figure 10, after using ASR to recognize the audio to obtain the original on-screen speech, that is, after obtaining the first text field, first The first text field is subjected to word segmentation processing, and a candidate word set is formed based on each word obtained after word segmentation processing, and one of them is selected as a candidate word for core word detection to determine whether the current candidate word is a preset core word; if the current candidate word is not If it is a preset core word, then the context of the current candidate word is input into the pre-trained language model to determine whether it conforms to the preset language expression. At the same time, the pinyin of the hot word is used to detect whether the current candidate word is similar to the hot word in the preset hot word database. ; If the combination of the current candidate word and its context does not conform to the preset language expression and there is no hot word similar to the current candidate word in the preset hot word library, then the current candidate word is judged to be a redundant word, otherwise it is judged that the current candidate word is not Redundant words; if the current candidate word belongs to the preset core word, then the next word segmentation is selected as the candidate word until all words in the candidate word set are traversed and detected.

请参阅图11,示出了本申请实施例所提供的语音识别方法的另一种执行流程图。如图11所示,在接收到包括用户输入语音及外界噪音的音频数据后,利用ASR识别音频获取原始上屏话术,即第一文字段,然后通过判断该第一文字段是否满足第一预设条件来确定其是否为异常话术,若该第一文字段不满足第一预设条件,则说明书该第一文字段为正常话术,可以直接作为待执行话术;若第一文字段满足第一预设条件,则说明书该第一文字段为异常话术,因而基于第一文字段及用户的使用日志生成个性化推荐列表,即第二字符串推荐列表,然后基于用户的选择确定是否进行整句替换;如果用户点击了推荐列表中的目标字符串,则基于选择的目标字符串生成待执行话术,即目标文字段;如果用户未点击推荐列表中的目标字符串,则直接将第一文字段作为待执行话术,即目标文字段。Please refer to FIG. 11 , which shows another execution flow chart of the speech recognition method provided by the embodiment of the present application. As shown in Figure 11, after receiving the audio data including the user's input voice and external noise, ASR is used to recognize the audio to obtain the original on-screen speech, that is, the first text field, and then determine whether the first text field satisfies the first preset Conditions are used to determine whether it is an abnormal speech technique. If the first text field does not meet the first preset condition, it indicates that the first text field is a normal speech technique and can be directly used as the speech technique to be executed; if the first text field meets the first preset condition Assuming a condition, it means that the first text field is an abnormal speech, so a personalized recommendation list, that is, a second string recommendation list, is generated based on the first text field and the user's usage log, and then it is determined whether to replace the entire sentence based on the user's choice; If the user clicks the target string in the recommendation list, the speech to be executed is generated based on the selected target string, that is, the target text field; if the user does not click the target string in the recommendation list, the first text field is directly used as the to-be-executed phrase. Perform the discourse, that is, the target text field.

请参阅图12,示出了本申请实施例所提供的第二字符串推荐列表的生成过程示意图。如图12所示,在确定接收到的音频数据的原始上屏话术满足第一预设条件的情况下,即确定第一文字段为异常话术的情况下,利用依存句法分析及核心词提取技术,获取其中的主语、谓语、宾语及预设核心词,然后基于主语、谓语、宾语、预设核心词及当前用户ID,检索由用户常用话术库及相似用户常用话术库构建的常用话术库,并设置生成排序值为1~N的TopN候选话术选项,然后基于BM25打分算法,对检索到话术进行BM25打分及排序,将排序值为1~N的候选话术输出并显示,即得到第二字符串推荐列表。Please refer to FIG. 12 , which is a schematic diagram of the generation process of the second string recommendation list provided by the embodiment of the present application. As shown in Figure 12, when it is determined that the original on-screen utterance of the received audio data satisfies the first preset condition, that is, when it is determined that the first text field is an abnormal utterance, dependency syntax analysis and core word extraction are used technology, obtain the subject, predicate, object and preset core words, and then based on the subject, predicate, object, preset core words and the current user ID, retrieve commonly used expressions constructed from the user's common expression database and similar users' common expression database. The speech skills library is set, and the option of generating TopN candidate speech skills with a sorting value of 1~N is set, and then based on the BM25 scoring algorithm, the retrieved speech skills are scored and sorted by BM25, and the candidate speech skills with a sorting value of 1~N are output and Display, that is, the second string recommendation list is obtained.

需要说明的是,本申请实施例提供的语音识别方法,执行主体可以为终端设备,或者该终端设备中的用于执行加载语音识别方法的控制模块。本申请实施例中以文终端设备执行加载语音识别方法为例,说明本申请实施例提供的语音识别方法。It should be noted that, for the speech recognition method provided by the embodiment of the present application, the execution subject may be a terminal device, or a control module in the terminal device for executing and loading the speech recognition method. In the embodiment of the present application, the speech recognition method executed by a text terminal device is used as an example to illustrate the speech recognition method provided by the embodiment of the present application.

请参阅图13,示出了本申请实施例提供的一种语音识别装置的结构示意图,所述语音识别装置,如图13所示,所述装置包括:Please refer to Figure 13, which shows a schematic structural diagram of a speech recognition device provided by an embodiment of the present application. The speech recognition device, as shown in Figure 13, includes:

转换模块131,用于将接收的音频数据转化为第一文字段;The conversion module 131 is used to convert the received audio data into the first text field;

检测模块132,用于在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词;其中,所述第一预设条件包括以下至少之一:文字总数大于预设字数阈值且包含预设核心词;文字总数大于预设字数阈值且预设话术库中存在与所述第一文字段相似度大于预设相似度阈值的第二文字段;The detection module 132 is configured to detect each first word segment of a preset type in the first text field when the first text field meets a first preset condition; wherein the first preset condition includes at least the following: One: the total number of words is greater than the preset word count threshold and includes the preset core words; the total number of words is greater than the preset word count threshold and there is a second word in the preset vocabulary library that is similar to the first text field and is greater than the preset similarity threshold. part;

处理模块133,用于对所述第一文字段中的目标分词进行处理生成目标文字段;其中,对所述目标分词的处理包括以下至少之一:删除所述目标分词、将所述目标分词所属单句替换为目标字符串。The processing module 133 is configured to process the target word segmentation in the first text field to generate a target text field; wherein the processing of the target word segmentation includes at least one of the following: deleting the target word segmentation, changing the target word segmentation to which it belongs. Single sentences are replaced with the target string.

可选地,所述的装置中,所述检测模块132,具体用于在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中的冗余词、主语、谓语、宾语及预设核心词;Optionally, in the device, the detection module 132 is specifically configured to detect redundant words, subjects, and predicates in the first text field when the first text field meets the first preset condition. , object and default core words;

其中,所述冗余词为第一文字段中与前后预设位数的分词的组合不符合预设语言表达方式,且不存在于预设热词库中的分词。Wherein, the redundant words are word segments whose combination with the preset digits before and after the first text field does not conform to the preset language expression and does not exist in the preset hot word database.

可选地,所述的装置中,所述处理模块133包括:Optionally, in the above device, the processing module 133 includes:

第一处理单元,用于在所述目标分词为冗余词的情况下,将所述目标分词从所述第一文字段中删除,生成目标文字段。。The first processing unit is configured to delete the target word segmentation from the first text field and generate a target text field when the target word segmentation is a redundant word. .

可选地,所述的装置中,所述处理模块133还包括:Optionally, in the above device, the processing module 133 also includes:

第一接收单元,用于在所述目标分词为预设核心词的情况下,接收对所述各第一分词中目标分词的第一输入;A first receiving unit configured to receive the first input of the target word segmentation in each of the first word segments when the target word segmentation is a preset core word;

第一生成单元,用于在所述目标分词为预设核心词的情况下,响应于所述第一输入,生成与所述目标分词匹配的第一字符串推荐列表;A first generation unit configured to generate a first string recommendation list that matches the target word segmentation in response to the first input when the target word segmentation is a preset core word;

第二接收单元,用于接收对所述第一字符串推荐列表中第一目标字符串的第二输入;a second receiving unit configured to receive a second input to the first target string in the first string recommendation list;

第二处理单元,用于响应于所述第二输入,将所述第一文字段中所述目标分词所在单句替换为所述第一目标字符串,生成目标文字段。The second processing unit is configured to respond to the second input by replacing the single sentence containing the target word segmentation in the first text field with the first target string to generate a target text field.

可选地,所述的装置中,所述处理模块还包括:Optionally, in the device, the processing module further includes:

第二生成单元,用于在所述第一文字段中包括主语、谓语、宾语及预设核心词的情况下,根据所述第一文字段中的主语、谓语、宾语、预设核心词及用户的使用日志,生成第二字符串推荐列表;The second generation unit is configured to, when the first text field includes a subject, a predicate, an object and a preset core word, generate Use logs to generate a second string recommendation list;

第三接收单元,用于接收对所述第二字符串推荐列表中第二目标字符串的第三输入;A third receiving unit configured to receive a third input to the second target string in the second string recommendation list;

第三处理单元,用于将所述第二目标字符串确定为目标文字段。The third processing unit is used to determine the second target character string as a target text field.

本申请实施例中的语音识别装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personaldigital assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(NetworkAttached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。The speech recognition device in the embodiment of the present application may be a device, or may be a component, integrated circuit, or chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. For example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a personal digital assistant (personal digital assistant). , PDA), etc., the non-mobile electronic device can be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., embodiments of the present application No specific limitation is made.

本申请实施例中的语音识别装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The speech recognition device in the embodiment of the present application may be a device with an operating system. The operating system can be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of this application.

本申请实施例提供的语音识别装置能够实现图1至图12的方法实施例中文档显示方法实现的各个过程,为避免重复,这里不再赘述。The speech recognition device provided by the embodiment of the present application can implement various processes implemented by the document display method in the method embodiments of Figures 1 to 12. To avoid duplication, they will not be described again here.

本申请实施例中,在进行语音识别时,先由转换模块131将接收的音频数据转化为第一文字段;然后由检测模块132在第一文字段满足判定为异常话术的第一预设条件的情况下,检测第一文字段中预设类型的各第一分词;再由处理模块133对第一文字段中的目标分词进行处理生成目标文字段。通过在接收的音频数据转化的第一文字段满足第一预设条件的情况下,即第一文字段判定为异常话术的情况下,对第一文字段中预设类型的各第一分词进行删除或替换处理,能更好地明确用户的意图,快速完成改写,有效提升语音识别的执行效果。In the embodiment of the present application, when performing speech recognition, the conversion module 131 first converts the received audio data into a first text field; and then the detection module 132 satisfies the first preset condition for determining abnormal speech skills in the first text field. In this case, each first word segment of the preset type in the first text field is detected; and then the processing module 133 processes the target word segmentation in the first text field to generate a target text field. By deleting or deleting each first segment of the preset type in the first text field when the first text field converted from the received audio data satisfies the first preset condition, that is, when the first text field is determined to be abnormal speech. Replacement processing can better clarify the user's intention, quickly complete the rewriting, and effectively improve the execution effect of speech recognition.

可选的,本申请实施例还提供一种电子设备,包括处理器,存储器,存储在存储器上并可在所述处理器上运行的程序或指令,该程序或指令被处理器执行时实现上述语音识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, embodiments of the present application also provide an electronic device, including a processor, a memory, and a program or instructions stored in the memory and executable on the processor. When the program or instructions are executed by the processor, the above is implemented. Each process of the embodiment of the speech recognition method can achieve the same technical effect. To avoid duplication, it will not be described again here.

需要注意的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and non-mobile electronic devices described above.

图14为实现本申请实施例的一种电子设备的硬件结构示意图。Figure 14 is a schematic diagram of the hardware structure of an electronic device that implements an embodiment of the present application.

该电子设备140包括但不限于:射频单元1401、网络模块1402、音频输出单元1403、输入单元1404、传感器1405、显示单元1406、用户输入单元1407 、接口单元1408、存储器1409、以及处理器1410等部件。The electronic device 140 includes but is not limited to: radio frequency unit 1401, network module 1402, audio output unit 1403, input unit 1404, sensor 1405, display unit 1406, user input unit 1407, interface unit 1408, memory 1409, processor 1410, etc. part.

本领域技术人员可以理解,电子设备140还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1410逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图14中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art can understand that the electronic device 140 may also include a power supply (such as a battery) that supplies power to various components. The power supply may be logically connected to the processor 1410 through a power management system, thereby managing charging, discharging, and function through the power management system. Consumption management and other functions. The structure of the electronic device shown in Figure 14 does not constitute a limitation of the electronic device. The electronic device may include more or less components than shown in the figure, or combine certain components, or arrange different components, which will not be described again here. .

其中,用户输入单元1407,在本申请实施例中包括显示界面;Among them, the user input unit 1407 includes a display interface in this embodiment of the application;

处理器1410,用于将接收的音频数据转化为第一文字段;在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词;对所述第一文字段中的目标分词进行处理生成目标文字段;其中,所述第一预设条件包括以下至少之一:文字总数大于预设字数阈值且包含预设核心词;文字总数大于预设字数阈值且预设话术库中存在与所述第一文字段相似度大于预设相似度阈值的第二文字段;对所述目标分词的处理包括以下至少之一:删除所述目标分词、将所述目标分词所属单句替换为目标字符串。Processor 1410, configured to convert the received audio data into a first text field; when the first text field meets a first preset condition, detect each first word segment of a preset type in the first text field; The target word segmentation in the first text field is processed to generate a target text field; wherein, the first preset condition includes at least one of the following: the total number of words is greater than the preset word count threshold and includes preset core words; the total number of words is greater than the preset There is a word count threshold and there is a second text field in the preset vocabulary library that is more similar to the first text field than the preset similarity threshold; processing the target word segmentation includes at least one of the following: deleting the target word segmentation, The single sentence to which the target word segment belongs is replaced with the target string.

本申请实施例提供的电子设备,通过在接收的音频数据转化的第一文字段满足第一预设条件的情况下,即第一文字段判定为异常话术的情况下,对第一文字段中预设类型的各第一分词进行删除或替换处理,能更好地明确用户的意图,快速完成改写,有效提升语音识别的执行效果。The electronic device provided by the embodiment of the present application, when the first text field converted into the received audio data satisfies the first preset condition, that is, when the first text field is determined to be abnormal speech, the preset value in the first text field is modified. Deleting or replacing each first participle of the type can better clarify the user's intention, quickly complete the rewriting, and effectively improve the execution effect of speech recognition.

可选的,处理器1410,具体用于在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中的冗余词、主语、谓语、宾语及预设核心词;Optionally, the processor 1410 is specifically configured to detect redundant words, subjects, predicates, objects and preset core words in the first text field when the first text field meets the first preset condition;

其中,所述冗余词为第一文字段中与前后预设位数的分词的组合不符合预设语言表达方式,且不存在于预设热词库中的分词。Wherein, the redundant words are word segments whose combination with the preset digits before and after the first text field does not conform to the preset language expression and does not exist in the preset hot word database.

可选的,处理器1410,具体用于在所述目标分词为冗余词的情况下,响应于所述第一输入,将所述目标分词从所述第一文字段中删除,生成目标文字段。Optionally, the processor 1410 is specifically configured to, in response to the first input, delete the target word segmentation from the first text field and generate a target text field when the target word segmentation is a redundant word. .

可选的,处理器1410,还用于在所述目标分词为预设核心词的情况下,接收对所述各第一分词中目标分词的第一输入;响应于所述第一输入,生成与所述目标分词匹配的第一字符串推荐列表;接收对所述第一字符串推荐列表中第一目标字符串的第二输入;响应于所述第二输入,将所述第一文字段中所述目标分词所在单句替换为所述第一目标字符串,生成目标文字段。Optionally, the processor 1410 is also configured to receive a first input of the target word segmentation in each of the first word segments when the target word segmentation is a preset core word; in response to the first input, generate a first string recommendation list that matches the target word segmentation; receiving a second input to a first target string in the first string recommendation list; in response to the second input, adding the first text field to The single sentence in which the target word segment is located is replaced with the first target string to generate a target text field.

可选的,处理器1410,还用于在所述第一文字段中包括主语、谓语、宾语及预设核心词的情况下,根据所述第一文字段中的主语、谓语、宾语、预设核心词及用户的使用日志,生成第二字符串推荐列表;接收对所述第二字符串推荐列表中第二目标字符串的第三输入;将所述第二目标字符串确定为目标文字段。Optionally, the processor 1410 is also configured to, when the first text field includes a subject, a predicate, an object and a preset core word, based on the subject, predicate, object and preset core word in the first text field. words and the user's usage log to generate a second string recommendation list; receive a third input of a second target string in the second string recommendation list; and determine the second target string as a target text field.

本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述语音识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present application also provide a readable storage medium. Programs or instructions are stored on the readable storage medium. When the program or instructions are executed by a processor, each process of the above speech recognition method embodiment is implemented and the same can be achieved. The technical effects will not be repeated here to avoid repetition.

其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory, ROM)、随机存取存储器(Random Access Memory, RAM)、磁碟或者光盘等。Wherein, the processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage media, such as computer read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述语音识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。An embodiment of the present application further provides a chip. The chip includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the above speech recognition method embodiment. Each process can achieve the same technical effect. To avoid duplication, it will not be described again here.

应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chips mentioned in the embodiments of this application may also be called system-on-chip, system-on-a-chip, system-on-a-chip or system-on-chip, etc.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, method, article or apparatus. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, but may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions may be performed, for example, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of this application.

上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings. However, the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Inspired by this application, many forms can be made without departing from the purpose of this application and the scope protected by the claims, all of which fall within the protection of this application.

Claims (9)

Translated fromChinese
1.一种语音识别方法,其特征在于,所述方法包括:1. A speech recognition method, characterized in that the method includes:将接收的音频数据转化为第一文字段;Convert the received audio data into the first text field;在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词;其中,所述第一预设条件包括以下至少之一:文字总数大于预设字数阈值且包含预设核心词;文字总数大于预设字数阈值且预设话术库中存在与所述第一文字段相似度大于预设相似度阈值的第二文字段;When the first text field satisfies a first preset condition, each first word segment of the preset type in the first text field is detected; wherein the first preset condition includes at least one of the following: the total number of words is greater than There is a preset word count threshold and includes a preset core word; the total number of words is greater than the preset word count threshold and there is a second text field in the preset vocabulary library that is similar to the first text field and is greater than the preset similarity threshold;对所述第一文字段中的目标分词进行处理生成目标文字段;其中,对所述目标分词的处理包括以下至少之一:删除所述目标分词、将所述目标分词所属单句替换为目标字符串;Process the target word segmentation in the first text field to generate a target text field; wherein the processing of the target word segmentation includes at least one of the following: deleting the target word segmentation and replacing the single sentence to which the target word segmentation belongs with the target string ;其中,所述在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词的步骤,包括:Wherein, the step of detecting each first word segment of a preset type in the first text field when the first text field satisfies a first preset condition includes:在所述第一文字段满足第一预设条件的情况下,结合依存句法分析和预设核心词抽取技术,检测所述第一文字段中的冗余词、主语、谓语、宾语及预设核心词,所述冗余词为第一文字段中与前后预设位数的分词的组合不符合预设语言表达方式,且不存在于预设热词库中的分词;将所述组合输入到预训练好的语言模型中,由所述语言模型对所述组合进行打分,当分值大于第一模型阈值时,则认为所述组合不符合预设语言表达方式;所述预设热词库是通过分析日志并统计词语出现频数大于预设频数阈值的词语组成的。When the first text field meets the first preset condition, dependency syntax analysis and preset core word extraction technology are combined to detect redundant words, subjects, predicates, objects and preset core words in the first text field. , the redundant words are word segments whose combination with the preset number of digits before and after in the first text field does not conform to the preset language expression and does not exist in the preset hot word library; input the combination into the pre-training In a good language model, the language model scores the combination. When the score is greater than the first model threshold, it is considered that the combination does not comply with the preset language expression; the preset hot word database is obtained by Analyze the log and count the words whose frequency is greater than the preset frequency threshold.2.根据权利要求1所述的语音识别方法,其特征在于,在所述目标分词为冗余词的情况下,所述对所述第一文字段中的目标分词进行处理生成目标文字段的步骤,包括:2. The speech recognition method according to claim 1, characterized in that, when the target word segmentation is a redundant word, the step of processing the target word segmentation in the first text field to generate a target text field. ,include:将所述目标分词从所述第一文字段中删除,生成目标文字段。The target word segmentation is deleted from the first text field to generate a target text field.3.根据权利要求1所述的语音识别方法,其特征在于,在所述目标分词为预设核心词的情况下,所述对所述第一文字段中的目标分词进行处理生成目标文字段的步骤,包括:3. The speech recognition method according to claim 1, characterized in that, when the target word segmentation is a preset core word, the target word segmentation in the first text field is processed to generate a target text field. steps, including:接收对所述各第一分词中目标分词的第一输入;receiving a first input to a target word segment in each of the first word segments;响应于所述第一输入,生成与所述目标分词匹配的第一字符串推荐列表;In response to the first input, generating a first string recommendation list that matches the target word segmentation;接收对所述第一字符串推荐列表中第一目标字符串的第二输入;receiving a second input to a first target string in the first string recommendation list;响应于所述第二输入,将所述第一文字段中所述目标分词所在单句替换为所述第一目标字符串,生成目标文字段。In response to the second input, the single sentence containing the target word segment in the first text field is replaced with the first target string to generate a target text field.4.根据权利要求1所述的语音识别方法,其特征在于,在所述第一文字段中包括主语、谓语、宾语及预设核心词的情况下,所述对所述第一文字段中的目标分词进行处理生成目标文字段的步骤,包括:4. The speech recognition method according to claim 1, characterized in that, when the first text field includes a subject, a predicate, an object and a preset core word, the target in the first text field is The steps for word segmentation processing to generate target text fields include:根据所述第一文字段中的主语、谓语、宾语、预设核心词及用户的使用日志,生成第二字符串推荐列表;Generate a second string recommendation list based on the subject, predicate, object, preset core words and the user's usage log in the first text field;接收对所述第二字符串推荐列表中第二目标字符串的第三输入;receiving a third input to a second target string in the second string recommendation list;将所述第二目标字符串确定为目标文字段。The second target character string is determined as a target text field.5.一种语音识别装置,其特征在于,所述装置包括:5. A speech recognition device, characterized in that the device includes:转换模块,用于将接收的音频数据转化为第一文字段;A conversion module used to convert the received audio data into the first text field;检测模块,用于在所述第一文字段满足第一预设条件的情况下,检测所述第一文字段中预设类型的各第一分词;其中,所述第一预设条件包括以下至少之一:文字总数大于预设字数阈值且包含预设核心词;文字总数大于预设字数阈值且预设话术库中存在与所述第一文字段相似度大于预设相似度阈值的第二文字段;A detection module configured to detect each first word segment of a preset type in the first text field when the first text field satisfies a first preset condition; wherein the first preset condition includes at least one of the following 1: The total number of words is greater than the preset word count threshold and includes the preset core words; the total number of words is greater than the preset word count threshold and there is a second text field in the preset vocabulary library that is more similar to the first text field than the preset similarity threshold. ;处理模块,用于对所述第一文字段中的目标分词进行处理生成目标文字段;其中,对所述目标分词的处理包括以下至少之一:删除所述目标分词、将所述目标分词所属单句替换为目标字符串;A processing module, configured to process the target word segmentation in the first text field to generate a target text field; wherein the processing of the target word segmentation includes at least one of the following: deleting the target word segmentation, and converting the single sentence to which the target word segmentation belongs. Replace with target string;其中,所述检测模块,具体用于在所述第一文字段满足第一预设条件的情况下,结合依存句法分析和预设核心词抽取技术,检测所述第一文字段中的冗余词、主语、谓语、宾语及预设核心词;所述冗余词为第一文字段中与前后预设位数的分词的组合不符合预设语言表达方式,且不存在于预设热词库中的分词;将所述组合输入到预训练好的语言模型中,由所述语言模型对所述组合进行打分,当分值大于第一模型阈值时,则认为所述组合不符合预设语言表达方式;所述预设热词库是通过分析日志并统计词语出现频数大于预设频数阈值的词语组成的。Wherein, the detection module is specifically used to detect redundant words, Subject, predicate, object and preset core words; the redundant words are those in the first text field that the combination of the preset digits before and after the participles does not conform to the preset language expression and does not exist in the preset hot thesaurus Word segmentation; input the combination into the pre-trained language model, and the language model scores the combination. When the score is greater than the first model threshold, it is considered that the combination does not comply with the preset language expression. ; The preset hot thesaurus is formed by analyzing logs and counting words whose occurrence frequency is greater than the preset frequency threshold.6.根据权利要求5所述的语音识别装置,其特征在于,所述处理模块包括:6. The speech recognition device according to claim 5, characterized in that the processing module includes:第一处理单元,用于在所述目标分词为冗余词的情况下,将所述目标分词从所述第一文字段中删除,生成目标文字段。The first processing unit is configured to delete the target word segmentation from the first text field and generate a target text field when the target word segmentation is a redundant word.7.根据权利要求5所述的语音识别装置,其特征在于,所述处理模块还包括:7. The speech recognition device according to claim 5, wherein the processing module further includes:第一接收单元,用于在所述目标分词为预设核心词的情况下,接收对所述各第一分词中目标分词的第一输入;A first receiving unit configured to receive the first input of the target word segmentation in each of the first word segments when the target word segmentation is a preset core word;第一生成单元,用于响应于所述第一输入,生成与所述目标分词匹配的第一字符串推荐列表;A first generation unit configured to generate a first string recommendation list that matches the target word segmentation in response to the first input;第二接收单元,用于接收对所述第一字符串推荐列表中第一目标字符串的第二输入;a second receiving unit configured to receive a second input to the first target string in the first string recommendation list;第二处理单元,用于响应于所述第二输入,将所述第一文字段中所述目标分词所在单句替换为所述第一目标字符串,生成目标文字段。The second processing unit is configured to respond to the second input by replacing the single sentence containing the target word segmentation in the first text field with the first target string to generate a target text field.8.根据权利要求5所述的语音识别装置,其特征在于,所述处理模块还包括:8. The speech recognition device according to claim 5, wherein the processing module further includes:第二生成单元,用于在所述第一文字段中包括主语、谓语、宾语及预设核心词的情况下,根据所述第一文字段中的主语、谓语、宾语、预设核心词及用户的使用日志,生成第二字符串推荐列表;The second generation unit is configured to, when the first text field includes a subject, a predicate, an object and a preset core word, generate Use logs to generate a second string recommendation list;第三接收单元,用于接收对所述第二字符串推荐列表中第二目标字符串的第三输入;A third receiving unit configured to receive a third input to the second target string in the second string recommendation list;第三处理单元,用于将所述第二目标字符串确定为目标文字段。The third processing unit is used to determine the second target character string as a target text field.9.一种电子设备,其特征在于,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-4任一所述的语音识别方法的步骤。9. An electronic device, characterized in that it includes a processor, a memory and a program or instructions stored on the memory and executable on the processor. The program or instructions are implemented when executed by the processor. The steps of the speech recognition method according to any one of claims 1-4.
CN202011425798.6A2020-12-082020-12-08Voice recognition method and device and electronic equipmentActiveCN112562684B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011425798.6ACN112562684B (en)2020-12-082020-12-08Voice recognition method and device and electronic equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011425798.6ACN112562684B (en)2020-12-082020-12-08Voice recognition method and device and electronic equipment

Publications (2)

Publication NumberPublication Date
CN112562684A CN112562684A (en)2021-03-26
CN112562684Btrue CN112562684B (en)2023-09-26

Family

ID=75059802

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011425798.6AActiveCN112562684B (en)2020-12-082020-12-08Voice recognition method and device and electronic equipment

Country Status (1)

CountryLink
CN (1)CN112562684B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113128221A (en)*2021-05-082021-07-16聚好看科技股份有限公司Method for storing speaking content, display device and server
CN113613068A (en)*2021-08-032021-11-05北京字跳网络技术有限公司Video processing method and device, electronic equipment and storage medium
CN113990299B (en)*2021-12-242022-05-13广州小鹏汽车科技有限公司Voice interaction method and device, server and readable storage medium thereof
CN114742076B (en)*2022-04-112024-12-03网易有道信息技术(北京)有限公司 Method for generating training data, training method, device and storage medium
CN118152521B (en)*2024-05-082024-08-09云南师范大学Retrieval enhancement generation method based on text rewriting

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105824552B (en)*2015-07-292019-05-17维沃移动通信有限公司A kind of recognition methods of text information and device
CN107608957A (en)*2017-09-062018-01-19百度在线网络技术(北京)有限公司Text modification method, apparatus and its equipment based on voice messaging
CN109471539B (en)*2018-10-232023-06-06维沃移动通信有限公司 A method for modifying input content and mobile terminal
CN109545183A (en)*2018-11-232019-03-29北京羽扇智信息科技有限公司Text handling method, device, electronic equipment and storage medium
CN110544477A (en)*2019-09-292019-12-06北京声智科技有限公司Voice recognition method, device, equipment and medium
CN111696557A (en)*2020-06-232020-09-22深圳壹账通智能科技有限公司Method, device and equipment for calibrating voice recognition result and storage medium

Also Published As

Publication numberPublication date
CN112562684A (en)2021-03-26

Similar Documents

PublicationPublication DateTitle
CN112562684B (en)Voice recognition method and device and electronic equipment
US12141532B2 (en)Device and method for machine reading comprehension question and answer
EP3648099B1 (en)Voice recognition method, device, apparatus, and storage medium
CN108304375B (en)Information identification method and equipment, storage medium and terminal thereof
EP3183728B1 (en)Orphaned utterance detection system and method
US10431214B2 (en)System and method of determining a domain and/or an action related to a natural language input
CN107247707B (en)Enterprise association relation information extraction method and device based on completion strategy
US9280595B2 (en)Application query conversion
CN109918676B (en) A method and device for detecting intent regular expressions, and terminal equipment
US8335787B2 (en)Topic word generation method and system
CN110297880B (en)Corpus product recommendation method, apparatus, device and storage medium
CN104866511B (en)A kind of method and apparatus of addition multimedia file
CN110188356B (en)Information processing method and device
CN106649253B (en)Auxiliary control method and system based on rear verifying
US10073828B2 (en)Updating language databases using crowd-sourced input
US20220365956A1 (en)Method and apparatus for generating patent summary information, and electronic device and medium
CN107679196A (en)A kind of multimedia recognition methods, electronic equipment and storage medium
JP5426292B2 (en) Opinion classification device and program
WO2025161504A1 (en)Word segmentation method and apparatus, and non-volatile storage medium and electronic device
WO2022105754A1 (en)Character input method and apparatus, and electronic device
US11823671B1 (en)Architecture for context-augmented word embedding
CN114970733B (en)Corpus generation method, corpus generation device, corpus generation system, storage medium and electronic equipment
CN109948155B (en)Multi-intention selection method and device and terminal equipment
CN111353293B (en)Statement material generation method and terminal equipment
CN112802454A (en)Method and device for recommending awakening words, terminal equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp