技术领域Technical Field
本发明涉及语音处理技术领域,尤其涉及一种语音识别方法、装置、电子设备和存储介质。The present invention relates to the field of speech processing technology, and in particular to a speech recognition method, device, electronic equipment and storage medium.
背景技术Background Art
随着人工智能技术的迅速发展,语音识别技术在智能家居、智能机器人等交互领域得到了广泛应用。由于语音识别使用用户的不断增多,用户之间发音习惯的差异性亦趋明显,导致通用的语音识别方法无法对所有用户均取得较好的识别效果。With the rapid development of artificial intelligence technology, speech recognition technology has been widely used in interactive fields such as smart homes and smart robots. As the number of users of speech recognition continues to increase, the differences in pronunciation habits between users have become more obvious, resulting in the inability of general speech recognition methods to achieve good recognition results for all users.
现有的语音识别方法,为了实现针对各用户的个性化语音识别,从而提高语音识别准确性,通常会基于某一用户大量的历史语音数据构建针对该用户的个性化语音识别系统。然而,该方式的优化效果有限,且部署维护难度大,实用性较差。In order to achieve personalized speech recognition for each user and improve speech recognition accuracy, existing speech recognition methods usually build a personalized speech recognition system for a user based on a large amount of historical speech data of a certain user. However, this method has limited optimization effect, is difficult to deploy and maintain, and has poor practicality.
发明内容Summary of the invention
本发明提供一种语音识别方法、装置、电子设备和存储介质,用以解决现有技术中语音识别优化效果欠佳、实用性不足的缺陷。The present invention provides a speech recognition method, device, electronic device and storage medium, which are used to solve the defects of poor speech recognition optimization effect and insufficient practicality in the prior art.
本发明提供一种语音识别方法,包括:The present invention provides a speech recognition method, comprising:
确定用户的待识别语音;Determine the user's speech to be recognized;
基于预设状态转移路径,对所述待识别语音进行语音识别解码,得到语音识别结果;所述预设状态转移路径是基于所述用户的地域信息和/或历史输入信息扩充得到的。Based on a preset state transition path, the speech to be recognized is subjected to speech recognition decoding to obtain a speech recognition result; the preset state transition path is obtained by expanding the user's regional information and/or historical input information.
根据本发明提供一种的语音识别方法,所述基于预设状态转移路径,对所述待识别语音进行语音识别解码,包括:According to a speech recognition method provided by the present invention, the speech recognition decoding of the speech to be recognized based on a preset state transition path includes:
确定所述待识别语音对应的音素序列;Determine the phoneme sequence corresponding to the speech to be recognized;
基于上一解码位置对应的预设状态转移路径,对当前解码位置处的音素序列进行解码,得到当前解码位置处的解码结果。Based on the preset state transition path corresponding to the previous decoding position, the phoneme sequence at the current decoding position is decoded to obtain a decoding result at the current decoding position.
根据本发明提供的一种语音识别方法,所述预设状态转移路径是基于如下步骤扩充的:According to a speech recognition method provided by the present invention, the preset state transition path is expanded based on the following steps:
确定与所述用户的地域信息相关联的地域名词;Determining a regional term associated with the regional information of the user;
基于各地域名词扩充解码网络中地名对应的预设状态转移路径。Based on the domain name vocabulary of each place, the preset state transition path corresponding to the place name in the decoding network is expanded.
根据本发明提供的一种语音识别方法,所述预设状态转移路径是基于如下步骤扩充的:According to a speech recognition method provided by the present invention, the preset state transition path is expanded based on the following steps:
基于所述用户的历史输入信息,确定当前解码位置处的音素序列对应的相似热词;Determine, based on the historical input information of the user, a similar hot word corresponding to the phoneme sequence at the current decoding position;
基于所述相似热词,扩展上一解码位置对应的预设状态转移路径。Based on the similar hot words, the preset state transition path corresponding to the previous decoding position is expanded.
根据本发明提供的一种语音识别方法,所述基于所述用户的历史输入信息,确定当前解码位置处的音素序列对应的相似热词,包括:According to a speech recognition method provided by the present invention, the determining, based on the historical input information of the user, a similar hot word corresponding to a phoneme sequence at a current decoding position comprises:
基于当前解码位置处的音素序列以及预先构建的发音相似矩阵,确定当前解码位置处的音素序列对应的相似音素序列;Determine a similar phoneme sequence corresponding to the phoneme sequence at the current decoding position based on the phoneme sequence at the current decoding position and a pre-constructed pronunciation similarity matrix;
基于所述用户的各个热词,确定与当前解码位置处的音素序列和/或所述相似音素序列对应的相似热词;所述热词是基于所述历史输入信息确定的。Based on each hot word of the user, similar hot words corresponding to the phoneme sequence at the current decoding position and/or the similar phoneme sequence are determined; the hot words are determined based on the historical input information.
根据本发明提供的一种语音识别方法,所述基于预设状态转移路径,对所述待识别语音进行语音识别解码,包括:According to a speech recognition method provided by the present invention, the speech recognition decoding of the speech to be recognized based on a preset state transition path includes:
基于语言模型,结合所述预设状态转移路径,对所述待识别语音进行语音识别解码;Based on the language model and in combination with the preset state transition path, performing speech recognition decoding on the speech to be recognized;
其中,所述语言模型与所述用户当前使用的设备类型对应;任一设备类型对应的语言模型是基于所述任一设备类型的应用场景文本训练得到的。The language model corresponds to the type of device currently used by the user; the language model corresponding to any device type is trained based on the application scenario text of any device type.
根据本发明提供的一种语音识别方法,所述基于预设状态转移路径,对所述待识别语音进行语音识别解码,包括:According to a speech recognition method provided by the present invention, the speech recognition decoding of the speech to be recognized based on a preset state transition path includes:
确定所述用户的声纹特征;Determining a voiceprint feature of the user;
基于所述预设状态转移路径,结合所述待识别语音的音频特征和所述用户的声纹特征,对所述待识别语音进行语音识别解码。Based on the preset state transition path, combined with the audio features of the speech to be recognized and the voiceprint features of the user, speech recognition decoding is performed on the speech to be recognized.
本发明还提供一种语音识别装置,包括:The present invention also provides a speech recognition device, comprising:
语音数据确定单元,用于确定用户的待识别语音;A voice data determination unit, used to determine the user's voice to be recognized;
语音识别解码单元,用于基于预设状态转移路径,对所述待识别语音进行语音识别解码,得到语音识别结果;所述预设状态转移路径是基于所述用户的地域信息和/或历史输入信息扩充得到的。The speech recognition decoding unit is used to perform speech recognition decoding on the speech to be recognized based on a preset state transition path to obtain a speech recognition result; the preset state transition path is expanded based on the user's regional information and/or historical input information.
本发明还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述语音识别方法的步骤。The present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps of any of the above-mentioned speech recognition methods are implemented.
本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述语音识别方法的步骤。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of any of the above-mentioned speech recognition methods are implemented.
本发明提供的语音识别方法、装置、电子设备和存储介质,基于当前用户的地域信息和/或历史输入信息动态扩充解码网络中的预设状态转移路径,从而基于扩充后的预设状态转移路径对该用户的待识别语音进行语音识别解码,利用该用户的个性化信息,提升了个性化语音识别的准确性,且动态扩充预设状态转移路径的方式,增强了实用性。The speech recognition method, device, electronic device and storage medium provided by the present invention dynamically expand the preset state transition path in the decoding network based on the current user's regional information and/or historical input information, thereby performing speech recognition decoding on the user's to-be-recognized speech based on the expanded preset state transition path, utilizing the user's personalized information to improve the accuracy of personalized speech recognition, and the way of dynamically expanding the preset state transition path enhances practicality.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.
图1为本发明提供的语音识别方法的流程示意图;FIG1 is a flow chart of a speech recognition method provided by the present invention;
图2为本发明提供的路径扩充方法的流程示意图之一;FIG2 is a schematic diagram of a flow chart of a path expansion method provided by the present invention;
图3为本发明提供的地域信息扩充路径的示意图;FIG3 is a schematic diagram of a regional information expansion path provided by the present invention;
图4为本发明提供的路径扩充方法的流程示意图之二;FIG4 is a second flow chart of the path expansion method provided by the present invention;
图5为本发明提供的相似热词扩展路径的示意图;FIG5 is a schematic diagram of a similar hot word expansion path provided by the present invention;
图6为本发明提供的相似热词确定方法的流程示意图;FIG6 is a schematic diagram of a process flow of a method for determining similar hot words provided by the present invention;
图7为本发明提供的语言模型选取的示意图;FIG7 is a schematic diagram of language model selection provided by the present invention;
图8为本发明提供的语音识别系统的结构示意图;FIG8 is a schematic diagram of the structure of a speech recognition system provided by the present invention;
图9为本发明提供的语音识别装置的结构示意图;FIG9 is a schematic diagram of the structure of a speech recognition device provided by the present invention;
图10为本发明提供的电子设备的结构示意图。FIG. 10 is a schematic diagram of the structure of an electronic device provided by the present invention.
具体实施方式DETAILED DESCRIPTION
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
随着人工智能产业的迅猛发展,语音识别技术在智能家居、智能机器人等交互领域得到了广泛应用。近年来,很多有关语音识别的开发技术在不断创新,语音作为最方便、快捷的交互方式之一,其识别俨然已成为人机交互的重要环节。随着语音使用用户的不断增多,用户之间发音习惯的差异性变得越来越明显,在此情况下,传统的采用统一通用语音识别系统进行语音识别的方法,由于通用语音识别系统需要覆盖更多的用户和更多的场景,因此无法对所有用户都取得很好的识别准确率。With the rapid development of the artificial intelligence industry, speech recognition technology has been widely used in interactive fields such as smart homes and smart robots. In recent years, many speech recognition technologies have been continuously innovated. As one of the most convenient and quickest ways of interaction, speech recognition has become an important part of human-computer interaction. With the increasing number of voice users, the differences in pronunciation habits between users have become more and more obvious. In this case, the traditional method of using a unified general speech recognition system for speech recognition cannot achieve a good recognition accuracy for all users because the general speech recognition system needs to cover more users and more scenarios.
因此,如何利用每个用户的个性化信息,增强语音识别系统的针对性,从而提升每个用户语音识别准确率,成为了目前语音识别领域的重要研究方向。现有的个性化语音识别方法通常是基于大量的用户历史语音数据,构建针对各个用户的个性化语音识别系统。然而,这种方法对于新用户而言,由于缺乏该用户的历史数据,难以构建可靠的语音识别系统,导致该方法的个性化增强效果有限;而对于老用户而言,由于各用户的历史语音数量差异大且需要每个用户单独定制存储一套识别模型(例如传统基于隐马尔可夫模型识别系统中的声学模型,或是Encode-Decode模型),部署维护难度大,因此实用性较差。Therefore, how to use the personalized information of each user to enhance the pertinence of the speech recognition system, thereby improving the accuracy of each user's speech recognition, has become an important research direction in the current speech recognition field. Existing personalized speech recognition methods are usually based on a large amount of user historical speech data to build a personalized speech recognition system for each user. However, for new users, this method is difficult to build a reliable speech recognition system due to the lack of historical data of the user, resulting in limited personalized enhancement effect of this method; and for old users, due to the large difference in the number of historical voices of each user and the need for each user to customize and store a set of recognition models (such as the acoustic model in the traditional hidden Markov model recognition system, or the Encode-Decode model), it is difficult to deploy and maintain, so it is less practical.
对此,本发明实施例提供了一种语音识别方法,可以有效进行语音识别的个性化增强,提升语音识别的准确性。图1为本发明实施例提供的语音识别方法的流程示意图,如图1所示,该方法包括:In this regard, an embodiment of the present invention provides a speech recognition method, which can effectively perform personalized enhancement of speech recognition and improve the accuracy of speech recognition. FIG1 is a flow chart of the speech recognition method provided by an embodiment of the present invention. As shown in FIG1 , the method includes:
步骤110,确定用户的待识别语音;Step 110, determining the user's speech to be recognized;
步骤120,基于预设状态转移路径,对待识别语音进行语音识别解码,得到语音识别结果;预设状态转移路径是基于用户的地域信息和/或历史输入信息扩充得到的。Step 120, based on a preset state transition path, perform speech recognition decoding on the speech to be recognized to obtain a speech recognition result; the preset state transition path is obtained by expanding based on the user's regional information and/or historical input information.
具体地,获取用户的待识别语音。其中,待识别语音可以是用户通过电子设备实时录制的语音数据,也可以是已存储或接收到的语音数据,本发明实施例对此不作具体限定。Specifically, the user's voice to be recognized is obtained. The voice to be recognized may be voice data recorded in real time by the user through an electronic device, or may be stored or received voice data, which is not specifically limited in the embodiment of the present invention.
随后,利用预设状态转移路径对该待识别语音进行语音识别解码,得到语音识别结果。其中,预设状态转移路径可以为解码网络中任意两个相邻节点之间的路径。此处,解码网络可以作为一个搜索空间,从中寻找到一条从初始节点到终止节点的最优路径,实现待识别语音的解码。具体而言,可以利用声学模型将待识别语音中各语音帧转换为状态序列或音素序列后,基于解码网络将状态序列或音素序列映射到词序列;也可以结合端到端的语言识别模型,例如Encode-Decode模型,将待识别语音转换为字序列后,再基于解码网络将字序列映射到词序列。此外,解码网络可以基于声学模型、发音词典、语言模型等知识源构建得到,例如可以基于加权有限状态机(weighted finite-state transducers,WFST)的建立方式建立,本发明实施例对此不作具体限定。Subsequently, the speech to be recognized is decoded for speech recognition using a preset state transition path to obtain a speech recognition result. The preset state transition path may be a path between any two adjacent nodes in the decoding network. Here, the decoding network may be used as a search space to find an optimal path from the initial node to the terminal node to achieve decoding of the speech to be recognized. Specifically, the acoustic model may be used to convert each speech frame in the speech to be recognized into a state sequence or a phoneme sequence, and then the state sequence or the phoneme sequence may be mapped to a word sequence based on the decoding network; or an end-to-end language recognition model may be combined, such as an Encode-Decode model, to convert the speech to be recognized into a word sequence, and then the word sequence may be mapped to a word sequence based on the decoding network. In addition, the decoding network may be constructed based on knowledge sources such as an acoustic model, a pronunciation dictionary, and a language model, and may be established, for example, based on the establishment method of a weighted finite-state transducer (WFST), which is not specifically limited in the embodiment of the present invention.
在对待识别语音进行语音识别解码时,会根据待识别语音的状态序列、音素序列或字序列,从初始节点开始逐步搜寻解码网络中的预设状态转移路径并计算得分,从而寻找到最优路径。因此,预设状态转移路径的构建是个性化语音识别过程中的重要一环,预设状态转移路径与当前用户越贴合,解码得到的语音识别结果的准确性越高。When performing speech recognition decoding on the speech to be recognized, the preset state transition path in the decoding network will be searched step by step from the initial node according to the state sequence, phoneme sequence or word sequence of the speech to be recognized, and the score will be calculated to find the optimal path. Therefore, the construction of the preset state transition path is an important part of the personalized speech recognition process. The closer the preset state transition path fits the current user, the higher the accuracy of the speech recognition result obtained by decoding.
对此,本发明实施例在构建解码网络时,除了利用声学模型、发音词典、语言模型等知识源之外,还会根据用户的地域信息和/或历史输入信息,对解码网络中的预设状态转移路径进行扩充。其中,可以在已有的解码网络基础上进行针对当前用户的路径扩充,使得本发明实施例提供的方法可以仅需针对当前用户存储较少的个性化信息,并对已有的解码网络进行少量改动,即可实现个性化语音识别,增强了语音识别方法的实用性。例如,可以根据用户的地域信息和/或历史输入信息,在解码网络中的对应节点之间新增若干条新的路径,并基于语言模型计算上述新增路径的得分。In this regard, when constructing a decoding network, the embodiments of the present invention, in addition to using knowledge sources such as acoustic models, pronunciation dictionaries, and language models, will also expand the preset state transition paths in the decoding network based on the user's geographical information and/or historical input information. Among them, the path expansion for the current user can be performed on the basis of the existing decoding network, so that the method provided by the embodiment of the present invention can only store less personalized information for the current user and make a small change to the existing decoding network to achieve personalized speech recognition, thereby enhancing the practicality of the speech recognition method. For example, several new paths can be added between corresponding nodes in the decoding network based on the user's geographical information and/or historical input information, and the scores of the above-mentioned new paths can be calculated based on the language model.
其中,用户的地域信息可以提供用户当前所在的位置信息,根据该用户的地域信息,可以推测该用户下一步可能前往的目的地,并将其扩充至解码网络中。当用户使用语音进行导航时,语音识别解码时可以搜索到扩充的预设状态转移路径,即使用户的发音不标准或导航目的地的名称较难识别时,也能准确识别出用户输入的语音内容。除此之外,用户的历史输入信息也可以额外提供该用户的表达习惯,从而帮助推导用户当前可能表达的词语。因此,还可以对用户的历史输入信息进行提炼并扩充至解码网络中,以提高语音识别的准确性。Among them, the user's regional information can provide the user's current location information. Based on the user's regional information, the destination that the user may go to next can be inferred and expanded into the decoding network. When the user uses voice for navigation, the expanded preset state transfer path can be searched during voice recognition decoding. Even if the user's pronunciation is not standard or the name of the navigation destination is difficult to recognize, the user's input voice content can be accurately recognized. In addition, the user's historical input information can also provide additional expression habits of the user, thereby helping to deduce the words that the user may currently express. Therefore, the user's historical input information can also be refined and expanded into the decoding network to improve the accuracy of voice recognition.
本发明实施例提供的方法,基于当前用户的地域信息和/或历史输入信息动态扩充解码网络中的预设状态转移路径,从而基于扩充后的预设状态转移路径对该用户的待识别语音进行语音识别解码,利用该用户的个性化信息,提升了个性化语音识别的准确性,且动态扩充预设状态转移路径的方式,增强了该方法的实用性。The method provided by the embodiment of the present invention dynamically expands the preset state transition path in the decoding network based on the current user's regional information and/or historical input information, thereby performing speech recognition decoding on the user's to-be-recognized speech based on the expanded preset state transition path, and utilizing the user's personalized information to improve the accuracy of personalized speech recognition. The way of dynamically expanding the preset state transition path enhances the practicality of the method.
基于上述实施例,步骤120包括:Based on the above embodiment, step 120 includes:
确定待识别语音对应的音素序列;Determine the phoneme sequence corresponding to the speech to be recognized;
基于上一解码位置对应的预设状态转移路径,对当前解码位置处的音素序列进行解码,得到当前解码位置处的解码结果。Based on the preset state transition path corresponding to the previous decoding position, the phoneme sequence at the current decoding position is decoded to obtain a decoding result at the current decoding position.
具体地,对待识别语音的各个语音帧进行识别,得到待识别语音对应的音素序列。此处,可以对待识别语音的各个语音帧进行声学特征提取,得到各个语音帧的声学特征,再基于声学模型对各个语音帧的声学特征进行识别,确定各个语音帧所属的状态,进而将状态组合成音素,得到待识别语音的音素序列。Specifically, each speech frame of the speech to be recognized is recognized to obtain a phoneme sequence corresponding to the speech to be recognized. Here, acoustic features can be extracted from each speech frame of the speech to be recognized to obtain the acoustic features of each speech frame, and then the acoustic features of each speech frame are recognized based on the acoustic model to determine the state to which each speech frame belongs, and then the states are combined into phonemes to obtain the phoneme sequence of the speech to be recognized.
随后,对待识别语音的音素序列进行解码。解码过程中,会从解码网络的初始节点开始,根据待识别语音的音素序列,找寻合适的预设状态转移路径,从而到达下一节点,并重复上述步骤,直至到达终止节点。假设当前解码至待识别语音的当前解码位置,需要对当前解码位置的音素序列进行解码。对应地,在解码网络中,当前搜索至节点t,需要找到一条合适路径以行进至下一节点。此时,可以基于上一解码位置对应的预设状态转移路径,对当前解码位置处的音素序列进行解码,得到当前解码位置处的解码结果。其中,上一解码位置对应的预设状态转移路径为解码网络中当前所在节点与其下一节点之间的预设状态转移路径,即从节点t出发的预设状态转移路径。基于上一解码位置对应的预设状态转移路径的得分,对当前解码位置处的音素序列解码,选定其中一条预设状态转移路径,从而确定当前解码位置处的解码结果。Subsequently, the phoneme sequence of the speech to be recognized is decoded. During the decoding process, starting from the initial node of the decoding network, according to the phoneme sequence of the speech to be recognized, a suitable preset state transfer path is found to reach the next node, and the above steps are repeated until the termination node is reached. Assuming that the current decoding position of the speech to be recognized is currently decoded, the phoneme sequence at the current decoding position needs to be decoded. Correspondingly, in the decoding network, the current search is to node t, and a suitable path needs to be found to move to the next node. At this time, the phoneme sequence at the current decoding position can be decoded based on the preset state transfer path corresponding to the previous decoding position to obtain the decoding result at the current decoding position. Among them, the preset state transfer path corresponding to the previous decoding position is the preset state transfer path between the current node and the next node in the decoding network, that is, the preset state transfer path starting from node t. Based on the score of the preset state transfer path corresponding to the previous decoding position, the phoneme sequence at the current decoding position is decoded, and one of the preset state transfer paths is selected to determine the decoding result at the current decoding position.
基于上述任一实施例,图2为本发明实施例提供的路径扩充方法的流程示意图之一,如图2所示,该方法包括:Based on any of the above embodiments, FIG. 2 is a flow chart of a path expansion method provided by an embodiment of the present invention. As shown in FIG. 2 , the method includes:
步骤210,确定与用户的地域信息相关联的地域名词;Step 210, determining a regional term associated with the user's regional information;
步骤220,基于各地域名词扩充解码网络中地名对应的预设状态转移路径。Step 220, based on the geographical words of each place, a preset state transition path corresponding to the geographical name in the decoding network is expanded.
具体地,根据该用户的地域信息,确定与该地域信息相关联的地域名词。其中,可以以该地域信息为中心,获取周边活动范围内的其他地点的地域名词。例如,可以获取周边活动范围内的热门地点的名称,或是根据该用户的历史定位信息和/或历史导航数据,获取周边活动范围内用户曾前往的地点名称,作为相关联的地域名词。Specifically, based on the user's regional information, the regional terms associated with the regional information are determined. The regional information can be used as the center to obtain the regional terms of other places in the surrounding activity range. For example, the names of popular places in the surrounding activity range can be obtained, or the names of places that the user has visited in the surrounding activity range can be obtained based on the user's historical positioning information and/or historical navigation data as the associated regional terms.
基于各个地域名词,可以对解码网络中地名对应的预设状态转移路径进行扩充。图3为本发明实施例提供的地域信息扩充路径的示意图,如图3上部分所示,可以预先构建具备通用地名($地点)路径的基础解码网络。其中,除了通用地名路径以外,还可以扩充若干热门地名路径。在获取到与用户的地域信息相关联的地域名词(例如美亚光电、先研院)后,如图3中间部分所示,可以先构建各地域名词对应的预设状态转移路径,其中各地域名词对应的预设状态转移路径所连接的节点相同。随后,如图3下部分所示,将各地域名词对应的预设状态转移路径,扩充至基础解码网络中通用地名对应的预设状态转移路径处。Based on each geographical term, the preset state transition path corresponding to the place name in the decoding network can be expanded. Figure 3 is a schematic diagram of the geographical information expansion path provided by an embodiment of the present invention. As shown in the upper part of Figure 3, a basic decoding network with a universal place name ($place) path can be pre-constructed. Among them, in addition to the universal place name path, several popular place name paths can also be expanded. After obtaining the geographical terms associated with the user's geographical information (such as Meiya Optoelectronics, Xianyan Institute), as shown in the middle part of Figure 3, the preset state transition paths corresponding to the geographical terms can be constructed first, wherein the nodes connected by the preset state transition paths corresponding to the geographical terms are the same. Subsequently, as shown in the lower part of Figure 3, the preset state transition paths corresponding to the geographical terms are expanded to the preset state transition paths corresponding to the universal place names in the basic decoding network.
本发明实施例提供的方法,通过与用户的地域信息相关联的地域名词扩充解码网络中地名对应的预设状态转移路径,对预设状态转移路径进行了个性化扩充,有助于提高语音识别的准确性。The method provided by the embodiment of the present invention expands the preset state transition path corresponding to the place name in the decoding network through the geographical terms associated with the user's geographical information, and personalizes the expansion of the preset state transition path, which helps to improve the accuracy of speech recognition.
基于上述任一实施例,图4为本发明实施例提供的路径扩充方法的流程示意图之二,如图4所示,该方法包括:Based on any of the above embodiments, FIG. 4 is a second flow chart of a path expansion method provided by an embodiment of the present invention. As shown in FIG. 4 , the method includes:
步骤410,基于用户的历史输入信息,确定当前解码位置处的音素序列对应的相似热词;Step 410, based on the historical input information of the user, determine the similar hot words corresponding to the phoneme sequence at the current decoding position;
步骤420,基于相似热词,扩展上一解码位置对应的预设状态转移路径。Step 420: Based on similar hot words, expand the preset state transition path corresponding to the previous decoding position.
具体地,该用户的历史输入信息可以提供该用户的语言表达习惯,例如该用户常说的单词或词组等,当当前解码位置处的音素序列与该历史输入信息中的部分单词或词组的发音相同或相似时,表明该用户可能表达的是该单词或词组,因而可以将其扩充至解码网络中。因此,在解码过程中,可以基于用户的历史输入信息,确定当前解码位置处的音素序列对应的相似热词。其中,相似热词为该用户的历史输入信息中与当前解码位置处的音素序列发音相同或相似的热词。Specifically, the user's historical input information can provide the user's language expression habits, such as the words or phrases that the user often says. When the phoneme sequence at the current decoding position is the same or similar to the pronunciation of some words or phrases in the historical input information, it indicates that the user may be expressing the word or phrase, so it can be expanded into the decoding network. Therefore, in the decoding process, the similar hot words corresponding to the phoneme sequence at the current decoding position can be determined based on the user's historical input information. Among them, the similar hot words are the hot words in the user's historical input information that are pronounced the same or similar to the phoneme sequence at the current decoding position.
基于获取的相似热词,可以生成新的预设状态转移路径,并扩充至上一解码位置对应的预设状态转移路径的相应位置。图5为本发明实施例提供的相似热词扩展路径的示意图,如图5所示,当前解码位置处的音素序列对应的相似热词为“十里桃花”,生成该相似热词对应的新路径,并在上一解码位置对应的预设状态转移路径处插入该新路径,实现路径的动态扩充。Based on the acquired similar hot words, a new preset state transition path can be generated and expanded to the corresponding position of the preset state transition path corresponding to the previous decoding position. Figure 5 is a schematic diagram of the similar hot word expansion path provided by an embodiment of the present invention. As shown in Figure 5, the similar hot word corresponding to the phoneme sequence at the current decoding position is "十里桃花", and a new path corresponding to the similar hot word is generated, and the new path is inserted at the preset state transition path corresponding to the previous decoding position to achieve dynamic expansion of the path.
本发明实施例提供的方法,通过在解码过程中,利用当前解码位置处的音素序列对应的相似热词扩展上一解码位置对应的预设状态转移路径,对预设状态转移路径进行了个性化地动态扩充,有助于提高语音识别的准确性。The method provided by the embodiment of the present invention, during the decoding process, uses similar hot words corresponding to the phoneme sequence at the current decoding position to expand the preset state transition path corresponding to the previous decoding position, thereby performing personalized dynamic expansion of the preset state transition path, which helps to improve the accuracy of speech recognition.
基于上述任一实施例,图6为本发明实施例提供的相似热词确定方法的流程示意图,如图6所示,步骤410包括:Based on any of the above embodiments, FIG. 6 is a flow chart of a method for determining similar hot words provided by an embodiment of the present invention. As shown in FIG. 6 , step 410 includes:
步骤411,基于当前解码位置处的音素序列以及预先构建的发音相似矩阵,确定当前解码位置处的音素序列对应的相似音素序列;Step 411, based on the phoneme sequence at the current decoding position and the pre-constructed pronunciation similarity matrix, determine a similar phoneme sequence corresponding to the phoneme sequence at the current decoding position;
步骤412,基于用户的各个热词,确定与当前解码位置处的音素序列和/或相似音素序列对应的相似热词;热词是基于历史输入信息确定的。Step 412, based on each hot word of the user, determine similar hot words corresponding to the phoneme sequence and/or similar phoneme sequence at the current decoding position; the hot words are determined based on historical input information.
具体地,可以预先根据发音词典,构建发音相似矩阵。如图5所示,发音相似矩阵中可以存储发音相似的音素序列。根据当前解码位置处的音素序列,在发音相似矩阵中进行查找,找到与当前解码位置处的音素序列发音相似的相似音素序列。Specifically, a pronunciation similarity matrix can be constructed in advance based on a pronunciation dictionary. As shown in Figure 5, a pronunciation similarity matrix can store phoneme sequences with similar pronunciations. According to the phoneme sequence at the current decoding position, a search is performed in the pronunciation similarity matrix to find a similar phoneme sequence with a similar pronunciation to the phoneme sequence at the current decoding position.
基于该用户的各个热词,从中确定发音与当前解码位置处的音素序列和/或相似音素序列对应的热词,作为相似热词。其中,各个热词是根据该用户的历史输入信息确定的。例如,可以获取该用户历史手动输入的文字信息,根据用户输入的频率,从中筛选出频率较高的词语作为热词,构建该用户的热词列表。Based on each hot word of the user, a hot word whose pronunciation corresponds to the phoneme sequence and/or similar phoneme sequence at the current decoding position is determined as a similar hot word. Each hot word is determined based on the historical input information of the user. For example, the text information manually input by the user in the past can be obtained, and according to the frequency of the user input, words with higher frequency are selected as hot words to construct a hot word list of the user.
基于上述任一实施例,步骤120包括:Based on any of the above embodiments, step 120 includes:
基于语言模型,结合预设状态转移路径,对待识别语音进行语音识别解码;Based on the language model and the preset state transition path, the speech recognition decoding is performed on the speech to be recognized;
其中,语言模型与用户当前使用的设备类型对应;任一设备类型对应的语言模型是基于该设备类型的应用场景文本训练得到的。Among them, the language model corresponds to the device type currently used by the user; the language model corresponding to any device type is trained based on the application scenario text of the device type.
具体地,随着各类电子设备的普及,用户通常会在不同应用场景使用不同的电子设备。例如,对于电视等大屏电子设备,用户通常会使用语音交互进行电视台控制和网络视频点播等;对于智能音箱设备,用户更多会使用语音交互进行天气查询和歌曲点播等;而对于车载电脑设备,用户更多会使用语音交互进行地址导航等。Specifically, with the popularity of various electronic devices, users usually use different electronic devices in different application scenarios. For example, for large-screen electronic devices such as TVs, users usually use voice interaction to control TV stations and play online videos on demand, etc. For smart speaker devices, users are more likely to use voice interaction for weather inquiries and song requests, etc., and for in-vehicle computer devices, users are more likely to use voice interaction for address navigation, etc.
因此,可以预先确定不同设备类型的应用场景,并收集各应用场景下的应用场景文本,再基于各设备类型的应用场景文本训练该设备类型对应的语言模型,以备不同类型的设备在进行语音识别时使用。其中,当电子设备使用的语音识别系统为基于隐马尔可夫模型(Hidden Markov Model,HMM)的识别系统时,上述语言模型即指传统的语言模型,例如基于n-gram的语言模型,其可以直接替换原始语音识别系统中的语言模型;当电子设备使用的语音识别系统为基于Encode-Decode的识别系统时,上述语言模型可以是神经网络语言模型,其识别结果可以通过各种融合方法与原语音识别系统的识别结果进行融合。Therefore, the application scenarios of different types of devices can be determined in advance, and the application scenario texts under each application scenario can be collected. Then, based on the application scenario texts of each type of device, the language model corresponding to the device type can be trained to prepare for use by different types of devices when performing speech recognition. Among them, when the speech recognition system used by the electronic device is a recognition system based on the Hidden Markov Model (HMM), the above-mentioned language model refers to a traditional language model, such as a language model based on n-gram, which can directly replace the language model in the original speech recognition system; when the speech recognition system used by the electronic device is a recognition system based on Encode-Decode, the above-mentioned language model can be a neural network language model, and its recognition results can be fused with the recognition results of the original speech recognition system through various fusion methods.
确定产生待识别语音的设备信息,例如手机、车载电脑、电视或智能音箱等,从而确定该设备对应的语言模型。基于该语言模型以及预设状态转移路径,可以对待识别语音进行语音识别解码。图7为本发明实施例提供的语言模型选取的示意图,如图7所示,根据输入语音的设备ID,从多种设备类型对应的语言模型中动态选取当前设备对应的语言模型。得到的语言模型可以和声学模型组合,也可以与Encode-Decode模型进行组合,以实现语音识别,得到输入语音的识别结果。Determine the device information that generates the speech to be recognized, such as a mobile phone, a car computer, a TV or a smart speaker, so as to determine the language model corresponding to the device. Based on the language model and the preset state transition path, the speech to be recognized can be decoded for speech recognition. Figure 7 is a schematic diagram of the language model selection provided by an embodiment of the present invention. As shown in Figure 7, according to the device ID of the input speech, the language model corresponding to the current device is dynamically selected from the language models corresponding to multiple device types. The obtained language model can be combined with the acoustic model or the Encode-Decode model to realize speech recognition and obtain the recognition result of the input speech.
本发明实施例提供的方法,通过动态选择与用户当前使用的设备类型对应的语言模型,结合预设状态转移路径,对待识别语音进行语音识别解码,进一步提高了语音识别的准确性。The method provided by the embodiment of the present invention further improves the accuracy of speech recognition by dynamically selecting a language model corresponding to the type of device currently used by the user and combining it with a preset state transition path to perform speech recognition decoding on the speech to be recognized.
基于上述任一实施例,步骤120包括:Based on any of the above embodiments, step 120 includes:
确定用户的声纹特征;Determine the user's voiceprint characteristics;
基于预设状态转移路径,结合待识别语音的音频特征和用户的声纹特征,对待识别语音进行语音识别解码。Based on the preset state transfer path, combined with the audio features of the speech to be recognized and the voiceprint features of the user, speech recognition decoding is performed on the speech to be recognized.
具体地,由于不同用户的口音和说话风格不同,在进行语音识别时,可以根据当前用户的发音特性进行自适应语音识别,以适应不同用户的语音数据,从而提高语音识别的准确率。因此,可以获取当前用户的声纹特征。其中,声纹特征可以表达该用户的发音特性和发音习惯。此处,可以利用现有的i-vector提取模型,例如通用背景模型UBM,提取当前用户的身份认证矢量,作为其声纹特征。此方法提取得到的声纹特征中包含有说话人信息和信道信息等,具有较高的稳定性。此外,还可以利用深度学习框架下的x-vector提取模型提取该用户的声纹特征,本发明实施例对此不作具体限定。随后,基于预设状态转移路径,结合待识别语音数据的音频特征和用户的声纹特征,对待识别语音进行语音识别解码。其中,音频特征中包含该语音数据的语义信息,再结合该用户的声纹特征中包含的发音特性,可以提升针对该用户的语音识别准确性。Specifically, due to the different accents and speaking styles of different users, when performing speech recognition, adaptive speech recognition can be performed according to the pronunciation characteristics of the current user to adapt to the speech data of different users, thereby improving the accuracy of speech recognition. Therefore, the voiceprint features of the current user can be obtained. Among them, the voiceprint features can express the pronunciation characteristics and pronunciation habits of the user. Here, the existing i-vector extraction model, such as the universal background model UBM, can be used to extract the identity authentication vector of the current user as its voiceprint feature. The voiceprint features extracted by this method contain speaker information and channel information, etc., and have high stability. In addition, the x-vector extraction model under the deep learning framework can also be used to extract the voiceprint features of the user, which is not specifically limited in the embodiment of the present invention. Subsequently, based on the preset state transition path, combined with the audio features of the speech data to be recognized and the voiceprint features of the user, the speech to be recognized is decoded for speech recognition. Among them, the audio features contain the semantic information of the speech data, and combined with the pronunciation characteristics contained in the voiceprint features of the user, the accuracy of speech recognition for the user can be improved.
本发明实施例提供的方法,通过确定用户的声纹特征,再结合待识别语音的音频特征和用户的声纹特征,对待识别语音进行语音识别解码,进一步提高了语音识别的准确性。The method provided by the embodiment of the present invention determines the voiceprint features of the user, and then combines the audio features of the speech to be recognized and the voiceprint features of the user to perform speech recognition decoding on the speech to be recognized, thereby further improving the accuracy of speech recognition.
基于上述任一实施例,图8为本发明实施例提供的语音识别系统的结构示意图,如图8所示,该系统可以基于现有的语音识别模型建立,并利用多维度的个性化识别增强模块进行语音识别增强。其中,个性化识别增强模块包括四个:动态路径扩展模块、动态热词激励模块、动态语音模型选取模块以及动态声纹增强模块。Based on any of the above embodiments, FIG8 is a schematic diagram of the structure of the speech recognition system provided by the embodiment of the present invention. As shown in FIG8, the system can be established based on the existing speech recognition model, and speech recognition enhancement is performed using a multi-dimensional personalized recognition enhancement module. Among them, the personalized recognition enhancement module includes four: a dynamic path expansion module, a dynamic hot word excitation module, a dynamic speech model selection module, and a dynamic voiceprint enhancement module.
其中,动态路径扩展模块用于基于用户的地域信息,对解码网络中地名对应的预设状态转移路径进行扩展,具体扩展方式与上述实施例中相同,在此不再赘述。The dynamic path extension module is used to extend the preset state transfer path corresponding to the place name in the decoding network based on the user's regional information. The specific extension method is the same as that in the above embodiment and will not be repeated here.
动态热词激励模块用于基于用户的历史输入信息,构建该用户的热词库,并基于热词库进行热词激励。若该系统是由基于HMM的识别模型建立得到,则动态热词激励模块可以用于在实际解码过程中,从热词库中选择与当前解码位置处的音素序列对应的相似热词,并利用上述相似热词扩展解码网络中上一解码位置对应的预设状态转移路径,此处的具体扩展方式与上述实施例中相同,在此不再赘述。若该系统是由基于Encode-Decode的识别模型建立得到,则动态热词激励模块可以基于热词编码器(Bias Encoder)将每个热词表示为固定维的热词编码,然后利用上一解码时刻解码器(Decoder)输出的状态信息通过注意力机制选出与输入语音相吻合的热词编码作为输出,与输入语音的音频特征一起送入解码器进行解码,得到识别结果。The dynamic hot word excitation module is used to build a hot word library of the user based on the user's historical input information, and to perform hot word excitation based on the hot word library. If the system is established by a recognition model based on HMM, the dynamic hot word excitation module can be used to select similar hot words corresponding to the phoneme sequence at the current decoding position from the hot word library during the actual decoding process, and use the above-mentioned similar hot words to expand the preset state transfer path corresponding to the previous decoding position in the decoding network. The specific expansion method here is the same as in the above-mentioned embodiment and will not be repeated here. If the system is established by a recognition model based on Encode-Decode, the dynamic hot word excitation module can represent each hot word as a fixed-dimensional hot word encoding based on a hot word encoder (Bias Encoder), and then use the state information output by the decoder (Decoder) at the last decoding moment to select the hot word encoding that matches the input speech through the attention mechanism as the output, and send it to the decoder together with the audio features of the input speech for decoding to obtain the recognition result.
动态语音模型选取模块用于基于用户的设备信息,动态选择与用户当前使用的设备类型对应的语言模型,以对待识别语音进行语音识别解码。The dynamic speech model selection module is used to dynamically select a language model corresponding to the device type currently used by the user based on the user's device information, so as to perform speech recognition decoding on the speech to be recognized.
动态声纹增强模块用于确定用户的声纹特征,再结合待识别语音的音频特征和用户的声纹特征,对待识别语音进行语音识别解码。The dynamic voiceprint enhancement module is used to determine the user's voiceprint features, and then combine the audio features of the speech to be recognized and the user's voiceprint features to perform speech recognition decoding on the speech to be recognized.
需要说明的是,该语音识别系统中各个性化识别增强模块可以单独使用,也可以多个模块联合使用,以提升语音识别的准确性。It should be noted that each personalized recognition enhancement module in the speech recognition system can be used alone or in combination to improve the accuracy of speech recognition.
基于上述任一实施例,图9为本发明实施例提供的语音识别装置的结构示意图,如图9所示,该装置包括:语音数据确定单元910和语音识别解码单元920。Based on any of the above embodiments, FIG. 9 is a schematic diagram of the structure of a speech recognition device provided in an embodiment of the present invention. As shown in FIG. 9 , the device includes: a speech data determination unit 910 and a speech recognition decoding unit 920 .
其中,语音数据确定单元910用于确定用户的待识别语音;The voice data determination unit 910 is used to determine the user's voice to be recognized;
语音识别解码单元920用于基于预设状态转移路径,对待识别语音进行语音识别解码,得到语音识别结果;预设状态转移路径是基于用户的地域信息和/或历史输入信息扩充得到的。The speech recognition decoding unit 920 is used to perform speech recognition decoding on the speech to be recognized based on a preset state transition path to obtain a speech recognition result; the preset state transition path is expanded based on the user's regional information and/or historical input information.
本发明实施例提供的装置,基于当前用户的地域信息和/或历史输入信息动态扩充解码网络中的预设状态转移路径,从而基于扩充后的预设状态转移路径对该用户的待识别语音进行语音识别解码,利用该用户的个性化信息,提升了个性化语音识别的准确性,且动态扩充预设状态转移路径的方式,增强了该装置的实用性。The device provided by the embodiment of the present invention dynamically expands the preset state transition path in the decoding network based on the current user's regional information and/or historical input information, thereby performing speech recognition decoding on the user's to-be-recognized speech based on the expanded preset state transition path, and utilizes the user's personalized information to improve the accuracy of personalized speech recognition. The method of dynamically expanding the preset state transition path enhances the practicality of the device.
基于上述任一实施例,语音识别解码单元920用于:Based on any of the above embodiments, the speech recognition decoding unit 920 is used to:
确定待识别语音对应的音素序列;Determine the phoneme sequence corresponding to the speech to be recognized;
基于上一解码位置对应的预设状态转移路径,对当前解码位置处的音素序列进行解码,得到当前解码位置处的解码结果。Based on the preset state transition path corresponding to the previous decoding position, the phoneme sequence at the current decoding position is decoded to obtain a decoding result at the current decoding position.
基于上述任一实施例,该装置还包括第一路径扩充单元,用于:Based on any of the above embodiments, the device further includes a first path expansion unit, configured to:
确定与用户的地域信息相关联的地域名词;Determining regional terms associated with the user's regional information;
基于各地域名词扩充解码网络中地名对应的预设状态转移路径。Based on the domain name vocabulary of each place, the preset state transition path corresponding to the place name in the decoding network is expanded.
本发明实施例提供的装置,通过与用户的地域信息相关联的地域名词扩充解码网络中地名对应的预设状态转移路径,对预设状态转移路径进行了个性化扩充,有助于提高语音识别的准确性。The device provided by the embodiment of the present invention expands the preset state transition path corresponding to the place name in the decoding network through the geographical term associated with the user's geographical information, and personalizes the expansion of the preset state transition path, which helps to improve the accuracy of speech recognition.
基于上述任一实施例,该装置还包括第二路径扩充单元,用于:Based on any of the above embodiments, the device further includes a second path expansion unit, configured to:
基于用户的历史输入信息,确定当前解码位置处的音素序列对应的相似热词;Based on the user's historical input information, determine the similar hot words corresponding to the phoneme sequence at the current decoding position;
基于相似热词,扩展上一解码位置对应的预设状态转移路径。Based on similar hot words, the preset state transition path corresponding to the previous decoding position is expanded.
本发明实施例提供的装置,通过在解码过程中,利用当前解码位置处的音素序列对应的相似热词扩展上一解码位置对应的预设状态转移路径,对预设状态转移路径进行了个性化地动态扩充,有助于提高语音识别的准确性。The device provided by the embodiment of the present invention, during the decoding process, uses similar hot words corresponding to the phoneme sequence at the current decoding position to expand the preset state transition path corresponding to the previous decoding position, thereby dynamically expanding the preset state transition path in a personalized manner, which helps to improve the accuracy of speech recognition.
基于上述任一实施例,基于用户的历史输入信息,确定当前解码位置处的音素序列对应的相似热词,包括:Based on any of the above embodiments, determining the similar hot words corresponding to the phoneme sequence at the current decoding position based on the historical input information of the user includes:
基于当前解码位置处的音素序列以及预先构建的发音相似矩阵,确定当前解码位置处的音素序列对应的相似音素序列;Determine a similar phoneme sequence corresponding to the phoneme sequence at the current decoding position based on the phoneme sequence at the current decoding position and a pre-constructed pronunciation similarity matrix;
基于用户的各个热词,确定与当前解码位置处的音素序列和/或相似音素序列对应的相似热词;热词是基于历史输入信息确定的。Based on each hot word of the user, similar hot words corresponding to the phoneme sequence and/or similar phoneme sequence at the current decoding position are determined; the hot words are determined based on historical input information.
基于上述任一实施例,语音识别解码单元920用于:Based on any of the above embodiments, the speech recognition decoding unit 920 is used to:
基于语言模型,结合预设状态转移路径,对待识别语音进行语音识别解码;Based on the language model and the preset state transition path, the speech recognition decoding is performed on the speech to be recognized;
其中,语言模型与用户当前使用的设备类型对应;任一设备类型对应的语言模型是基于该设备类型的应用场景文本训练得到的。Among them, the language model corresponds to the device type currently used by the user; the language model corresponding to any device type is trained based on the application scenario text of the device type.
本发明实施例提供的装置,通过动态选择与用户当前使用的设备类型对应的语言模型,结合预设状态转移路径,对待识别语音进行语音识别解码,进一步提高了语音识别的准确性。The device provided by the embodiment of the present invention further improves the accuracy of speech recognition by dynamically selecting a language model corresponding to the type of device currently used by the user and combining it with a preset state transition path to perform speech recognition decoding on the speech to be recognized.
基于上述任一实施例,语音识别解码单元920用于:Based on any of the above embodiments, the speech recognition decoding unit 920 is used to:
确定用户的声纹特征;Determine the user's voiceprint characteristics;
基于预设状态转移路径,结合待识别语音的音频特征和用户的声纹特征,对待识别语音进行语音识别解码。Based on the preset state transfer path, combined with the audio features of the speech to be recognized and the voiceprint features of the user, speech recognition decoding is performed on the speech to be recognized.
本发明实施例提供的装置,通过确定用户的声纹特征,再结合待识别语音的音频特征和用户的声纹特征,对待识别语音进行语音识别解码,进一步提高了语音识别的准确性。The device provided by the embodiment of the present invention determines the voiceprint features of the user, and then combines the audio features of the speech to be recognized and the voiceprint features of the user to perform speech recognition decoding on the speech to be recognized, thereby further improving the accuracy of speech recognition.
图10示例了一种电子设备的实体结构示意图,如图10所示,该电子设备可以包括:处理器(processor)1010、通信接口(Communications Interface)1020、存储器(memory)1030和通信总线1040,其中,处理器1010,通信接口1020,存储器1030通过通信总线1040完成相互间的通信。处理器1010可以调用存储器1030中的逻辑指令,以执行语音识别方法,该方法包括:确定用户的待识别语音;基于预设状态转移路径,对所述待识别语音进行语音识别解码,得到语音识别结果;所述预设状态转移路径是基于所述用户的地域信息和/或历史输入信息扩充得到的。FIG10 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG10 , the electronic device may include: a processor 1010, a communications interface 1020, a memory 1030, and a communication bus 1040, wherein the processor 1010, the communications interface 1020, and the memory 1030 communicate with each other through the communication bus 1040. The processor 1010 may call the logic instructions in the memory 1030 to execute a speech recognition method, which includes: determining a user's speech to be recognized; based on a preset state transition path, performing speech recognition decoding on the speech to be recognized to obtain a speech recognition result; the preset state transition path is obtained by expanding the user's regional information and/or historical input information.
此外,上述的存储器1030中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 1030 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on such an understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法所提供的语音识别方法,该方法包括:确定用户的待识别语音;基于预设状态转移路径,对所述待识别语音进行语音识别解码,得到语音识别结果;所述预设状态转移路径是基于所述用户的地域信息和/或历史输入信息扩充得到的。On the other hand, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, and the computer program includes program instructions. When the program instructions are executed by a computer, the computer can execute the speech recognition method provided by the above-mentioned methods, and the method includes: determining the user's speech to be recognized; based on a preset state transition path, performing speech recognition decoding on the speech to be recognized to obtain a speech recognition result; the preset state transition path is obtained by expanding the user's geographical information and/or historical input information.
又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各提供的语音识别方法,该方法包括:确定用户的待识别语音;基于预设状态转移路径,对所述待识别语音进行语音识别解码,得到语音识别结果;所述预设状态转移路径是基于所述用户的地域信息和/或历史输入信息扩充得到的。On the other hand, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to execute the above-mentioned speech recognition methods, the method comprising: determining a user's speech to be recognized; based on a preset state transition path, performing speech recognition decoding on the speech to be recognized to obtain a speech recognition result; the preset state transition path is obtained by expanding the user's geographical information and/or historical input information.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without creative work.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110474762.5ACN113113024B (en) | 2021-04-29 | 2021-04-29 | Speech recognition method, device, electronic equipment and storage medium |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110474762.5ACN113113024B (en) | 2021-04-29 | 2021-04-29 | Speech recognition method, device, electronic equipment and storage medium |
| Publication Number | Publication Date |
|---|---|
| CN113113024A CN113113024A (en) | 2021-07-13 |
| CN113113024Btrue CN113113024B (en) | 2024-08-23 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110474762.5AActiveCN113113024B (en) | 2021-04-29 | 2021-04-29 | Speech recognition method, device, electronic equipment and storage medium |
| Country | Link |
|---|---|
| CN (1) | CN113113024B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113838456B (en)* | 2021-09-28 | 2024-05-31 | 中国科学技术大学 | Phoneme extraction method, speech recognition method, device, equipment and storage medium |
| CN114220444B (en)* | 2021-10-27 | 2022-09-06 | 安徽讯飞寰语科技有限公司 | Voice decoding method, device, electronic equipment and storage medium |
| CN114242046B (en)* | 2021-12-01 | 2022-08-16 | 广州小鹏汽车科技有限公司 | Voice interaction method and device, server and storage medium |
| CN119811371A (en)* | 2023-10-11 | 2025-04-11 | 广州视源电子科技股份有限公司 | Speech recognition method, device, equipment and storage medium based on poetry hot words |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103065630A (en)* | 2012-12-28 | 2013-04-24 | 安徽科大讯飞信息科技股份有限公司 | User personalized information voice recognition method and user personalized information voice recognition system |
| CN106469554A (en)* | 2015-08-21 | 2017-03-01 | 科大讯飞股份有限公司 | A kind of adaptive recognition methodss and system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2007140048A (en)* | 2005-11-17 | 2007-06-07 | Oki Electric Ind Co Ltd | Voice recognition system |
| US8255224B2 (en)* | 2008-03-07 | 2012-08-28 | Google Inc. | Voice recognition grammar selection based on context |
| CN103903619B (en)* | 2012-12-28 | 2016-12-28 | 科大讯飞股份有限公司 | A kind of method and system improving speech recognition accuracy |
| WO2015026366A1 (en)* | 2013-08-23 | 2015-02-26 | Nuance Communications, Inc. | Multiple pass automatic speech recognition methods and apparatus |
| KR20170134115A (en)* | 2016-05-27 | 2017-12-06 | 주식회사 케이티 | Voice recognition apparatus using WFST optimization and method thereof |
| CN107734193A (en)* | 2017-11-22 | 2018-02-23 | 深圳悉罗机器人有限公司 | Smart machine system and smart machine control method |
| CN110634472B (en)* | 2018-06-21 | 2024-06-04 | 中兴通讯股份有限公司 | Speech recognition method, server and computer readable storage medium |
| US10388272B1 (en)* | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
| CN111354347B (en)* | 2018-12-21 | 2023-08-15 | 中国科学院声学研究所 | Speech recognition method and system based on self-adaptive hotword weight |
| CN111508497B (en)* | 2019-01-30 | 2023-09-26 | 北京猎户星空科技有限公司 | Speech recognition method, device, electronic equipment and storage medium |
| KR102758478B1 (en)* | 2019-04-05 | 2025-01-22 | 삼성전자주식회사 | Method and apparatus for speech recognition |
| CN112071310B (en)* | 2019-06-11 | 2024-05-07 | 北京地平线机器人技术研发有限公司 | Speech recognition method and device, electronic equipment and storage medium |
| CN110610700B (en)* | 2019-10-16 | 2022-01-14 | 科大讯飞股份有限公司 | Decoding network construction method, voice recognition method, device, equipment and storage medium |
| CN112102815B (en)* | 2020-11-13 | 2021-07-13 | 深圳追一科技有限公司 | Speech recognition method, speech recognition device, computer equipment and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103065630A (en)* | 2012-12-28 | 2013-04-24 | 安徽科大讯飞信息科技股份有限公司 | User personalized information voice recognition method and user personalized information voice recognition system |
| CN106469554A (en)* | 2015-08-21 | 2017-03-01 | 科大讯飞股份有限公司 | A kind of adaptive recognition methodss and system |
| Publication number | Publication date |
|---|---|
| CN113113024A (en) | 2021-07-13 |
| Publication | Publication Date | Title |
|---|---|---|
| CN113113024B (en) | Speech recognition method, device, electronic equipment and storage medium | |
| US11664020B2 (en) | Speech recognition method and apparatus | |
| CN111292728B (en) | Speech recognition method and device | |
| CN108831439B (en) | Speech recognition method, device, device and system | |
| Qian et al. | Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system | |
| JP6550068B2 (en) | Pronunciation prediction in speech recognition | |
| CN108899013B (en) | Voice search method and device and voice recognition system | |
| US9640175B2 (en) | Pronunciation learning from user correction | |
| KR101237799B1 (en) | Improving the robustness to environmental changes of a context dependent speech recognizer | |
| WO2020119432A1 (en) | Speech recognition method and apparatus, and device and storage medium | |
| US10515637B1 (en) | Dynamic speech processing | |
| US10832668B1 (en) | Dynamic speech processing | |
| US9922650B1 (en) | Intent-specific automatic speech recognition result generation | |
| CN110930980A (en) | An acoustic recognition model, method and system for Chinese-English mixed speech | |
| CN114283786B (en) | Speech recognition method, device and computer readable storage medium | |
| CN114155836B (en) | Speech recognition method, related device and readable storage medium | |
| CN114420159B (en) | Audio evaluation method and device, and non-transient storage medium | |
| CN112397053B (en) | Voice recognition method and device, electronic equipment and readable storage medium | |
| US20220399013A1 (en) | Response method, terminal, and storage medium | |
| US20170270923A1 (en) | Voice processing device and voice processing method | |
| TWI731921B (en) | Speech recognition method and device | |
| KR102300303B1 (en) | Voice recognition considering utterance variation | |
| KR102392992B1 (en) | User interfacing device and method for setting wake-up word activating speech recognition | |
| US11328713B1 (en) | On-device contextual understanding | |
| CN115410558A (en) | Out-of-set word processing method, electronic device and storage medium |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| TA01 | Transfer of patent application right | Effective date of registration:20230508 Address after:230026 No. 96, Jinzhai Road, Hefei, Anhui Applicant after:University of Science and Technology of China Applicant after:IFLYTEK Co.,Ltd. Address before:230088 666 Wangjiang West Road, Hefei hi tech Development Zone, Anhui Applicant before:IFLYTEK Co.,Ltd. | |
| TA01 | Transfer of patent application right | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |