CN118197318A

Movatterモバイル変換

Info

Publication number: CN118197318A
Application number: CN202211608793.6A
Authority: CN
Inventors: 王菲
Original assignee: Beijing Co Wheels Technology Co Ltd
Current assignee: Beijing Co Wheels Technology Co Ltd
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2024-06-14

Abstract

The disclosure provides a voice recognition method, a voice recognition device, electronic equipment and a storage medium, and relates to the field of intelligent voice. The voice recognition method comprises the following steps: responding to a user address query request, and acquiring identification information of a user; identifying the query content of the user address query request according to the first voice identification model to obtain a first identification result; acquiring a corresponding second voice recognition model according to the identification information of the user, and recognizing the query content of the user address query request according to the second voice recognition model to obtain a second recognition result; and respectively weighting the first recognition result and the second recognition result to calculate a score, and taking the recognition result with the largest score as a voice recognition result corresponding to the query content. The voice recognition method and the voice recognition device can improve the accuracy of voice recognition.

Description

Translated fromChinese

语音识别方法及装置、电子设备和存储介质Voice recognition method and device, electronic device and storage medium

技术领域Technical Field

本公开涉及智能语音领域，尤其涉及一种语音识别方法及装置、电子设备和存储介质。The present disclosure relates to the field of intelligent speech, and in particular to a speech recognition method and device, an electronic device, and a storage medium.

背景技术Background technique

在当前的语音识别技术下，可以将说话人口述的语句识别成文字，但是并没有在语音识别过程中对语义有更深的了解，以至于遇到多音字或者发音类似的词语时，可能会出现误识别的情况。此外，当前的语音识别技术针对不同母语或不同出生地的人，可能会因为某些语言中特定的发音习惯(如连读、拼读等)，而对同一音素的发音有区别，也会出现误识别的情况，导致语音识别的准确率不高。Under current speech recognition technology, the spoken sentences of the speaker can be recognized as text, but there is no deeper understanding of the semantics during the speech recognition process, so when encountering polyphones or words with similar pronunciations, misrecognition may occur. In addition, the current speech recognition technology may have different pronunciations for the same phoneme for people with different native languages or different birthplaces due to specific pronunciation habits in certain languages (such as connected reading, spelling, etc.), which may also lead to misrecognition, resulting in low accuracy of speech recognition.

发明内容Summary of the invention

本公开提供了一种语音识别方法、装置、电子设备和存储介质。The present disclosure provides a speech recognition method, device, electronic device and storage medium.

根据本公开的第一方面，提供了一种语音识别方法，包括：According to a first aspect of the present disclosure, there is provided a speech recognition method, comprising:

响应于用户地址查询请求，获取所述用户的标识信息；In response to a user address query request, obtaining identification information of the user;

根据第一语音识别模型对所述用户地址查询请求的查询内容进行识别得到第一识别结果；所述第一语音识别模型为基于全量数据训练得到的语音识别通用模型；Recognize the query content of the user address query request according to the first speech recognition model to obtain a first recognition result; the first speech recognition model is a general speech recognition model obtained by training based on the full amount of data;

根据所述用户的标识信息获取对应的第二语音识别模型，并根据所述第二语音识别模型对所述用户地址查询请求的查询内容进行识别，得到第二识别结果；所述第二语音识别模型为基于全量数据的细分数据训练得到的语音识别模型；Acquire a corresponding second speech recognition model according to the identification information of the user, and recognize the query content of the user address query request according to the second speech recognition model to obtain a second recognition result; the second speech recognition model is a speech recognition model obtained by training based on the segmented data of the full amount of data;

将所述第一识别结果和第二识别结果分别进行加权计算得分，将得分最大的识别结果作为所述查询内容对应的语音识别结果。The first recognition result and the second recognition result are weighted and scored respectively, and the recognition result with the largest score is used as the speech recognition result corresponding to the query content.

在本公开的一些实施例中，所述获取所述用户的标识信息包括：In some embodiments of the present disclosure, obtaining the identification information of the user includes:

获取用户的特征信息和/或车辆的属性信息；所述用户的特征信息为视觉信息和/或音频信息，所述车辆的属性信息为车辆标识信息；Acquiring characteristic information of a user and/or attribute information of a vehicle; the characteristic information of the user is visual information and/or audio information, and the attribute information of the vehicle is vehicle identification information;

根据所述用户的特征信息和/或车辆的属性信息获取所述用户的标识信息。The identification information of the user is obtained according to the characteristic information of the user and/or the attribute information of the vehicle.

在本公开的一些实施例中，所述根据用户的特征信息获取所述用户的标识信息包括：In some embodiments of the present disclosure, acquiring the identification information of the user according to the characteristic information of the user includes:

获取所述用户的视觉信息和/或音频信息；Acquiring visual information and/or audio information of the user;

根据所述视觉信息和/或音频信息确定所述用户的性别、年龄信息中的至少一种，并将其作为用户的标识信息。At least one of the gender and age information of the user is determined based on the visual information and/or the audio information, and is used as identification information of the user.

在本公开的一些实施例中，In some embodiments of the present disclosure,

所述根据所述视觉信息和/或音频信息确定所述用户的性别、年龄信息中的至少一种包括：Determining at least one of the gender and age information of the user according to the visual information and/or the audio information includes:

将所述视觉信息和/或音频信息与预存储的视觉信息和/或音频信息进行比对，获取预存储的所述用户的性别、年龄信息中的至少一种；Comparing the visual information and/or audio information with pre-stored visual information and/or audio information to obtain at least one of the pre-stored gender and age information of the user;

或者，将所述视觉信息和/或音频信息的特征信息与性别、年龄数据库中的特征信息进行比对，确定所述用户的性别、年龄信息中的至少一种。Alternatively, the feature information of the visual information and/or audio information is compared with the feature information in a gender and age database to determine at least one of the gender and age information of the user.

在本公开的一些实施例中，所述根据车辆的属性信息获取所述用户的标识信息包括：In some embodiments of the present disclosure, acquiring the identification information of the user according to the attribute information of the vehicle includes:

获取所述车辆标识信息；Acquiring the vehicle identification information;

根据所述车辆标识信息获取所述车辆的当前地址和/或常驻地址址信息中的至少一种，并将其作为用户的标识信息。At least one of the current address and/or permanent address information of the vehicle is acquired according to the vehicle identification information, and is used as the identification information of the user.

在本公开的一些实施例中，还包括：In some embodiments of the present disclosure, it further includes:

基于全量数据训练并构建第一语音识别模型，以及对所述全量数据按照预定规则进行细分，基于所述细分数据训练并构建第二语音识别模型。A first speech recognition model is trained and constructed based on the full amount of data, and the full amount of data is segmented according to a predetermined rule, and a second speech recognition model is trained and constructed based on the segmented data.

本公开的第二方面，提供了一种语音识别装置，该装置包括：In a second aspect of the present disclosure, a speech recognition device is provided, the device comprising:

第一获取单元，用于响应于用户地址查询请求，获取所述用户的标识信息；A first acquisition unit, configured to acquire identification information of the user in response to a user address query request;

识别单元，用于根据第一语音识别模型对所述用户地址查询请求的查询内容进行识别得到第一识别结果；所述第一语音识别模型为基于全量数据训练得到的语音识别通用模型；A recognition unit, configured to recognize the query content of the user address query request according to a first speech recognition model to obtain a first recognition result; the first speech recognition model is a general speech recognition model obtained by training based on the full amount of data;

第二获取单元，用于根据所述用户的标识信息获取对应的第二语音识别模型；A second acquisition unit, used to acquire a corresponding second speech recognition model according to the identification information of the user;

所述识别单元，还用于根据所述第二语音识别模型对所述用户地址查询请求的查询内容进行识别，得到第二识别结果；所述第二语音识别模型为基于全量数据的细分数据训练得到的语音识别模型；The recognition unit is further used to recognize the query content of the user address query request according to the second speech recognition model to obtain a second recognition result; the second speech recognition model is a speech recognition model obtained by training based on the segmented data of the full amount of data;

计算单元，用于将所述第一识别结果和第二识别结果分别进行加权计算得分，将得分最大的识别结果作为所述查询内容对应的语音识别结果。A calculation unit is used to perform weighted calculation on the first recognition result and the second recognition result, and take the recognition result with the largest score as the speech recognition result corresponding to the query content.

根据本公开的第三方面，提供了一种电子设备，包括：According to a third aspect of the present disclosure, there is provided an electronic device, including:

至少一个处理器；以及at least one processor; and

与至少一个处理器通信连接的存储器；其中，a memory communicatively connected to at least one processor; wherein,

存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行前述第一方面描述的方法。The memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor so that the at least one processor can execute the method described in the first aspect above.

根据本公开的第四方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，其中，计算机指令用于使计算机执行前述第一方面描述的方法。According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to enable a computer to execute the method described in the first aspect above.

根据本公开的第五方面，提供了一种计算机程序产品，包括计算机程序，计算机程序在被处理器执行时实现如前述第一方面描述的方法。According to a fifth aspect of the present disclosure, a computer program product is provided, including a computer program, and when the computer program is executed by a processor, the computer program implements the method described in the first aspect.

本公开提供的语音识别方法、装置及电子设备，其响应于用户地址查询请求，获取所述用户的标识信息；根据第一语音识别模型对所述用户地址查询请求的查询内容进行识别得到第一识别结果；并根据所述用户的标识信息获取对应的第二语音识别模型，根据所述第二语音识别模型对所述用户地址查询请求的查询内容进行识别，得到第二识别结果；将所述第一识别结果和第二识别结果分别进行加权计算得分，将得分最大的识别结果作为所述查询内容对应的语音识别结果；其中，所述第一语音识别模型为基于全量数据训练得到的语音识别通用模型，所述第二语音识别模型为基于全量数据的细分数据训练得到的语音识别模型。本发明最终获得的语音识别结果为基于通用模型的第一模型识别的第一识别结果和基于个性的第二模型识别的第二识别结果最终确定的识别率最好的识别结果，并且第二语音识别模型是基于全量数据细分后进行训练得到的，使得语音识别的准确性更好，故相比现有技术仅用通用模型进行识别，较大程度的提供了识别的准确性。The speech recognition method, device and electronic device provided by the present disclosure obtain the identification information of the user in response to the user address query request; identify the query content of the user address query request according to the first speech recognition model to obtain a first recognition result; and obtain the corresponding second speech recognition model according to the user identification information, identify the query content of the user address query request according to the second speech recognition model, and obtain a second recognition result; weight the first recognition result and the second recognition result respectively, and take the recognition result with the largest score as the speech recognition result corresponding to the query content; wherein the first speech recognition model is a general speech recognition model obtained by training based on the full amount of data, and the second speech recognition model is a speech recognition model obtained by training based on the subdivided data of the full amount of data. The speech recognition result finally obtained by the present invention is the recognition result with the best recognition rate finally determined by the first recognition result of the first model recognition based on the general model and the second recognition result of the second model recognition based on the individuality, and the second speech recognition model is obtained by training based on the subdivided full amount of data, so that the accuracy of speech recognition is better, so compared with the prior art that only uses the general model for recognition, the recognition accuracy is provided to a greater extent.

应当理解，本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征，也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present application, nor is it intended to limit the scope of the present application. Other features of the present application will become easily understood through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图用于更好地理解本方案，不构成对本公开的限定。其中：The accompanying drawings are used to better understand the present solution and do not constitute a limitation of the present disclosure.

图1为本公开实施例所提供的一种语音识别方法的流程示意图；FIG1 is a flow chart of a speech recognition method provided by an embodiment of the present disclosure;

图2为本公开实施例所提供的一种获取所述用户的标识信息的方法流程示意图；FIG2 is a schematic flow chart of a method for obtaining identification information of the user provided by an embodiment of the present disclosure;

图3为本公开实施例所提供的一种语音识别装置的组成框图；FIG3 is a block diagram of a speech recognition device provided by an embodiment of the present disclosure;

图4为本公开实施例所提供的另一种语音识别装置的组成框图；FIG4 is a block diagram of another speech recognition device provided by an embodiment of the present disclosure;

图5为本公开实施例所提供的示例电子设备400的示意性框图。FIG. 5 is a schematic block diagram of an exemplary electronic device 400 provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。The following is a description of exemplary embodiments of the present disclosure in conjunction with the accompanying drawings, including various details of the embodiments of the present disclosure to facilitate understanding, which should be considered as merely exemplary. Therefore, it should be recognized by those of ordinary skill in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present disclosure. Similarly, for the sake of clarity and conciseness, descriptions of well-known functions and structures are omitted in the following description.

下面参考附图描述本公开实施例的安全座椅插入检测方法、装置、电子设备和存储介质。The following describes the safety seat insertion detection method, device, electronic device and storage medium according to the embodiments of the present disclosure with reference to the accompanying drawings.

为解决上述问题，如图1所示，图1为本公开实施例所提供的一种语音识别方法的流程示意。具体可以包含以下步骤：To solve the above problem, as shown in FIG1 , FIG1 is a flow chart of a speech recognition method provided by an embodiment of the present disclosure. Specifically, the method may include the following steps:

步骤101、响应于用户地址查询请求，获取所述用户的标识信息。Step 101: In response to a user address query request, obtain identification information of the user.

本公开的实施例该处需要说明的是，所述用户的标识信息可以为但不局限于以下至少一项：性别、年龄、车辆的当前地址、车辆的常驻地址。还可以是其他的身份标签，具体的公开的实施例对此不进行限制。It should be noted that the user identification information may be, but is not limited to, at least one of the following: gender, age, current address of the vehicle, and permanent address of the vehicle. Other identity tags may also be used, and the specific disclosed embodiments do not limit this.

另外，该处需要说明的是，该用户地址查询请求可以是直接地址查询指令，也可以是通过多轮语音交互中涉及的地址查询指令，还可以是笼统的指代性的地址查询指令，比如附近网红咖啡店等，具体是哪种指令，本公开的实施例对此不进行限制。In addition, it should be noted that the user address query request can be a direct address query instruction, or an address query instruction involved in multiple rounds of voice interaction, or a general referential address query instruction, such as a nearby Internet celebrity coffee shop, etc. The specific type of instruction is not limited by the embodiments of the present disclosure.

步骤102、根据第一语音识别模型对所述用户地址查询请求的查询内容进行识别得到第一识别结。其中，所述第一语音识别模型为基于全量数据训练得到的语音识别模型。Step 102: Recognize the query content of the user address query request according to the first speech recognition model to obtain a first recognition result. The first speech recognition model is a speech recognition model trained based on the full amount of data.

本公开的实施例，所述第一语音识别模型为基于全量数据训练得到的语音识别通用模型。该处所述的全量数据为不分性别、年龄、地址等数据因素的全部待训练数据，就该全量数据构建和训练得到第一语音识别模型，该第一语音识别模型为通用语音识别模型，该第一语音识别模型对用户输入的语音进行识别翻译，并将翻译得到的文字与地址进行匹配，并将该识别得到的结果作为第一识别结果。基于全量数据进行构建和训练第一识别模型可以采用已有的任一种方法，本公开的实施例对此不进行限制。In an embodiment of the present disclosure, the first speech recognition model is a general speech recognition model obtained by training based on the full amount of data. The full amount of data mentioned here refers to all the data to be trained without distinguishing data factors such as gender, age, address, etc. The first speech recognition model is constructed and trained based on the full amount of data. The first speech recognition model is a general speech recognition model. The first speech recognition model recognizes and translates the speech input by the user, matches the translated text with the address, and uses the recognition result as the first recognition result. Any existing method can be used to construct and train the first recognition model based on the full amount of data, and the embodiments of the present disclosure do not limit this.

步骤103、根据所述用户的标识信息获取对应的第二语音识别模型，并根据所述第二语音识别模型对所述用户地址查询请求的查询内容进行识别，得到第二识别结果。所述第二语音识别模型为基于全量数据的细分数据训练得到的语音识别模型。Step 103: Obtain a corresponding second speech recognition model according to the identification information of the user, and recognize the query content of the user address query request according to the second speech recognition model to obtain a second recognition result. The second speech recognition model is a speech recognition model trained based on the segmented data of the full data.

本公开的实施例该处需要说明的是，所述第二语音识别模型为个性化的语音识别模型，其为基于全量数据按照预定规则进行细分，得到细分数据，比如按照性别，年龄，地区(车辆的当前地址、车辆的常驻地址)等维度对全量数据进行细分，将细分后的数据进行细分领域个性化语音识别模型的构建和训练，得到第二语音识别模型。该第二语音识别模型为一个单领域语音识别模型的总称，比如其可以是基于性别的语音识别模型，也可以是基于年龄的语音识别模型，也可以是基于地区的语音识别模型，还可以是其他类别的语音模型。具体的，本公开的实施例对此不进行限制。It should be noted here in the embodiments of the present disclosure that the second speech recognition model is a personalized speech recognition model, which is based on the full amount of data and is segmented according to predetermined rules to obtain segmented data. For example, the full amount of data is segmented according to dimensions such as gender, age, and region (the current address of the vehicle, the permanent address of the vehicle), and the segmented data is used to construct and train a personalized speech recognition model for the segmented field to obtain a second speech recognition model. The second speech recognition model is a general term for a single-field speech recognition model. For example, it can be a gender-based speech recognition model, an age-based speech recognition model, a region-based speech recognition model, or other types of speech models. Specifically, the embodiments of the present disclosure do not limit this.

另外，还需要说明的是，在进行第二语音识别模型进行训练的时候，其获取的数据尽量覆盖该类别的人群常去的地点、常听常看的歌曲与视频、特有的发音习惯等，具体的，本公开的实施例对此不进行限制。In addition, it should be noted that when training the second speech recognition model, the data obtained should try to cover the places frequented by people in this category, the songs and videos they often listen to and watch, the unique pronunciation habits, etc. Specifically, the embodiments of the present disclosure do not limit this.

步骤104、将所述第一识别结果和第二识别结果分别进行加权计算得分，将得分最大的识别结果作为所述查询内容对应的语音识别结果。Step 104: weight the first recognition result and the second recognition result respectively to calculate the scores, and use the recognition result with the largest score as the speech recognition result corresponding to the query content.

本公开的实施例该处需要说明的是，在得到第一识别结果和第二识别结果后，将两种结果共同进入语音识别候选列表，将所述第一识别结果和第二识别结果分别进行加权计算得分，根据最终得分排序后将得分最大的识别结果作为所述查询内容对应的语音识别结果。What needs to be explained here in the embodiment of the present disclosure is that after obtaining the first recognition result and the second recognition result, the two results are entered into the speech recognition candidate list together, the first recognition result and the second recognition result are weighted and scored respectively, and after sorting according to the final score, the recognition result with the largest score is used as the speech recognition result corresponding to the query content.

基于上述的方法，在执行本发明实施例的语音识别方式时，需要训练并构建第一语音识别模型和第二语音识别模型，具体的，基于全量数据训练并构建第一语音识别模型，以及对所述全量数据按照预定规则进行细分，基于所述细分数据训练并构建第二语音识别模型。如前所述的，第二语音识别模型为一类语音识别模型的总称，当其为基于性别和年龄建立的第二语音识别模型时，主要采集的是对应性别和年龄段对应的语音特征，比如，[男，35～40岁]第二语音识别模型，[女，20～25岁]第二语音识别模型，[儿童]第二语音识别模型；获取对应的「统计类」语言模型。Based on the above method, when executing the speech recognition method of the embodiment of the present invention, it is necessary to train and construct a first speech recognition model and a second speech recognition model. Specifically, the first speech recognition model is trained and constructed based on the full amount of data, and the full amount of data is segmented according to a predetermined rule, and the second speech recognition model is trained and constructed based on the segmented data. As mentioned above, the second speech recognition model is a general term for a type of speech recognition model. When it is a second speech recognition model established based on gender and age, it mainly collects speech features corresponding to the corresponding gender and age group, such as [male, 35-40 years old] second speech recognition model, [female, 20-25 years old] second speech recognition model, [children] second speech recognition model; obtain the corresponding "statistical" language model.

当其为基于地址建立的第二语音识别模型的时候，获取车辆标识信息比如身份标识id，获取车辆发售地的经纬度与车辆常驻地址，根据所得到的省份地区获取该地区相对应的「口音类」声学模型，该处的第二语音识别模型为口音类语音识别模型；如[川渝地区]、[粤区]、[西北地区]等地方的语音识别模型。When it is a second speech recognition model established based on the address, the vehicle identification information such as the identity ID is obtained, the latitude and longitude of the vehicle sales place and the vehicle's permanent address are obtained, and the "accent type" acoustic model corresponding to the province and region is obtained. The second speech recognition model here is an accent type speech recognition model; such as the speech recognition models of [Sichuan and Chongqing region], [Guangdong region], [Northwest region] and other places.

当其为基于当前地址建立的第二语音识别模型的时候，根据定位系统，得到当前坐标所在的城市，获取相对应的「地域类」语言模型；该处的第二语音识别模型为地域类语音识别模型，该第二语音识别模型主要覆盖该地区的热门POI与当前坐标附近的详细POI列表。When it is the second speech recognition model established based on the current address, the city where the current coordinates are located is obtained according to the positioning system, and the corresponding "regional" language model is obtained; the second speech recognition model here is a regional speech recognition model, which mainly covers the popular POIs in the area and a detailed POI list near the current coordinates.

基于上述描述，在接收到用户地址查询请求时，先获取用户的标识信息；用户的表述信息可以为但不局限于以下的任一项，包括：性别、年龄、车辆的当前地址、车辆的常驻地址等，在获取所述用户的标识信息时可以采用但不局限于以下的方法实现，该方法包括：Based on the above description, when receiving a user address query request, the user's identification information is first obtained; the user's description information may be, but is not limited to, any of the following items, including: gender, age, current address of the vehicle, permanent address of the vehicle, etc., and the following method may be used but is not limited to achieve the acquisition of the user's identification information, including:

获取用户的特征信息和/或车辆的属性信息；所述用户的特征信息为视觉信息和/或音频信息，所述车辆的属性信息为车辆标识信息。Acquire user characteristic information and/or vehicle attribute information; the user characteristic information is visual information and/or audio information, and the vehicle attribute information is vehicle identification information.

其中，根据所述用户的特征信息获取所述用户的性别和/年龄信息中的至少一种；也可以根据车辆的属性信息获取车辆的当前地址和/或车辆的常驻地址信息中的至少一种。Among them, at least one of the gender and/or age information of the user is obtained based on the characteristic information of the user; and at least one of the current address of the vehicle and/or the permanent address information of the vehicle can also be obtained based on the attribute information of the vehicle.

基于上述第二语音识别模型的构建，本发明实施例提供一种语音识别方法，其在接收到用户地址查询请求时，先获取用户的标识信息，当该标识信息包括性别或年龄时，所述获取所述用户的标识信息，如图2所示，可以采用但不局限于以下方法，该方法包括：Based on the construction of the second speech recognition model, an embodiment of the present invention provides a speech recognition method, which first obtains the user's identification information when receiving a user address query request. When the identification information includes gender or age, the method of obtaining the user's identification information, as shown in FIG. 2, may adopt but is not limited to the following method, which includes:

步骤201、获取所述用户的视觉信息和/或音频信息。Step 201: Acquire visual information and/or audio information of the user.

其中，基于该标识信息获取对应的第二语音识别模型，比如当前驾驶舱内没有摄像设备，仅有拾音设备，则获取音频信息，基于该音频信息确定所述用户的性别、年龄；比如当前驾驶舱内设置有摄像设备和拾音设备，则基于摄像设备获取的视觉信息以及拾音设备获取的音频信息确定所述用户的性别和年龄。Among them, the corresponding second speech recognition model is obtained based on the identification information. For example, if there is no camera device in the current cockpit but only a sound pickup device, audio information is obtained, and the gender and age of the user are determined based on the audio information; for example, if there are camera devices and sound pickup devices in the current cockpit, the gender and age of the user are determined based on the visual information obtained by the camera device and the audio information obtained by the sound pickup device.

步骤202、根据所述视觉信息和/或音频信息确定所述用户的性别、年龄信息中的至少一种，并将其作为用户的标识信息。Step 202: Determine at least one of the gender and age information of the user based on the visual information and/or audio information, and use it as identification information of the user.

其中，在根据所述视觉信息和/或音频信息确定所述用户的性别、年龄信息中的至少一种时，可以采用但不局限于以下的方法实现，该方法包括：Wherein, when determining at least one of the gender and age information of the user according to the visual information and/or the audio information, the following method may be used but is not limited to implement, and the method includes:

方式A、将所述视觉信息和/或音频信息与预存储的视觉信息和/或音频信息进行比对，获取预存储的所述用户的性别、年龄信息中的至少一种。Method A: Compare the visual information and/or audio information with pre-stored visual information and/or audio information to obtain at least one of the pre-stored gender and age information of the user.

该种方式是用户在使用该语音识别功能之前，先将用户的信息进行预存储，即将用户的视觉信息和/或音频信息进行录入，并输入对应的性别和年龄。比如，将男性车主的视觉信息和/或音频信息进行录入，并输入对应的性别男，年龄40岁；将女性车主的视频信息和/或音频信息进行录入，并输入对应的性别女，年龄39岁；将儿童的视觉信息和/或音频信息进行录入，并输入对应性别女，年龄8岁。基于该信息的输入，当男性车主进行地址查询请求时，基于获取到的视觉信息和/或音频信息与预存储的性别和年龄进行匹配，得到男性车主的性别和年龄。进一步的，基于性别和年龄获取对应的第二语音识别模型，基于第二语音模型对男车主的查询内容进行识别，得到第二识别结果。This method is that before using the voice recognition function, the user pre-stores the user's information, that is, the user's visual information and/or audio information is entered, and the corresponding gender is male, and the age is 40 years old; the video information and/or audio information of the female owner is entered, and the corresponding gender is female, and the age is 39 years old; the visual information and/or audio information of the child is entered, and the corresponding gender is female, and the age is 8 years old. Based on the input of this information, when the male owner makes an address query request, the gender and age of the male owner are obtained by matching the obtained visual information and/or audio information with the pre-stored gender and age. Furthermore, the corresponding second voice recognition model is obtained based on the gender and age, and the query content of the male owner is recognized based on the second voice model to obtain a second recognition result.

方式B、将所述视觉信息和/或音频信息的特征信息与性别、年龄数据库中的特征信息进行比对，确定所述用户的性别、年龄信息中的至少一种。Method B: comparing the feature information of the visual information and/or audio information with the feature information in the gender and age database to determine at least one of the gender and age information of the user.

本公开的实施例，基于大数据对同一性别、年龄信息中的至少一种的用户语音进行采集和训练，得到对应的语音数据集。在获取用户的性别、年龄信息中的至少一种时，将所述视觉信息和/或音频信息的特征信息与性别、年龄数据库中的特征信息进行比对，确定所述用户的性别、年龄。In the embodiment of the present disclosure, the user voices of at least one of the same gender and age information are collected and trained based on big data to obtain a corresponding voice data set. When obtaining at least one of the gender and age information of the user, the feature information of the visual information and/or audio information is compared with the feature information in the gender and age database to determine the gender and age of the user.

本公开的实施例，在获取用户的性别、年龄的标识信息时，可以通过以上两种方法实现，该方法能够使得获取的第二语音识别模型更精确，保证了最终语音识别的准确性。In the embodiments of the present disclosure, when obtaining the identification information of the user's gender and age, it can be achieved through the above two methods. This method can make the obtained second speech recognition model more accurate, thereby ensuring the accuracy of the final speech recognition.

在本公开的一些实施例中，当该标识信息为车辆常驻地址时，获取所述用户的标识信息可以采用但不局限于以下的方法实现，该方法包括：In some embodiments of the present disclosure, when the identification information is the permanent address of the vehicle, obtaining the identification information of the user may be implemented by, but not limited to, the following method, which includes:

获取所述车辆标识信息，根据所述车辆标识信息获取所述车辆常驻地址。The vehicle identification information is obtained, and the permanent address of the vehicle is obtained according to the vehicle identification information.

其中，车辆在进行售卖的时候，都具有唯一的车辆标识信息，该标识信息一般会记录该车被销售在哪个区域，故获取了车辆的标识信息，在一定程度上能够获得车辆的使用地。但是在一些情况下，车辆在一地购买，但是在另一地使用，为了更精准的获取车辆当前的使用地，获取车辆经常驻地址的信息，根据车辆常驻地址所在的地区确定当地的口音信息。基于该口音信息获取对应的口音第二语音识别模型。基于该口音第二语音识别模型进行用户地址查询请求的查询内容的识别，得到第二识别结果。Among them, when a vehicle is sold, it has unique vehicle identification information, which generally records the area where the vehicle is sold. Therefore, by obtaining the vehicle identification information, the place of use of the vehicle can be obtained to a certain extent. However, in some cases, the vehicle is purchased in one place, but used in another place. In order to more accurately obtain the current place of use of the vehicle, the information of the vehicle's permanent address is obtained, and the local accent information is determined according to the area where the vehicle's permanent address is located. Based on the accent information, the corresponding accent second speech recognition model is obtained. Based on the accent second speech recognition model, the query content of the user's address query request is recognized to obtain a second recognition result.

基于该口音第二语音识别模型，可以在一定程度上避免由于某些语言中特定的发音习惯(如连读、拼读等)，而对同一音素的发音有区别，也会出现误识别的情况，提高语音识别的准确率。Based on the accent second speech recognition model, it is possible to avoid, to a certain extent, the different pronunciations of the same phoneme due to specific pronunciation habits in certain languages (such as connected reading, spelling, etc.), which may also lead to misrecognition, thereby improving the accuracy of speech recognition.

在本公开的一些实施例中，当该标识信息为车辆的当前地址信息时，获取所述用户的标识信息可以采用但不局限于以下的方法实现，该方法包括：In some embodiments of the present disclosure, when the identification information is the current address information of the vehicle, obtaining the identification information of the user may be implemented by, but not limited to, the following method, which includes:

根据定位系统，得到当前车辆坐标所在的城市，获取相对应的地域第二语音识别模型；该地域第二语音识别模型中包括当前地址的热门兴趣点(Point of Interest，POI)与当前坐标附近的详细POI列表，基于该热门POI与当前坐标附近的详细POI列表进行具体位置的匹配。According to the positioning system, the city where the current vehicle coordinates are located is obtained, and the corresponding regional second voice recognition model is acquired; the regional second voice recognition model includes the popular points of interest (POI) of the current address and a detailed POI list near the current coordinates, and the specific location is matched based on the popular POI and the detailed POI list near the current coordinates.

上述的标识信息可以是一个，也可以是多个，具体的本公开的实施例对此不进行限制，当为多个的时候可以结合上面的实施例进行，本公开的实施例此处将不再赘述。The above identification information may be one or more, and the specific embodiments of the present disclosure do not limit this. When there are multiple identification information, it can be combined with the above embodiments, and the embodiments of the present disclosure will not be repeated here.

本公开的实施例，其响应于用户地址查询请求，获取所述用户的标识信息；根据第一语音识别模型对所述用户地址查询请求的查询内容进行识别得到第一识别结果；并根据所述用户的标识信息获取对应的第二语音识别模型，根据所述第二语音识别模型对所述用户地址查询请求的查询内容进行识别，得到第二识别结果；将所述第一识别结果和第二识别结果分别进行加权计算得分，将得分最大的识别结果作为所述查询内容对应的语音识别结果；其中，所述第一语音识别模型为基于全量数据训练得到的语音识别通用模型，所述第二语音识别模型为基于全量数据的细分数据训练得到的语音识别模型。本发明最终获得的语音识别结果为基于通用模型的第一模型识别的第一识别结果和基于个性的第二模型识别的第二识别结果最终确定的识别率最好的识别结果，并且第二语音识别模型是基于全量数据细分后进行训练得到的，使得语音识别的准确性更好，故相比现有技术仅用通用模型进行识别，较大程度的提供了识别的准确性The embodiment of the present disclosure obtains the identification information of the user in response to the user address query request; recognizes the query content of the user address query request according to the first speech recognition model to obtain a first recognition result; and obtains the corresponding second speech recognition model according to the user identification information, recognizes the query content of the user address query request according to the second speech recognition model to obtain a second recognition result; weights and calculates the scores of the first recognition result and the second recognition result respectively, and takes the recognition result with the largest score as the speech recognition result corresponding to the query content; wherein the first speech recognition model is a general speech recognition model obtained by training based on the full amount of data, and the second speech recognition model is a speech recognition model obtained by training based on the segmented data of the full amount of data. The speech recognition result finally obtained by the present invention is the recognition result with the best recognition rate finally determined by the first recognition result of the first model recognition based on the general model and the second recognition result of the second model recognition based on the individuality, and the second speech recognition model is obtained by training based on the segmented full amount of data, so that the accuracy of speech recognition is better, so compared with the prior art that only uses the general model for recognition, the recognition accuracy is improved to a greater extent.

与上述的语音识别方法相对应，本发明还提出一种语音识别装置的组成结构示意图。由于本发明的装置实施例与上述的方法实施例相对应，对于装置实施例中未披露的细节可参照上述的方法实施例，本发明中不再进行赘述。Corresponding to the above-mentioned speech recognition method, the present invention also provides a schematic diagram of the composition structure of a speech recognition device. Since the device embodiment of the present invention corresponds to the above-mentioned method embodiment, the details not disclosed in the device embodiment can be referred to the above-mentioned method embodiment, and will not be repeated in the present invention.

图3为本公开实施例提供的一种语音识别装置的组成框图，如图3所示，该装置包括：FIG3 is a block diagram of a speech recognition device provided by an embodiment of the present disclosure. As shown in FIG3 , the device includes:

第一获取单元301，用于响应于用户地址查询请求，获取所述用户的标识信息；A first acquisition unit 301 is used to acquire identification information of the user in response to a user address query request;

识别单元302，用于根据第一语音识别模型对所述用户地址查询请求的查询内容进行识别得到第一识别结果；所述第一语音识别模型为基于全量数据训练得到的语音识别通用模型；The recognition unit 302 is used to recognize the query content of the user address query request according to a first speech recognition model to obtain a first recognition result; the first speech recognition model is a general speech recognition model obtained by training based on the full amount of data;

第二获取单元303，用于根据所述用户的标识信息获取对应的第二语音识别模型；所述第二语音识别模型为基于全量数据的细分数据训练得到的语音识别模型；A second acquisition unit 303 is used to acquire a corresponding second speech recognition model according to the identification information of the user; the second speech recognition model is a speech recognition model trained based on the segmented data of the full amount of data;

所述识别单元302，还用于根据所述第二语音识别模型对所述用户地址查询请求的查询内容进行识别，得到第二识别结果；The recognition unit 302 is further configured to recognize the query content of the user address query request according to the second speech recognition model to obtain a second recognition result;

计算单元304，用于将所述第一识别结果和第二识别结果分别进行加权计算得分，将得分最大的识别结果作为所述查询内容对应的语音识别结果。The calculation unit 304 is used to perform weighted score calculation on the first recognition result and the second recognition result respectively, and take the recognition result with the largest score as the speech recognition result corresponding to the query content.

在本公开的一些实施例中，所述用户的标识信息为以下至少一种：In some embodiments of the present disclosure, the user identification information is at least one of the following:

性别、年龄、车辆的当前地址、车辆的常驻地址。Gender, age, current address of the vehicle, permanent address of the vehicle.

在本公开的一些实施例中，所述第一获取单元301还用于：In some embodiments of the present disclosure, the first acquiring unit 301 is further configured to:

在本公开的一些实施例中，所述根据所述视觉信息和/或音频信息确定所述用户的性别、年龄信息中的至少一种包括：In some embodiments of the present disclosure, determining at least one of the gender and age information of the user according to the visual information and/or the audio information includes:

获取所述车辆标识信息，根据所述车辆标识信息获取所述车辆的当前地址和/或常驻地址址信息中的至少一种，并将其作为用户的标识信息。The vehicle identification information is obtained, and at least one of the current address and/or permanent address information of the vehicle is obtained according to the vehicle identification information, and is used as the user's identification information.

在本公开的一些实施例中，如图4所述，该语音识别装置还包括：训练构建单元405。In some embodiments of the present disclosure, as shown in FIG. 4 , the speech recognition device further includes: a training construction unit 405 .

所述训练构建单元405，用于基于全量数据训练并构建第一语音识别模型，以及对所述全量数据按照预定规则进行细分，基于所述细分数据训练并构建第二语音识别模型。The training construction unit 405 is used to train and construct a first speech recognition model based on the full amount of data, and to segment the full amount of data according to a predetermined rule, and to train and construct a second speech recognition model based on the segmented data.

本公开提供的实施例，其响应于用户地址查询请求，获取所述用户的标识信息；根据第一语音识别模型对所述用户地址查询请求的查询内容进行识别得到第一识别结果；并根据所述用户的标识信息获取对应的第二语音识别模型，根据所述第二语音识别模型对所述用户地址查询请求的查询内容进行识别，得到第二识别结果；将所述第一识别结果和第二识别结果分别进行加权计算得分，将得分最大的识别结果作为所述查询内容对应的语音识别结果；其中，所述第一语音识别模型为基于全量数据训练得到的语音识别通用模型，所述第二语音识别模型为基于全量数据的细分数据训练得到的语音识别模型。本发明最终获得的语音识别结果为基于通用模型的第一模型识别的第一识别结果和基于个性的第二模型识别的第二识别结果最终确定的识别率最好的识别结果，并且第二语音识别模型是基于全量数据细分后进行训练得到的，使得语音识别的准确性更好，故相比现有技术仅用通用模型进行识别，较大程度的提供了识别的准确性。The embodiment provided by the present disclosure obtains the identification information of the user in response to the user address query request; recognizes the query content of the user address query request according to the first speech recognition model to obtain a first recognition result; and obtains the corresponding second speech recognition model according to the user identification information, recognizes the query content of the user address query request according to the second speech recognition model, and obtains the second recognition result; weights and calculates the scores of the first recognition result and the second recognition result respectively, and takes the recognition result with the largest score as the speech recognition result corresponding to the query content; wherein the first speech recognition model is a general speech recognition model obtained by training based on the full amount of data, and the second speech recognition model is a speech recognition model obtained by training based on the segmented data of the full amount of data. The speech recognition result finally obtained by the present invention is the recognition result with the best recognition rate finally determined by the first recognition result of the first model recognition based on the general model and the second recognition result of the second model recognition based on the personality, and the second speech recognition model is obtained by training based on the segmented full amount of data, so that the accuracy of speech recognition is better, so compared with the prior art that only uses the general model for recognition, the recognition accuracy is provided to a greater extent.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to an embodiment of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

图5示出了可以用来实施本公开的实施例的示例电子设备400的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG5 shows a schematic block diagram of an example electronic device 400 that can be used to implement an embodiment of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present disclosure described and/or required herein.

如图5所示，设备400包括计算单元401，其可以根据存储在ROM(Read-OnlyMemory，只读存储器)402中的计算机程序或者从存储单元408加载到RAM(Random AccessMemory，随机访问/存取存储器)403中的计算机程序，来执行各种适当的动作和处理。在RAM403中，还可存储设备400操作所需的各种程序和数据。计算单元401、ROM 402以及RAM 403通过总线404彼此相连。I/O(Input/Output，输入/输出)接口405也连接至总线404。As shown in FIG5 , the device 400 includes a computing unit 401, which can perform various appropriate actions and processes according to a computer program stored in a ROM (Read-Only Memory) 402 or a computer program loaded from a storage unit 408 into a RAM (Random Access Memory) 403. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored. The computing unit 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An I/O (Input/Output) interface 405 is also connected to the bus 404.

设备400中的多个部件连接至I/O接口405，包括：输入单元406，例如键盘、鼠标等；输出单元407，例如各种类型的显示器、扬声器等；存储单元408，例如磁盘、光盘等；以及通信单元409，例如网卡、调制解调器、无线通信收发机等。通信单元409允许设备400通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。A number of components in the device 400 are connected to the I/O interface 405, including: an input unit 406, such as a keyboard, a mouse, etc.; an output unit 407, such as various types of displays, speakers, etc.; a storage unit 408, such as a disk, an optical disk, etc.; and a communication unit 409, such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元401可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元401的一些示例包括但不限于CPU(Central Processing Unit，中央处理单元)、GPU(Graphic Processing Units，图形处理单元)、各种专用的AI(Artificial Intelligence，人工智能)计算芯片、各种运行机器学习模型算法的计算单元、DSP(Digital SignalProcessor，数字信号处理器)、以及任何适当的处理器、控制器、微控制器等。计算单元401执行上文所描述的各个方法和处理，例如安全座椅插入检测方法。例如，在一些实施例中，安全座椅插入检测方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元408。在一些实施例中，计算机程序的部分或者全部可以经由ROM 402和/或通信单元409而被载入和/或安装到设备400上。当计算机程序加载到RAM 403并由计算单元401执行时，可以执行上文描述的方法的一个或多个步骤。备选地，在其他实施例中，计算单元401可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行前述安全座椅插入检测方法。The computing unit 401 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a CPU (Central Processing Unit), a GPU (Graphic Processing Units), various dedicated AI (Artificial Intelligence) computing chips, various computing units running machine learning model algorithms, a DSP (Digital Signal Processor), and any appropriate processor, controller, microcontroller, etc. The computing unit 401 performs the various methods and processes described above, such as a safety seat insertion detection method. For example, in some embodiments, the safety seat insertion detection method may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into the RAM 403 and executed by the computing unit 401, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to execute the aforementioned safety seat insertion detection method in any other appropriate manner (eg, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、FPGA(Field Programmable Gate Array，现场可编程门阵列)、ASIC(Application-Specific Integrated Circuit，专用集成电路)、ASSP(Application Specific StandardProduct，专用标准产品)、SOC(System On Chip，芯片上系统的系统)、CPLD(ComplexProgrammable Logic Device，复杂可编程逻辑设备)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various embodiments of the systems and techniques described above herein may be implemented in digital electronic circuit systems, integrated circuit systems, FPGAs (Field Programmable Gate Arrays), ASICs (Application-Specific Integrated Circuits), ASSPs (Application Specific Standard Products), SOCs (System On Chips), CPLDs (Complex Programmable Logic Devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs that may be executed and/or interpreted on a programmable system including at least one programmable processor that may be a dedicated or general-purpose programmable processor that may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。The program code for implementing the method of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that the program code, when executed by the processor or controller, implements the functions/operations specified in the flow chart and/or block diagram. The program code may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM(Electrically Programmable Read-Only-Memory，可擦除可编程只读存储器)或快闪存储器、光纤、CD-ROM(Compact Disc Read-Only Memory，便捷式紧凑盘只读存储器)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing. More specific examples of machine-readable storage media may include electrical connections based on one or more lines, portable computer disks, hard disks, RAM, ROM, EPROM (Electrically Programmable Read-Only-Memory) or flash memory, optical fiber, CD-ROM (Compact Disc Read-Only Memory), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(Cathode-Ray Tube，阴极射线管)或者LCD(Liquid Crystal Display，液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (Cathode-Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including acoustic input, voice input, or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：LAN(LocalArea Network，局域网)、WAN(Wide Area Network，广域网)、互联网和区块链网络。The systems and techniques described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), the Internet, and blockchain networks.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，又称为云计算服务器或云主机，是云计算服务体系中的一项主机产品，以解决了传统物理主机与VPS服务("Virtual Private Server"，或简称"VPS")中，存在的管理难度大，业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship between the client and the server is generated by computer programs running on the corresponding computers and having a client-server relationship with each other. The server may be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the defects of difficult management and weak business scalability in traditional physical hosts and VPS services ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server combined with a blockchain.

其中，需要说明的是，人工智能是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科，既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术；人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术以及机器学习/深度学习、大数据处理技术、知识图谱技术等几大方向。It should be noted that artificial intelligence is a discipline that studies how computers can simulate certain human thought processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware-level and software-level technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; artificial intelligence software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology, as well as machine learning/deep learning, big data processing technology, knowledge graph technology, and other major directions.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps recorded in this disclosure can be executed in parallel, sequentially or in different orders, as long as the desired results of the technical solutions disclosed in this disclosure can be achieved, and this document does not limit this.

上述具体实施方式，并不构成对本公开保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等，均应包含在本公开保护范围之内。The above specific implementations do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modification, equivalent substitution and improvement made within the spirit and principle of the present disclosure shall be included in the protection scope of the present disclosure.