Movatterモバイル変換


[0]ホーム

URL:


CN116092492B - Hybrid multilingual navigation voice instruction processing method, device and electronic device - Google Patents

Hybrid multilingual navigation voice instruction processing method, device and electronic device

Info

Publication number
CN116092492B
CN116092492BCN202310085223.1ACN202310085223ACN116092492BCN 116092492 BCN116092492 BCN 116092492BCN 202310085223 ACN202310085223 ACN 202310085223ACN 116092492 BCN116092492 BCN 116092492B
Authority
CN
China
Prior art keywords
place name
voice
navigation
user
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310085223.1A
Other languages
Chinese (zh)
Other versions
CN116092492A (en
Inventor
张睿智
雷琴辉
刘俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Intelligent Voice Innovation Development Co ltd
iFlytek Co Ltd
Original Assignee
Hefei Intelligent Voice Innovation Development Co ltd
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Intelligent Voice Innovation Development Co ltd, iFlytek Co LtdfiledCriticalHefei Intelligent Voice Innovation Development Co ltd
Priority to CN202310085223.1ApriorityCriticalpatent/CN116092492B/en
Publication of CN116092492ApublicationCriticalpatent/CN116092492A/en
Application grantedgrantedCritical
Publication of CN116092492BpublicationCriticalpatent/CN116092492B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a method, a device and an electronic device for processing mixed multilingual navigation voice instructions, which are used for predetermining the country or region where a user is currently located, cutting the voice instructions input by the user into place name sections and non-place name sections, calling a voice processing strategy matched with the country or region where the user is located to identify the voice instructions corresponding to the place name sections, and finally carrying out navigation intention understanding by combining the identification results of the place name sections and the non-place name sections. The conventional positioning is replaced by judging the current country or region, so that a voice processing strategy matched with the local language can be predetermined, and the mixed multilingual voice instruction can be subjected to targeted recognition processing by cutting the input voice, so that a more reliable and accurate navigation intention understanding result is provided. The method and the device have the advantages that dictionary construction is not needed to be carried out at a cost, a large amount of parameter adjustment is not needed to be carried out on the existing model, and the situation of mixed languages in a navigation scene can be processed more economically and efficiently.

Description

Mixed multilingual navigation voice instruction processing method and device and electronic equipment
Technical Field
The invention relates to the technical field of voice interaction, in particular to a method and a device for processing a mixed multilingual navigation voice instruction and electronic equipment.
Background
At present, intelligent voice technology is fully popularized, and voice products are widely applied to intelligent furniture, mobile equipment and vehicle-mounted fields. In the process of voice interaction under overseas navigation scenes, the situation that the main language is inconsistent with the place name language, such as the situation that English + local language speaks "navigation XX place name", such mixed multilingual voice instructions related to navigation as "navigation to (english)" at a c of у ю pi, щ pi, and the like, and the existing voice assistant products cannot accurately recognize voice instructions in such a scene, resulting in deviation in understanding of the user's navigation intention.
Aiming at the specific problems in the specific scene, the prior art is mainly based on the original pronunciation dictionary construction in principle, and comprehensively considers the pronunciation habits of several mixed languages, and the process also needs to collect a large amount of audio annotation data and also needs to carry out large-scale parameter adjustment on the model training level.
However, the pronunciation dictionary is still limited to be constructed into multiple languages due to the difference between languages, and especially, the tens of millions of data identification supporting capability for multiple nations and places cannot meet expectations.
Disclosure of Invention
In view of the above, the present invention aims to provide a method, an apparatus and an electronic device for processing a mixed multilingual navigation voice command, so as to solve the drawbacks of the mixed multilingual navigation voice command caused by constructing a pronunciation dictionary and performing model adjustment.
The technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a method for processing a mixed multilingual navigation voice command, including:
Under a navigation scene, the country or region where the user is currently located is predetermined;
Cutting a voice instruction input by a user into a place name section and a non-place name section;
Invoking a voice processing strategy matched with the country or region where the user is currently located, and identifying a voice instruction corresponding to the place name section;
and carrying out navigation intention understanding by combining the recognition results of the place name section and the non-place name section.
In at least one possible implementation manner, the pre-determining the country or region in which the user is currently located includes acquiring electronic fence information corresponding to a border line or region boundary line from a navigation map.
In at least one possible implementation manner, the processing method further comprises the steps of updating the acquired electronic fence information by using the position information of the user, or dynamically adjusting the period of acquiring the electronic fence information according to the position information, the moving speed information and the distance information between the electronic fence and the user.
In at least one possible implementation manner, the step of cutting the voice command input by the user into the place name section and the non-place name section includes:
Cutting is performed in a manner of semantic understanding of the voice recognition contents, or in a manner of classifying the voice recognition contents, or in an audio dimension by utilizing the difference of voice instructions.
In at least one possible implementation manner, after the voice command corresponding to the place name section is identified, judging whether the identification result meets the preset confidence coefficient requirement, and if not, adopting the voice processing algorithm which is the same as that of the non-place name section to identify the voice command corresponding to the place name section.
In at least one possible implementation manner, after the voice command corresponding to the place name section is identified, when the identification result is judged to be unable to correspond to the place name of the country or region where the user is currently located, the identification result is corrected and a plurality of corrected place names close to the identification result are provided for the user to confirm.
In at least one possible implementation manner, the processing method further comprises the steps of detecting whether the voice command input by the user contains the place name or not before the place name and non-place name are cut, and if not, directly adopting a voice processing strategy corresponding to the main language to recognize and understand the input complete voice command.
In a second aspect, the present invention provides a mixed multilingual navigation voice instruction processing apparatus, including:
the region determination module is used for determining the country or region where the user is currently located in advance in a navigation scene;
The instruction cutting module is used for cutting a voice instruction input by a user into a place name section and a non-place name section;
The place name recognition module is used for calling a voice processing strategy matched with the country or region where the user is currently located and recognizing a voice instruction corresponding to the place name section;
and the navigation intention understanding module is used for carrying out navigation intention understanding by combining the identification results of the place name section and the non-place name section.
In a third aspect, the present invention provides an electronic device, comprising:
One or more processors, a memory, and one or more computer programs, the memory may employ a non-volatile storage medium, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the device, cause the device to perform the method as in the first aspect or any of the possible implementations of the first aspect.
The main conception of the invention is that the country or region where the user is currently located is predetermined under the navigation scene, after the user inputs the voice command, the command is cut into a place name section and a non-place name section, the voice processing strategy matched with the country or region where the user is currently located is called, the voice command corresponding to the place name section is identified, and finally the navigation intention is understood by combining the identification results of the place name section and the non-place name section. The conventional positioning thought is replaced by judging the current country or region, so that a voice processing strategy matched with the local language can be predetermined, and the input mixed multilingual voice instruction can be subjected to targeted recognition processing by cutting the input voice, so that a more reliable and accurate navigation intention understanding result is provided. The method and the device do not need to consume cost to construct a dictionary and do not need to carry out a large amount of parameter adjustment on the existing model, and can more economically and efficiently process the situation of mixed languages in a navigation scene.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the accompanying drawings, in which:
FIG. 1 is a flowchart of an embodiment of a method for processing mixed multilingual navigation voice command according to the present invention;
FIG. 2 is a schematic diagram of an embodiment of a mixed multilingual navigation voice command processing device according to the present invention;
fig. 3 is a schematic diagram of an embodiment of an electronic device provided by the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Aiming at the situation that the navigation voice command of the non-subject language is mixed in the navigation scene, the invention provides an embodiment of at least one mixed multilingual navigation voice command processing method, as shown in fig. 1, which specifically comprises the following steps:
Step S1, under a navigation scene, the country or region where the user is currently located is predetermined;
In practical operation, the process can be implemented based on various means, for example, coordinate information of the position where the user is located is adopted, and the longitude and latitude of the current position where the user is located can be obtained through navigation technologies such as satellite positioning, etc., so that the mode is relatively conventional, although the implementation obstacle is low, certain disadvantages exist, namely, deviation can be generated in single coordinate positioning, or in a real application scene, the moving user can not reliably obtain stable coordinate information of the moving user, particularly, in the vehicle driving process, the moving speed of the user is relatively high, and if the moving user is combined with areas with dense distribution in different countries such as europe, the current country or region where the user is located can not be reliably and accurately judged.
Based on such considerations, the present invention proposes in some preferred embodiments that the electronic fence information corresponding to border lines or regions in the navigation map be preferably used for weighted excitation for subsequent mixed language processing, i.e. in the concept of the preferred embodiment, the country or region in which the user is currently located is determined by the concept of "range", which is more reliably adapted to the navigation scenario as required by the single passing through the location position, especially for specific scenarios driving or crossing border lines/regions.
The invention does not need to describe and limit the invention, but can also supplement the description that, on the basis of the concept that the country or region where the user is located is determined by the scope, in other preferred embodiments of the invention, the obtained electronic fence information can be updated together with the position information (which can come from the longitude and latitude coordinates) of the user, and the period of obtaining the electronic fence information can be dynamically adjusted by further combining the moving speed information of the user and the distance information of the electronic fence based on the obtained electronic fence information, so that the reliable and accurate switching of the country/region where the user is located can be realized in certain scenes, such as cross-border driving, for example, the electronic fence information can be obtained once every 15 seconds (can be adjusted as required) when the user is judged to be closer to the certain country based on the information.
S2, cutting a voice instruction input by a user into a place name section and a non-place name section;
One skilled in the art can understand at least two aspects, one of which is that the scene aimed by the present invention is navigation, and accordingly, the user voice received by the system is navigation-related and relatively high probability will include a voice command similar to destination query navigation, and the other is that the default language of the navigation system is necessarily the language (such as the native language) used by the user, and the present invention is called the subject language, which indicates that the language is the most dominant basic language for voice processing of navigation.
Based on the above two aspects, in actual operation, after receiving navigation related voice provided by a user, through default subject language, the input voice can be identified and transcribed, and the input voice command is split into a place name segment and a non-place name segment according to a predetermined semantic understanding strategy, and because the method is a mature technology, the invention only makes a schematic introduction: for example, after preprocessing the audio of the input voice, performing transcription through a voice recognition model of a certain subject language, completing word segmentation sentence segmentation processing, and then dividing a part which is possibly a place name and a part which is not the place name in an original input instruction according to a preset expression template, a semantic slot and the like by combining an intention understanding model; in addition, the method of classifying the voice recognition content can be considered, when the method is implemented, a classifier trained according to a set target can be adopted to classify the voice recognition content, so that a part of a place name class and a part of a non-place name class are obtained, in some preferred embodiments of the invention, the first aspect and the second aspect mentioned above are considered, preferably, the voice part different from the main language can be judged as the place name segment directly through the difference of the voice dimension, the method can remarkably improve the integral processing time, and the implementation method also has the mature voice and acoustic processing schemes in the field for reference. However, the cutting method is not limited to the above method, and the technical tool is not the focus of the present invention.
S3, invoking a voice processing strategy matched with the country or region where the user is currently located, and identifying a voice instruction corresponding to the place name section;
and S4, combining the recognition results of the place name section and the non-place name section to understand the navigation intention.
For the above two steps, it may be pointed out that, as described above, the recognition of the non-place name segment may be obtained through a speech processing model related to the main language by default of the navigation system, and in different embodiments, the recognition opportunity may not be limited, and the recognition of the place name segment may be preferentially performed by using the speech processing model for the current world/region determined by the previous step, and then the recognition results of the two steps are comprehensively considered, and may be input into a natural language processing algorithm mature in the art, so as to understand the navigation expectations input by the user, and perform corresponding conventional navigation operations such as path planning, which are not repeated in the present invention.
It may be further noted that, in some preferred embodiments of the present invention, in consideration of special situations in real scenes, reliable location name recognition results may not be obtained if the voice processing model of the local country/region is directly used, and the following preferred schemes are proposed herein to deal with.
(1) After performing location name recognition processing by using a voice processing strategy matched with the local international/regional in the step S3, judging whether the recognition result meets a preset confidence requirement (in actual operation, the recognition result can be compared according to a score mechanism), if yes, continuing to execute the step S4, if not, adopting the voice processing algorithm identical with the non-location name to recognize the voice command corresponding to the location name, and then executing the step S4. That is, the station in this embodiment is located in the system, and considers that if the voice processing model of the local country/region cannot obtain the place name section voice with the confidence meeting the requirement, the place name section voice is still considered to be in the subject language with high probability, so that the voice processing model for processing the non-place name section is invoked to identify the place name section.
(2) After performing the location name recognition processing using the voice processing policy of the local international/regional matching in step S3, if it is determined that the recognition result cannot correspond to the location name of the country/region, it is preferable to correct the recognition result and provide the corrected local location name (closest to the recognition result) or local location names (Top N closest to the recognition result) for the user to confirm, and then step S4 is performed. The corrected place name may be provided in the form of voice and/or text output, and the manner of correcting the recognition result in this embodiment may refer to the existing voice and text correction manner. That is, the station in this embodiment is located in the system, and considers that if the recognition result of the place name section obtained by the voice processing model of the local country/region may not be accurately matched to the corresponding local place name due to inaccurate pronunciation of the local language by the user, a policy for assisting the user in correction is adopted. In addition, the idea of matching the recognition result with the local place name is mentioned herein, that is, in other embodiments of the present invention, but not limited to, a place name library of different countries/regions may be constructed by using the local language in the early stage, including cities, scenic spots, roads, addresses where buildings are located, and the like, and when the recognition result of the place name section obtained through the voice processing model of the local country/region is obtained, the matching operation may be performed with the place name library.
Finally, it may be further added that, in the foregoing description step S2, the voice command input by the user is related to the address in the navigation scene, but in consideration of the actual application, there is a navigation voice command that does not input the location name, so in other preferred schemes, the invention proposes that before the location name section and the non-location name section are cut, whether the voice command input by the user includes the location name can be detected, for example, whether the voice command includes the location name content can be primarily determined by conventional and mature techniques such as language recognition, understanding and classification, and if the primary determination result is not included, the voice processing strategy corresponding to the default main language of the system can be directly adopted to recognize and intend to understand the input complete voice command, and the links of dividing and calling the country/region voice processing algorithm are skipped, so that the user requirement can be responded more quickly.
Based on the above embodiments, and in conjunction with the examples mentioned above, references are provided herein to assist in describing the working process of the solution of the present invention, in which a user in native english drives locally in russia, and when using a navigation application, the navigation system, with default subject language english, determines that the current user is located in russia (the core goal of the process is to serve the corresponding language recognition model required for follow-up, on implementation of the solution, that the russia recognition model may be pre-selected for follow-up invocation) according to the electronic fence characterizing the russia border line provided by the high-precision map module in current navigation. When the user inputs "na vite to (english, meaning Navigate to) a pi of у ю a/щ a/b (russian, meaning red field)" the voice interactive instruction mixed with the multiple languages, the navigation system recognizes the english part and the non-english part from the acoustic layer, determines the english part as a non-place name segment according to a predetermined policy, and determines the non-english part as a place name segment.
Then, based on the fact that the current user is located in Russian by the electronic fence in advance, the constructed Russian voice recognition model is called, voice recognition processing is carried out on the audio segment corresponding to the region name section 'kappa pi' at у ю pi 'at щ pi' at the second time, and certainly, the non-region name section part can be recognized by directly adopting a navigation system default language, namely an English voice recognition model.
Finally, the recognized text output by the two voice recognition models can be spliced according to the audio time sequence of the input voice and sent to a pre-built natural language understanding algorithm for text-based semantic analysis processing, wherein the supplementary explanation is that the natural language understanding process can be irrelevant to language types, the intention understanding result can be obtained by only collecting data sets of different languages according to real requirements to train the existing intention understanding models in actual operation, and for the purpose of simplifying the training of the natural language understanding algorithm, the recognized text 'Navigate to k' alpha 'can also be converted into text' Navigate to Red Square 'by a pre-set Russian translation service of' Ji 'e у ю pi'm, 'Ji' щ a 'alpha' by the pre-set Ji 'alpha' and then sent to the intention understanding model trained by only English samples in advance. Therefore, through the whole flow, the navigation system can determine that the user expects to drive from the current place to the red field in Moscow, and then gives corresponding planned routes and other relevant navigation information.
In summary, the main concept of the present invention is to pre-determine the country or region where the user is currently located in the navigation scene, after the user inputs the voice command, cut the command into a place name segment and a non-place name segment, invoke a voice processing policy matched with the country or region where the user is currently located, identify the voice command corresponding to the place name segment, and finally combine the identification results of the place name segment and the non-place name segment to understand the navigation intention. The conventional positioning thought is replaced by judging the current country or region, so that a voice processing strategy matched with the local language can be predetermined, and the input mixed multilingual voice instruction can be subjected to targeted recognition processing by cutting the input voice, so that a more reliable and accurate navigation intention understanding result is provided. The invention does not need to cost to construct a dictionary for voice recognition and does not need to carry out a large amount of parameter adjustment on the existing voice recognition model, and can more economically and efficiently process the situation of mixed languages in a navigation scene.
Corresponding to the above embodiments and preferred solutions, the present invention further provides an embodiment of a mixed multilingual navigation voice command processing device, as shown in fig. 2, which may specifically include the following components:
The area judging module 1 is used for presetting the country or area where the user is currently located in a navigation scene;
the instruction cutting module 2 is used for cutting a voice instruction input by a user into a place name section and a non-place name section;
The place name recognition module 3 is used for calling a voice processing strategy matched with the country or region where the user is currently located and recognizing a voice instruction corresponding to the place name section;
And the navigation intention understanding module 4 is used for carrying out navigation intention understanding by combining the identification results of the place name section and the non-place name section.
It should be understood that the above division of the components in the mixed multilingual navigation voice command processing apparatus shown in fig. 2 is merely a division of logic functions, and may be fully or partially integrated into a physical entity or may be physically separated. The components can be realized in the form of software calling through the processing element, can be realized in the form of hardware, and can also be realized in the form of software calling through the processing element, and can be realized in the form of hardware. For example, some of the above modules may be individually set up processing elements, or may be integrated in a chip of the electronic device. The implementation of the other components is similar. In addition, all or part of the components can be integrated together or can be independently realized. In implementation, each step of the above method or each component above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above components may be one or more integrated circuits configured to implement the above methods, such as one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SINGNAL Processor; DSP), or one or more field programmable gate arrays (Field Programmable GATE ARRAY; FPGA), or the like. For another example, these components may be integrated together and implemented in the form of a System-On-a-Chip (SOC).
In view of the foregoing examples and preferred embodiments thereof, it will be appreciated by those skilled in the art that in actual operation, the technical concepts of the present invention may be applied to various embodiments, and the present invention is schematically illustrated by the following carriers:
(1) An electronic device. The apparatus may in particular comprise one or more processors, a memory and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the steps/functions of the foregoing embodiments or equivalent implementations.
The electronic device may be an electronic device related to a computer, such as, but not limited to, various interactive terminals, electronic products, mobile terminals, and the like.
Fig. 3 is a schematic structural diagram of an embodiment of an electronic device according to the present invention, and in particular, an electronic device 900 includes a processor 910 and a memory 930. Wherein the processor 910 and the memory 930 may communicate with each other via an internal connection, and transfer control and/or data signals, the memory 930 is configured to store a computer program, and the processor 910 is configured to call and execute the computer program from the memory 930. The processor 910 and the memory 930 may be combined into a single processing device, more commonly referred to as separate components, and the processor 910 is configured to execute program code stored in the memory 930 to perform the functions described above. In particular implementations, the memory 930 may also be integrated within the processor 910 or separate from the processor 910.
In addition, to further improve the functionality of the electronic device 900, the device 900 may further comprise one or more of an input unit 960, a display unit 970, audio circuitry 980, a camera 990, a sensor 901, etc., which may further comprise a speaker 982, a microphone 984, etc. Wherein the display unit 970 may include a display screen.
Further, the apparatus 900 may also include a power supply 950 for providing electrical power to various devices or circuits in the apparatus 900.
It should be appreciated that the operation and/or function of the various components in the apparatus 900 may be found in particular in the foregoing description of embodiments of the method, system, etc., and detailed descriptions thereof are omitted here as appropriate to avoid redundancy.
It should be appreciated that the processor 910 in the electronic device 900 shown in fig. 3 may be a system on a chip SOC, where the processor 910 may include a central processing unit (Central Processing Unit; hereinafter referred to as a CPU), and may further include other types of processors, such as an image processor (Graphics Processing Unit; hereinafter referred to as a GPU), and so on, which will be described in detail below.
In general, portions of the processors or processing units within the processor 910 may cooperate to implement the preceding method flows, and corresponding software programs for the portions of the processors or processing units may be stored in the memory 930.
(2) A computer data storage medium having stored thereon a computer program or the above-mentioned means which, when executed, causes a computer to perform the steps/functions of the foregoing embodiments or equivalent implementations.
In several embodiments provided by the present invention, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer data storage medium. Based on such understanding, certain aspects of the present invention may be embodied in the form of a software product as described below, in essence, or as a part of, contributing to the prior art.
It is especially pointed out that the storage medium may refer to a server or a similar computer device, in particular, i.e. a storage means in the server or a similar computer device, in which the aforementioned computer program or the aforementioned means are stored.
(3) A computer program product (which may comprise the apparatus described above) which, when run on a terminal device, causes the terminal device to perform the hybrid multilingual navigation voice instruction processing method of the foregoing embodiment or equivalent.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the above-described computer program product may include, but is not limited to, an APP.
The above device/terminal may be a computer device, and the hardware structure of the computer device may further specifically include at least one processor, at least one communication interface, at least one memory, and at least one communication bus, where the processor, the communication interface, and the memory may all complete communication with each other through the communication bus. The processor may be a central Processing unit CPU, DSP, microcontroller or digital signal processor, and may further include a GPU, an embedded neural network processor (Neural-network Process Units; hereinafter referred to as NPU) and an image signal processor (IMAGE SIGNAL Processing; hereinafter referred to as ISP), which may further include an ASIC, or one or more integrated circuits configured to implement embodiments of the present invention, etc., and may further have a function of operating one or more software programs, which may be stored in a storage medium such as a Memory, etc., and the aforementioned Memory/storage medium may include a non-volatile Memory (non-removable disk, U-disk, removable hard disk, optical disk, etc., and a Read-Only Memory (ROM; hereinafter referred to as ROM), a random access Memory (Random Access Memory; hereinafter referred to as RAM), etc.
In the embodiments of the present invention, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent a, b, c, a and b, a and c, b and c, or a and b and c, wherein a, b, c may be single or plural.
Those of skill in the art will appreciate that the various modules, units, and method steps described in the embodiments disclosed herein can be implemented in electronic hardware, computer software, and combinations of electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
And wherein the modules, units, etc. illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of places, e.g. nodes of a system network. In particular, some or all modules and units in the system can be selected according to actual needs to achieve the purpose of the embodiment scheme. Those skilled in the art will understand and practice the invention without undue burden.
The construction, features and effects of the present invention are described in detail with reference to the embodiments shown in the drawings, but the above-mentioned embodiments and the technical features related to the preferred embodiments are only preferred embodiments of the present invention, and it should be understood that those skilled in the art may reasonably combine and arrange the above-mentioned embodiments into various equivalent schemes without departing from or changing the design concept and technical effects of the present invention, so that the present invention is not limited by the scope of the embodiments shown in the drawings, and all changes made according to the concepts of the present invention or modifications to equivalent embodiments are within the scope of the present invention without departing from the spirit covered by the specification and drawings.

Claims (8)

Translated fromChinese
1.一种混合多语种导航语音指令处理方法,其特征在于,包括:1. A method for processing mixed multi-language navigation voice instructions, comprising:在导航场景下,预先确定用户当前所在的国家或地区,用于选定当前所在的国家或地区的语音处理策略以待后续调用;In the navigation scenario, the user's current country or region is pre-determined to select the voice processing strategy of the current country or region for subsequent invocation;将用户输入的语音指令切割为地名段及非地名段,包括采用对语音识别内容进行语义理解的方式进行切割;Segmenting the voice command input by the user into place name segments and non-place name segments, including segmenting by semantically understanding the voice recognition content;调用与用户当前所在的国家或地区匹配的语音处理策略,对所述地名段对应的语音指令进行识别,并判断识别结果是否满足预设的置信度要求,若否,则采用与所述非地名段相同的语音处理算法对所述地名段对应的语音指令进行识别;Invoke a voice processing strategy that matches the country or region where the user is currently located, recognize the voice command corresponding to the place name segment, and determine whether the recognition result meets a preset confidence requirement. If not, use the same voice processing algorithm as the non-place name segment to recognize the voice command corresponding to the place name segment;结合所述地名段以及所述非地名段二者的识别结果进行导航意图理解。The navigation intention is understood by combining the recognition results of the place name segment and the non-place name segment.2.根据权利要求1所述的混合多语种导航语音指令处理方法,其特征在于,所述预先确定用户当前所在的国家或地区包括:从导航地图中获取对应于国境线或地区界线的电子围栏信息。2. The hybrid multi-language navigation voice command processing method according to claim 1 is characterized in that the predetermining of the country or region where the user is currently located includes: obtaining electronic fence information corresponding to the national border or regional boundary from the navigation map.3.根据权利要求2所述的混合多语种导航语音指令处理方法,其特征在于,所述处理方法还包括:利用用户的位置信息更新获取到的所述电子围栏信息;或者,根据用户的位置信息、移动速度信息、与电子围栏的距离信息,动态调整获取所述电子围栏信息的周期。3. According to the hybrid multi-language navigation voice command processing method of claim 2, it is characterized in that the processing method also includes: using the user's location information to update the acquired electronic fence information; or, dynamically adjusting the period of acquiring the electronic fence information according to the user's location information, moving speed information, and distance information from the electronic fence.4.根据权利要求1所述的混合多语种导航语音指令处理方法,其特征在于,在对所述地名段对应的语音指令进行识别后,当判断识别结果由于用户对当地语言发音不准导致无法对应当前所在国家或地区的地名时,对识别结果进行纠正并提供纠正后的若干个接近识别结果的地名供用户确认。4. The hybrid multilingual navigation voice instruction processing method according to claim 1 is characterized in that after the voice instruction corresponding to the place name segment is recognized, when it is determined that the recognition result cannot correspond to the place name of the current country or region due to the user's inaccurate pronunciation of the local language, the recognition result is corrected and several corrected place names close to the recognition result are provided for user confirmation.5.根据权利要求1~4任一项所述的混合多语种导航语音指令处理方法,其特征在于,所述处理方法还包括:在进行地名段与非地名段切割之前,检测用户输入的语音指令中是否包含地名,若否,则直接采用与主体语言对应的语音处理策略对输入的完整语音指令进行识别及意图理解。5. The method for processing mixed multilingual navigation voice commands according to any one of claims 1 to 4 is characterized in that the processing method further comprises: before cutting the place name segment and the non-place name segment, detecting whether the voice command input by the user contains a place name, and if not, directly using the voice processing strategy corresponding to the main language to recognize and understand the intention of the input complete voice command.6.一种混合多语种导航语音指令处理装置,其特征在于,包括:6. A mixed multi-language navigation voice instruction processing device, characterized by comprising:所在区域判定模块,用于在导航场景下,预先确定用户当前所在的国家或地区,用于选定当前所在的国家或地区的语音处理策略以待后续调用;A region determination module is used to predetermine the country or region where the user is currently located in a navigation scenario, and to select a speech processing strategy for the current country or region for subsequent invocation;指令切割模块,用于将用户输入的语音指令切割为地名段及非地名段,包括采用对语音识别内容进行语义理解的方式进行切割;A command segmentation module, used for segmenting the voice command input by the user into place name segments and non-place name segments, including segmentation by semantic understanding of the voice recognition content;地名识别模块,用于调用与用户当前所在的国家或地区匹配的语音处理策略,对所述地名段对应的语音指令进行识别,并判断识别结果是否满足预设的置信度要求,若否,则采用与所述非地名段相同的语音处理算法对所述地名段对应的语音指令进行识别;A place name recognition module is used to call a voice processing strategy that matches the country or region where the user is currently located, recognize the voice command corresponding to the place name segment, and determine whether the recognition result meets the preset confidence requirement. If not, the voice command corresponding to the place name segment is recognized using the same voice processing algorithm as the non-place name segment;导航意图理解模块,用于结合所述地名段以及所述非地名段二者的识别结果进行导航意图理解。The navigation intention understanding module is used to understand the navigation intention by combining the recognition results of the place name segment and the non-place name segment.7.一种电子设备,其特征在于,包括:7. An electronic device, comprising:一个或多个处理器、存储器以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行权利要求1~5任一项所述的混合多语种导航语音指令处理方法。One or more processors, a memory and one or more computer programs, wherein the one or more computer programs are stored in the memory, and the one or more computer programs include instructions, which, when executed by the electronic device, enable the electronic device to execute the hybrid multilingual navigation voice instruction processing method as described in any one of claims 1 to 5.8.一种计算机数据存储介质,其特征在于,所述计算机数据存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得计算机执行权利要求1~5任一项所述的混合多语种导航语音指令处理方法。8. A computer data storage medium, characterized in that a computer program is stored in the computer data storage medium, and when the computer program is run on a computer, the computer executes the hybrid multi-language navigation voice instruction processing method according to any one of claims 1 to 5.
CN202310085223.1A2023-01-172023-01-17 Hybrid multilingual navigation voice instruction processing method, device and electronic deviceActiveCN116092492B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310085223.1ACN116092492B (en)2023-01-172023-01-17 Hybrid multilingual navigation voice instruction processing method, device and electronic device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310085223.1ACN116092492B (en)2023-01-172023-01-17 Hybrid multilingual navigation voice instruction processing method, device and electronic device

Publications (2)

Publication NumberPublication Date
CN116092492A CN116092492A (en)2023-05-09
CN116092492Btrue CN116092492B (en)2025-07-15

Family

ID=86204240

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310085223.1AActiveCN116092492B (en)2023-01-172023-01-17 Hybrid multilingual navigation voice instruction processing method, device and electronic device

Country Status (1)

CountryLink
CN (1)CN116092492B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117423260B (en)*2023-12-192024-03-12杭州智慧耳朵科技有限公司Auxiliary teaching method based on classroom speech recognition and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104282302A (en)*2013-07-042015-01-14三星电子株式会社 Apparatus and method for recognizing speech and text
CN107093277A (en)*2017-03-282017-08-25北京途自在物联科技有限公司A kind of new shared bicycle management method and its system
CN107112007A (en)*2014-12-242017-08-29三菱电机株式会社Speech recognition equipment and audio recognition method

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8140335B2 (en)*2007-12-112012-03-20Voicebox Technologies, Inc.System and method for providing a natural language voice user interface in an integrated voice navigation services environment
DE102014210716A1 (en)*2014-06-052015-12-17Continental Automotive Gmbh Assistance system, which is controllable by means of voice inputs, with a functional device and a plurality of speech recognition modules
JP6385442B2 (en)*2014-08-212018-09-05三菱電機株式会社 Map information processing system and map information processing method
KR101619966B1 (en)*2014-09-242016-05-11엠앤서비스 주식회사Voice analyzing apparatus, method and system for guiding route
JP6896335B2 (en)*2017-05-302021-06-30アルパイン株式会社 Speech recognition device and speech recognition method
WO2021019775A1 (en)*2019-08-012021-02-04三菱電機株式会社Multilingual voice recognition device and multilingual voice recognition method
CN113571040B (en)*2021-01-152025-09-09腾讯科技(深圳)有限公司Voice data recognition method, device, equipment and storage medium
CN115223563B (en)*2021-09-162023-09-15广州汽车集团股份有限公司Vehicle navigation voice interaction method, device and storage medium
CN114242059A (en)*2021-12-232022-03-25广州小鹏汽车科技有限公司Voice interaction method, vehicle, server, voice system and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104282302A (en)*2013-07-042015-01-14三星电子株式会社 Apparatus and method for recognizing speech and text
CN107112007A (en)*2014-12-242017-08-29三菱电机株式会社Speech recognition equipment and audio recognition method
CN107093277A (en)*2017-03-282017-08-25北京途自在物联科技有限公司A kind of new shared bicycle management method and its system

Also Published As

Publication numberPublication date
CN116092492A (en)2023-05-09

Similar Documents

PublicationPublication DateTitle
US11328708B2 (en)Speech error-correction method, device and storage medium
US7386440B2 (en)Method, system, and apparatus for natural language mixed-initiative dialogue processing
CN106875949B (en)Correction method and device for voice recognition
CN110276023B (en) POI transition event discovery method, apparatus, computing device and medium
US20190087455A1 (en)System and method for natural language processing
WO2020258502A1 (en)Text analysis method and apparatus, computer apparatus and computer storage medium
JP2020505643A (en) Voice recognition method, electronic device, and computer storage medium
KR102372069B1 (en)Free dialogue system and method for language learning
JP2020004382A (en)Method and device for voice interaction
CN113326702B (en)Semantic recognition method, semantic recognition device, electronic equipment and storage medium
CN111128181B (en)Recitation question evaluating method, recitation question evaluating device and recitation question evaluating equipment
WO2017177809A1 (en)Word segmentation method and system for language text
US20190005950A1 (en)Intention estimation device and intention estimation method
US10741178B2 (en)Method for providing vehicle AI service and device using the same
CN111753524A (en)Text sentence break position identification method and system, electronic device and storage medium
CN116092492B (en) Hybrid multilingual navigation voice instruction processing method, device and electronic device
CN110926493A (en)Navigation method, navigation device, vehicle and computer readable storage medium
KR102017229B1 (en)A text sentence automatic generating system based deep learning for improving infinity of speech pattern
CN114038451A (en)Quality inspection method and device for dialogue data, computer equipment and storage medium
CN110020429A (en)Method for recognizing semantics and equipment
CN113035200A (en)Voice recognition error correction method, device and equipment based on human-computer interaction scene
CN111368145A (en)Knowledge graph creating method and system and terminal equipment
CN116311312A (en)Training method of visual question-answering model and visual question-answering method
KR20130014473A (en)Speech recognition system and method based on location information
CN114241279B (en) Image-text joint error correction method, device, storage medium, and computer equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp