CN106293600A

Movatterモバイル変換

Info

Publication number: CN106293600A
Application number: CN201610641425.XA
Authority: CN
Inventors: 张瀚林
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2016-08-05
Filing date: 2016-08-05
Publication date: 2017-01-04

Abstract

Translated fromChinese

本发明公开了一种语音控制方法，用于控制APP，该方法包括：A、根据用户对APP界面控件的操作，拦截到每一操作对应的动作，以及该动作发生在APP界面上的坐标位置；B、为每个操作的对应动作，以及该动作发生在APP界面上的坐标位置建立唯一对应的语音识别标签，形成标签记录；C、根据用户朗读的语音识别标签内容，查找到该语音识别标签对应的动作，以及该动作发生在APP界面上的坐标位置；D、在所述APP界面上的坐标位置执行对应动作。本发明还公开了一种语音控制系统。采用本发明能够对第三方程序中的每一个界面进行控制和操作。

The invention discloses a voice control method for controlling an APP. The method includes: A. According to the user's operation on the APP interface control, intercepting the action corresponding to each operation, and the coordinate position where the action occurs on the APP interface ; B. Establish a unique corresponding voice recognition tag for the corresponding action of each operation and the coordinate position of the action on the APP interface to form a tag record; C. Find the voice recognition tag based on the content of the voice recognition tag read by the user The action corresponding to the label, and the coordinate position where the action occurs on the APP interface; D. Execute the corresponding action at the coordinate position on the APP interface. The invention also discloses a voice control system. The invention can control and operate each interface in the third-party program.

Description

Translated fromChinese

一种语音控制方法及系统Voice control method and system

技术领域technical field

本发明涉及计算机领域，特别涉及一种语音控制方法及系统。The invention relates to the field of computers, in particular to a voice control method and system.

背景技术Background technique

语音助手能够为我们带来很多便捷。我们可以通过语音助手，利用语音控制打开系统安装的计算机应用程序(APP)。Voice assistants can bring us a lot of convenience. We can use voice control to open the computer application program (APP) installed in the system through the voice assistant.

目前，流行的大部分厂商所支持的语音助手软件，不能很好的支持第三方软件的语音控制操作，只能做一些简单的第三方应用程序打开的操作。而不能对第三方程序中的每一个界面进行控制和操作。另外，也有一些发明，采用提取界面元素标签进行保存到运行时库，语音识别的时候匹配标签进行预定义动作的操作，这种发明一方面需要提取界面元素标签，另外一方面需要预定义基本操作。在某些界面元素比较接近或者相同的情况下，容易造成不同的界面元素对应相同标签的情况；在某些界面元素不存在或者是非文字标签的时候会造成无法提取界面元素标签的情况。而且这种发明需要预定义基本操作的动作，因此，其只能执行预定义的动作。At present, the voice assistant software supported by most of the popular manufacturers cannot well support the voice control operation of third-party software, and can only do some simple operations of opening third-party applications. Instead, it is impossible to control and operate every interface in the third-party program. In addition, there are also some inventions that use the extraction of interface element tags to save to the runtime library, and match the tags to perform predefined actions during speech recognition. On the one hand, this kind of invention needs to extract interface element tags, and on the other hand, it needs to pre-define basic operations . In the case that some interface elements are relatively close or identical, it is easy to cause different interface elements to correspond to the same label; when some interface elements do not exist or are non-text labels, the interface element label cannot be extracted. And this kind of invention needs to pre-define the action of basic operation, therefore, it can only perform the pre-defined action.

发明内容Contents of the invention

本发明的目的在于提供一种语音控制方法及系统，能够对第三方程序中的每一个界面进行控制和操作。The purpose of the present invention is to provide a voice control method and system capable of controlling and operating each interface in a third-party program.

为实现上述发明目的，本发明提供了一种语音控制方法，用于控制计算机应用程序APP，该方法包括：In order to achieve the purpose of the above invention, the present invention provides a voice control method for controlling a computer application program APP, the method comprising:

A、根据用户对APP界面控件的操作，拦截到每一操作对应的动作，以及该动作发生在APP界面上的坐标位置；A. According to the user's operation on the APP interface control, intercept the action corresponding to each operation, and the coordinate position where the action occurs on the APP interface;

B、为每个操作的对应动作，以及该动作发生在APP界面上的坐标位置建立唯一对应的语音识别标签，形成标签记录；B. Establish a unique corresponding voice recognition tag for the corresponding action of each operation and the coordinate position of the action on the APP interface to form a tag record;

C、根据用户朗读的语音识别标签内容，查找到该语音识别标签对应的动作，以及该动作发生在APP界面上的坐标位置；C. According to the content of the speech recognition tag read by the user, find out the action corresponding to the speech recognition tag, and the coordinate position where the action occurs on the APP interface;

D、在所述APP界面上的坐标位置执行对应动作。D. Execute the corresponding action at the coordinate position on the APP interface.

为实现上述发明目的，本发明还提供了一种语音控制系统，用于控制计算机应用程序APP，该系统包括：In order to achieve the purpose of the above invention, the present invention also provides a voice control system for controlling the computer application program APP, the system comprising:

拦截模块，根据用户对APP界面控件的操作，拦截到每一操作对应的动作，以及该动作发生在APP界面上的坐标位置；The interception module, according to the user's operation on the APP interface control, intercepts the action corresponding to each operation, and the coordinate position where the action occurs on the APP interface;

标签识别模块，为每个操作的对应动作，以及该动作发生在APP界面上的坐标位置建立唯一对应的语音识别标签，形成标签记录；根据用户朗读的语音识别标签内容，查找到该语音识别标签对应的动作，以及该动作发生在APP界面上的坐标位置；The label recognition module establishes a unique corresponding voice recognition label for the corresponding action of each operation and the coordinate position of the action on the APP interface, forming a label record; find the voice recognition label according to the content of the voice recognition label read by the user The corresponding action, and the coordinate position where the action occurs on the APP interface;

动作控制模块，在所述APP界面上的坐标位置执行对应动作。The action control module executes corresponding actions at the coordinate positions on the APP interface.

综上所述，本发明实施例提供的语言控制方法及装置，在拦截动作的同时，利用语音识别技术自定义每个动作的语音识别标签。通过这种方式，系统不仅可以捕捉用户的每一个操作动作和该动作发生在屏幕上的位置。而且，由于是自定义的语音识别标签，因此，可以完全避免标签相同或者不能获取标签的情况发生。另外，本发明通过动作捕捉的方式，而不是利用截图识别可操作区域的方式，因此，不需要存储大量的图片，也不需要利用图像识别技术去识别每一张图片的可操作区域。因此，不存在占用存储器空间，降低系统执行效率和浪费电能的情况。To sum up, the language control method and device provided by the embodiments of the present invention use voice recognition technology to customize the voice recognition tag of each action while intercepting actions. In this way, the system can not only capture every operation action of the user and the position on the screen where the action occurs. Moreover, since it is a custom speech recognition tag, it is possible to completely avoid the occurrence of the same tag or the inability to obtain the tag. In addition, the present invention uses motion capture instead of using screenshots to identify operable areas. Therefore, it does not need to store a large number of pictures, nor does it need to use image recognition technology to identify the operable area of each picture. Therefore, there is no situation of occupying memory space, reducing system execution efficiency and wasting electric energy.

附图说明Description of drawings

图1为本发明优选实施例语音控制方法的流程示意图。FIG. 1 is a schematic flowchart of a voice control method in a preferred embodiment of the present invention.

图2为本发明实施例语音控制系统的结构示意图。FIG. 2 is a schematic structural diagram of a voice control system according to an embodiment of the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案及优点更加清楚明白，以下参照附图并举实施例，对本发明所述方案作进一步地详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the solutions of the present invention will be further described in detail below with reference to the accompanying drawings and examples.

本发明的语音控制方法主要包含两个阶段，第一个阶段为语音识别标签的生成阶段，第二个阶段为语音识别控制阶段。在第一个阶段，用户打开语音助手软件，并且利用语音助手软件打开一个第三方APP。然后，在第三方APP的操作界面上，对界面控件进行操作，而语音助手软件在后台运行，捕捉并拦截和记录用户的每一步操作的动作(例如：点击按钮)以及该动作发生在屏幕上的坐标位置(X，Y)。然后，用户为该动作自定义一个语音识别标签，利用语音识别获得的文字作为该语音识别标签的内容存储到数据库中。从而完成一个语音识别标签的制作过程。在第二个阶段，用户利用存储在数据库中的标签记录，将对应的语音识别标签内容显示在第三方APP视图的对应可操作控件元素周围的某个适当的坐标位置上，当用户朗读某一界面控件上对应的语音识别标签内容，利用语音识别得到对应的文字标签，然后到数据库中匹配该标签，从而获得该标签在屏幕上对应的动作和该动作在屏幕上发生的位置。在获取了上述信息后，语音助手软件指挥系统自动到屏幕上的坐标位置上进行相关联的动作的操作。从而达到语音控制第三方APP的目的。The speech control method of the present invention mainly includes two stages, the first stage is the generation stage of the speech recognition label, and the second stage is the speech recognition control stage. In the first stage, the user opens the voice assistant software, and uses the voice assistant software to open a third-party APP. Then, on the operation interface of the third-party APP, the interface controls are operated, while the voice assistant software runs in the background, captures, intercepts and records every step of the user's operation (for example: clicking a button) and the action occurs on the screen The coordinate position (X, Y) of . Then, the user defines a speech recognition tag for the action, and the text obtained by speech recognition is stored in the database as the content of the speech recognition tag. Thereby, the production process of a speech recognition label is completed. In the second stage, the user uses the label records stored in the database to display the corresponding voice recognition label content at an appropriate coordinate position around the corresponding operable control element in the third-party APP view. The content of the corresponding voice recognition label on the interface control, use the voice recognition to obtain the corresponding text label, and then match the label in the database, so as to obtain the corresponding action of the label on the screen and the position where the action occurs on the screen. After acquiring the above information, the voice assistant software directs the system to automatically go to the coordinate position on the screen to perform the operation of the associated action. So as to achieve the purpose of voice control third-party APP.

图1为本发明优选实施例语音控制方法的流程示意图，如图1所示，包括以下步骤：Fig. 1 is a schematic flow chart of a voice control method in a preferred embodiment of the present invention, as shown in Fig. 1, comprising the following steps:

A1、获取APP名称以及APP界面控件所在当页语音识别标签页码，加入到标签记录中；A1. Obtain the APP name and the voice recognition label page number of the current page where the APP interface controls are located, and add them to the label record;

其中，APP界面控件指可在窗体上放置的可视化图形“元件”，如按钮、文件编辑框等。其中大多数是具有执行功能或通过“事件”引发代码运行并完成响应的功能。Among them, the APP interface control refers to a visual graphic "component" that can be placed on the form, such as a button, a file editing box, and the like. Most of these are functions that have execution functions or "events" that cause code to run and complete in response.

B1、根据该动作发生在APP界面上的坐标位置计算所述标签记录显示在APP界面上的坐标位置，并将所述标签记录显示在APP界面上的坐标位置加入到标签记录中；B1. Calculate the coordinate position of the label record displayed on the APP interface according to the coordinate position of the action occurring on the APP interface, and add the coordinate position of the label record displayed on the APP interface to the label record;

C1、根据APP名称以及当页语音识别标签页码，查找到与所述APP名称以及语音识别标签页码相匹配的所有标签记录，将每一条标签记录显示在APP界面的相应坐标位置上；C1. According to the APP name and the current page voice recognition label page number, find all label records matching the APP name and voice recognition label page number, and display each label record on the corresponding coordinate position of the APP interface;

从而完成本发明的语音控制方法。其中，第一个阶段包括步骤A1、A、B和B1，为语音识别标签的生成阶段，第二个阶段包括步骤C1、C和D为语音识别控制阶段。需要说明的是，本发明优选实施例在语音识别标签中加入了语音识别标签页码，语音识别标签页码与每一页APP界面相对应。在语音识别标签相同的情况下，可以通过语音识别标签页码来区分不同的标签记录对应的动作及动作的发生位置。反过来说，如果自定义语音识别标签时使得各个语音识别标签名称各不相同，每一个语音识别标签的名称唯一对应一个动作及动作的发生位置，那么就不需要设置语音识别标签页码。Thus, the voice control method of the present invention is completed. Among them, the first stage includes steps A1, A, B and B1, which is the generation stage of voice recognition tags, and the second stage includes steps C1, C and D, which is the speech recognition control stage. It should be noted that, in the preferred embodiment of the present invention, a voice recognition label page number is added to the voice recognition label, and the voice recognition label page number corresponds to each APP interface. In the case of the same voice recognition label, the actions corresponding to different label records and the location where the action occurs can be distinguished by the page number of the voice recognition label. Conversely, if you customize the speech recognition tags so that the names of each speech recognition tag are different, and the name of each speech recognition tag uniquely corresponds to an action and the location where the action occurs, then you do not need to set the speech recognition tag page number.

进一步地，在执行步骤B1之后，该方法还包括：步骤B2、根据用户对APP界面控件的操作所跳转到的下一页APP界面，获取下一页语音识别标签页码，将所述下一页语音识别标签页码加入到当条标签记录中，并且将所述下一页语音识别标签页码加入到新的标签记录中，然后重复执行步骤A1、A、B和B1，形成与下一页语音识别标签页码相匹配的所有标签记录。Further, after step B1 is executed, the method also includes: step B2, according to the next page APP interface jumped to by the user's operation on the APP interface control, obtaining the next page voice recognition label page number, and converting the next page The page number of the voice recognition label is added to the current label record, and the next page of voice recognition label page number is added to the new label record, and then steps A1, A, B and B1 are repeated to form the next page of voice Identifies all label records that match the label page number.

在执行步骤C时，还包括查找当前语音识别标签是否包含下一页语音识别标签页码，如果包含，则，在执行步骤D之后，进入该下一页语音识别标签页码，然后重复执行步骤C1、C和D，在下一页语音识别标签页码所对应的APP界面上完成动作的执行。When executing step C, it also includes finding whether the current voice recognition label contains the next page of voice recognition label page number, if yes, then, after step D is performed, enter the next page of voice recognition label page number, and then repeat steps C1, C and D, complete the execution of the action on the APP interface corresponding to the voice recognition tab page number on the next page.

基于同样的发明构思，本发明提供一种语音控制系统，用于控制APP，如图2所示，该系统包括：Based on the same inventive concept, the present invention provides a voice control system for controlling APP, as shown in Figure 2, the system includes:

拦截模块201，根据用户对APP界面控件的操作，拦截到每一操作对应的动作，以及该动作发生在APP界面上的坐标位置；The interception module 201, according to the user's operation on the APP interface control, intercepts the action corresponding to each operation, and the coordinate position where the action occurs on the APP interface;

标签识别模块202，为每个操作的对应动作，以及该动作发生在APP界面上的坐标位置建立唯一对应的语音识别标签，形成标签记录；根据用户朗读的语音识别标签内容，查找到该语音识别标签对应的动作，以及该动作发生在APP界面上的坐标位置；The label recognition module 202 establishes a unique corresponding voice recognition label for the corresponding action of each operation and the coordinate position of the action on the APP interface, forming a label record; according to the content of the voice recognition label read by the user, the voice recognition label is found. The action corresponding to the label, and the coordinate position where the action occurs on the APP interface;

动作控制模块203，在所述APP界面上的坐标位置执行对应动作。The action control module 203 executes a corresponding action at the coordinate position on the APP interface.

所述标签识别模块202，还用于在拦截模块根据用户对APP界面控件的操作，拦截到每一操作对应的动作，以及该动作发生在APP界面上的坐标位置之前，获取APP名称以及APP界面控件所在当页语音识别标签页码，加入到标签记录中；The label identification module 202 is also used to obtain the APP name and the APP interface before the interception module intercepts the action corresponding to each operation according to the user's operation on the APP interface control, and the coordinate position where the action occurs on the APP interface The page number of the voice recognition label on the current page where the control is located, and add it to the label record;

所述标签识别模块202，还用于在为每个操作的对应动作，以及该动作发生在APP界面上的坐标位置建立唯一对应的语音识别标签，形成标签记录之后，根据该动作发生在APP界面上的坐标位置计算所述标签记录显示在APP界面上的坐标位置，并将所述标签记录显示在APP界面上的坐标位置加入到标签记录中；The label recognition module 202 is also used to establish a unique corresponding voice recognition label for the corresponding action of each operation and the coordinate position where the action occurs on the APP interface, and after forming a label record, according to the action occurred on the APP interface Calculate the coordinate position of the label record displayed on the APP interface by calculating the coordinate position on the APP interface, and add the coordinate position of the label record displayed on the APP interface to the label record;

所述标签识别模块202，还用于在根据用户朗读的语音识别标签内容，查找到该语音识别标签对应的动作，以及该动作发生在APP界面上的坐标位置之前，根据APP名称以及当页语音识别标签页码，查找到与所述APP名称以及语音识别标签页码相匹配的所有标签记录，将每一条标签记录显示在APP界面的相应坐标位置上。The label recognition module 202 is also used to search for the action corresponding to the voice recognition label according to the content of the voice recognition label read by the user, and before the coordinate position where the action occurs on the APP interface, according to the name of the APP and the current page voice Identify the label page number, find all the label records matching the APP name and the voice recognition label page number, and display each label record on the corresponding coordinate position of the APP interface.

所述标签识别模块202，还用于在根据该动作发生在APP界面上的坐标位置计算所述标签记录显示在APP界面上的坐标位置，并将所述标签记录显示在APP界面上的坐标位置加入到标签记录中之后，根据用户对APP界面控件的操作所跳转到的下一页APP界面，获取下一页语音识别标签页码，将所述下一页语音识别标签页码加入到当条标签记录中，并且将所述下一页语音识别标签页码加入到新的标签记录中。The label recognition module 202 is also used to calculate the coordinate position of the label record displayed on the APP interface according to the coordinate position of the action occurring on the APP interface, and display the coordinate position of the label record on the APP interface After being added to the label record, according to the next page of the APP interface that the user jumps to through the operation of the APP interface control, the page number of the next page of voice recognition label is obtained, and the page number of the next page of voice recognition label is added to the current label record, and add the next page voice recognition label page number to the new label record.

所述标签识别模块202，还用于根据用户朗读的语音识别标签内容，查找到该语音识别标签对应的动作，以及该动作发生在APP界面上的坐标位置时，查找当前语音识别标签是否包含下一页语音识别标签页码，如果包含，则，在动作控制模块203，在所述APP界面上的坐标位置执行对应动作之后，进入该下一页语音识别标签页码。The label recognition module 202 is also used to find out the action corresponding to the voice recognition label according to the content of the voice recognition label read by the user, and when the action occurs at the coordinate position on the APP interface, find out whether the current voice recognition label contains the following If the voice recognition tab page number of a page contains, then, in the action control module 203, after performing the corresponding action at the coordinate position on the APP interface, enter the next page voice recognition tab page number.

所述系统还包括语音识别模块204，接收用户朗读的语音识别标签，并转化为文字的语音识别标签，发送给标签识别模块202，为每个操作的对应动作，以及该动作发生在APP界面上的坐标位置建立唯一对应的语音识别标签。The system also includes a voice recognition module 204, which receives the voice recognition label read by the user, and converts it into a text voice recognition label, and sends it to the label recognition module 202 for the corresponding action of each operation, and the action occurs on the APP interface The coordinate position of the unique corresponding speech recognition tag is established.

为清楚说明本发明，下面分阶段进行分析说明。本发明的语音控制方法要实现对第三方APP的控制。In order to clearly illustrate the present invention, the analysis and description will be carried out in stages below. The voice control method of the present invention is to realize the control of the third-party APP.

第一阶段：语音识别标签的生成阶段Phase 1: Generation of Speech Recognition Tags

(1)在需要打开APP_XXX时，用户在打开语音控制系统的基础上，朗读打开APP_XXX；(1) When APP_XXX needs to be opened, the user reads and opens APP_XXX on the basis of opening the voice control system;

(2)语音识别模块识别语音，打开APP_XXX。默认打开APP_XXX第1页界面；(2) The speech recognition module recognizes speech and opens APP_XXX. Open APP_XXX page 1 interface by default;

(3)标签识别模块获取到APP名称“APP_XXX”，还获取到与第1页界面对应的语音识别标签页码1，并且加入到标签记录中；(3) The label recognition module obtains the APP name "APP_XXX", and also obtains the voice recognition label page number 1 corresponding to the page 1 interface, and adds it to the label record;

(4)语音控制系统弹出一个POP UP让用户选择是否需要录制语音识别标签，用户选择录制语音识别标签。(4) The voice control system pops up a POP UP to allow the user to choose whether to record the voice recognition tag, and the user chooses to record the voice recognition tag.

(5)当用户操作第1页界面上的某一APP界面控件时，假设该APP界面控件为按钮，则，用户点击该按钮，此时该按钮的点击事件被拦截模块拦截，得到该点击动作(Click)和该点击动作发生在第1页界面上的坐标位置(X0，Y0)，将其发送给标签识别模块，加入到标签记录中；(5) When the user operates an APP interface control on the first page interface, assuming that the APP interface control is a button, the user clicks the button, and the click event of the button is intercepted by the interception module at this time, and the click action is obtained (Click) and the coordinate position (X0, Y0) where the click action occurs on the first page interface, send it to the label identification module, and add it to the label record;

(6)同时，启动语音识别模块，用户朗读一个自定义语音识别标签“Button1”，语音识别模块在识别朗读的“Button1”后，生成文字的语音识别标签““Button1”，将其发送给标签识别模块加入到标签记录中，建立“Button1”与“Click”和(X0，Y0)之间的唯一对应关系。(6) At the same time, start the speech recognition module, and the user reads a custom speech recognition label "Button1". After recognizing the read "Button1", the speech recognition module generates a text speech recognition label "Button1" and sends it to the label The identification module is added to the label record to establish a unique correspondence between "Button1" and "Click" and (X0, Y0).

另外，标签识别模块根据点击坐标(X0，Y0)计算出标签记录的显示位置(x0，y0)，加入到标签记录中。(x0，y0)一般显示在(X0，Y0)的周围，以便于用户清楚地将语音识别标签与标签记录一一对应上。In addition, the label recognition module calculates the display position (x0, y0) of the label record according to the click coordinates (X0, Y0), and adds it to the label record. (x0, y0) is generally displayed around (X0, Y0), so that the user can clearly associate the voice recognition tags with the tag records one by one.

上述生成的标签记录如表1所示：The label records generated above are shown in Table 1:

表1Table 1

(7)在生成标签记录之后，继续执行“Button1”的点击事件，跳转页面到第2页界面；(7) After the label record is generated, continue to execute the click event of "Button1", and jump to the page 2 interface;

(8)用户朗读第2页，语音识别模块在识别朗读的“第2页”后，发送给标签识别模块，标签识别模块获取到与第2页界面对应的语音识别标签页码2，将该语音识别标签页码2追加到表1的标签记录中，作为当页语音识别标签页码要跳转的下一页语音识别标签页码。如表1’所示。并且，新建标签记录，将该语音识别标签页码2加入到新的标签记录中。(8) The user reads the second page, and the voice recognition module sends it to the label recognition module after recognizing the read "page 2", and the label recognition module obtains the voice recognition label page number 2 corresponding to the page 2 interface, and uses Recognition tag page number 2 is added to the tag record in Table 1 as the next speech recognition tag page number to jump to from the current speech recognition tag page number. As shown in Table 1. And, create a new tag record, and add the speech recognition tag page number 2 to the new tag record.

表1’Table 1'

接下来，同理，与第1页界面的语音识别标签“Button1”的生成步骤一样，生成第2页界面的语音识别标签。Next, in the same way, the voice recognition label of the page 2 interface is generated in the same way as the voice recognition label "Button1" of the page 1 interface.

(9)语音控制系统弹出一个POP UP让用户选择是否需要录制语音识别标签，用户选择录制语音识别标签。(9) The voice control system pops up a POP UP to allow the user to choose whether to record the voice recognition tag, and the user chooses to record the voice recognition tag.

(10)当用户点击第2页界面上的某一按钮时，此时该按钮的点击事件被拦截模块拦截，得到该点击动作(Click)和该点击动作发生在第2页界面上的坐标位置(X1，Y1)，将其发送给标签识别模块，加入到新的标签记录中；(10) When the user clicks a button on the second page interface, the click event of the button is intercepted by the interception module at this time, and the click action (Click) and the coordinate position where the click action occurs on the second page interface are obtained (X1, Y1), send it to the label identification module, and add it to the new label record;

同时，启动语音识别模块，用户朗读一个自定义语音识别标签“Button1”，语音识别模块在识别朗读的“Button1”后，生成文字的语音识别标签“Button1”，将其发送给标签识别模块加入到标签记录中，建立“Button1”与“Click”和(X1，Y1)之间的唯一对应关系。At the same time, start the voice recognition module, the user reads a custom voice recognition label "Button1", the voice recognition module generates a text voice recognition label "Button1" after recognizing the read "Button1", and sends it to the label recognition module to add to In the label record, a unique correspondence between "Button1" and "Click" and (X1, Y1) is established.

另外，标签识别模块根据点击坐标(X1，Y1)计算出标签记录的显示位置(x1，y1)，加入到新的标签记录中。(x1，y1)一般显示在(X1，Y1)的周围，以便于将语音识别标签与标签记录一一对应上。In addition, the label recognition module calculates the display position (x1, y1) of the label record according to the click coordinates (X1, Y1), and adds it to the new label record. (x1, y1) is generally displayed around (X1, Y1), so as to make a one-to-one correspondence between voice recognition tags and tag records.

上述生成的标签记录如表2所示：The label records generated above are shown in Table 2:

表2Table 2

(11)当用户点击第2页界面上的另一按钮时，此时该按钮的点击事件被拦截模块拦截，得到该点击动作(Click)和该点击动作发生在第2页界面上的坐标位置(X2，Y2)，将其发送给标签识别模块，加入到新的标签记录中；(11) When the user clicks another button on the second page interface, the click event of the button is intercepted by the interception module at this time, and the click action (Click) and the coordinate position where the click action occurs on the second page interface are obtained (X2, Y2), send it to the tag identification module, and add it to the new tag record;

同时，启动语音识别模块，用户朗读一个自定义语音识别标签“Button2”，语音识别模块在识别朗读的“Button2”后，生成文字的语音识别标签“Button2”，将其发送给标签识别模块加入到标签记录中，建立“Button2”与“Click”和(X2，Y2)之间的唯一对应关系。At the same time, start the speech recognition module, the user reads a custom speech recognition label "Button2", and the speech recognition module generates a text speech recognition label "Button2" after recognizing the read "Button2", and sends it to the label recognition module to add to In the label record, a unique correspondence between "Button2" and "Click" and (X2, Y2) is established.

另外，标签识别模块根据点击坐标(X2，Y2)计算出标签记录的显示位置(x2，y2)，加入到新的标签记录中。(x2，y2)一般显示在(X2，Y2)的周围，以便于将语音识别标签与标签记录一一对应上。In addition, the label recognition module calculates the display position (x2, y2) of the label record according to the click coordinates (X2, Y2), and adds it to the new label record. (x2, y2) is generally displayed around (X2, Y2), so as to make a one-to-one correspondence between voice recognition tags and tag records.

上述生成的标签记录如表3所示：The label records generated above are shown in Table 3:

表3table 3

根据上述描述，以此类推，拦截在第三方APP的每一界面上的进行的操作动作，生成对应有语音识别标签的标签记录。According to the above description, and so on, intercept the operation actions performed on each interface of the third-party APP, and generate a tag record corresponding to the voice recognition tag.

第二个阶段：语音识别控制阶段The second stage: speech recognition control stage

(3)标签识别模块获取到APP名称“APP_XXX”，还获取到与第1页界面对应的语音识别标签页码1，(3) The label recognition module obtains the APP name "APP_XXX", and also obtains the voice recognition label page number 1 corresponding to the page 1 interface,

(4)标签识别模块根据APP名称“APP_XXX”以及当页语音识别标签页码1，查找到与“APP_XXX”以及语音识别标签页码1相匹配的所有标签记录，根据表1’，匹配一条标签记录，因此，将该条标签记录显示在第1页界面的坐标位置(x0，y0)上。(4) The label recognition module finds all label records matching "APP_XXX" and the voice recognition label page number 1 according to the APP name "APP_XXX" and the current page voice recognition label page number 1, and matches a label record according to Table 1', Therefore, the label record is displayed on the coordinate position (x0, y0) of the interface on the first page.

(5)用户朗读标签记录上的语音识别标签“Button1”，语音识别模块在识别朗读的“Button1”后，生成文字的语音识别标签“Button1”，将其发送给标签识别模块，标签识别模块根据语音识别标签“Button1”，查找到该语音识别标签对应的动作“Click”以及该动作发生在第1页界面上的坐标位置(X1，Y1)。(5) The user reads the voice recognition label "Button1" on the label record, and the voice recognition module generates the text voice recognition label "Button1" after recognizing the read "Button1", and sends it to the label recognition module. For the voice recognition label "Button1", find the action "Click" corresponding to the voice recognition label and the coordinate position (X1, Y1) where the action occurs on the first page interface.

(6)标签识别模块将该语音识别标签对应的动作“Click”以及该动作发生在第1页界面上的坐标位置(X1，Y1)传递给动作控制模块，动作控制模块在位置(X1，Y1)执行点击按钮“Button1”的操作。(6) The label recognition module transmits the action "Click" corresponding to the voice recognition label and the coordinate position (X1, Y1) where the action occurs on the first page interface to the action control module, and the action control module is at the position (X1, Y1 ) to perform the operation of clicking the button "Button1".

(7)动作控制模块执行点击按钮“Button1”的操作后，跳转页面到第2页界面。(7) After the action control module executes the operation of clicking the button "Button1", it jumps to the page 2 interface.

(8)由于标签识别模块查询表1’的标签记录，下一页面是第2页界面，则将第2页界面所对应的标签记录查找出来，包括表2和表3的标签记录。(8) Since the tag identification module query table 1' is tag record, the next page is the 2nd page interface, then the tag record corresponding to the 2nd page interface is found, including the tag records of Table 2 and Table 3.

接下来，同理，控制第2页界面控件的步骤与控制第1页界面控件的步骤一样。Next, similarly, the steps for controlling the interface controls on the second page are the same as the steps for controlling the interface controls on the first page.

(9)从表2和表3的标签记录可以看出，第2页界面上有两个语音识别标签，“Button1”和“Button2”。用户选择朗读标签记录上的语音识别标签“Button2”，语音识别模块在识别朗读的“Button2”后，生成文字的语音识别标签“Button2”，将其发送给标签识别模块，标签识别模块根据语音识别标签“Button2”，查找到该语音识别标签对应的动作“Click”以及该动作发生在第2页界面上的坐标位置(X2，Y2)。(9) From the label records in Table 2 and Table 3, it can be seen that there are two voice recognition labels on the interface on page 2, "Button1" and "Button2". The user chooses to read the voice recognition label "Button2" on the label record, and the voice recognition module generates the text voice recognition label "Button2" after recognizing the read "Button2", and sends it to the label recognition module. For the label "Button2", find the action "Click" corresponding to the voice recognition label and the coordinate position (X2, Y2) where this action occurs on the second page interface.

标签识别模块将该语音识别标签对应的动作“Click”以及该动作发生在第2页界面上的坐标位置(X2，Y2)传递给动作控制模块，动作控制模块在位置(X2，Y2)执行点击按钮“Button2”的操作。The label recognition module transmits the action "Click" corresponding to the voice recognition label and the coordinate position (X2, Y2) where the action occurs on the second page interface to the action control module, and the action control module executes the click at the position (X2, Y2) Action of button "Button2".

根据上述描述，以此类推，通过语音自动完成对第三方APP的每一界面上控件的控制。According to the above description, and so on, the control of each interface control of the third-party APP is automatically completed by voice.

以上实施例中所列举的界面操作过程，只是一个应用场景的举例，在该实施例中将语音操作过程中的各个步骤都进行一步一步的细化操作，当然在此过程中也可以进行简化，例如：可以简化为语音朗读符合一定的语法规则“Page xx，Button xx，Next pagexx”等，将多个朗读步骤合一为一个朗读步骤。这些均可以自己定义。另外，是否有弹出式按钮，或者其他控制语音标签录制的方法，都可以自定义。The interface operation process listed in the above embodiment is just an example of an application scenario. In this embodiment, each step in the voice operation process is refined step by step. Of course, this process can also be simplified. For example: it can be simplified as voice reading conforming to certain grammatical rules "Page xx, Button xx, Next pagexx", etc., combining multiple reading steps into one reading step. These can be defined by yourself. Also, whether there is a pop-up button, or some other method of controlling voice tag recording, can be customized.

本发明的方法对系统中的触摸屏和按键事件进行全局的监控，一旦发现用户有操控终端的行为，将系统该行为进行拦截，进行自定义标签的生成，从而将自定义标签与该动作进行绑定。存储在数据库中，而在系统中界面控件位置不发生变化的时候，这种录制生成自定义标签的行为只需要发生一次。以后每次使用该APP都可以使用语音控制。如果界面控件的位置发生变化，则需要重新录制自定义标签。The method of the present invention monitors the touch screen and button events in the system globally. Once the user is found to manipulate the terminal, the behavior of the system is intercepted, and a custom label is generated, so that the custom label is bound to the action. Certainly. Stored in the database, and when the position of the interface control in the system does not change, this behavior of recording and generating a custom label only needs to happen once. Every time you use the APP in the future, you can use voice control. If the position of the interface control changes, you need to re-record the custom label.

本发明的语音控制方法及装置，可以适用于各种移动终端和PC。主要应用于语音识别和语音控制方面。可以使用本发明进行APP的语音控制操作。从而达到解放双手，更加智能化的目的，尤其适用于像智能手表这样操作界面比较小，操作不方便的智能设备。The voice control method and device of the present invention can be applied to various mobile terminals and PCs. Mainly used in speech recognition and voice control. The voice control operation of APP can be performed by using the present invention. In order to achieve the purpose of freeing hands and making it more intelligent, it is especially suitable for smart devices such as smart watches with relatively small operation interfaces and inconvenient operations.

本发明的有益效果在于，The beneficial effect of the present invention is that,

一、可以拦截和捕捉并且记录用户对触摸屏和按键的所有操作，以及该操作所发生的屏幕坐标位置。并且与一个自定义的语音识别标签绑定，在进行语音控制的时候，按照语音朗读该自定义标签，并且到数据库查询该标签绑定的操作，然后直接指挥系统自动到屏幕的该坐标位置上进行相关的操作，从而无需手动的去操作屏幕或者按键。达到语音控制的目的。1. It can intercept, capture and record all user operations on the touch screen and keys, as well as the screen coordinates where the operations occur. And it is bound to a custom voice recognition tag. When performing voice control, read the custom tag according to the voice, and query the operation of the tag binding in the database, and then directly command the system to automatically go to the coordinate position of the screen Perform related operations, so that there is no need to manually operate the screen or keys. To achieve the purpose of voice control.

二、在录制生成自定义标签以后，其动作和标签进行了绑定。因此在使用语音控制过程中，只要朗读自定义标签内容，即可根据标签查询到对应的动作。然后该动作的发生并不是用户手工操作，或者其他物理方法进行操作，而是通知系统，由系统进行自动操作，例如，点击触摸屏幕的某个位置等。2. After recording and generating a custom label, its action is bound to the label. Therefore, in the process of using voice control, as long as you read the content of the custom label, you can query the corresponding action according to the label. Then the action is not manually operated by the user or operated by other physical methods, but is notified to the system, and the system performs automatic operation, for example, clicking a certain position on the touch screen.

以上所述，仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.