CN108766438A

Movatterモバイル変換

Info

Publication number: CN108766438A
Application number: CN201810645687.2A
Authority: CN
Inventors: 陈彪
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-06-21
Filing date: 2018-06-21
Publication date: 2018-11-06
Anticipated expiration: 2038-06-21
Also published as: CN108766438B

Abstract

Translated fromChinese

本申请实施例公开了人机交互方法、装置、存储介质及智能终端。该方法包括：检测到第一语音信号时，对所述第一语音信号对应的第一声源进行定位；若所述第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端；若检测到人眼对准终端，则启动人机交互模式并响应所述第一语音信号对应的语音指令。本申请实施例通过采用上述技术方案，智能终端可以在检测到第一语音信号时，通过第一声源的定位判断是否启动摄像头检测人眼与终端的关系，当检测到人眼与终端对准时，方可启动人机交互模式并响应相关语音指令，避免因关键词唤醒人机交互模式导致操作繁琐的问题，简化了人机交互的操作，同时也提高了人机交互效率。

The embodiments of the present application disclose a human-computer interaction method, device, storage medium and intelligent terminal. The method includes: when a first voice signal is detected, the first sound source corresponding to the first voice signal is located; if the positioning result of the first sound source meets the preset requirements, the camera is started, and the camera is used to detect whether the human eye is aligned with the terminal; if the human eye is detected to be aligned with the terminal, the human-computer interaction mode is started and the voice command corresponding to the first voice signal is responded. By adopting the above technical solution, the embodiment of the present application can determine whether to start the camera to detect the relationship between the human eye and the terminal by locating the first sound source when the first voice signal is detected. When it is detected that the human eye is aligned with the terminal, the human-computer interaction mode can be started and the relevant voice command can be responded, avoiding the problem of cumbersome operation caused by waking up the human-computer interaction mode with keywords, simplifying the operation of human-computer interaction, and also improving the efficiency of human-computer interaction.

Description

Translated fromChinese

人机交互方法、装置、存储介质及智能终端Human-computer interaction method, device, storage medium and intelligent terminal

技术领域technical field

本发明实施例涉及人工智能技术领域，尤其涉及一种人机交互方法、装置、存储介质及智能终端。The embodiments of the present invention relate to the technical field of artificial intelligence, and in particular, to a human-computer interaction method, device, storage medium, and intelligent terminal.

背景技术Background technique

随着人工智能技术的发展，人机交互逐渐成为多数智能终端的标准配置，用户可以通过与智能终端进行人机交互来对智能终端进行控制。With the development of artificial intelligence technology, human-computer interaction has gradually become the standard configuration of most smart terminals, and users can control the smart terminal through human-computer interaction with the smart terminal.

目前，当用户要使用智能终端的人机交互功能时，需要先通过关键词对智能终端进行唤醒后，方可启动人机交互模式，而在用户与智能终端连续交互的情况下，每次交互前都要输入关键词唤醒智能终端，操作繁琐，导致人机交互过程效率较低，亟需改进。At present, when the user wants to use the human-computer interaction function of the smart terminal, he needs to wake up the smart terminal through a keyword before starting the human-computer interaction mode. In the case of continuous interaction between the user and the smart terminal, each interaction It is necessary to input keywords to wake up the smart terminal before, and the operation is cumbersome, resulting in low efficiency of the human-computer interaction process, which needs to be improved urgently.

发明内容Contents of the invention

本发明实施例提供一种人机交互方法、装置、存储介质及智能终端，不需要使用关键词即可自动唤醒智能终端开启人机交互模式，优化了智能终端的人机交互功能。Embodiments of the present invention provide a human-computer interaction method, device, storage medium, and smart terminal, which can automatically wake up the smart terminal to start the human-computer interaction mode without using keywords, and optimize the human-computer interaction function of the smart terminal.

第一方面，本发明实施例提供了一种人机交互方法，包括：In a first aspect, an embodiment of the present invention provides a method for human-computer interaction, including:

检测到第一语音信号时，对所述第一语音信号对应的第一声源进行定位；When a first voice signal is detected, locating a first sound source corresponding to the first voice signal;

若所述第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端；If the positioning result of the first sound source meets the preset requirements, start the camera, and detect whether the human eyes are aimed at the terminal through the camera;

若检测到人眼对准终端，则启动人机交互模式并响应所述第一语音信号对应的语音指令。If it is detected that the human eyes are aimed at the terminal, start the human-computer interaction mode and respond to the voice instruction corresponding to the first voice signal.

第二方面，本发明实施例提供了一种拍人机交互装置，包括：In the second aspect, an embodiment of the present invention provides a human-computer interaction device for shooting, including:

声源定位模块，用于检测到第一语音信号时，对所述第一语音信号对应的第一声源进行定位；A sound source localization module, configured to locate a first sound source corresponding to the first speech signal when a first speech signal is detected;

人眼对准检测模块，用于若所述第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端；The human eye alignment detection module is used to start the camera if the positioning result of the first sound source meets the preset requirements, and detect whether the human eye is aligned with the terminal through the camera;

人机交互响应模块，用于若检测到人眼对准终端，则启动人机交互模式并响应所述第一语音信号对应的语音指令。The human-computer interaction response module is configured to start the human-computer interaction mode and respond to the voice instruction corresponding to the first voice signal if it is detected that the human eyes are aimed at the terminal.

第三方面，本发明实施例提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如本发明实施例所述的人机交互方法。In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the human-computer interaction method as described in the embodiment of the present invention is implemented.

第四方面，本发明实施例提供了一种智能终端，包括存储器，处理器及存储在存储器上并可在处理器运行的计算机程序，所述处理器执行所述计算机程序时实现如本发明实施例所述的人机交互方法。In a fourth aspect, an embodiment of the present invention provides an intelligent terminal, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the computer program implemented in the present invention is implemented. The human-computer interaction method described in the example.

本发明实施例中提供的人机交互方案，通过检测到第一语音信号时，对第一语音信号对应的第一声源进行定位，若第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端，若检测到人眼对准终端，则启动人机交互模式并响应第一语音信号对应的语音指令。通过采用上述技术方案，智能终端可以在检测到第一语音信号时，通过第一声源的定位判断是否启动摄像头检测人眼与终端的关系，当检测到人眼与终端对准时，方可启动人机交互模式并响应相关语音指令，避免因关键词唤醒人机交互模式导致操作繁琐的问题，简化了人机交互的操作过程，同时也提高了人机交互效率。In the human-computer interaction solution provided in the embodiment of the present invention, when the first voice signal is detected, the first sound source corresponding to the first voice signal is located, and if the positioning result of the first sound source meets the preset requirements, the The camera is used to detect whether the human eyes are aimed at the terminal, and if it is detected that the human eyes are aimed at the terminal, the human-computer interaction mode is started and the voice instruction corresponding to the first voice signal is responded. By adopting the above technical solution, when the smart terminal detects the first voice signal, it can judge whether to start the camera to detect the relationship between the human eye and the terminal through the location of the first sound source. When it detects that the human eye is aligned with the terminal, it can start The human-computer interaction mode responds to relevant voice commands, avoiding the problem of cumbersome operations caused by waking up the human-computer interaction mode due to keywords, simplifying the operation process of human-computer interaction, and improving the efficiency of human-computer interaction.

附图说明Description of drawings

图1为本发明实施例提供的一种人机交互方法的流程示意图；FIG. 1 is a schematic flowchart of a human-computer interaction method provided by an embodiment of the present invention;

图2为本发明实施例提供的另一种人机交互方法的流程示意图；FIG. 2 is a schematic flowchart of another human-computer interaction method provided by an embodiment of the present invention;

图3为本发明实施例提供的又一种人机交互方法的流程示意图；FIG. 3 is a schematic flowchart of another human-computer interaction method provided by an embodiment of the present invention;

图4为本发明实施例提供的又一种人机交互方法的流程示意图；FIG. 4 is a schematic flowchart of another human-computer interaction method provided by an embodiment of the present invention;

图5为本发明实施例提供的一种人机交互装置的结构框图；FIG. 5 is a structural block diagram of a human-computer interaction device provided by an embodiment of the present invention;

图6为本发明实施例提供的一种智能终端的结构示意图；FIG. 6 is a schematic structural diagram of a smart terminal provided by an embodiment of the present invention;

图7为本发明实施例提供的又一种智能终端的结构示意图。FIG. 7 is a schematic structural diagram of another smart terminal provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图并通过具体实施方式来进一步说明本发明的技术方案。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and through specific implementation methods. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

在更加详细地讨论示例性实施例之前应当提到的是，一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各步骤描述成顺序的处理，但是其中的许多步骤可以被并行地、并发地或者同时实施。此外，各步骤的顺序可以被重新安排。当其操作完成时所述处理可以被终止，但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。Before discussing the exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe the steps as sequential processing, many of the steps may be performed in parallel, concurrently, or simultaneously. Additionally, the order of steps may be rearranged. The process may be terminated when its operations are complete, but may also have additional steps not included in the figure. The processing may correspond to a method, function, procedure, subroutine, subroutine, or the like.

图1为本发明实施例提供的一种人机交互方法的流程示意图，该方法可以由人机交互装置执行，其中该装置可由软件和/或硬件实现，一般可集成在智能终端中。如图1所示，该方法包括：FIG. 1 is a schematic flowchart of a human-computer interaction method provided by an embodiment of the present invention. The method can be executed by a human-computer interaction device, wherein the device can be implemented by software and/or hardware, and can generally be integrated into a smart terminal. As shown in Figure 1, the method includes:

步骤101、检测到第一语音信号时，对第一语音信号对应的第一声源进行定位。Step 101. When a first voice signal is detected, locate a first sound source corresponding to the first voice signal.

示例性的，本发明实施例中的智能终端可包括智能音箱、手机、平板电脑以及媒体播放器等具有人机交互功能的终端设备。Exemplarily, the smart terminal in the embodiment of the present invention may include a smart speaker, a mobile phone, a tablet computer, a media player, and other terminal devices with human-computer interaction functions.

示例性的，智能终端可以通过声音传感器或麦克风等检测装置来检测声音信号，并可对声音信号进行检测，在判断出其中包含语音信号时，可认为检测到第一语音信号。第一语音信号可以是任意用户所说的话对应的声音信号，例如可以是用户向智能终端发出指令时的声音对应的音频信号。可选的，智能终端中的检测装置可以处于常开状态，用于实时检测环境中的第一语音信号，避免遗漏用户对智能终端发出语音指令。第一语音信号对应的第一声源可以是产生第一语音信号的对象。例如，若第一语音信号是用户A发出的，则第一声源为用户A，或者说用户A的嘴巴。Exemplarily, the smart terminal may detect the sound signal through a detection device such as a sound sensor or a microphone, and may detect the sound signal. When it is determined that the sound signal is contained therein, it may be considered that the first sound signal is detected. The first voice signal may be a sound signal corresponding to any spoken words of the user, for example, may be an audio signal corresponding to the sound when the user sends an instruction to the smart terminal. Optionally, the detection device in the smart terminal may be in a normally-on state for real-time detection of the first voice signal in the environment, so as to avoid missing the user's voice command to the smart terminal. The first sound source corresponding to the first voice signal may be an object that generates the first voice signal. For example, if the first voice signal is sent by user A, the first sound source is user A, or user A's mouth.

示例性的，对第一语音信号对应的第一声源进行定位，包括确定第一语音信号所对应的声源相距智能终端的距离和方向等信息，从而锁定第一语音信号的来源。具体的，对第一语音信号对应的第一声源进行定位，可以包括：通过声音定位技术，确定第一语音信号对应的第一声源相对于终端的距离与方向。Exemplarily, locating the first sound source corresponding to the first voice signal includes determining information such as the distance and direction from the sound source corresponding to the first voice signal to the smart terminal, so as to lock the source of the first voice signal. Specifically, locating the first sound source corresponding to the first voice signal may include: determining the distance and direction of the first sound source corresponding to the first voice signal relative to the terminal through a sound localization technology.

其中，声音定位技术可以是通过声音刺激确定声源方向和距离的技术，具体的确定方法可包括但不限于波束形成法、高分辨率频谱估计法或时间差法等。在本申请实施例中，检测装置检测到环境中存在第一语音信号时，可以采用麦克风阵列的声音定位技术来对第一语音信号对应的第一声源进行定位，确定第一声源相对于智能终端的距离与方向。Among them, the sound localization technology can be a technology to determine the direction and distance of the sound source through sound stimulation, and the specific determination methods can include but are not limited to beamforming, high-resolution spectrum estimation, or time difference methods. In the embodiment of the present application, when the detection device detects that there is a first voice signal in the environment, the sound localization technology of the microphone array can be used to locate the first sound source corresponding to the first voice signal, and determine the relative position of the first sound source. The distance and direction of the smart terminal.

步骤102、若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端。Step 102: If the positioning result of the first sound source satisfies the preset requirement, start the camera, and use the camera to detect whether the human eyes are aimed at the terminal.

示例性的，第一声源的定位结果包括第一语音信号对应的声源相对于智能终端的位置与方向。预设要求可以是判断用户与智能终端的距离是否满足预设距离阈值，方向是否在摄像头的拍摄范围内。预设距离阈值可以是系统自动设置的能够保证检测到完整声音信息所要求的最远距离，也可以是用户根据自身需求设置的执行人机交互功能时用户和智能终端之间的最远距离。Exemplarily, the positioning result of the first sound source includes the position and direction of the sound source corresponding to the first voice signal relative to the smart terminal. The preset requirement may be to judge whether the distance between the user and the smart terminal satisfies a preset distance threshold, and whether the direction is within the shooting range of the camera. The preset distance threshold can be the farthest distance automatically set by the system to ensure the detection of complete sound information, or it can be the farthest distance between the user and the smart terminal when performing human-computer interaction functions set by the user according to his or her own needs.

示例性的，检测人眼是否对准终端时可以是先通过摄像头检测采集画面中人脸区域，然后再在人脸区域中检测出眼睛所在区域，最后再对检测出的眼睛所在区域进行进一步的识别和判断，确定眼睛是否对准智能终端。具体的，确定眼睛是否对准智能终端可以是看采集图像中人眼区域是否完整且眼球在眼睛区域的中央，若是，则说明人眼对准智能终端。Exemplarily, when detecting whether the human eyes are aimed at the terminal, the face area in the captured picture may be detected by the camera first, and then the area where the eyes are located in the face area is detected, and finally the detected area where the eyes are located is further processed. Identify and judge, and determine whether the eyes are aimed at the smart terminal. Specifically, determining whether the eyes are aimed at the smart terminal may be to check whether the human eye area in the captured image is complete and the eyeball is in the center of the eye area, and if so, it means that the human eye is aimed at the smart terminal.

示例性的，本申请实施例中的摄像头可以是一个固定视场的摄像头，也可以是360°可旋转的摄像头，还可以是由多个固定视场的摄像头构成的摄像头组，对此不进行限定。另外，摄像头可以是集成在智能终端中的部件，也可以是智能终端的外接部件，对此也不做限定。Exemplarily, the camera in the embodiment of the present application may be a camera with a fixed field of view, or a 360° rotatable camera, or a camera group composed of multiple cameras with a fixed field of view. limited. In addition, the camera may be a component integrated in the smart terminal, or an external component of the smart terminal, which is not limited.

本申请实施例中，若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端，可以包括：若第一声源相对于终端的距离小于预设距离阈值，则依据第一声源相对于终端的方向启动摄像头，并通过摄像头检测人眼是否对准终端。In the embodiment of the present application, if the positioning result of the first sound source meets the preset requirements, start the camera, and use the camera to detect whether the human eyes are aimed at the terminal, which may include: if the distance between the first sound source and the terminal is less than the preset The distance threshold is to start the camera according to the direction of the first sound source relative to the terminal, and use the camera to detect whether the human eyes are aimed at the terminal.

具体的，当智能终端的摄像头为一个固定视场的摄像头时，若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端，可以是当第一语音信号相对于终端设备的距离小于预设距离，且第一语音信号对应的声源方向在固定视场的摄像头的拍摄范围内时，启动该固定视场的摄像头，并通过该摄像头采集声源图像，判断人眼是否对准智能终端。Specifically, when the camera of the smart terminal is a camera with a fixed field of view, if the positioning result of the first sound source meets the preset requirements, the camera is started, and the camera is used to detect whether the human eyes are aimed at the terminal. When the distance between the voice signal and the terminal device is less than the preset distance, and the sound source direction corresponding to the first voice signal is within the shooting range of the camera with a fixed field of view, start the camera with a fixed field of view, and collect the sound source through the camera image to determine whether the human eye is aimed at the smart terminal.

当智能终端的摄像头为360°可旋转的摄像头时，若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端，可以是当第一语音信号相对于终端设备的距离小于预设距离时，依据检测到的第一声源相对于终端的方向，启动可旋转摄像头，并将其转动到第一声源所在方向，在第一声源所在方向采集声源图像，判断人眼是否对准智能终端。When the camera of the smart terminal is a 360° rotatable camera, if the positioning result of the first sound source meets the preset requirements, the camera is started, and the camera is used to detect whether the human eyes are aimed at the terminal, which can be when the first voice signal is relatively When the distance of the terminal device is less than the preset distance, according to the detected direction of the first sound source relative to the terminal, start the rotatable camera, and turn it to the direction of the first sound source, and collect data in the direction of the first sound source Sound source image, to determine whether the human eye is aimed at the smart terminal.

当智能终端的摄像头是由多个固定视场的摄像头构成的摄像头组时，若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端，可以是当第一语音信号相对于终端设备的距离小于预设距离时，判断第一声源相对于终端的方向是否在摄像头组的任意一个摄像头的拍摄视场范围内，若在，则启动拍摄视角最佳的摄像头采集声源图像，并判断人眼是否对准智能终端。When the camera of the smart terminal is a camera group composed of multiple cameras with a fixed field of view, if the positioning result of the first sound source meets the preset requirements, the camera is started, and the camera is used to detect whether the human eye is aimed at the terminal, which can be When the distance of the first voice signal relative to the terminal device is less than the preset distance, it is determined whether the direction of the first sound source relative to the terminal is within the shooting field of view of any camera in the camera group, and if it is, start the shooting angle The best camera collects the image of the sound source and judges whether the human eye is aimed at the smart terminal.

步骤103、若检测到人眼对准终端，则启动人机交互模式并响应第一语音信号对应的语音指令。Step 103, if it is detected that the human eyes are aimed at the terminal, start the human-computer interaction mode and respond to the voice instruction corresponding to the first voice signal.

示例性的，人机交互模式可以是用户通过智能终端实现与智能终端之间的对话，例如用户可以与智能终端对话，向其发出语音指令，智能终端会响应该语音指令，如果需要还会将执行结果反馈给用户。语音指令，可以是用户通过语音的形式向智能终端发起的指令，例如，若智能终端为智能音箱，语音指令可以是用户向智能音箱发出的“开启智能音箱，播放默认列表中的音乐”指示智能音箱工作的语音指令。Exemplarily, the human-computer interaction mode may be that the user realizes a dialogue with the intelligent terminal through the intelligent terminal. For example, the user can communicate with the intelligent terminal and send a voice command to it. The execution result is fed back to the user. The voice command can be an command initiated by the user to the smart terminal in the form of voice. For example, if the smart terminal is a smart speaker, the voice command can be "turn on the smart speaker and play the music in the default list" issued by the user to the smart speaker to instruct the smart Voice commands for speaker work.

在本申请实施例中，若检测到人眼对准智能终端，则说明用户在对着智能终端说话，此时说明用户想通过人机交互模式向智能终端发出语音指令。因此，启动智能终端的人机交互模式，并响应第一语音信号对应的语音指令。可选的，为了防止因对声源定位以及人眼是否对准终端的检测，遗漏用户发出的语音指令，可以在检测到第一语音信号，对该第一语音信号对应的第一声源进行定位的同时，获取第一语音信号对应的语音内容。在响应第一语音信号对应的语音指令时，依据语音内容生成语音指令，并响应所述语音指令。In the embodiment of the present application, if it is detected that the human eyes are aimed at the smart terminal, it means that the user is talking to the smart terminal, and at this time, it means that the user wants to issue a voice command to the smart terminal through the human-computer interaction mode. Therefore, start the human-computer interaction mode of the smart terminal, and respond to the voice instruction corresponding to the first voice signal. Optionally, in order to prevent the voice command issued by the user from being missed due to the detection of the sound source location and whether the human eyes are aligned with the terminal, after the first voice signal is detected, the first sound source corresponding to the first voice signal may be detected. At the same time of positioning, the voice content corresponding to the first voice signal is acquired. When responding to the voice instruction corresponding to the first voice signal, generate the voice instruction according to the voice content, and respond to the voice instruction.

具体的，在检测到第一语音信号后，不管其是否是用户发出的语音指令，都先获取第一语音信号对应的语音内容，在确定该语音信号是用户发出的语音指令，即在响应第一语音信号对应的语音指令时，可以再对获取的第一语音信号对应的语音内容进行识别处理，得到该语音内容对应的语音指令，并响应该语音指令。若确定该语音信号不是用户发出的语音指令，则将获取的语音内容删除，避免冗余数据对内存的占用。Specifically, after the first voice signal is detected, regardless of whether it is a voice command issued by the user, the voice content corresponding to the first voice signal is first obtained, and after it is determined that the voice signal is a voice command issued by the user, that is, in response to the voice command issued by the user For a voice command corresponding to a voice signal, the acquired voice content corresponding to the first voice signal may be identified and processed to obtain a voice command corresponding to the voice content, and respond to the voice command. If it is determined that the voice signal is not a voice command issued by the user, the acquired voice content is deleted to avoid memory occupation by redundant data.

需要说明的是，在人机交互过程中，可能出现用户说出的第一句话中并不包含具体的控制指令的情况，此时，识别第一语音信号发现该语音信号并不包含语音指令，则需要继续检测第一声源对应的第二语音信号，识别第二语音信号中包含的语音指令，响应该语音指令。It should be noted that in the process of human-computer interaction, the first sentence spoken by the user may not contain specific control instructions. At this time, the recognition of the first voice signal finds that the voice signal does not contain voice commands. , it is necessary to continue to detect the second voice signal corresponding to the first sound source, recognize the voice command contained in the second voice signal, and respond to the voice command.

本申请实施例中提供的人机交互方法，通过检测到第一语音信号时，对第一语音信号对应的第一声源进行定位，若第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端，若检测到人眼对准终端，则启动人机交互模式并响应第一语音信号对应的语音指令。通过采用上述技术方案，智能终端可以在检测到第一语音信号时，通过第一声源的定位判断是否启动摄像头检测人眼与终端的关系，当检测到人眼与终端对准时，方可启动人机交互模式并响应相关语音指令，避免因关键词唤醒人机交互模式导致操作繁琐的问题，简化了人机交互的操作过程，同时也提高了人机交互效率。In the human-computer interaction method provided in the embodiment of the present application, when the first voice signal is detected, the first sound source corresponding to the first voice signal is located, and if the positioning result of the first sound source meets the preset requirements, the The camera is used to detect whether the human eyes are aimed at the terminal, and if it is detected that the human eyes are aimed at the terminal, the human-computer interaction mode is started and the voice instruction corresponding to the first voice signal is responded. By adopting the above technical solution, when the smart terminal detects the first voice signal, it can judge whether to start the camera to detect the relationship between the human eye and the terminal through the location of the first sound source. When it detects that the human eye is aligned with the terminal, it can start The human-computer interaction mode responds to relevant voice commands, avoiding the problem of cumbersome operations caused by waking up the human-computer interaction mode due to keywords, simplifying the operation process of human-computer interaction, and improving the efficiency of human-computer interaction.

在一些实施例中，所述对所述第一语音信号对应的第一声源进行定位，包括：通过声音定位技术，确定所述第一语音信号对应的第一声源相对于终端的距离与方向；相应的，所述若所述第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端，包括：若所述第一声源相对于所述终端的距离小于预设距离阈值，则依据所述第一声源相对于所述终端的方向启动摄像头，并通过所述摄像头检测人眼是否对准终端。这样设置的好处在于，能够通过语音信号对应的声源相对于终端的距离与方向，判断是否启动摄像头，以及如何启动摄像头，进而提高了判断是否启动人机交互模式的准确性。In some embodiments, the locating the first sound source corresponding to the first voice signal includes: determining the distance and distance of the first sound source corresponding to the first voice signal relative to the terminal by using sound localization technology. direction; correspondingly, if the positioning result of the first sound source meets the preset requirements, start the camera, and use the camera to detect whether the human eye is aimed at the terminal, including: if the first sound source is relative to If the distance of the terminal is less than the preset distance threshold, the camera is activated according to the direction of the first sound source relative to the terminal, and the camera is used to detect whether human eyes are aimed at the terminal. The advantage of this setting is that it can judge whether to start the camera and how to start the camera according to the distance and direction of the sound source corresponding to the voice signal relative to the terminal, thereby improving the accuracy of judging whether to start the human-computer interaction mode.

在一些实施例中，检测到素数第一语音信号时，进行对所述第一语音信号对应的第一声源进行定位的同时还包括：检测到所述第一语音信号时，获取所述第一语音信号对应的语音内容；相应的，所述响应所述第一语音信号对应的语音指令，包括：依据所述语音内容生成语音指令，并响应所述语音指令。这样设置的好处在于，能够防止因进行对声源定位以及人眼是否对准终端的检测，遗漏用户发出的语音指令，例如，用户对智能终端就说了“播放默认列表”，而智能终端接收到该语音信号后仅用于定位和人眼是否对准终端的检测，并没有记录该内容，即使后续判断出该语音信号是用户发出的控制指令，也无法获知其中的内容并响应该控制指令，造成用户语音指令的遗漏。In some embodiments, when a prime number first voice signal is detected, locating the first sound source corresponding to the first voice signal also includes: when the first voice signal is detected, acquiring the first sound source Voice content corresponding to a voice signal; correspondingly, the responding to the voice command corresponding to the first voice signal includes: generating a voice command according to the voice content, and responding to the voice command. The advantage of this setting is that it can prevent voice commands from the user from being missed due to sound source localization and detection of whether the human eye is aimed at the terminal. After receiving the voice signal, it is only used for positioning and detection of whether the human eyes are aligned with the terminal, and the content is not recorded. Even if the voice signal is subsequently determined to be a control command issued by the user, it is impossible to know the content and respond to the control command , resulting in the omission of the user's voice command.

在一些实施例中，所述启动人机交互模式并响应所述第一语音信号对应的语音指令之后，还包括：记录所述第一语音信号对应的第一声纹信息，关闭摄像头；当检测到所述第一声纹信息对应的第二语音信号时，确定所述第一声纹信息对应的第一声源的移动速度；若所述移动速度小于预设速度阈值，则响应所述第二语音信号对应的语音指令。这样设置的好处在于，能够在确定用户和智能终端人机交互后，关闭摄像头，仅通过语音信号检测装置判断是否继续保持人机交互模式，减少智能终端功耗的同时，提高了人机交互效率。In some embodiments, after starting the human-computer interaction mode and responding to the voice instruction corresponding to the first voice signal, it further includes: recording the first voiceprint information corresponding to the first voice signal, and turning off the camera; When the second voice signal corresponding to the first voiceprint information is received, determine the moving speed of the first sound source corresponding to the first voiceprint information; if the moving speed is less than a preset speed threshold, respond to the first sound source Voice instructions corresponding to the two voice signals. The advantage of this setting is that after the human-computer interaction between the user and the smart terminal is determined, the camera can be turned off, and only the voice signal detection device can be used to judge whether to continue to maintain the human-computer interaction mode, which reduces the power consumption of the smart terminal and improves the efficiency of human-computer interaction. .

在一些实施例中，所述确定所述第一声纹信息对应的第一声源的移动速度，包括：获取第一时刻与第二时刻的时间间隔，其中，所述第一时刻包括检测到所述第一语音信号的时刻，所述第二时刻包括检测到所述第二语音信号的时刻；获取所述第一声源在所述第一时刻和所述第二时刻相对于所述终端的距离差；根据所述时间间隔和所述距离差计算所述第一声纹信息对应的第一声源的移动速度。这样设置的好处在于，能够更加准确的分析出第一声源的移动速度，进而准确判断是否需要继续保持智能终端的人机交互模式。In some embodiments, the determining the moving speed of the first sound source corresponding to the first voiceprint information includes: acquiring a time interval between the first moment and the second moment, wherein the first moment includes detecting The time of the first voice signal, the second time includes the time when the second voice signal is detected; obtain the first sound source relative to the terminal at the first time and the second time Calculate the moving speed of the first sound source corresponding to the first voiceprint information according to the time interval and the distance difference. The advantage of such setting is that the moving speed of the first sound source can be analyzed more accurately, and then it can be accurately judged whether it is necessary to continue to maintain the human-computer interaction mode of the smart terminal.

在一些实施例中，所述启动人机交互模式并响应所述第一语音信号对应的语音指令之后，还包括：记录所述第一语音信号对应的第一声纹信息，关闭摄像头；当检测到所述第一声纹信息对应的第三语音信号时，若判断出当前时刻与第三时刻的时间间隔大于所述第一声纹信息的有效时长，则对所述第三语音信号进行定位，若定位结果满足预设要求，则启动所述摄像头，并通过所述摄像头重新进行人眼检测；其中，所述第三时刻包括上一次检测到所述第一声纹信息对应的语音信号的时刻，所述第一声纹信息的有效时长包括最近两次检测到所述第一声纹信息对应的语音信号的时间间隔。这样设置的好处在于，能够在满足第一声纹信息有效时长的情况下，对第一声纹信息的不同语音信号对应的语音指令直接响应，超过有效时长的情况下，重新定位以及人眼对准终端检测，减少智能终端功耗的同时，保证了语音指令的实时性和准确性。In some embodiments, after starting the human-computer interaction mode and responding to the voice instruction corresponding to the first voice signal, it further includes: recording the first voiceprint information corresponding to the first voice signal, and turning off the camera; When arriving at the third voice signal corresponding to the first voiceprint information, if it is judged that the time interval between the current moment and the third moment is greater than the effective duration of the first voiceprint information, then locate the third voice signal , if the positioning result meets the preset requirements, start the camera, and perform human eye detection again through the camera; wherein, the third moment includes the time when the voice signal corresponding to the first voiceprint information was detected last time The valid duration of the first voiceprint information includes the time interval between the last two detections of the voice signal corresponding to the first voiceprint information. The advantage of this setting is that it can directly respond to voice commands corresponding to different voice signals of the first voiceprint information under the condition that the validity period of the first voiceprint information is satisfied, and repositioning and human eye recognition can be performed when the validity period is exceeded. Quasi-terminal detection reduces the power consumption of smart terminals while ensuring the real-time and accuracy of voice commands.

在一些实施例中，所述启动人机交互模式并响应所述第一语音信号对应的语音指令之后，还包括：控制摄像头采集第一语音信号对应的人脸图像作为目标人脸图像并记录；若检测到的第四语音信号的声纹信息为第二声纹信息，且所述摄像头检测到人眼对准终端设备，则控制摄像头采集所述第四语音信号对应的人脸图像，并将所述人脸图像与记录的目标人脸图像进行匹配；若不匹配，则响应所述第四语音信号对应的语音指令，并将所述第四语音信号对应的人脸图像作为目标人脸信息。这样设置的好处在于，能够对不同用户发出的语音信号进行区别，避免了语音信号与识别用户身份不符，导致语音指令响应错误的情况发生。In some embodiments, after starting the human-computer interaction mode and responding to the voice instruction corresponding to the first voice signal, it further includes: controlling the camera to collect and record the face image corresponding to the first voice signal as the target face image; If the detected voiceprint information of the fourth voice signal is the second voiceprint information, and the camera detects that the human eyes are aimed at the terminal device, the camera is controlled to collect the face image corresponding to the fourth voice signal, and The face image is matched with the recorded target face image; if it does not match, then respond to the voice instruction corresponding to the fourth voice signal, and use the face image corresponding to the fourth voice signal as the target face information . The advantage of this setting is that it can distinguish voice signals from different users, avoiding the situation that the voice signal does not match the identity of the identified user, resulting in an incorrect voice command response.

图2为本发明实施例提供的另一种人机交互的方法的流程示意图，该方法包括如下步骤：Fig. 2 is a schematic flow chart of another human-computer interaction method provided by an embodiment of the present invention, the method includes the following steps:

步骤201、检测到第一语音信号时，对第一语音信号对应的第一声源进行定位。Step 201. When a first voice signal is detected, locate a first sound source corresponding to the first voice signal.

步骤202、若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端。Step 202: If the positioning result of the first sound source satisfies the preset requirement, start the camera, and use the camera to detect whether the human eyes are aimed at the terminal.

步骤203、若检测到人眼对准终端，则启动人机交互模式并响应第一语音信号对应的语音指令。Step 203, if it is detected that the human eyes are aimed at the terminal, start the human-computer interaction mode and respond to the voice instruction corresponding to the first voice signal.

步骤204、记录第一语音信号对应的第一声纹信息，关闭摄像头。Step 204: Record the first voiceprint information corresponding to the first voice signal, and turn off the camera.

示例性的,声纹信息可以是集成在智能终端中的器件或模型检测到的携带言语信息的声波频谱信息,每个人都具有其特有的声纹信息，因此，声纹信息可以区别不同人的声音或判断是否是同一人的声音。第一声纹信息可以是第一语音信号对应的声源的声纹信息。Exemplarily, the voiceprint information can be the sound wave spectrum information carrying speech information detected by the device or model integrated in the smart terminal. Everyone has their own unique voiceprint information. Therefore, the voiceprint information can distinguish the voices of different people. voice or judge whether it is the voice of the same person. The first voiceprint information may be voiceprint information of a sound source corresponding to the first voice signal.

示例性的，当检测到第一语音信号为用户发出的对智能终端的语音指令，响应该语音指令后，可以记录第一语音信号对应的第一声纹信息，通过第一声纹信息进行后续语音指令是否还是同一用户发出的语音的识别和判断，进而决定是否响应接下来的语音指令。不用再对后续的每个语音信号都执行上述步骤201到步骤204的操作，此时可以关闭摄像头，不但减少了智能终端的功耗，还提高人机交互的效率。Exemplarily, when it is detected that the first voice signal is a voice command issued by the user to the smart terminal, after responding to the voice command, the first voiceprint information corresponding to the first voice signal can be recorded, and the follow-up can be performed through the first voiceprint information. Recognize and judge whether the voice command is still the voice issued by the same user, and then decide whether to respond to the next voice command. It is no longer necessary to perform the above steps 201 to 204 for each subsequent voice signal. At this time, the camera can be turned off, which not only reduces the power consumption of the smart terminal, but also improves the efficiency of human-computer interaction.

步骤205、当检测到第一声纹信息对应的第二语音信号时，确定第一声纹信息对应的第一声源的移动速度。Step 205: When the second voice signal corresponding to the first voiceprint information is detected, determine the moving speed of the first sound source corresponding to the first voiceprint information.

示例性的，第二语音信号可以是在第一语音信号后，又发出的第一声纹信息对应的语音信号。例如，用户A的声纹信息为第一声纹信息，用户A说的第一句话“开启智能音箱播放默认列表”为第一语音信号，智能音箱响应后，用户A又说“播放下一首”，此时该句话即为第一声纹信息对应的第二语音信号。Exemplarily, the second voice signal may be a voice signal corresponding to the first voiceprint information sent after the first voice signal. For example, the voiceprint information of user A is the first voiceprint information, and the first sentence that user A said "start the smart speaker to play the default list" is the first voice signal. After the smart speaker responds, user A says "play the next First", this sentence is the second voice signal corresponding to the first voiceprint information.

示例性的，当检测到第二语音信号时，为了防止用户的位置已经发生变化，例如，用户已经远离智能终端，第二语音信号是用户在与其他人说话，并非是对智能终端的语音指令。因此不能检测到同一第一声纹信息对应的第二语音信号，就立刻识别该语音信号中是否包含语音指令，若包含就执行语音指令。需要进一步判断第二语音信号是否是满足识别要求，可以是确定第一声纹信息对应的第一声源的移动速度，通过判断第一声源的移动速度是否满足识别要求，如果满足，再进行具体的识别。Exemplarily, when the second voice signal is detected, in order to prevent that the user's location has changed, for example, the user has moved away from the smart terminal, the second voice signal is that the user is talking to other people, not a voice command to the smart terminal . Therefore, if the second voice signal corresponding to the same first voiceprint information cannot be detected, it is immediately recognized whether the voice signal contains a voice command, and if so, the voice command is executed. It is necessary to further judge whether the second voice signal meets the recognition requirements, which may be to determine the moving speed of the first sound source corresponding to the first voiceprint information, by judging whether the moving speed of the first sound source meets the recognition requirements, and if so, proceed specific identification.

其中，第一声源的移动速度可以是第一声源在发出第一语音信号后，发出第二语音信号前这段时间的移动速度。例如，用户A发出第一语音信号后，向后移动并发出第二语音信号，则用户在发出第一语音信号后，第二语音信号之前这段时间内的移动速度即为第一声源的移动速度。Wherein, the moving speed of the first sound source may be the moving speed of the first sound source during a period of time after sending out the first sound signal and before sending out the second sound signal. For example, after user A sends out the first voice signal, he moves backward and sends out the second voice signal, then the moving speed of the user during the period before the second voice signal after sending out the first voice signal is the speed of the first sound source. Moving speed.

在本申请实施例中，确定第一声纹信息对应的第一声源的移动速度，可以包括：获取第一时刻与第二时刻的时间间隔，其中，第一时刻包括检测到第一语音信号的时刻，第二时刻包括检测到第二语音信号的时刻；获取第一声源在第一时刻和第二时刻相对于终端的距离差；根据时间间隔和距离差计算第一声纹信息对应的第一声源的移动速度。In this embodiment of the present application, determining the moving speed of the first sound source corresponding to the first voiceprint information may include: obtaining the time interval between the first moment and the second moment, where the first moment includes the detection of the first voice signal The second moment includes the moment when the second voice signal is detected; obtain the distance difference between the first sound source and the terminal at the first moment and the second moment; calculate the corresponding time interval of the first voiceprint information according to the time interval and distance difference The movement speed of the first sound source.

具体的，可以是分别获取检测到第一语音信号和第二语音信号对应的第一时刻和第二时刻，得到两时刻的时间间隔，然后再获取在第一时刻和第二时刻第一声源相对于智能终端的距离之差，依据距离差和时间间隔即可得到第一声源的移动速度。例如，第一声源为用户A，用户A在9:00发出第一语音信号，在9:02发出第二语音信号，且用户A发出第一语音信号时相对于终端的距离为0.5米，在发出第二语音信号时相对于智能终端的距离为1.5米，则第一时刻与第二时刻的时间间隔为2分钟，距离差为1米，第一声源的移动速度为0.5米每分钟。Specifically, the first and second moments corresponding to the first voice signal and the second voice signal can be respectively obtained to obtain the time interval between the two moments, and then the first sound source at the first moment and the second moment can be acquired With respect to the distance difference of the smart terminal, the moving speed of the first sound source can be obtained according to the distance difference and the time interval. For example, the first sound source is user A, user A sends out the first voice signal at 9:00, and sends out the second voice signal at 9:02, and the distance between user A and the terminal when sending out the first voice signal is 0.5 meters, When the second voice signal is sent out, the distance from the smart terminal is 1.5 meters, the time interval between the first moment and the second moment is 2 minutes, the distance difference is 1 meter, and the moving speed of the first sound source is 0.5 meters per minute .

步骤206、判断第一声源的移动速度是否小于预设速度阈值，若是，执行步骤207，若否返回步骤201。Step 206 , judging whether the moving speed of the first sound source is less than a preset speed threshold, if yes, execute step 207 , if not, return to step 201 .

示例性的，预设速度阈值可以是用户预先根据自身需求设置的，也可以是结合声源移动速度和语音识别的准确性之间的关系以及用户的使用习惯，在保证能够准确识别语音指令的前提下，自动设置的。判断第一声源的移动速度是否小于预设速度阈值，即是否可以将其确定为第一声源发出的对智能终端的第二个语音指令，若是，执行步骤207，响应第二语音信号对应的语音指令，若否，则返回步骤201，对该语音信号重新执行步骤201到步骤203。Exemplarily, the preset speed threshold can be set by the user in advance according to their own needs, or it can be combined with the relationship between the moving speed of the sound source and the accuracy of speech recognition as well as the user's usage habits to ensure that the speech command can be accurately recognized. Under the premise, it is automatically set. Judging whether the moving speed of the first sound source is less than the preset speed threshold, that is, whether it can be determined as the second voice instruction to the smart terminal issued by the first sound source, if so, perform step 207, and respond to the second voice signal corresponding to If not, return to step 201, and re-execute steps 201 to 203 for the voice signal.

步骤207、响应第二语音信号对应的语音指令。Step 207: Respond to the voice instruction corresponding to the second voice signal.

示例性的，响应第二语音信号对应的语音信号包括先识别第二语音信号中是否包含语音指令，若包含则响应该语音指令，若不包含，则忽略第二语音信号。Exemplarily, responding to the voice signal corresponding to the second voice signal includes first identifying whether the second voice signal contains a voice command, and responding to the voice command if it does, and ignoring the second voice signal if it does not.

本发明实施例智能终端在检测到第一语音信号对应的第一声源的定位满足预设要求且检测到人眼与终端对准时，启动人机交互模式并响应相关语音指令，同时记录第一语音信号对应的声纹信息，关闭摄像头。当后续又检测到第一声纹信息对应的第二语音信号时，若第一声纹信息对应的第一声源的移动速度小于预设速度阈值，则响应第二语音信号对应的语音指令。能够在确定用户和智能终端人机交互后，关闭摄像头，仅通过语音信号检测装置判断是否继续保持人机交互模式，减少智能终端功耗的同时，提高了人机交互效率。In the embodiment of the present invention, when the intelligent terminal detects that the location of the first sound source corresponding to the first voice signal meets the preset requirements and detects that the human eyes are aligned with the terminal, it starts the human-computer interaction mode and responds to the relevant voice commands, and records the first sound source at the same time. The voiceprint information corresponding to the voice signal, and the camera is turned off. When the second voice signal corresponding to the first voiceprint information is subsequently detected, if the moving speed of the first sound source corresponding to the first voiceprint information is less than a preset speed threshold, the voice instruction corresponding to the second voice signal is responded. After the human-computer interaction between the user and the smart terminal is determined, the camera can be turned off, and only the voice signal detection device can be used to judge whether to continue to maintain the human-computer interaction mode, thereby reducing the power consumption of the smart terminal and improving the efficiency of human-computer interaction.

图3为本发明实施例提供的又一种人机交互的方法的流程示意图，该方法包括如下步骤：Fig. 3 is a schematic flowchart of another human-computer interaction method provided by an embodiment of the present invention, the method includes the following steps:

步骤301、检测到第一语音信号时，对第一语音信号对应的第一声源进行定位。Step 301. When a first voice signal is detected, locate a first sound source corresponding to the first voice signal.

步骤302、若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端。Step 302: If the positioning result of the first sound source satisfies the preset requirement, start the camera, and use the camera to detect whether the human eyes are aimed at the terminal.

步骤303、若检测到人眼对准终端，则启动人机交互模式并响应第一语音信号对应的语音指令。Step 303, if it is detected that the human eyes are aimed at the terminal, start the human-computer interaction mode and respond to the voice instruction corresponding to the first voice signal.

步骤304、记录第一语音信号对应的第一声纹信息，关闭摄像头。Step 304: Record the first voiceprint information corresponding to the first voice signal, and turn off the camera.

步骤305、当检测到第一声纹信息对应的第三语音信号时，判断当前时刻与第三时刻的时间间隔是否大于第一声纹信息的有效时长，若是，返回执行步骤301，若否，执行步骤306。Step 305. When the third voice signal corresponding to the first voiceprint information is detected, determine whether the time interval between the current moment and the third moment is greater than the effective duration of the first voiceprint information. If yes, return to step 301. If not, Execute step 306.

其中，第三时刻包括上一次检测到第一声纹信息对应的语音信号的时刻，第一声纹信息的有效时长包括最近两次检测到第一声纹信息对应的语音信号的时间间隔。第三语音信号可以是第一声纹信息对应的除第一语音信号以外的，其他语音信号，可以与第二语音信号相同，也可以与第二语音信号不同。Wherein, the third moment includes the last time when the voice signal corresponding to the first voiceprint information was detected, and the valid duration of the first voiceprint information includes the time interval between the last two detections of the voice signal corresponding to the first voiceprint information. The third voice signal may be a voice signal other than the first voice signal corresponding to the first voiceprint information, and may be the same as or different from the second voice signal.

示例性的，针对各声纹信息其有效时长并不是永久的，而是有一个固定期限的，即最近两次检测到该声纹信息对应的语音信号的时间间隔。若检测到当前时刻与第三时刻的时间间隔超过当前语音信号对应的声纹信息的有效时长，则需要返回步骤301，对第三语音信号重新执行步骤301到步骤303。若当前时刻与第三时刻的时间间隔没有超过当前语音信号对应的声纹信息的有效时长，则执行步骤306，响应第三语音信号对应的语音指令。例如，第一声纹信息是用户A的声纹信息，在当前时刻即早上9:00检测到用户A发出的第三语音信号，用户A的上一次发出的语音信号的时刻(即第三时刻)为8:58，用户A上上次发出的语音信号的时刻为8:55。则用户A的第一声纹信息的有效时长为8:55到8:58之间的时间间隔，即3分钟。当前时刻与第三时刻的时间间隔为2分钟没有超过第一声纹信息的有效时长3分钟，因此执行步骤306，响应第三语音信号对应的语音指令。Exemplarily, the effective duration of each voiceprint information is not permanent, but has a fixed period, that is, the time interval between the last two detections of the voice signal corresponding to the voiceprint information. If it is detected that the time interval between the current moment and the third moment exceeds the valid duration of the voiceprint information corresponding to the current voice signal, it is necessary to return to step 301 and re-execute steps 301 to 303 for the third voice signal. If the time interval between the current moment and the third moment does not exceed the valid duration of the voiceprint information corresponding to the current voice signal, step 306 is executed to respond to the voice command corresponding to the third voice signal. For example, the first voiceprint information is the voiceprint information of user A, the third voice signal sent by user A is detected at 9:00 in the morning at the current moment, and the time of the voice signal sent by user A last time (that is, the third moment ) is 8:58, and the time of the last voice signal sent by user A is 8:55. Then the effective duration of the first voiceprint information of user A is the time interval between 8:55 and 8:58, that is, 3 minutes. The time interval between the current moment and the third moment is 2 minutes and does not exceed 3 minutes of the valid duration of the first voiceprint information, so step 306 is executed to respond to the voice command corresponding to the third voice signal.

需要说明的是，第一声纹信息的初始有效时长，即若当前时刻的第三语音信号为第一声纹信息对应的第二个语音信号，此时最近检测到第一声纹信息对应的语音信号只有一次，第一声纹信息的有效时长可以是系统根据第一声纹信息的历史有效时长确定，例如，历史有效时长的均值。还可以是用户根据自身需求预先设定的。It should be noted that the initial effective duration of the first voiceprint information, that is, if the third voice signal at the current moment is the second voice signal corresponding to the first voiceprint information, the most recently detected There is only one voice signal, and the valid duration of the first voiceprint information can be determined by the system according to the historical valid duration of the first voiceprint information, for example, the average value of the historical valid duration. It may also be preset by the user according to his or her own needs.

步骤306、响应第三语音信号对应的语音指令。Step 306: Respond to the voice instruction corresponding to the third voice signal.

本发明实施例智能终端在检测到第一语音信号对应的第一声源的定位满足预设要求且检测到人眼与终端对准时，启动人机交互模式并响应相关语音指令，同时记录第一语音信号对应的声纹信息，关闭摄像头。当后续又检测到第一声纹信息对应的第三语音信号时，若当前时刻与第三时刻的时间间隔小于第一声纹信息的有效时长，则响应第三语音信号对应的语音指令。减少智能终端功耗的同时，保证了语音指令的实时性和准确性。In the embodiment of the present invention, when the intelligent terminal detects that the location of the first sound source corresponding to the first voice signal meets the preset requirements and detects that the human eyes are aligned with the terminal, it starts the human-computer interaction mode and responds to the relevant voice commands, and records the first sound source at the same time. The voiceprint information corresponding to the voice signal, and the camera is turned off. When the third voice signal corresponding to the first voiceprint information is subsequently detected, if the time interval between the current moment and the third moment is less than the effective duration of the first voiceprint information, the voice command corresponding to the third voice signal is responded. While reducing the power consumption of smart terminals, it ensures the real-time and accuracy of voice commands.

图4为本发明实施例提供的又一种人机交互的方法的流程示意图，该方法包括如下步骤：Fig. 4 is a schematic flow chart of another human-computer interaction method provided by an embodiment of the present invention, the method includes the following steps:

步骤401、检测到第一语音信号时，对第一语音信号对应的第一声源进行定位。Step 401. When a first voice signal is detected, locate a first sound source corresponding to the first voice signal.

步骤402、若第一声源的定位结果满足预设要求，则启动摄像头，并通过摄像头检测人眼是否对准终端。Step 402: If the positioning result of the first sound source satisfies the preset requirement, start the camera, and use the camera to detect whether the human eyes are aimed at the terminal.

步骤403、若检测到人眼对准终端，则启动人机交互模式并响应第一语音信号对应的语音指令。Step 403, if it is detected that the human eyes are aimed at the terminal, start the human-computer interaction mode and respond to the voice instruction corresponding to the first voice signal.

步骤404、控制摄像头采集第一语音信号对应的人脸图像作为目标人脸图像并记录。Step 404, controlling the camera to collect and record the face image corresponding to the first voice signal as the target face image.

示例性的，为了保证后续人机交互过程的准确性，当确定第一语音信号是用户发出的用于控制智能终端的语音信号时，控制摄像头采集第一语音信号对应的人脸图像，该人脸图像为进行人眼是否对准终端检测时，人眼所属的人脸图像。将该人脸图像作为目标人脸图像，用于后续进行是否是同一声源发出的语音信号的确定。Exemplarily, in order to ensure the accuracy of the subsequent human-computer interaction process, when it is determined that the first voice signal is a voice signal sent by the user for controlling the smart terminal, the camera is controlled to collect the face image corresponding to the first voice signal, and the person The face image is the face image to which the human eyes belong when detecting whether the human eyes are aligned with the terminal. The face image is used as the target face image for subsequent determination of whether the voice signal is from the same sound source.

需要说明的是，步骤404的执行顺序本申请不进行限定，可以是本申请实施例中所示的顺序，还可以是在步骤403启动人机交互模式之前执行，还可以是与启动人机交互模式并响应第一语音信号对应的语音指令同时进行。It should be noted that the execution order of step 404 is not limited in this application, it may be the order shown in the embodiment of this application, it may also be executed before starting the human-computer interaction mode in step 403, or it may be executed before starting the human-computer interaction mode. mode and respond to the voice instruction corresponding to the first voice signal simultaneously.

步骤405、若检测到的第四语音信号的声纹信息为第二声纹信息，且摄像头检测到人眼对准终端设备，则控制摄像头采集第四语音信号对应的人脸图像。Step 405: If the detected voiceprint information of the fourth voice signal is the second voiceprint information, and the camera detects that the human eyes are aimed at the terminal device, control the camera to collect a face image corresponding to the fourth voice signal.

示例性的，第四语音信号可以是第一语音信号之后检测到的另一语音信息，其可以是与第一语音信号具有相同的第一声纹信息，还可以是不同于第一信号，具有的是第二声纹信息。Exemplarily, the fourth voice signal may be another voice information detected after the first voice signal, which may have the same first voiceprint information as the first voice signal, or may be different from the first signal, having is the second voiceprint information.

当检测到第四语音信号的声纹信息为第二声纹信息，且摄像头检测到人眼对准终端设备时，说明该语音信号与第一语音信号的声源不同，且用户的人眼是对准智能终端的，此时要进一步判断摄像头检测到的用户是否就是第四语音信号对应的声源，即控制摄像头采集第四语音信号对应的人脸图像。When it is detected that the voiceprint information of the fourth voice signal is the second voiceprint information, and the camera detects that the human eyes are aimed at the terminal device, it means that the sound source of the voice signal is different from that of the first voice signal, and the user's eyes are For those aiming at the smart terminal, it is necessary to further judge whether the user detected by the camera is the sound source corresponding to the fourth voice signal, that is, to control the camera to collect the face image corresponding to the fourth voice signal.

需要说明的是，若第四语音信号的声纹信息仍为第一声纹信息，则按照上述图2和/或图3所示的方法进行人机交互操作过程。若第四语音信号的声纹信息为第二声纹信息，但是没有检测到此时人眼对准终端设备，则说明第四语音信号并不是用户对终端的控制指令，不响应第四语音信号对应的语音指令。It should be noted that if the voiceprint information of the fourth voice signal is still the first voiceprint information, the human-computer interaction operation process is performed according to the method shown in FIG. 2 and/or FIG. 3 above. If the voiceprint information of the fourth voice signal is the second voiceprint information, but it is not detected that the human eyes are aimed at the terminal device at this time, it means that the fourth voice signal is not the user's control command to the terminal, and the fourth voice signal is not responded corresponding voice commands.

步骤406、判断第四语音信号对应的人脸图像与记录的目标人脸图像是否匹配，若不匹配，则执行步骤407，若匹配，执行步骤408，不响应第四语音信号对应的语音指令。Step 406. Determine whether the face image corresponding to the fourth voice signal matches the recorded target face image. If not, perform step 407. If they match, perform step 408, and do not respond to the voice command corresponding to the fourth voice signal.

示例性的，若第四语音信号对应的人脸图像与记录的第一语音信号对应的目标人脸图像匹配，说明此时面对终端的用户仍为第一语音信号对应的目标用户，并不是发出第四语音信号的用户，此时，可以不响应第四语音信号对应的语音指令。若第四语音信号对应的人脸图像与记录的第一语音信号对应的目标人脸图像不匹配，说明此时面对终端的用户已不是第一语音信号对应的目标用户，是第四语音信号对应的用户，且该用户人眼与终端对准，此时，执行步骤407，响应第四语音信号对应的语音指令。Exemplarily, if the face image corresponding to the fourth voice signal matches the target face image corresponding to the recorded first voice signal, it means that the user facing the terminal is still the target user corresponding to the first voice signal, not The user who sends out the fourth voice signal may not respond to the voice instruction corresponding to the fourth voice signal at this time. If the face image corresponding to the fourth voice signal does not match the target face image corresponding to the recorded first voice signal, it means that the user facing the terminal is no longer the target user corresponding to the first voice signal, but the fourth voice signal The corresponding user, and the user's eyes are aligned with the terminal, at this time, step 407 is performed to respond to the voice instruction corresponding to the fourth voice signal.

例如，第一语音信号是由用户A发出的，目标人脸图像也为用户A的人脸图像。而第四语音信号是由用户B发出的，若第四语音信号对应的人脸图像与记录的目标人脸图像匹配，说明用户B并没有对准终端设备，此时还是用户A对准终端设备，因此，不满足第四语音信号对应的用户人眼对准终端，可以不响应第四语音信号对应的语音指令。若第四语音信号对应的人脸图像与记录的目标人脸图像不匹配，说明此时用户B对准终端设备，导致用户B的人脸图像与用户A的目标人脸图像不匹配，确定对准终端的人眼为用户B的人眼，此时可以响应第四语音信号对应的语音指令。For example, the first voice signal is sent by user A, and the target face image is also user A's face image. The fourth voice signal is sent by user B. If the face image corresponding to the fourth voice signal matches the recorded target face image, it means that user B is not aiming at the terminal device, and user A is still aiming at the terminal device at this time. Therefore, the user whose eyes do not meet the requirements of the fourth voice signal corresponding to the terminal may not respond to the voice instruction corresponding to the fourth voice signal. If the face image corresponding to the fourth voice signal does not match the recorded target face image, it means that user B is aiming at the terminal device at this time, resulting in a mismatch between the face image of user B and the target face image of user A. The human eyes of the quasi-terminal are the human eyes of user B, and can respond to the voice instruction corresponding to the fourth voice signal at this time.

步骤407、响应第四语音信号对应的语音指令，并将第四语音信号对应的人脸图像作为目标人脸信息。Step 407: Respond to the voice instruction corresponding to the fourth voice signal, and use the face image corresponding to the fourth voice signal as the target face information.

示例性的，当第四语音信号对应的人脸图像与记录的目标人脸图像不匹配时，响应第四语音信号对应的语音指令，并通过摄像头采集第四语音信号对应的人脸作为替代之前的目标人脸信息。Exemplarily, when the face image corresponding to the fourth voice signal does not match the recorded target face image, respond to the voice instruction corresponding to the fourth voice signal, and collect the face corresponding to the fourth voice signal through the camera as a substitute target face information.

步骤408、不响应第四语音信号对应的语音指令。Step 408: Do not respond to the voice instruction corresponding to the fourth voice signal.

本发明实施例智能终端在检测到第一语音信号对应的第一声源的定位满足预设要求且检测到人眼与终端对准时，启动人机交互模式并响应相关语音指令，并采集当前人脸图像作为目标人脸图像。后续检测到第二声纹信息对应的第四语音信号，且检测到人眼对准终端设备时，采集第四语音信号对应的人脸图像与目标人脸图像匹配，若不匹配，响应第四语音信号对应的语音指令。能够对不同用户发出的语音信号进行区别，避免了语音信号与识别用户身份不符，导致语音指令响应错误的情况。In the embodiment of the present invention, when the intelligent terminal detects that the location of the first sound source corresponding to the first voice signal meets the preset requirements and detects that the human eyes are aligned with the terminal, it starts the human-computer interaction mode and responds to relevant voice commands, and collects the current human-computer interaction mode. The face image is used as the target face image. Subsequent detection of the fourth voice signal corresponding to the second voiceprint information, and when it is detected that the human eyes are aimed at the terminal device, the face image corresponding to the fourth voice signal is collected to match the target face image, and if it does not match, respond to the fourth voice signal. A voice instruction corresponding to the voice signal. The voice signals sent by different users can be distinguished, avoiding the situation that the voice signal does not match the identity of the identified user, resulting in an erroneous voice command response.

图5为本发明实施例提供的一种人机交互装置的结构框图，该装置可由软件和/或硬件实现，一般集成在终端中，可通过执行人机交互方法来响应用户的语音指令。如图5所示，该装置包括：FIG. 5 is a structural block diagram of a human-computer interaction device provided by an embodiment of the present invention. The device can be implemented by software and/or hardware, and is generally integrated in a terminal, and can respond to user voice commands by executing a human-computer interaction method. As shown in Figure 5, the device includes:

声源定位模块501，用于检测到第一语音信号时，对所述第一语音信号对应的第一声源进行定位；The sound source localization module 501 is configured to locate the first sound source corresponding to the first speech signal when the first speech signal is detected;

人眼对准检测模块502，用于若所述第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端；The human eye alignment detection module 502 is configured to start the camera if the positioning result of the first sound source meets the preset requirements, and detect whether the human eye is aligned with the terminal through the camera;

人机交互响应模块503，用于若检测到人眼对准终端，则启动人机交互模式并响应所述第一语音信号对应的语音指令。The human-computer interaction response module 503 is configured to start the human-computer interaction mode and respond to the voice instruction corresponding to the first voice signal if it is detected that the human eyes are aimed at the terminal.

本申请实施例中提供的人机交互装置，通过检测到第一语音信号时，对第一语音信号对应的第一声源进行定位，若第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端，若检测到人眼对准终端，则启动人机交互模式并响应第一语音信号对应的语音指令。通过采用上述技术方案，智能终端可以在检测到第一语音信号时，通过第一声源的定位判断是否启动摄像头检测人眼与终端的关系，当检测到人眼与终端对准时，方可启动人机交互模式并响应相关语音指令，避免因关键词唤醒人机交互模式导致操作繁琐的问题，简化了人机交互的操作过程，同时也提高了人机交互效率。The human-computer interaction device provided in the embodiment of the present application locates the first sound source corresponding to the first voice signal when it detects the first voice signal, and if the positioning result of the first sound source meets the preset requirements, it starts The camera is used to detect whether the human eyes are aimed at the terminal, and if it is detected that the human eyes are aimed at the terminal, the human-computer interaction mode is started and the voice instruction corresponding to the first voice signal is responded. By adopting the above technical solution, when the smart terminal detects the first voice signal, it can judge whether to start the camera to detect the relationship between the human eye and the terminal through the location of the first sound source. When it detects that the human eye is aligned with the terminal, it can start The human-computer interaction mode responds to relevant voice commands, avoiding the problem of cumbersome operations caused by waking up the human-computer interaction mode due to keywords, simplifying the operation process of human-computer interaction, and improving the efficiency of human-computer interaction.

可选的，所述对所述第一语音信号对应的第一声源进行定位，包括：Optionally, the locating the first sound source corresponding to the first voice signal includes:

通过声音定位技术，确定所述第一语音信号对应的第一声源相对于终端的距离与方向；Determining the distance and direction of the first sound source corresponding to the first voice signal relative to the terminal through sound positioning technology;

相应的，所述若所述第一声源的定位结果满足预设要求，则启动摄像头，并通过所述摄像头检测人眼是否对准终端，包括：Correspondingly, if the positioning result of the first sound source meets the preset requirements, start the camera, and use the camera to detect whether the human eyes are aimed at the terminal, including:

若所述第一声源相对于所述终端的距离小于预设距离阈值，则依据所述第一声源相对于所述终端的方向启动摄像头，并通过所述摄像头检测人眼是否对准终端。If the distance between the first sound source and the terminal is less than the preset distance threshold, start the camera according to the direction of the first sound source relative to the terminal, and use the camera to detect whether the human eye is aimed at the terminal .

可选的，上述装置还包括：Optionally, the above-mentioned device also includes:

语音内容获取模块，用于检测到所述第一语音信号时，获取所述第一语音信号对应的语音内容；A voice content acquisition module, configured to acquire the voice content corresponding to the first voice signal when the first voice signal is detected;

相应的，所述响应所述第一语音信号对应的语音指令，包括：Correspondingly, the responding to the voice instruction corresponding to the first voice signal includes:

依据所述语音内容生成语音指令，并响应所述语音指令。Generate a voice command according to the voice content, and respond to the voice command.

信息记录模块，用于记录所述第一语音信号对应的第一声纹信息；An information recording module, configured to record the first voiceprint information corresponding to the first voice signal;

摄像头控制模块，用于在记录所述第一语音信号对应的第一声纹信息后，关闭摄像头；A camera control module, configured to close the camera after recording the first voiceprint information corresponding to the first voice signal;

速度确定模块，用于当检测到所述第一声纹信息对应的第二语音信号时，确定所述第一声纹信息对应的第一声源的移动速度；A speed determination module, configured to determine the moving speed of the first sound source corresponding to the first voiceprint information when the second voice signal corresponding to the first voiceprint information is detected;

人机交互响应模块503还用于，若所述移动速度小于预设速度阈值，则响应所述第二语音信号对应的语音指令。The human-computer interaction response module 503 is further configured to respond to the voice instruction corresponding to the second voice signal if the moving speed is less than a preset speed threshold.

可选的，速度确定模块具体用于，获取第一时刻与第二时刻的时间间隔，其中，所述第一时刻包括检测到所述第一语音信号的时刻，所述第二时刻包括检测到所述第二语音信号的时刻；Optionally, the speed determination module is specifically configured to acquire the time interval between the first moment and the second moment, wherein the first moment includes the moment when the first voice signal is detected, and the second moment includes the moment when the first voice signal is detected. the moment of the second speech signal;

获取所述第一声源在所述第一时刻和所述第二时刻相对于所述终端的距离差；acquiring a distance difference between the first sound source and the terminal at the first moment and the second moment;

根据所述时间间隔和所述距离差计算所述第一声纹信息对应的第一声源的移动速度。Calculate the moving speed of the first sound source corresponding to the first voiceprint information according to the time interval and the distance difference.

有效时长判断模块，用于当检测到所述第一声纹信息对应的第三语音信号时，若判断出当前时刻与第三时刻的时间间隔大于所述第一声纹信息的有效时长，则控制声源定位模块501对所述第三语音信号进行定位，若定位结果满足预设要求，则控制摄像头控制模块启动所述摄像头，并控制人眼对准检测模块502通过所述摄像头重新进行人眼检测；The effective duration judging module is used for detecting the third voice signal corresponding to the first voiceprint information, if it is determined that the time interval between the current moment and the third moment is greater than the effective duration of the first voiceprint information, then Control the sound source localization module 501 to locate the third voice signal, if the localization result meets the preset requirements, then control the camera control module to start the camera, and control the human eye alignment detection module 502 to re-perform the human voice through the camera. eye detection;

其中，所述第三时刻包括上一次检测到所述第一声纹信息对应的语音信号的时刻，所述第一声纹信息的有效时长包括最近两次检测到所述第一声纹信息对应的语音信号的时间间隔。Wherein, the third moment includes the last time when the voice signal corresponding to the first voiceprint information was detected, and the valid duration of the first voiceprint information includes the last two detections corresponding to the first voiceprint information. the time interval of the speech signal.

可选的，上述装置包括：Optionally, the above-mentioned devices include:

人脸采集模块，用于控制摄像头采集第一语音信号对应的人脸图像作为目标人脸图像；The face acquisition module is used to control the camera to collect the corresponding face image of the first voice signal as the target face image;

人脸图像记录模块，用于记录目标人脸图像；Face image recording module, used to record the target face image;

人脸采集模块还用于若检测到的第四语音信号的声纹信息为第二声纹信息，且所述摄像头检测到人眼对准终端设备，则控制摄像头采集所述第四语音信号对应的人脸图像；The face collection module is also used to control the camera to collect the fourth voice signal corresponding to the second voiceprint information if the detected voiceprint information of the fourth voice signal is the second voiceprint information, and the camera detects that the human eyes are aimed at the terminal device. face image;

人脸图像匹配模块，用于将所述第四语音信号对应的人脸图像与记录的目标人脸图像进行匹配；A face image matching module, configured to match the face image corresponding to the fourth voice signal with the recorded target face image;

人机交互响应模块503，还用于若不匹配，则响应所述第四语音信号对应的语音指令；The human-computer interaction response module 503 is also configured to respond to the voice instruction corresponding to the fourth voice signal if there is no match;

人脸图像记录模块还用于，将所述第四语音信号对应的人脸图像作为目标人脸信息。The face image recording module is further configured to use the face image corresponding to the fourth voice signal as target face information.

本申请实施例还提供一种包含计算机可执行指令的存储介质，所述计算机可执行指令在由计算机处理器执行时用于执行人机交互方法，该方法包括：The embodiment of the present application also provides a storage medium containing computer-executable instructions, the computer-executable instructions are used to execute a human-computer interaction method when executed by a computer processor, and the method includes:

存储介质——任何的各种类型的存储器设备或存储设备。术语“存储介质”旨在包括：安装介质，例如CD-ROM、软盘或磁带装置；计算机系统存储器或随机存取存储器，诸如DRAM、DDRRAM、SRAM、EDORAM，兰巴斯(Rambus)RAM等；非易失性存储器，诸如闪存、磁介质(例如硬盘或光存储)；寄存器或其它相似类型的存储器元件等。存储介质可以还包括其它类型的存储器或其组合。另外，存储介质可以位于程序在其中被执行的第一计算机系统中，或者可以位于不同的第二计算机系统中，第二计算机系统通过网络(诸如因特网)连接到第一计算机系统。第二计算机系统可以提供程序指令给第一计算机用于执行。术语“存储介质”可以包括可以驻留在不同位置中(例如在通过网络连接的不同计算机系统中)的两个或更多存储介质。存储介质可以存储可由一个或多个处理器执行的程序指令(例如具体实现为计算机程序)。storage medium - any of various types of memory devices or storage devices. The term "storage medium" is intended to include: installation media, such as CD-ROMs, floppy disks, or tape drives; computer system memory or random access memory, such as DRAM, DDRRAM, SRAM, EDORAM, Rambus RAM, etc.; Volatile memory, such as flash memory, magnetic media (eg hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. Also, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network such as the Internet. The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems connected by a network. The storage medium may store program instructions (eg embodied as computer programs) executable by one or more processors.

当然，本申请实施例所提供的一种包含计算机可执行指令的存储介质，其计算机可执行指令不限于如上所述的人机交互操作，还可以执行本申请任意实施例所提供的人机交互方法中的相关操作。Of course, a storage medium containing computer-executable instructions provided in the embodiments of the present application, the computer-executable instructions are not limited to the above-mentioned human-computer interaction operations, and can also perform the human-computer interaction provided in any embodiment of the application. Related operations in the method.

本申请实施例提供了一种智能终端，该智能终端中可集成本申请实施例提供的人机交互装置。图6为本申请实施例提供的一种智能终端的结构示意图。智能终端600可以包括：存储器601和处理器602，及存储在存储器上并可在处理器运行的计算机程序，所述处理器602执行所述计算机程序时实现如本申请实施例所述的人机交互方法。The embodiment of the present application provides an intelligent terminal, and the human-computer interaction device provided in the embodiment of the present application can be integrated into the intelligent terminal. FIG. 6 is a schematic structural diagram of a smart terminal provided by an embodiment of the present application. The smart terminal 600 may include: a memory 601 and a processor 602, and a computer program stored on the memory and operable by the processor. When the processor 602 executes the computer program, it realizes the man-machine system as described in the embodiment of the present application. interactive method.

本申请实施例提供的智能终端，可以在检测到第一语音信号时，通过第一声源的定位判断是否启动摄像头检测人眼与终端的关系，当检测到人眼与终端对准时，方可启动人机交互模式并响应相关语音指令，避免因关键词唤醒人机交互模式导致操作繁琐的问题，简化了人机交互的操作过程，同时也提高了人机交互效率。The smart terminal provided by the embodiment of the present application can judge whether to start the camera to detect the relationship between the human eye and the terminal through the location of the first sound source when the first voice signal is detected. Start the human-computer interaction mode and respond to relevant voice commands, avoiding the problem of cumbersome operations caused by waking up the human-computer interaction mode due to keywords, simplifying the operation process of human-computer interaction, and improving the efficiency of human-computer interaction.

图7为本申请实施例提供的又一种智能终端的结构示意图，该智能终端可以包括：壳体(图中未示出)、存储器701、中央处理器(central processing unit，CPU)702(又称处理器，以下简称CPU)、电路板(图中未示出)和电源电路(图中未示出)。所述电路板安置在所述壳体围成的空间内部；所述CPU702和所述存储器701设置在所述电路板上；所述电源电路，用于为所述智能终端的各个电路或器件供电；所述存储器701，用于存储可执行程序代码；所述CPU702通过读取所述存储器701中存储的可执行程序代码来运行与所述可执行程序代码对应的计算机程序，以实现以下步骤：FIG. 7 is a schematic structural diagram of another smart terminal provided by an embodiment of the present application. The smart terminal may include: a casing (not shown in the figure), a memory 701, and a central processing unit (central processing unit, CPU) 702 (also Said processor, hereinafter referred to as CPU), circuit board (not shown in the figure) and power supply circuit (not shown in the figure). The circuit board is placed inside the space surrounded by the housing; the CPU702 and the memory 701 are arranged on the circuit board; the power supply circuit is used to supply power to each circuit or device of the smart terminal The memory 701 is used to store executable program codes; the CPU702 executes a computer program corresponding to the executable program codes by reading the executable program codes stored in the memory 701, to achieve the following steps:

所述智能终端还包括：外设接口703、RF(Radio Frequency，射频)电路705、音频电路706、扬声器711、电源管理芯片708、输入/输出(I/O)子系统709、其他输入/控制设备710、触摸屏712、其他输入/控制设备710以及外部端口704，这些部件通过一个或多个通信总线或信号线707来通信。The smart terminal also includes: peripheral interface 703, RF (Radio Frequency, radio frequency) circuit 705, audio circuit 706, speaker 711, power management chip 708, input/output (I/O) subsystem 709, other input/control device 710 , touch screen 712 , other input/control devices 710 , and external ports 704 , these components communicate via one or more communication buses or signal lines 707 .

应该理解的是，图示智能终端700仅仅是智能终端的一个范例，并且智能终端700可以具有比图中所示出的更多的或者更少的部件，可以组合两个或更多的部件，或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。It should be understood that the illustrated smart terminal 700 is only an example of a smart terminal, and the smart terminal 700 may have more or fewer components than those shown in the figure, and two or more components may be combined, Or can have a different component configuration. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.

下面就本实施例提供的用于人机交互的智能终端进行详细的描述，该智能终端以手机为例。The following describes in detail the intelligent terminal used for human-computer interaction provided by this embodiment, and the intelligent terminal uses a mobile phone as an example.

存储器701，所述存储器701可以被CPU702、外设接口703等访问，所述存储器701可以包括高速随机存取存储器，还可以包括非易失性存储器，例如一个或多个磁盘存储器件、闪存器件、或其他易失性固态存储器件。Memory 701, the memory 701 can be accessed by the CPU 702, the peripheral interface 703, etc., the memory 701 can include a high-speed random access memory, and can also include a non-volatile memory, such as one or more disk storage devices, flash memory devices , or other volatile solid-state storage devices.

外设接口703，所述外设接口703可以将设备的输入和输出外设连接到CPU702和存储器701。Peripheral interface 703 , which can connect the input and output peripherals of the device to CPU 702 and memory 701 .

I/O子系统709，所述I/O子系统709可以将设备上的输入输出外设，例如触摸屏712和其他输入/控制设备710，连接到外设接口703。I/O子系统709可以包括显示控制器7091和用于控制其他输入/控制设备710的一个或多个输入控制器7092。其中，一个或多个输入控制器7092从其他输入/控制设备710接收电信号或者向其他输入/控制设备710发送电信号，其他输入/控制设备710可以包括物理按钮(按压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击滚轮。值得说明的是，输入控制器7092可以与以下任一个连接：键盘、红外端口、USB接口以及诸如鼠标的指示设备。The I/O subsystem 709 , the I/O subsystem 709 can connect input and output peripherals on the device, such as a touch screen 712 and other input/control devices 710 , to the peripheral interface 703 . I/O subsystem 709 may include a display controller 7091 and one or more input controllers 7092 for controlling other input/control devices 710 . Among them, one or more input controllers 7092 receive electrical signals from or send electrical signals to other input/control devices 710, which may include physical buttons (push buttons, rocker buttons, etc.) ), dials, slide switches, joysticks, click wheels. It is worth noting that the input controller 7092 can be connected to any of the following: a keyboard, an infrared port, a USB interface, and a pointing device such as a mouse.

触摸屏712，所述触摸屏712是用户智能终端与用户之间的输入接口和输出接口，将可视输出显示给用户，可视输出可以包括图形、文本、图标、视频等。A touch screen 712, the touch screen 712 is an input interface and an output interface between the user's smart terminal and the user, and displays visual output to the user. The visual output may include graphics, text, icons, videos, etc.

I/O子系统709中的显示控制器7091从触摸屏712接收电信号或者向触摸屏712发送电信号。触摸屏712检测触摸屏上的接触，显示控制器7091将检测到的接触转换为与显示在触摸屏712上的用户界面对象的交互，即实现人机交互，显示在触摸屏712上的用户界面对象可以是运行游戏的图标、联网到相应网络的图标等。值得说明的是，设备还可以包括光鼠，光鼠是不显示可视输出的触摸敏感表面，或者是由触摸屏形成的触摸敏感表面的延伸。The display controller 7091 in the I/O subsystem 709 receives electrical signals from the touch screen 712 or sends electrical signals to the touch screen 712 . The touch screen 712 detects the contact on the touch screen, and the display controller 7091 converts the detected contact into an interaction with the user interface object displayed on the touch screen 712, that is, realizes human-computer interaction, and the user interface object displayed on the touch screen 712 can be a running Icons for games, icons for networking to appropriate networks, etc. It is worth noting that the device may also include an optical mouse, which is a touch-sensitive surface that does not display visual output, or that is an extension of a touch-sensitive surface formed by a touch screen.

RF电路705，主要用于建立手机与无线网络(即网络侧)的通信，实现手机与无线网络的数据接收和发送。例如收发短信息、电子邮件等。具体地，RF电路705接收并发送RF信号，RF信号也称为电磁信号，RF电路705将电信号转换为电磁信号或将电磁信号转换为电信号，并且通过该电磁信号与通信网络以及其他设备进行通信。RF电路705可以包括用于执行这些功能的已知电路，其包括但不限于天线系统、RF收发机、一个或多个放大器、调谐器、一个或多个振荡器、数字信号处理器、CODEC(COder-DECoder，编译码器)芯片组、用户标识模块(Subscriber Identity Module，SIM)等等。The RF circuit 705 is mainly used to establish communication between the mobile phone and the wireless network (that is, the network side), and realize data reception and transmission between the mobile phone and the wireless network. Such as sending and receiving short messages, e-mails, etc. Specifically, the RF circuit 705 receives and sends RF signals, which are also called electromagnetic signals, and the RF circuit 705 converts electrical signals into electromagnetic signals or converts electromagnetic signals into electrical signals, and communicates with communication networks and other devices through the electromagnetic signals to communicate. RF circuitry 705 may include known circuitry for performing these functions including, but not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC ( COder-DECoder, Codec) Chipset, Subscriber Identity Module (Subscriber Identity Module, SIM) and so on.

音频电路706，主要用于从外设接口703接收音频数据，将该音频数据转换为电信号，并且将该电信号发送给扬声器711。The audio circuit 706 is mainly used to receive audio data from the peripheral interface 703 , convert the audio data into electrical signals, and send the electrical signals to the speaker 711 .

扬声器711，用于将手机通过RF电路705从无线网络接收的语音信号，还原为声音并向用户播放该声音。The speaker 711 is used to restore the voice signal received by the mobile phone from the wireless network through the RF circuit 705 into sound and play the sound to the user.

电源管理芯片708，用于为CPU702、I/O子系统及外设接口所连接的硬件进行供电及电源管理。The power management chip 708 is used for power supply and power management for the hardware connected to the CPU 702 , the I/O subsystem and the peripheral interface.

上述实施例中提供的人机交互装置、存储介质及智能终端可执行本发明任意实施例所提供的人机交互方法，具备执行该方法相应的功能模块和有益效果。未在上述实施例中详尽描述的技术细节，可参见本发明任意实施例所提供的人机交互方法。The human-computer interaction device, storage medium, and smart terminal provided in the above embodiments can execute the human-computer interaction method provided in any embodiment of the present invention, and have corresponding functional modules and beneficial effects for executing the method. For technical details not exhaustively described in the foregoing embodiments, reference may be made to the human-computer interaction method provided in any embodiment of the present invention.

注意，上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解，本发明不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此，虽然通过以上实施例对本发明进行了较为详细的说明，但是本发明不仅仅限于以上实施例，在不脱离本发明构思的情况下，还可以包括更多其他等效实施例，而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, and the present invention The scope is determined by the scope of the appended claims.