WO2022156611A1

Movatterモバイル変換

Info

Publication number: WO2022156611A1
Application number: PCT/CN2022/072203
Authority: WO
Inventors: 李泽华; 张涛; 禤小兵
Original assignee: Shenzhen Pudu Technology Co Ltd
Current assignee: Shenzhen Pudu Technology Co Ltd
Priority date: 2021-01-21
Filing date: 2022-01-15
Publication date: 2022-07-28
Anticipated expiration: 2023-07-21
Also published as: CN112925235A

Abstract

A sound source positioning method and device during interaction and a computer readable storage medium. The method comprises: a sound pickup array picking up target sound information around a robot (S101); determining azimuth information of a sound source of the target sound information according to the target sound information picked up by the sound pickup array (S102); sending the azimuth information of the sound source of the target sound information to a servo mechanism of the robot, such that the servo mechanism rotates the head of the robot to directly face the sound source sending the target sound information (S103).

Description

Translated fromChinese

交互时的声源定位方法、设备和计算机可读存储介质Interactive sound source localization method, device and computer-readable storage medium

本申请要求于2021年1月21日提交中国国家知识产权局专利局、申请号为202110084524.3、申请名称为“交互时的声源定位方法、设备和计算机可读存储介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application filed on January 21, 2021 with the Patent Office of the State Intellectual Property Office of the People's Republic of China, the application number is 202110084524.3, and the application name is "Sound source localization method, device and computer-readable storage medium during interaction" rights, the entire contents of which are incorporated herein by reference.

技术领域technical field

本申请涉及机器人领域，特别涉及一种交互时的声源定位方法、设备和计算机可读存储介质。The present application relates to the field of robotics, and in particular, to a method, device and computer-readable storage medium for sound source localization during interaction.

背景技术Background technique

机器人技术的飞速发展，使得机器人在各种场景得到广泛应用。人们希望这些应用场景下的机器人能够与用户进行交互，尤其是一些特定应用场景下的机器人，例如，陪伴老年人、独居者等的机器人，能否与用户进行良好交互作为能否给用户带来良好体验的重要标准。The rapid development of robotics technology has made robots widely used in various scenarios. It is hoped that robots in these application scenarios can interact with users, especially robots in some specific application scenarios, such as robots accompanying the elderly, living alone, etc., whether they can interact well with users as a result of whether they can bring benefits to users. Important criteria for a good experience.

用户与机器人交互的一种方式是语音交互，例如，唤醒机器人、在唤醒机器人后与机器人的对话，等等。和人与人之间交互类似，机器人能够正对发出声音（无论是指令还是具有情感的对话）的用户，是形成良好的交互体验的前提，而机器人不是每时每刻都正面对着用户，因此，这种交互方式下对声源的定位尤为重要。One way the user interacts with the bot is through voice interaction, for example, waking the bot, having a conversation with the bot after waking up the bot, and so on. Similar to human-to-human interaction, a robot can face the user who makes a sound (whether it is an instruction or an emotional dialogue), which is a prerequisite for a good interactive experience, but the robot does not face the user all the time. Therefore, the localization of the sound source in this interactive mode is particularly important.

然而，现有交互时的声源定位方法中，由于不能使机器人准确地定位声源方向，导致用户发出声音时，机器人不能正对用户，让用户的体验下降。However, in the existing interactive sound source localization method, since the robot cannot accurately locate the sound source direction, when the user makes a sound, the robot cannot face the user, which reduces the user's experience.

技术问题technical problem

在此处键入技术问题描述段落。Type a technical problem description paragraph here.

技术解决方案technical solutions

根据本申请的各种实施例，提供一种交互时的声源定位方法、设备和计算机可读存储介质。According to various embodiments of the present application, an interactive sound source localization method, device, and computer-readable storage medium are provided.

一种交互时的声源定位方法，包括：An interactive sound source localization method, comprising:

拾音器阵列拾取机器人周边的目标声音信息；The pickup array picks up the target sound information around the robot;

根据所述拾音器阵列拾取的目标声音信息，确定所述目标声音信息的声源的方位信息；According to the target sound information picked up by the pickup array, determine the orientation information of the sound source of the target sound information;

将所述目标声音信息的声源的方位信息发送至所述机器人的伺服机构，以使所述伺服机构转动所述机器人的头部正对发出所述目标声音信息的声源。The orientation information of the sound source of the target sound information is sent to the servo mechanism of the robot, so that the servo mechanism rotates the head of the robot to face the sound source that emits the target sound information.

一种交互时的声源定位装置，包括：An interactive sound source localization device, comprising:

拾音器阵列模块，用于通过拾音器阵列拾取机器人周边的目标声音信息；The pickup array module is used to pick up the target sound information around the robot through the pickup array;

方位信息确定模块，用于根据拾音器阵列拾取的目标声音信息，确定目标声音信息的声源的方位信息；an orientation information determination module, used for determining orientation information of the sound source of the target sound information according to the target sound information picked up by the pickup array;

驱动模块，用于将目标声音信息的声源的方位信息发送至机器人的伺服机构，以使伺服机构转动机器人的头部正对发出目标声音信息的声源。The driving module is used for sending the orientation information of the sound source of the target sound information to the servo mechanism of the robot, so that the servo mechanism rotates the head of the robot to face the sound source that emits the target sound information.

一种设备，所述设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上述交互时的声源定位方法的技术方案的步骤。A device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the computer program to achieve a sound source during interaction as described above The steps of the technical solution of the positioning method.

一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如上述交互时的声源定位方法的技术方案的步骤。A computer-readable storage medium storing a computer program, when the computer program is executed by a processor, implements the steps of the technical solution of the above-mentioned interactive sound source localization method.

本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the present application will be apparent from the description, drawings, and claims.

有益效果beneficial effect

在此处键入有益效果描述段落。Type a benefit description paragraph here.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他实施例的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, the drawings of other embodiments can also be obtained according to these drawings without creative efforts.

图1是本申请实施例提供的交互时的声源定位方法的流程图；1 is a flowchart of a method for localizing a sound source during interaction provided by an embodiment of the present application;

图2是本申请实施例提供的交互时的声源定位装置的结构示意图；2 is a schematic structural diagram of an interactive sound source localization device provided by an embodiment of the present application;

图3是本申请另一实施例提供的交互时的声源定位装置的结构示意图；3 is a schematic structural diagram of an interactive sound source localization device provided by another embodiment of the present application;

图4是本申请实施例提供的设备的结构示意图。FIG. 4 is a schematic structural diagram of a device provided by an embodiment of the present application.

本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION

为了便于理解本申请，下面将参照相关附图对本申请进行更全面的描述。附图中给出了本申请的较佳实施例。但是，本申请可以以许多不同的形式来实现，并不限于本文所描述的实施例。相反地，提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。In order to facilitate understanding of the present application, the present application will be described more fully below with reference to the related drawings. The preferred embodiments of the present application are shown in the accompanying drawings. However, the application may be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided.

除非另有定义，本文所使用的所有的技术和科学术语与属于发明的技术领域的技术人员通常理解的含义相同。本文中在发明的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在限制本申请。本文所使用的术语“和／或”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of the invention. The terms used herein in the description of the invention are for the purpose of describing particular embodiments only and are not intended to limit the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

本申请提出了一种交互时的声源定位方法，可应用于机器人，该机器人可以是在餐厅作业的机器人，例如，传菜机器人，也可以是在医疗场所，例如医院作业的送药机器人，还可以是家用机器人，例如，陪伴老年人或独居者的情感机器人，等等。如附图1所示，交互时的声源定位方法主要包括步骤S101至S103，详述如下：This application proposes an interactive sound source localization method, which can be applied to a robot. The robot can be a robot operating in a restaurant, such as a food delivery robot, or a medicine delivery robot operating in a medical place, such as a hospital. It can also be a domestic robot, for example, an emotional robot that accompanies the elderly or those who live alone, and so on. As shown in FIG. 1 , the method of sound source localization during interaction mainly includes steps S101 to S103, which are described in detail as follows:

步骤S101：拾音器阵列拾取机器人周边的目标声音信息。Step S101 : the pickup array picks up target sound information around the robot.

所谓拾音器阵列，是指按照一定规则排列的多个拾音器的组合，拾音器阵列中的每个拾音器均具有独立获取声音信号或声音信息的功能。在本申请实施例中，拾音器阵列可以是由4至6个拾音器组成的阵列，其可以设置在机器人头部的前、后、左、右、上和下等6个位置，拾音器可以是麦克风（包括其内部的音频放大电路）等常见声音信号获取设备。The so-called pickup array refers to a combination of multiple pickups arranged according to certain rules. Each pickup in the pickup array has the function of independently acquiring sound signals or sound information. In the embodiment of the present application, the pickup array may be an array composed of 4 to 6 pickups, which may be arranged at 6 positions of the front, rear, left, right, upper and lower of the robot head, and the pickups may be microphones ( Including its internal audio amplifier circuit) and other common sound signal acquisition equipment.

在本申请实施例中，目标声音信息可以是具有特定含义的语音内容，包括用于唤醒机器人的唤醒词、对机器人亲昵的称呼以及与机器人交互时的常用语，例如“小艾小艾，请醒一醒”、“小艾，请转过头来”以及“小艾，请帮我将水杯拿过来一下”等等。由于用户可能处于机器人的任何一个方位，因此，拾音器阵列需要拾取机器人的周边声音，以期拾取到目标声音信息。由于对于特定的语音信息，其必然包含特定的声学特征，例如语音的响度、音调、频率、音色甚至特定人员的声纹等，因此，作为本申请一个实施例，拾音器阵列拾取机器人周边的目标声音信息可以是：通过对拾音器阵列拾取的周边声音进行声学特征提取，得到包含声学特征的声源信息，将该声源信息的声学特征与预存声学特征进行比较，若匹配，则确定该声源信息为目标声音信息。考虑到机器人周边的声音信号复杂，除了目标声音信息，还包括一些噪声等干扰信息，在上述拾音器阵列拾取机器人周边的目标声音信息的实施例中，在对拾音器阵列拾取的周边声音进行声学特征提取时，还可以对周边声音进行除干扰的处理，包括消除或降低噪声干扰。具体地，消除或降低噪声干扰的一种方法可以是：确定拾音器阵列中各个拾音器采集到的周边声音的音量，计算出音量差值小于预定差值阈值的声音信号，将这些声音信号中频率高于第一频率阈值或低于第二频率阈值和/或持续时长高于第一持续时长或低于第二持续时长的声音信号确定为噪声信号，其中，上述计算出音量差值小于预定差值阈值的声音信号可以是：对于一组拾音器，将其中任意一个拾音器视为主拾音器，其他拾音器视为次拾音器，对该组拾音器采集到的频段相同的声音信号，计算各个次拾音器所采集到的声音信号的音量平均值，计算音量平均值与主拾音器采集到的声音信号的音量的差值，当差值小于预定差值阈值时，将对应的声音信号确定为音量差值小于预定差值阈值的声音信号。In this embodiment of the present application, the target sound information may be voice content with a specific meaning, including a wake-up word used to wake up the robot, an intimate address for the robot, and common words when interacting with the robot, such as "Xiao Ai Xiao Ai, please Wake up", "Xiao Ai, please turn your head" and "Xiao Ai, please bring me the water glass" and so on. Since the user may be in any position of the robot, the pickup array needs to pick up the surrounding sounds of the robot, in order to pick up the target sound information. Since specific voice information must contain specific acoustic characteristics, such as the loudness, pitch, frequency, timbre of the voice and even the voiceprint of a specific person, etc., therefore, as an embodiment of the present application, the pickup array picks up the target sound around the robot The information may be: by performing acoustic feature extraction on the surrounding sound picked up by the pickup array, sound source information containing the acoustic features is obtained, the acoustic features of the sound source information are compared with the pre-stored acoustic features, and if they match, the sound source information is determined. for the target sound information. Considering that the sound signal around the robot is complex, in addition to the target sound information, it also includes some interference information such as noise. In the above embodiment of the pickup array picking up the target sound information around the robot, the acoustic feature extraction is performed on the surrounding sound picked up by the pickup array. When , you can also perform interference removal processing on the surrounding sound, including eliminating or reducing noise interference. Specifically, a method for eliminating or reducing noise interference may be: determining the volume of the surrounding sounds collected by each pickup in the pickup array, calculating the sound signals whose volume difference is less than a predetermined difference threshold, and classifying the high-frequency sound in these sound signals A sound signal with a first frequency threshold or lower than a second frequency threshold and/or a duration higher than the first duration or lower than the second duration is determined as a noise signal, wherein the calculated volume difference is less than a predetermined difference The sound signal of the threshold can be: for a group of pickups, consider any one of the pickups as the main pickup, and the other pickups as the secondary pickups. The volume average value of the sound signal, calculate the difference between the volume average value and the volume of the sound signal collected by the main pickup, when the difference value is less than the predetermined difference threshold, determine the corresponding sound signal as the volume difference is less than the predetermined difference threshold sound signal.

步骤S102：根据拾音器阵列拾取的目标声音信息，确定目标声音信息的声源的方位信息。Step S102: Determine the orientation information of the sound source of the target sound information according to the target sound information picked up by the microphone array.

上述步骤S102对拾音器阵列拾取的周边声音进行声学特征提取过程包括对拾音器阵列拾取的周边声音进行音强即声音强度的提取，即得到声源信息的音强，因此，拾音器阵列拾取到的目标声音信息还包含其音强。作为本申请的一个实施例，根据拾音器阵列拾取的目标声音信息，确定目标声音信息的声源的方位信息可以是：计算目标声音信息到达拾音器阵列中每个拾音器的时间，确定拾音器阵列采集目标声音信息的延迟时间，根据该延迟时间和目标声音信息的音强，确定目标声音信息的声源的方位信息。上述定位原理实际上是模拟了人体听觉系统听音辨位的思想并结合几何知识得到，此处不做赘述。The above-mentioned step S102 performs an acoustic feature extraction process on the surrounding sound picked up by the pickup array, including extracting the sound intensity, that is, the sound intensity, of the surrounding sound picked up by the pickup array, that is, the sound intensity of the sound source information is obtained. Therefore, the target sound picked up by the pickup array The information also includes its sound intensity. As an embodiment of the present application, according to the target sound information picked up by the pickup array, determining the orientation information of the sound source of the target sound information may be: calculating the time when the target sound information arrives at each pickup in the pickup array, and determining the pickup array to collect the target sound. The information delay time is used to determine the orientation information of the sound source of the target sound information according to the delay time and the sound intensity of the target sound information. The above positioning principle is actually obtained by simulating the idea of listening to the position of the human auditory system and combining with geometric knowledge, and will not be repeated here.

在本申请另一实施例中，根据拾音器阵列拾取的目标声音信息，确定目标声音信息的声源的方位信息可以是：将机器人所在空间划分成若干个空域；判断第一个接收到目标声音信息的拾音器和第二个接收到目标声音信息的拾音器，以确定目标声音信息的声源所在空域；根据第一个和第二个接收到目标声音信息的两个时刻以及目标声音信息的声源所在空域，计算得出目标声音信息的声源的方位信息。上述判断第一个接收到目标声音信息的拾音器和第二个接收到目标声音信息的拾音器可以是设置判断阈值，该阈值为前期从目标声音信息中提取的一段语音的平均音强值，若拾音器阵列中有两个拾音器先后接收到的目标声音信息的音强高于该平均音强值，则确定该两个拾音器为第一个接收到目标声音信息的拾音器和第二个接收到目标声音信息的拾音器。上述目标声音信息的声源所在空域实际是该声源处于一对拾音器之间的角度决定，因此，根据第一个和第二个接收到目标声音信息的两个时刻以及目标声音信息的声源所在空域，计算得出目标声音信息的声源的方位信息可以是：将任意两个拾音器视为一组拾音器，对于每一组拾音器分别计算目标声音信息的声源处于该组拾音器中两个拾音器之间的方位角度β，根据该方位角度β估算目标声音信息的声源与一组拾音器之间距离D，根据该距离D确定目标声音信息的声源所在空域中一假定声源的位置，在假定声源的位置正交分解每一组拾音器确定的假定声源，计算该假定声源的水平角和高度角，定位目标声音信息的声源的方位信息。In another embodiment of the present application, according to the target sound information picked up by the pickup array, determining the orientation information of the sound source of the target sound information may be: dividing the space where the robot is located into several airspaces; judging that the first person to receive the target sound information The pickup and the second pickup that received the target sound information to determine the airspace where the sound source of the target sound information is located; according to the two moments when the first and second received the target sound information and the sound source of the target sound information Airspace, calculate the orientation information of the sound source of the target sound information. The above-mentioned judgment of the first pickup receiving the target sound information and the second pickup receiving the target sound information may be to set a judgment threshold, and the threshold is the average sound intensity value of a segment of speech extracted from the target sound information in the early stage. If the pickup If the sound intensity of the target sound information successively received by two pickups in the array is higher than the average sound intensity value, the two pickups are determined to be the first pickup to receive the target sound information and the second pickup to receive the target sound information. 's pickups. The airspace where the sound source of the target sound information is located is actually determined by the angle at which the sound source is located between a pair of pickups. Therefore, according to the two moments when the first and second received the target sound information and the sound source of the target sound information In the airspace where the target sound information is located, the orientation information of the sound source of the target sound information can be calculated as follows: any two pickups are regarded as a group of pickups, and the sound source of the target sound information calculated for each group of pickups is located in the two pickups in the group of pickups. The azimuth angle β between the two, according to the azimuth angle β to estimate the distance D between the sound source of the target sound information and a group of pickups, according to the distance D to determine the position of a hypothetical sound source in the airspace where the sound source of the target sound information is located, in The position of the assumed sound source is decomposed orthogonally to the assumed sound source determined by each group of pickups, the horizontal angle and the elevation angle of the assumed sound source are calculated, and the orientation information of the sound source of the target sound information is located.

在本申请另一实施例中，上述方法还可以包括：按照拾音器阵列拾取的目标声音信息的至少一个声源方位进行图像采集，从采集的图像中，识别发声部位的形态学特征。由于拾音器阵列中每个拾音器都可以拾取目标声音信息，因此，拾音器阵列拾取的目标声音信息包括多个声源方位，可以按照拾音器阵列拾取的目标声音信息的至少一个声源方位进行图像采集，并识别这些采集的图像中发声部位的形态学特征，例如，发声时的口型等。In another embodiment of the present application, the above method may further include: performing image acquisition according to at least one sound source azimuth of the target sound information picked up by the pickup array, and identifying the morphological features of the sounding part from the acquired images. Since each pickup in the pickup array can pick up target sound information, the target sound information picked up by the pickup array includes multiple sound source azimuths, and image acquisition can be performed according to at least one sound source azimuth of the target sound information picked up by the pickup array, and Identify the morphological features of the vocalization site in these acquired images, such as the mouth shape during vocalization.

结合上述按照拾音器阵列拾取的目标声音信息的至少一个声源方位进行图像采集，从采集的图像中，识别发声部位的形态学特征这一技术手段，在本申请一个实施例中，根据拾音器阵列拾取的目标声音信息，确定目标声音信息的声源的方位信息可以是：根据采集的图像中发声部位的形态学特征与目标声音信息之间的匹配度，从拾音器阵列拾取的目标声音信息的至少一个声源方位中，确定目标声音信息的声源的最终方位信息。具体而言，根据采集的图像中发声部位的形态学特征与目标声音信息之间的匹配度，从拾音器阵列拾取的目标声音信息的至少一个声源方位中，确定目标声音信息的声源的最终方位信息可以是：获取拾音器阵列拾取的目标声音信息的至少一个声源方位中各声源方位的预测方位概率值；根据预测方位概率值和语音表达部位的形态学特征与目标声音信息之间的匹配度，确定声源方位所对应的声源方位值；选取对应于最大声源方位值的声源方位作为目标声音信息的声源的最终方位信息，其中，声源方位值用于表征获取的声源方位为目标声音信息的最终方位信息的概率。上述实施例，实际上是将听觉和视觉的定位方法结合，即通过图像中发声部位的形态学特征进行辅助定位声源方向，因此，相较于仅根据声音确定声源方位而言，可提高对声源定位的准确性。Combined with the above-mentioned image acquisition according to at least one sound source azimuth of the target sound information picked up by the pickup array, and the technical means of identifying the morphological characteristics of the sounding part from the collected image, in an embodiment of the present application, the pickup array picks up According to the matching degree between the morphological features of the sounding part in the collected image and the target sound information, at least one of the target sound information picked up from the pickup array In the sound source orientation, the final orientation information of the sound source of the target sound information is determined. Specifically, according to the matching degree between the morphological features of the sounding part in the collected image and the target sound information, from at least one sound source azimuth of the target sound information picked up by the pickup array, the final sound source of the target sound information is determined. The orientation information may be: obtaining the predicted orientation probability value of each sound source orientation in at least one sound source orientation of the target sound information picked up by the pickup array; Matching degree, determine the sound source azimuth value corresponding to the sound source azimuth; select the sound source azimuth corresponding to the maximum sound source azimuth value as the final azimuth information of the sound source of the target sound information, wherein the sound source azimuth value is used to characterize the acquired The sound source bearing is the probability of the final bearing information of the target sound information. The above embodiment actually combines the auditory and visual localization methods, that is, the morphological features of the sounding part in the image are used to assist in localizing the direction of the sound source. Accuracy of sound source localization.

步骤S103：将目标声音信息的声源的方位信息发送至机器人的伺服机构，以使伺服机构转动机器人的头部正对发出目标声音信息的声源。Step S103: Send the orientation information of the sound source of the target sound information to the servo mechanism of the robot, so that the servo mechanism rotates the head of the robot to face the sound source that emits the target sound information.

本申请实施例中，伺服机构是指经由闭回路控制方式达到一个机械系统位置、速度、或加速度控制的系统，通常包含受控体、致动器、传感器和控制器等几个部分，其中，控制器部分与机器人中的中央处理单元连接。当经步骤S102确定了目标声音信息的声源的方位信息后，中央处理单元将目标声音信息的声源的方位信息发送至伺服机构的控制器，控制器驱动致动器（通常是一个马达）转动机器人的头部正对发出目标声音信息的声源。In the embodiments of the present application, a servo mechanism refers to a system that achieves position, speed, or acceleration control of a mechanical system through a closed-loop control method, and usually includes several parts such as a controlled body, an actuator, a sensor, and a controller, among which, The controller part is connected with the central processing unit in the robot. After determining the position information of the sound source of the target sound information in step S102, the central processing unit sends the position information of the sound source of the target sound information to the controller of the servo mechanism, and the controller drives the actuator (usually a motor) Turn the robot's head to face the sound source that emits the target sound message.

具体地，可以在机器人上设置一定位传感器（例如陀螺仪），该定位传感器用于感知机器人当前所正对的方向，伺服机构根据目标声音信息的声源的方位信息以及机器人当前所正对的方向，计算出机器人需要转动的方向和角度，然后使机器人的头部向需要转动的方向转动对应的角度，最终正对发出目标声音信息的声源。可以理解的是，由于前述确定的目标声音信息的声源的方位信息是三维空间的方位信息，因此，机器人的头部转动的方向分为两种情形，即，声源与机器人的头部等高以及声源与机器人的头部不等高；相应地，将目标声音信息的声源的方位信息发送至机器人的伺服机构，以使伺服机构转动机器人的头部正对发出目标声音信息的声源亦分为两种情形，即，当声源与机器人的头部等高时，将目标声音信息的声源相对机器人的头部的第一平面夹角发送至机器人的伺服机构，以使伺服机构按照第一平面夹角向左或向右转动机器人的头部正对发出目标声音信息的声源；当目标声音信息的声源与机器人的头部不等高时，将机器人的头部相对目标声音信息的声源的俯仰角或机器人的头部相对目标声音信息的声源的俯仰角和目标声音信息的声源相对机器人的头部的第二平面夹角发送至机器人的伺服机构，以使伺服机构按照俯仰角上下转动机器人的头部或按照俯仰角上下转动机器人的头部之后再按照第二平面夹角向左或向右转动机器人的头部正对发出目标声音信息的声源。上述实施例中，第一平面夹角或第二平面夹角是指声源与机器人的头部处于同一平面时的夹角，俯仰角包括机器人的头部相对声源的俯视角度和仰视角度。Specifically, a positioning sensor (such as a gyroscope) can be set on the robot, and the positioning sensor is used to sense the direction the robot is currently facing. Calculate the direction and angle that the robot needs to rotate, and then rotate the robot's head to the corresponding angle in the direction that needs to be rotated, and finally face the sound source that emits the target sound information. It can be understood that, since the azimuth information of the sound source of the target sound information determined above is the azimuth information of the three-dimensional space, the rotation direction of the head of the robot is divided into two situations, that is, the sound source and the head of the robot, etc. The height of the sound source and the height of the sound source are not equal to the head of the robot; accordingly, the orientation information of the sound source of the target sound information is sent to the servo mechanism of the robot, so that the servo mechanism rotates the head of the robot to face the sound that emits the target sound information. The source is also divided into two situations, that is, when the sound source is at the same height as the robot's head, the angle between the sound source of the target sound information and the first plane of the robot's head is sent to the robot's servo mechanism, so that the servo The mechanism rotates the robot's head to the left or right according to the angle of the first plane to face the sound source that emits the target sound information; when the sound source of the target sound information is not at the same height as the robot's head, the robot's head The pitch angle relative to the sound source of the target sound information or the pitch angle of the robot's head relative to the sound source of the target sound information and the included angle of the sound source of the target sound information relative to the second plane of the robot's head are sent to the servo mechanism of the robot, In order to make the servo mechanism rotate the robot's head up and down according to the pitch angle or turn the robot's head up and down according to the pitch angle, then turn the robot's head left or right according to the angle of the second plane to face the sound of the target sound information. source. In the above embodiment, the first plane angle or the second plane angle refers to the angle when the sound source and the head of the robot are on the same plane, and the pitch angle includes the top and bottom angles of the robot's head relative to the sound source.

需要进一步说明的是，上述本申请实施例中，无论是根据拾音器阵列拾取的目标声音信息确定声源的方位信息，还是结合采集的图像和拾音器阵列拾取的目标声音信息确定声源的方位信息，使机器人的伺服机构转动机器人的头部正对发出目标声音信息的声源，并非严格意义上的实时定位，这是因为，从确定目标声音信息的声源的方位信息，到转动机器人的头部正对发出目标声音信息的声源具有一个时间差或者先后顺序（尽管这个过程比较短）。为了增强实时性，在本申请一个实施例中，可以在确定目标声音信息的声源的方位信息后，结合图像识别算法，持续跟踪发出目标声音信息的声源，具体可以是在声源的发声范围内，利用人像识别技术检测其中是否存在人脸，或者利用人脸识别技术检测在声源的发声范围是否存在脸部特征与预存的人脸模板相同的用户，或者，利用唇部运动检测技术，检测在声源的发声范围是否存在唇部发生运动的用户，若是，则锁定发出目标声音信息的声源以对其进行持续跟踪。It should be further noted that, in the above-mentioned embodiments of the present application, whether the orientation information of the sound source is determined according to the target sound information picked up by the pickup array, or the orientation information of the sound source is determined in combination with the collected image and the target sound information picked up by the pickup array, Making the robot's servo mechanism rotate the robot's head to face the sound source that emits the target sound information is not a real-time positioning in the strict sense. This is because, from determining the orientation information of the sound source of the target sound information, to turning the robot's head There is a time difference or sequence (although this process is relatively short) to the sound source that emits the target sound information. In order to enhance the real-time performance, in an embodiment of the present application, after determining the orientation information of the sound source of the target sound information, combined with the image recognition algorithm, the sound source that emits the target sound information can be continuously tracked. Within the range, use the face recognition technology to detect whether there is a face, or use the face recognition technology to detect whether there is a user with the same facial features as the pre-stored face template in the sound source range, or use the lip motion detection technology. , detect whether there is a user whose lips move in the sounding range of the sound source, and if so, lock the sound source that emits the target sound information to continuously track it.

从上述附图1示例的交互时的声源定位方法可知，由于拾取机器人周边的目标声音信息是拾音器阵列，可从多个方向拾取目标声音信息，因而根据这些拾音器阵列拾取的目标声音信息确定目标声音信息的声源的方位信息，其定位结果相对比较准确，能够使得伺服机构转动机器人的头部，准确正对发出目标声音信息的声源，提高机器人与用户的交互效率，也能提升用户的使用体验。It can be seen from the above-mentioned sound source localization method in the example of FIG. 1 that the target sound information around the pickup robot is a pickup array, and the target sound information can be picked up from multiple directions, so the target sound information picked up by these pickup arrays is used to determine the target. The orientation information of the sound source of the sound information, the positioning result is relatively accurate, which can make the servo mechanism rotate the head of the robot to accurately face the sound source that emits the target sound information, improve the interaction efficiency between the robot and the user, and also improve the user's experience. Use experience.

请参阅附图2，是本申请实施例提供的一种交互时的声源定位装置，可以包括拾音器阵列模块201、方位信息确定模块202和驱动模块203，详述如下：Please refer to FIG. 2, which is an interactive sound source localization device provided by an embodiment of the present application, which may include a pickup array module 201, an orientation information determination module 202, and a drive module 203, and the details are as follows:

拾音器阵列模块201，用于拾取机器人周边的目标声音信息；The pickup array module 201 is used to pick up target sound information around the robot;

方位信息确定模块202，用于根据拾音器阵列201拾取的目标声音信息，确定目标声音信息的声源的方位信息；The orientation information determination module 202 is used for determining the orientation information of the sound source of the target sound information according to the target sound information picked up by the pickup array 201;

驱动模块203，用于将目标声音信息的声源的方位信息发送至机器人的伺服机构，以使伺服机构转动机器人的头部正对发出目标声音信息的声源。The driving module 203 is configured to send the orientation information of the sound source of the target sound information to the servo mechanism of the robot, so that the servo mechanism rotates the head of the robot to face the sound source that emits the target sound information.

可选地，附图2示例的拾音器阵列模块201可以包括特征提取单元和目标信息确定单元，其中：Optionally, the pickup array module 201 shown in FIG. 2 may include a feature extraction unit and a target information determination unit, wherein:

特征提取单元，用于通过对拾音器阵列拾取的周边声音进行声学特征提取，得到包含声学特征的声源信息；A feature extraction unit, configured to obtain sound source information including acoustic features by performing acoustic feature extraction on the surrounding sounds picked up by the pickup array;

采集单元，用于将声源信息的声学特征与预存声学特征进行比较，若匹配，则确定声源信息为目标声音信息。The acquisition unit is configured to compare the acoustic features of the sound source information with the pre-stored acoustic features, and if they match, determine the sound source information as the target sound information.

可选地，附图2示例的装置中，声学特征包括声源信息的音强，方位信息确定模块202可以包括计算单元和第一方位信息确定单元，其中：Optionally, in the apparatus shown in FIG. 2 , the acoustic feature includes the sound intensity of the sound source information, and the orientation information determination module 202 may include a calculation unit and a first orientation information determination unit, wherein:

计算单元，用于计算目标声音信息到达拾音器阵列中每个拾音器的时间，确定拾音器阵列采集目标声音信息的延迟时间；a computing unit, used to calculate the time when the target sound information reaches each pickup in the pickup array, and determine the delay time for the pickup array to collect the target sound information;

第一方位信息确定单元，用于根据目标声音信息的延迟时间和目标声音信息的音强，确定目标声音信息的声源的方位信息。The first azimuth information determining unit is configured to determine the azimuth information of the sound source of the target sound information according to the delay time of the target sound information and the sound intensity of the target sound information.

可选地，附图2示例的方位信息确定模块202可以包括空域划分单元、判断单元和第二方位信息确定单元，其中：Optionally, the orientation information determination module 202 in the example of FIG. 2 may include an airspace division unit, a determination unit and a second orientation information determination unit, wherein:

空域划分单元，用于将机器人所在空间划分成若干个空域；The airspace division unit is used to divide the space where the robot is located into several airspaces;

判断单元，用于判断第一个接收到目标声音信息的拾音器和第二个接收到目标声音信息的拾音器，以确定目标声音信息的声源所在的空域；A judging unit for judging the first pickup receiving the target sound information and the second pickup receiving the target sound information, to determine the airspace where the sound source of the target sound information is located;

第二方位信息确定单元，用于根据第一个和第二个接收到目标声音信息的两个时刻以及目标声音信息所在的空域，计算得出目标声音信息的声源的方位信息。The second azimuth information determining unit is configured to calculate and obtain the azimuth information of the sound source of the target sound information according to the first and second times when the target sound information is received and the airspace where the target sound information is located.

可选地，附图2示例的装置还可以包括图像采集模块301和识别模块302，如附图3示例的交互时的声源定位装置，其中：Optionally, the apparatus illustrated in FIG. 2 may further include an image acquisition module 301 and an identification module 302, such as the interactive sound source localization apparatus illustrated in FIG. 3 , wherein:

图像采集模块301，用于按照拾音器阵列拾取的目标声音信息的至少一个声源方位进行图像采集；An image acquisition module 301, configured to perform image acquisition according to at least one sound source orientation of the target sound information picked up by the pickup array;

识别模块302，用于从图像采集模块301采集的图像中，识别发声部位的形态学特征。The identification module 302 is configured to identify the morphological features of the vocalization part from the images collected by the image collection module 301 .

可选地，附图3示例的方位信息确定模块202可以包括第三方位信息确定单元，用于根据发声部位的形态学特征与目标声音信息之间的匹配度，从拾音器阵列拾取的目标声音信息的至少一个声源方位中，确定目标声音信息的声源的最终方位信息。Optionally, the orientation information determination module 202 of the example of FIG. 3 may include a third orientation information determination unit, which is used to obtain the target sound information picked up from the pickup array according to the matching degree between the morphological features of the vocal part and the target sound information. In the azimuth of at least one sound source, the final azimuth information of the sound source of the target sound information is determined.

可选地，上述第三方位信息确定单元可以包括预测方位概率值获取单元、声源方位值确定单元和选取单元，其中：Optionally, the above-mentioned third location information determination unit may include a predicted location probability value acquisition unit, a sound source location value determination unit and a selection unit, wherein:

预测方位概率值获取单元，用于获取至少一个声源方位的声源方位中各声源方位的预测方位概率值；a predicted azimuth probability value acquisition unit, configured to acquire the predicted azimuth probability value of each sound source azimuth in the sound source azimuth of at least one sound source azimuth;

声源方位值确定单元，用于根据预测方位概率值和匹配度，确定声源方位所对应的声源方位值，其中，声源方位值用于表征获取的声源方位为目标声音信息的最终方位信息的概率；The sound source orientation value determination unit is used to determine the sound source orientation value corresponding to the sound source orientation according to the predicted orientation probability value and the matching degree, wherein the sound source orientation value is used to indicate that the acquired sound source orientation is the final result of the target sound information. Probability of orientation information;

选取单元，用于选取对应于最大声源方位值的声源方位作为目标声音信息的声源的最终方位信息。The selecting unit is used for selecting the sound source azimuth corresponding to the maximum sound source azimuth value as the final azimuth information of the sound source of the target sound information.

可选地，附图2示例的驱动模块203可以包括第一转动单元和第二转动单元，其中：Optionally, the driving module 203 shown in FIG. 2 may include a first rotating unit and a second rotating unit, wherein:

第一转动单元，用于当目标声音信息的声源与机器人的头部等高时，将目标声音信息的声源相对机器人的头部的第一平面夹角发送至机器人的伺服机构，以使机器人的伺服机构按照第一平面夹角向左或向右转动机器人的头部正对发出目标声音信息的声源；The first rotation unit is used to send the angle between the sound source of the target sound information and the head of the robot to the servo mechanism of the robot when the sound source of the target sound information is at the same height as the head of the robot. The servo mechanism of the robot rotates the head of the robot to the left or right according to the angle of the first plane to face the sound source that emits the target sound information;

第二转动单元，用于当目标声音信息的声源与机器人的头部不等高时，将机器人的头部相对目标声音信息的声源的俯仰角或机器人的头部相对目标声音信息的声源的俯仰角和目标声音信息的声源相对机器人的头部的第二平面夹角发送至机器人的伺服机构，以使伺服机构按照俯仰角上下转动机器人的头部或按照俯仰角上下转动机器人的头部之后再按照第二平面夹角向左或向右转动机器人的头部正对发出目标声音信息的声源。The second rotation unit is used to adjust the pitch angle of the robot's head relative to the sound source of the target sound information or the sound source of the robot's head relative to the target sound information when the sound source of the target sound information is not at the same height as the robot's head. The pitch angle of the source and the included angle of the sound source of the target sound information relative to the second plane of the robot's head are sent to the robot's servo mechanism, so that the servo mechanism can rotate the robot's head up and down according to the pitch angle or rotate the robot's head up and down according to the pitch angle. Then turn the head of the robot to the left or right according to the included angle of the second plane to face the sound source that emits the target sound information.

从以上技术方案的描述中可知，由于拾取机器人周边的目标声音信息是拾音器阵列，可从多个方向拾取目标声音信息，因而根据这些拾音器阵列拾取的目标声音信息确定目标声音信息的声源的方位信息，其定位结果相对比较准确，能够使得伺服机构转动机器人的头部，准确正对发出目标声音信息的声源，提高机器人与用户的交互效率，也能提升用户的使用体验。As can be seen from the description of the above technical solutions, since the target sound information around the pickup robot is a pickup array, the target sound information can be picked up from multiple directions, so the orientation of the sound source of the target sound information is determined according to the target sound information picked up by these pickup arrays. The positioning result is relatively accurate, which enables the servo mechanism to rotate the head of the robot to accurately face the sound source that emits the target sound information, improve the interaction efficiency between the robot and the user, and also improve the user experience.

图4是本申请一实施例提供的设备的结构示意图。如图4所示，该实施例的设备4主要包括：处理器40、存储器41以及存储在存储器41中并可在处理器40上运行的计算机程序42，例如交互时的声源定位方法的程序。处理器40执行计算机程序42时实现上述交互时的声源定位方法实施例中的步骤，例如图1所示的步骤S101至S103。或者，处理器40执行计算机程序42时实现上述各装置实施例中各模块/单元的功能，例如图2所示拾音器阵列模块201、方位信息确定模块202和驱动模块203的功能。FIG. 4 is a schematic structural diagram of a device provided by an embodiment of the present application. As shown in FIG. 4 , the device 4 of this embodiment mainly includes: a processor 40 , a memory 41 , and a computer program 42 stored in the memory 41 and executable on the processor 40 , such as a program for an interactive sound source localization method . When the processor 40 executes the computer program 42 , the steps in the above-mentioned embodiment of the sound source localization method during interaction are implemented, for example, steps S101 to S103 shown in FIG. 1 . Alternatively, when the processor 40 executes the computer program 42, the functions of the modules/units in the above device embodiments, such as the functions of the pickup array module 201, the orientation information determination module 202 and the drive module 203 shown in FIG. 2, are implemented.

示例性地，交互时的声源定位方法的计算机程序42主要包括：拾音器阵列拾取机器人周边的目标声音信息；根据拾音器阵列拾取的目标声音信息，确定目标声音信息的声源的方位信息；将目标声音信息的声源的方位信息发送至机器人的伺服机构，以使伺服机构转动机器人的头部正对发出目标声音信息的声源。计算机程序42可以被分割成一个或多个模块/单元，一个或者多个模块/单元被存储在存储器41中，并由处理器40执行，以完成本申请。一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段，该指令段用于描述计算机程序42在设备4中的执行过程。例如，计算机程序42可以被分割成拾音器阵列模块201、方位信息确定模块202和驱动模块203（虚拟装置中的模块）的功能，各模块具体功能如下：拾音器阵列模块201，用于拾取机器人周边的目标声音信息；方位信息确定模块202，用于根据拾音器阵列201拾取的目标声音信息，确定目标声音信息的声源的方位信息；驱动模块203，用于将目标声音信息的声源的方位信息发送至机器人的伺服机构，以使伺服机构转动机器人的头部正对发出目标声音信息的声源。Exemplarily, the computer program 42 of the sound source localization method during interaction mainly includes: picking up the target sound information around the robot by the pickup array; determining the orientation information of the sound source of the target sound information according to the target sound information picked up by the pickup array; The orientation information of the sound source of the sound information is sent to the servo mechanism of the robot, so that the servo mechanism rotates the head of the robot to face the sound source that emits the target sound information. The computer program 42 may be divided into one or more modules/units, which are stored in the memory 41 and executed by the processor 40 to complete the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, the instruction segments being used to describe the execution of the computer program 42 in the device 4 . For example, the computer program 42 can be divided into the functions of the pickup array module 201, the orientation information determination module 202 and the driving module 203 (modules in the virtual device), and the specific functions of each module are as follows: target sound information; the azimuth information determination module 202 is used to determine the azimuth information of the sound source of the target sound information according to the target sound information picked up by the pickup array 201; the driving module 203 is used to send the azimuth information of the sound source of the target sound information To the servo mechanism of the robot, so that the servo mechanism rotates the head of the robot to face the sound source that emits the target sound information.

设备4可包括但不仅限于处理器40、存储器41。本领域技术人员可以理解，图4仅仅是设备4的示例，并不构成对设备4的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如计算设备还可以包括输入输出设备、网络接入设备、总线等。Device 4 may include, but is not limited to, processor 40 , memory 41 . Those skilled in the art can understand that FIG. 4 is only an example of the device 4, and does not constitute a limitation on the device 4. It may include more or less components than the one shown in the figure, or combine some components, or different components, such as Computing devices may also include input and output devices, network access devices, buses, and the like.

所称处理器40可以是中央处理单元（Central Processing Unit，CPU），还可以是其他通用处理器、数字信号处理器（Digital Signal Processor，DSP）、专用集成电路（Application Specific Integrated Circuit，ASIC）、现成可编程门阵列（Field-Programmable Gate Array，FPGA）或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 40 may be a central processing unit (CentralProcessing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processors)Processor, DSP), Application Specific Integrated Circuit (ASIC), Off-the-shelf Programmable Gate Array (Field-Programmable Gate Array)Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

存储器41可以是设备4的内部存储单元，例如设备4的硬盘或内存。存储器41也可以是设备4的外部存储设备，例如设备4上配备的插接式硬盘，智能存储卡（Smart Media Card，SMC），安全数字（Secure Digital，SD）卡，闪存卡（Flash Card）等。进一步地，存储器41还可以既包括设备4的内部存储单元也包括外部存储设备。存储器41用于存储计算机程序以及设备所需的其他程序和数据。存储器41还可以用于暂时地存储已经输出或者将要输出的数据。The memory 41 may be an internal storage unit of the device 4 , such as a hard disk or a memory of the device 4 . The memory 41 may also be an external storage device of the device 4, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card) equipped on the device 4 Wait. Further, the memory 41 may also include both an internal storage unit of the device 4 and an external storage device. The memory 41 is used to store computer programs and other programs and data required by the device. The memory 41 can also be used to temporarily store data that has been output or is to be output.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-described embodiments can be combined arbitrarily. For the sake of brevity, all possible combinations of the technical features in the above-described embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, All should be regarded as the scope described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

本发明的实施方式Embodiments of the present invention

在此处键入本发明的实施方式描述段落。Type the paragraphs describing embodiments of the invention here.

工业实用性Industrial Applicability

在此处键入工业实用性描述段落。Type an industrial applicability description paragraph here.

序列表自由内容Sequence Listing Free Content

在此处键入序列表自由内容描述段落。Type the Sequence Listing free content description paragraph here.

Claims

Translated fromChinese

1.一种交互时的声源定位方法，所述方法包括：1. A sound source localization method during interaction, the method comprising: