CN117197858A

Movatterモバイル変換

Info

Publication number: CN117197858A
Application number: CN202310526612.3A
Authority: CN
Inventors: 刘丽娴; 王为举; 赵传涛
Original assignee: Shenzhen Proscenic Technology Co Ltd
Current assignee: Shenzhen Proscenic Technology Co Ltd
Priority date: 2023-05-10
Filing date: 2023-05-10
Publication date: 2023-12-08

Abstract

The application relates to the technical field of artificial intelligence, in particular to a method, a device, equipment and a medium for switching languages of a sweeping robot. And if the detection result is not the face image, collecting the voice of the user, carrying out voiceprint recognition on the voice, carrying out corresponding task according to the voiceprint recognition result, collecting the image at the position of the sweeping robot, and re-executing the step of carrying out the face detection on the image collected by the sweeping robot to obtain the detection result. According to the application, the language is automatically switched by using face recognition and voiceprint recognition, so that the flexibility of the language switching of the robot is improved.

Description

Translated fromChinese

一种扫地机器人语言切换方法、装置、设备及介质A language switching method, device, equipment and medium for a sweeping robot

技术领域Technical field

本发明涉及人工智能技术领域，尤其涉及一种扫地机器人语言切换方法、装置、设备及介质。The present invention relates to the field of artificial intelligence technology, and in particular to a language switching method, device, equipment and medium for a sweeping robot.

背景技术Background technique

目前扫地机器人智能音控的实现方式一般是使用语言识别芯片(也叫语音识别IC)进行指令编设，而语言识别芯片大致分为2种类型：离线语音识别芯片和在线语音识别芯片。离线语音识别芯片是无需联网和任何其他外部设备的支持，上电即可使用，语音识别工作发生在本地设备。离线可以视为在线语音技术的简化版，将场景单一化减少需要识别的对象，实现硬件成本最低化。在线语音识别芯片是需要连接互联网才可以达到应用效果，因为它在唤醒之后要采集前端音频和进行音频解压，再将音频信号传到后台云端服务器做识别和处理。当前扫地机本机播报和第三方智能语音均通过预设语言来实现切换语言的功能，需要手动设置完成，使语言转换的灵活性较差，因此，在扫地机器人播报过程中，如何提高语言转换的灵活性陈我给急需解决的问题。The current implementation of intelligent voice control for sweeping robots generally uses language recognition chips (also called speech recognition ICs) to program instructions. Language recognition chips are roughly divided into two types: offline speech recognition chips and online speech recognition chips. The offline speech recognition chip does not require the support of the Internet and any other external devices. It can be used when powered on. The speech recognition work occurs on the local device. Offline can be regarded as a simplified version of online voice technology, simplifying the scene and reducing the objects that need to be recognized, thereby minimizing hardware costs. The online speech recognition chip needs to be connected to the Internet to achieve application effects, because after waking up, it needs to collect front-end audio and decompress the audio, and then transmit the audio signal to the background cloud server for recognition and processing. Currently, both the sweeping machine's local broadcast and third-party intelligent voice realize the language switching function through the preset language, which needs to be set manually, making the language conversion less flexible. Therefore, how to improve the language conversion during the sweeping robot's broadcast process? The flexibility of Chen I gives me the solution to urgent problems.

发明内容Contents of the invention

有鉴于此，本申请实施例提供了一种扫地机器人语言切换方法、装置、设备及介质，以解决扫地机器人播报过程中，语言转换的灵活性较差的问题。In view of this, embodiments of the present application provide a language switching method, device, equipment and medium for a sweeping robot to solve the problem of poor language switching flexibility during the broadcasting process of the sweeping robot.

第一方面，本申请实施例提供一种扫地机器人语言切换方法，所述扫地机器人语言切换方法包括：In a first aspect, embodiments of the present application provide a method for switching language for a sweeping robot. The method for switching language for a sweeping robot includes:

对扫地机器人采集到的图像进行人脸检测，得到检测结果，若所述检测结果为人脸图像，对所述人脸图像进行识别，得到人脸识别结果；Perform face detection on the images collected by the sweeping robot to obtain the detection result. If the detection result is a face image, recognize the face image to obtain the face recognition result;

根据所述人脸识别结果，从预设的语言包集合中查找与所述人脸识别结果匹配的目标语言；According to the face recognition result, search for a target language that matches the face recognition result from a preset language package set;

获取所述扫地机器人的当前语言，若所述当前语言与所述目标语言不相同时，将所述当前语言切换为所述目标语言，使用所述目标语言与用户进行语音交互；Obtain the current language of the sweeping robot, and if the current language is different from the target language, switch the current language to the target language, and use the target language to perform voice interaction with the user;

若所述检测结果不是人脸图像，采集用户的声音，对所述声音进行声纹识别，得到声纹识别结果，根据所述声纹识别结果执行对应任务，确定所述扫地机器人的位置；If the detection result is not a face image, collect the user's voice, perform voiceprint recognition on the voice, obtain the voiceprint recognition result, and perform corresponding tasks based on the voiceprint recognition result to determine the position of the sweeping robot;

在所述扫地机器人的位置处采集图像，重新执行对扫地机器人采集到的图像进行人脸检测，得到检测结果的步骤。Collect images at the position of the sweeping robot, and re-execute the steps of performing face detection on the images collected by the sweeping robot to obtain detection results.

第二方面，本申请实施例提供一种扫地机器人语言切换装置，所述扫地机器人语言切换装置包括：In a second aspect, embodiments of the present application provide a language switching device for a sweeping robot. The language switching device for a sweeping robot includes:

检测模块，用于对扫地机器人采集到的图像进行人脸检测，得到检测结果，若所述检测结果为人脸图像，对所述人脸图像进行识别，得到人脸识别结果；The detection module is used to perform face detection on the images collected by the sweeping robot to obtain the detection result. If the detection result is a face image, recognize the face image to obtain the face recognition result;

匹配模块，用于根据所述人脸识别结果，从预设的语言包集合中查找与所述人脸识别结果匹配的目标语言；A matching module, configured to search for a target language that matches the face recognition result from a preset language package set according to the face recognition result;

切换模块，用于获取所述扫地机器人的当前语言，若所述当前语言与所述目标语言不相同时，将所述当前语言切换为所述目标语言，使用所述目标语言与用户进行语音交互；A switching module, used to obtain the current language of the sweeping robot. If the current language is different from the target language, switch the current language to the target language, and use the target language to perform voice interaction with the user. ;

采集模块，用于若所述检测结果不是人脸图像，采集用户的声音，对所述声音进行声纹识别，得到声纹识别结果，根据所述声纹识别结果执行对应任务，确定所述扫地机器人的位置；The acquisition module is used to collect the user's voice if the detection result is not a face image, perform voiceprint recognition on the voice, obtain the voiceprint recognition result, perform the corresponding task according to the voiceprint recognition result, and determine the sweeping function. The robot’s position;

重新执行模块，用于在所述扫地机器人的位置处采集图像，重新执行对扫地机器人采集到的图像进行人脸检测，得到检测结果的步骤。A re-execution module is used to collect images at the position of the sweeping robot, and re-execute the steps of performing face detection on the images collected by the sweeping robot to obtain detection results.

第三方面，本申请实施例提供一种终端设备，所述终端设备包括处理器、存储器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如第一方面所述的扫地机器人语言切换方法。In a third aspect, embodiments of the present application provide a terminal device. The terminal device includes a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor executes the The computer program implements the language switching method of the sweeping robot as described in the first aspect.

第四方面，本申请实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面所述的扫地机器人语言切换方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the sweeping robot language switching as described in the first aspect is implemented. method.

本发明与现有技术相比存在的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

对扫地机器人采集到的图像进行人脸检测，得到检测结果，若检测结果为人脸图像，对人脸图像进行识别，得到人脸识别结果，根据人脸识别结果，从预设的语言包集合中查找与人脸识别结果匹配的目标语言，获取扫地机器人的当前语言，若当前语言与目标语言不相同时，将当前语言切换为目标语言，使用目标语言与用户进行语音交互若检测结果不是人脸图像，采集用户的声音，对声音进行声纹识别，得到声纹识别结果，根据声纹识别结果执行对应任务，确定扫地机器人的位置，在扫地机器人的位置处采集图像，重新执行对扫地机器人采集到的图像进行人脸检测，得到检测结果的步骤。本申请中使用人脸识别与声纹识别自动切换语言，提高扫地机器人语言切换的灵活性。Perform face detection on the images collected by the sweeping robot to obtain the detection results. If the detection result is a face image, recognize the face image to obtain the face recognition result. Based on the face recognition result, select the preset language package from the set of language packages. Find the target language that matches the face recognition result, and obtain the current language of the sweeping robot. If the current language is different from the target language, switch the current language to the target language, and use the target language to conduct voice interaction with the user. If the detection result is not a human face, Image, collect the user's voice, perform voiceprint recognition on the voice, obtain the voiceprint recognition result, perform the corresponding task according to the voiceprint recognition result, determine the position of the sweeping robot, collect the image at the position of the sweeping robot, and re-execute the collection of the sweeping robot The steps to perform face detection on the obtained image and obtain the detection results. In this application, face recognition and voiceprint recognition are used to automatically switch languages to improve the flexibility of language switching for the sweeping robot.

附图说明Description of the drawings

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例的描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. , for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative labor.

图1是本申请实施例一提供的一种扫地机器人语言切换方法的一应用环境示意图；Figure 1 is a schematic diagram of an application environment of a language switching method for a sweeping robot provided in Embodiment 1 of the present application;

图2是本申请实施例一提供的一种混合音乐的音乐流派分类方法的流程示意图；Figure 2 is a schematic flow chart of a music genre classification method for mixed music provided in Embodiment 1 of the present application;

图3是本申请实施例二提供的一种人脸识别方法的流程示意图；Figure 3 is a schematic flow chart of a face recognition method provided in Embodiment 2 of the present application;

图4是本申请实施例三提供的一种声纹识别方法的流程示意图；Figure 4 is a schematic flow chart of a voiceprint recognition method provided in Embodiment 3 of the present application;

图5是本申请实施例四提供的一种根据声纹识别结果执行对应任务方法的流程示意图；Figure 5 is a schematic flowchart of a method for performing corresponding tasks based on voiceprint recognition results provided in Embodiment 4 of the present application;

图6是本申请实施例五提供的一种扫地机器人语言切换装置的结构示意图；Figure 6 is a schematic structural diagram of a language switching device for a sweeping robot provided in Embodiment 5 of the present application;

图7是本申请实施例六提供的一种终端设备的结构示意图。Figure 7 is a schematic structural diagram of a terminal device provided in Embodiment 6 of the present application.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

应当理解，当在本申请说明书和所附权利要求书中使用时，术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It will be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.

还应当理解，在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

如在本申请说明书和所附权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地，短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".

另外，在本申请说明书和所附权利要求书的描述中，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In addition, in the description of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此，在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例，而是意味着“一个或多个但不是所有的实施例”，除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

本发明实施例可以基于人工智能技术对相关的数据进行获取和处理。其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。Embodiments of the present invention can acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

应理解，以下实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the sequence number of each step in the following embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

为了说明本申请的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solution of the present application, specific embodiments are provided below.

本申请实施例一提供的一种扫地机器人语言切换方法，可应用在如图1的应用环境中，其中，客户端与服务端进行通信。其中，客户端包括但不限于智能电视、掌上电脑、桌上型计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer，UMPC)、上网本、云端终端设备、个人数字助理(personal digital assistant，PDA)等设备。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The language switching method for a sweeping robot provided in Embodiment 1 of the present application can be applied in an application environment as shown in Figure 1, in which the client communicates with the server. Among them, clients include but are not limited to smart TVs, handheld computers, desktop computers, notebook computers, ultra-mobile personal computers (UMPC), netbooks, cloud terminal devices, personal digital assistants, PDA) and other equipment. The server can be implemented as an independent server or a server cluster composed of multiple servers.

参见图2，是本申请实施例一提供的一种扫地机器人语言切换方法的流程示意图，上述扫地机器人语言切换方法应用于上述扫地机器人终端，如图2所示，该扫地机器人语言切换方法可以包括以下步骤：Referring to Figure 2, it is a schematic flow chart of a method for switching a language of a sweeping robot provided in Embodiment 1 of the present application. The method of switching a language of a sweeping robot is applied to the terminal of the sweeping robot. As shown in Figure 2, the method of switching a language of a sweeping robot may include Following steps:

S201：对扫地机器人采集到的图像进行人脸检测，得到检测结果，若检测结果为人脸图像，对人脸图像进行识别，得到人脸识别结果。S201: Perform face detection on the image collected by the sweeping robot to obtain the detection result. If the detection result is a face image, identify the face image and obtain the face recognition result.

在步骤S201中，扫地机器人终端设置有摄像设备，可以采集扫地机器人周围的图像，当采集到周围的图像后，对扫地机器人采集到的图像进行人脸检测，得到检测结果，在对图像进行人脸检测时，可以使用预设人脸检测模型进行检测，检测图像中是否为人脸图像。若检测结果为人脸图像，对人脸图像进行识别，得到人脸识别结果。In step S201, the sweeping robot terminal is equipped with a camera device that can collect images around the sweeping robot. After collecting the surrounding images, face detection is performed on the images collected by the sweeping robot to obtain the detection results. When detecting faces, you can use the preset face detection model to detect whether the image is a face image. If the detection result is a face image, recognize the face image to obtain the face recognition result.

需要说明的是，扫地机器人中设备内置WIFI模块与人脸识别系统，通过网络通讯，基于IP信息先预设语言并在线下载语言包，下载完成后语言包运用到系统中，完成后进行人物信息录入，开启摄像头进行人脸识别以及声纹识别认证。It should be noted that the sweeping robot has a built-in WIFI module and face recognition system. Through network communication, the language is preset based on IP information and the language package is downloaded online. After the download is completed, the language package is applied to the system. After completion, the character information is Enter, turn on the camera for face recognition and voiceprint recognition authentication.

本实施例中，当扫地机器人开始工作时，扫地机器人开启摄像设备，对周围进行扫描，得到采集到的图像，通过对图像进行人脸检测，得到检测结果，人脸检测时，使用预设的人脸检测模型进行检测。在使用预设的人脸检测模型进行检测之前，需要对人脸检测模型进行训练。人脸检测模型训练过程如下：In this embodiment, when the sweeping robot starts working, the sweeping robot turns on the camera equipment, scans the surroundings, and obtains the collected images. The detection results are obtained by performing face detection on the images. When detecting faces, the preset Face detection model is used for detection. Before using the preset face detection model for detection, the face detection model needs to be trained. The face detection model training process is as follows:

获取样本训练数据，可以通过采集自然条件下(包含多种姿态、光照、遮挡、表情等干扰以及以矿山、沟壑、公路等工作场景为背景)的人脸数据集，对数据进行添加标签及归一化操作。To obtain sample training data, you can add labels and classify the data by collecting face data sets under natural conditions (including various postures, lighting, occlusions, expressions and other interferences, as well as working scenes such as mines, ravines, highways, etc.). Unified operation.

搭建两阶段级联卷积神经网络，第一阶段共四层，为了能接收不同尺寸的图像，第一阶段网络为全卷积神经网络，第二阶段共五层，最后一层为全连接层。Build a two-stage cascade convolutional neural network. The first stage has four layers. In order to receive images of different sizes, the first stage network is a fully convolutional neural network, the second stage has five layers, and the last layer is a fully connected layer. .

构建第一阶段网络训练数据集，将人脸区域通过交并比IOU裁剪保存，分为人脸区、非人脸区、部分人脸区，将图像中与真实边框的交并比大于0.7的边框标记为人脸区、介于(0.3，0.7)之间记为部分人脸区、小于0.3记为非人脸区，将获取的图像归一化到(-1，1)区间，并将图像大小设为(15，15)。第一阶段网络训练，利用裁剪出的数据集输入第一阶段网络进行前向传播，并对输出结果进行训练，分类损失函数为交叉熵分类损失函数。Construct the first-stage network training data set, cut and save the face area through the intersection and union ratio IOU, and divide it into face areas, non-face areas, and partial face areas. Marked as face area, between (0.3, 0.7) as partial face area, less than 0.3 as non-face area, normalize the acquired image to (-1, 1) interval, and change the image size Set to (15,15). In the first stage of network training, the cropped data set is used to input the first stage network for forward propagation, and the output results are trained. The classification loss function is the cross-entropy classification loss function.

构建第二阶段网络训练数据集，训练集分为两类：第一阶段网络错误分类的，重新随机裁剪获得的。并将数据归一化，图像大小设为(30，30)。第二阶段网络训练，利用Adam梯度下降算法对第二阶段网络进行训练，其中损失函数同第一阶段。根据训练好的第一阶段网络与第二阶段网络进行人脸检测，得到检测结果。Construct the second-stage network training data set. The training set is divided into two categories: those misclassified by the first-stage network and those obtained by random re-cropping. And normalize the data, and set the image size to (30, 30). In the second stage of network training, the Adam gradient descent algorithm is used to train the second stage network, and the loss function is the same as in the first stage. Face detection is performed based on the trained first-stage network and second-stage network, and the detection results are obtained.

图3是本申请实施例二提供的一种人脸识别方法的流程示意图。Figure 3 is a schematic flowchart of a face recognition method provided in Embodiment 2 of the present application.

可选地，若检测结果为人脸图像，对人脸图像进行识别，得到人脸识别结果，包括：Optionally, if the detection result is a face image, recognize the face image to obtain the face recognition result, including:

提取人脸图像中的脸部特征，将脸部特征与预设人脸检测库中的预设脸部特征进行匹配，得到匹配结果；Extract the facial features in the face image, match the facial features with the preset facial features in the preset face detection library, and obtain the matching results;

根据匹配结果确定人脸图像中的人脸识别结果。The face recognition result in the face image is determined based on the matching result.

本实施例中，在扫地机器人中的人脸识别系统中，可以使用卷积神经网络提取人脸图像中的脸部特征，卷积神经网络是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器，该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中，一个神经元可以只与部分邻层神经元连接。一个卷积层中，通常包含若干个特征平面，每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重，这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化，在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外，共享权重带来的直接好处是减少卷积神经网络各层之间的连接，同时又降低了过拟合的风险。In this embodiment, in the face recognition system in the sweeping robot, a convolutional neural network can be used to extract facial features in the face image. The convolutional neural network is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In the convolutional layer of a convolutional neural network, a neuron can be connected to only some of the neighboring layer neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information independent of position. The convolution kernel can be initialized in the form of a random-sized matrix. During the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

在卷积神经网络中，卷积层是卷积神经网络的核心模块。它主要实现对原始数据提取特征功能。卷积层是将上一层的特征和卷积核经过卷积方法，经过激活函数得到本层的卷积结果，从而形成本层的特征。浅层的卷积层只能提取比较低级的特征，深层次的从低层特征会提取更多较为复杂的特征。卷积层输出的特征与上一层的几个特征的卷积有关。每个特征可以通过不同的卷积核进行卷积。In a convolutional neural network, the convolutional layer is the core module of the convolutional neural network. It mainly implements the function of extracting features from raw data. The convolution layer combines the features and convolution kernel of the previous layer through the convolution method, and then obtains the convolution result of this layer through the activation function, thus forming the characteristics of this layer. Shallow convolutional layers can only extract relatively low-level features, while deep convolutional layers can extract more complex features from low-level features. The features output by the convolutional layer are related to the convolution of several features of the previous layer. Each feature can be convolved through different convolution kernels.

需要说明的是，在使用卷积神经网络提取图像特征时，需要对卷积神经网络进行训练，使用训练好的卷积神经网络提取图像特征。训练时可以是，从大型公用数据集中提取数据集，采用中值滤波的方法对数据集进行图像平滑的预处理操作，得到预处理后的数据集，将预处理后的数据集，采用灰度拉伸的图像增强的算法进行处理，得到增强后的数据集，根据得到的增强后的数据集训练初始卷积神经网络，得到训练好的卷积神经网络，It should be noted that when using a convolutional neural network to extract image features, the convolutional neural network needs to be trained, and the trained convolutional neural network is used to extract image features. During training, the data set can be extracted from a large public data set, and the median filtering method is used to perform an image smoothing preprocessing operation on the data set to obtain a preprocessed data set. The preprocessed data set is grayscale. The stretched image enhancement algorithm is processed to obtain an enhanced data set. The initial convolutional neural network is trained based on the obtained enhanced data set to obtain a trained convolutional neural network.

需要说明的是，卷积神经网络中可以包括多个卷积层，例如，可以包括第一卷积层、第二卷积层、第三卷积层、第四卷积层、第五卷积层和第六卷积层，第一卷积层的输入是待识别人脸图像，第二卷积层的输入是第一卷积层的输出，第三卷积层的输入是第二卷积层的输出，第四卷积层的输入包括第三卷积层的输出，第五卷积层的输入是第四卷积层的输出，第六卷积层的输入是第五卷积层的输出。It should be noted that the convolutional neural network may include multiple convolutional layers, for example, it may include a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, and a fifth convolutional layer. layer and the sixth convolution layer, the input of the first convolution layer is the face image to be recognized, the input of the second convolution layer is the output of the first convolution layer, and the input of the third convolution layer is the second convolution The output of the layer, the input of the fourth convolutional layer includes the output of the third convolutional layer, the input of the fifth convolutional layer is the output of the fourth convolutional layer, and the input of the sixth convolutional layer is the output of the fifth convolutional layer output.

通过卷积神经网络提取人脸图像的脸部特征，将脸部特征与预设人脸检测库中的预设脸部特征进行匹配，得到匹配结果，根据匹配结果确定人脸图像中的人脸识别结果。其中，预设人脸检测库中的预设脸部特征为预先收录的对应人脸特征。Extract the facial features of the face image through the convolutional neural network, match the facial features with the preset facial features in the preset face detection library, obtain the matching result, and determine the face in the face image based on the matching result Recognition results. Among them, the preset facial features in the preset face detection library are corresponding face features that have been collected in advance.

需要说明的是，在提取人脸图像中的人脸特征之前，可以对人脸图像进行人脸区域检测，使用人脸区域进行人脸特征提取，对人脸区域检测时，可以采用基于Viola-Jones检测框架的Haar特征检测对输入的灰度图像进行分析。Vi ola-Jones检测框架首先计算人脸图像的积分图，选用三矩形Haar特征模板提取人脸特征。然后利用已训练好的Adaboost分类器特征库，运用Cascade级联的方法简化分类器规模。使用的分类器特征库由22级联强分类器构成，每个强分类器又由若干个弱分类器构成。截取整幅人脸图像中所有80*80的子窗口,每个子窗口依次通过级联分类器，逐级淘汰非人脸子窗口。如果只有一个通过全部22级分类器的子窗口，则确定该窗口为人脸子窗口，如有多个通过全部22级分类器的子窗口，对多个待选人脸子窗口进行相邻6*6的子窗口进行合并筛选，选出最佳人脸子窗口。It should be noted that before extracting the facial features in the face image, the face region can be detected on the face image, and the face region can be used to extract the facial features. When detecting the face region, the Viola-based method can be used. The Haar feature detection of the Jones detection framework analyzes the input grayscale image. The Viola-Jones detection framework first calculates the integral image of the face image, and selects a three-rectangular Haar feature template to extract face features. Then, the trained Adaboost classifier feature library is used, and the Cascade cascade method is used to simplify the size of the classifier. The classifier feature library used consists of 22 cascades of strong classifiers, and each strong classifier is composed of several weak classifiers. All 80*80 sub-windows in the entire face image are intercepted, and each sub-window passes through the cascade classifier in turn to eliminate non-face sub-windows step by step. If there is only one sub-window that passes all 22-level classifiers, the window is determined to be a face sub-window. If there are multiple sub-windows that pass all 22-level classifiers, perform adjacent 6*6 selection on the multiple candidate face sub-windows. The sub-windows are merged and filtered to select the best face sub-window.

可选地，将脸部特征与预设人脸检测库中的预设脸部特征进行匹配，得到匹配结果之前，还包括：Optionally, matching the facial features with the preset facial features in the preset face detection library and obtaining the matching result also includes:

根据接收到的人脸识别录入指令，在预设人脸检测库中录入预设脸部特征。According to the received facial recognition input instruction, preset facial features are entered in the preset face detection library.

本实施例中，在扫地机器人中的人脸识别锁中会有专门的使用菜单，通过对菜单界面的操作，进入人脸识别录入功能页面，点击录入按钮，将人脸识别录入指令发送给服务端，开启人脸识别录入功能，录入预设人脸识别数据，可以根据需要录入多张人脸图像。在进行录入的过程中，可以通过WiFi模块连接家庭WiFi或者公共场所WiFi。In this embodiment, there is a special usage menu in the face recognition lock of the sweeping robot. By operating the menu interface, you enter the face recognition entry function page, click the entry button, and send the face recognition entry instructions to the service. On the terminal, turn on the face recognition input function and input the preset face recognition data. You can input multiple face images as needed. During the entry process, you can connect to home WiFi or public WiFi through the WiFi module.

S202：根据人脸识别结果，从预设的语言包集合中查找与人脸识别结果匹配的目标语言。S202: According to the face recognition result, search for the target language matching the face recognition result from the preset language package collection.

在步骤S202中，预设的语言包集合为预先下载的语言包，包括不同的语言包，不同的人脸识别结果对应不同的语言包，每个人脸识别结果至少对应一个语言包。In step S202, the preset language pack set is a pre-downloaded language pack, including different language packs. Different face recognition results correspond to different language packs, and each face recognition result corresponds to at least one language pack.

本实施例中，根据录入的预设脸部特征，下载对应的语言包，例如，当录入的预设脸部特征为A用户的人脸特征，A使用的语言为英语，则下载对应的英语语言包，将A用户与英语语言包进行匹配，当录入的预设脸部特征为B用户的人脸特征，B使用的语言为汉语，则下载对应的汉语语言包，将B用户与汉语语言包进行匹配，根据对应的匹配关系，根据人脸识别结果，从预设的语言包集合中查找与人脸识别结果匹配的目标语言。例如，当人脸识别结果为对应的A用户的人脸特征，则目标语言为英语。In this embodiment, the corresponding language package is downloaded based on the entered preset facial features. For example, when the entered preset facial features are the facial features of user A and the language used by A is English, the corresponding English package is downloaded. Language pack, match user A with the English language pack. When the entered preset facial features are those of user B, and the language used by B is Chinese, download the corresponding Chinese language pack and match user B with the Chinese language pack. The package is matched, and according to the corresponding matching relationship and the face recognition result, the target language that matches the face recognition result is searched from the preset language package collection. For example, when the face recognition result is the corresponding face feature of user A, the target language is English.

S203：获取扫地机器人的当前语言，若当前语言与目标语言不相同时，将当前语言切换为目标语言，使用目标语言与用户进行语音交互。S203: Obtain the current language of the sweeping robot. If the current language is different from the target language, switch the current language to the target language and use the target language to perform voice interaction with the user.

在步骤S203中，扫地机器人在进行与用户交互时，为了使用户可以有更好的使用体验，将使用与用户对应的语言，扫地机器人在开始与用户交互时，随机选择其中一个语言包作为当前语言进行交互，若当前语言与目标语言不相同时，将当前语言切换为目标语言，使用目标语言与用户进行语音交互。In step S203, when the sweeping robot interacts with the user, in order to enable the user to have a better experience, the sweeping robot will use the language corresponding to the user. When the sweeping robot starts to interact with the user, it randomly selects one of the language packages as the current language package. If the current language is different from the target language, the current language is switched to the target language, and the target language is used for voice interaction with the user.

本实施例中，获取扫地机器人的当前语言，当前语言可以是从预设的语言包集合随机选取的一种语言，也可以是上一次交互使用的语言，当前语言与目标语言对比，若当前语言与目标语言相等，则继续使用当前语言与用户进行交互，若当前语言与目标语言不相等，则将当前语言切换为目标语言，扫地机器人使用目标语言与用户进行交互。In this embodiment, the current language of the sweeping robot is obtained. The current language can be a language randomly selected from a preset language package set, or it can be the language used in the last interaction. The current language is compared with the target language. If the current language If the current language is equal to the target language, continue to use the current language to interact with the user. If the current language is not equal to the target language, switch the current language to the target language, and the sweeping robot uses the target language to interact with the user.

需要说明的是，扫地机器人与用户进行交互时，可以是回答用户的问题，或者实时播报清扫情况，或者播放预先设置的语音。It should be noted that when the sweeping robot interacts with the user, it can answer the user's questions, broadcast the cleaning situation in real time, or play a preset voice.

S204：若检测结果不是人脸图像，采集用户的声音，对声音进行声纹识别，得到声纹识别结果，根据声纹识别结果执行对应任务，确定扫地机器人的位置。S204: If the detection result is not a face image, collect the user's voice, perform voiceprint recognition on the voice, obtain the voiceprint recognition result, and perform corresponding tasks based on the voiceprint recognition result to determine the position of the sweeping robot.

在步骤S204中，若检测结果不是人脸图像，采集用户的声音，对声音进行声纹识别，得到声纹识别结果，根据声纹识别结果执行对应任务，由于扫地机器人在执行对应任务时，需要移动位置，根据位置的不断变化，扫地机器人在不同的位置进行扫描拍摄，以便可以拍摄得到用户人脸图像。In step S204, if the detection result is not a face image, collect the user's voice, perform voiceprint recognition on the voice, obtain the voiceprint recognition result, and perform the corresponding task based on the voiceprint recognition result. Since the sweeping robot needs to Moving position, according to the continuous change of position, the sweeping robot scans and shoots at different positions so that it can capture the user's face image.

本实施例中，若检测结果不是人脸图像，扫地机器人无法确定对应的目标语言，可以根据声纹的识别结果执行对应任务，以便可以使扫地机器人不断移动，使扫地机器人可以扫描拍摄到对应的人脸图像，得到对应的目标语言。In this embodiment, if the detection result is not a face image, the sweeping robot cannot determine the corresponding target language, and can perform the corresponding task according to the voiceprint recognition result, so that the sweeping robot can continuously move, so that the sweeping robot can scan and photograph the corresponding Face image, get the corresponding target language.

图4是本申请实施例三提供的一种声纹识别方法的流程示意图。Figure 4 is a schematic flowchart of a voiceprint recognition method provided in Embodiment 3 of the present application.

可选地，对声音进行声纹识别，得到声纹识别结果，包括：Optionally, perform voiceprint recognition on the voice to obtain voiceprint recognition results, including:

使用卷积神经网络，对声音进行卷积操作，得到卷积编码特征；Use a convolutional neural network to perform convolution operations on sounds to obtain convolutional coding features;

利用基于神经网络的声纹识别模型，对卷积编码特征进行处理，得到声音对应的声纹特征；Use the voiceprint recognition model based on neural networks to process the convolutional coding features to obtain the voiceprint features corresponding to the voice;

利用声纹特征，对声音进行声纹识别，得到声纹识别结果。Use voiceprint features to perform voiceprint recognition on the voice and obtain the voiceprint recognition result.

本实施例中，使用卷积神经网络，对声音进行卷积操作时，首先提取声音的梅尔频谱，使用卷积神经网络，对声音的梅尔频谱进行卷积操作，可以构建基于残差门控卷积的卷积神经网络，通过堆叠多个结合通道注意力机制的一维残差门控卷积网进行卷积编码，配合池化操作进行特征降维，最后卷积网络层输出的特征图中包含了梅尔频谱中的高层抽象特征。In this embodiment, when using a convolutional neural network to perform a convolution operation on a sound, the Mel spectrum of the sound is first extracted, and the convolutional neural network is used to perform a convolution operation on the Mel spectrum of the sound. A residual gate-based operation can be constructed. Convolution-controlled convolutional neural networks perform convolutional coding by stacking multiple one-dimensional residual gated convolutional networks combined with channel attention mechanisms, and cooperate with pooling operations to reduce feature dimensionality. Finally, the features output by the convolutional network layer are The figure contains high-level abstract features in the Mel spectrum.

基于残差门控卷积的卷积神经网络主要由两个残差门控卷积单元、一个压缩-激发层与一个最大池化层组成。其中一维卷积的感受野覆盖整个频谱处理结果的频率范围，将其与残差连接相结合。压缩-激发层其进行一维化处理，最大池化层用于降低卷积特征大小。The convolutional neural network based on residual gated convolution mainly consists of two residual gated convolution units, a compression-excitation layer and a maximum pooling layer. The receptive field of the one-dimensional convolution covers the entire frequency range of the spectrum processing result, which is combined with the residual connection. The compression-excitation layer performs one-dimensional processing, and the maximum pooling layer is used to reduce the convolution feature size.

需要说明的是，压缩-激发层主要由压缩和激发两个模块组成。压缩模块对应于全局平均池化，每个通道被转换为一个实数，通过池化操作计算每个通道对应的统计信息，完成时域特征的总结。激发模块的主要作用是抓取通道之间的关系，实现了门控机制的效果，两个全连接层对压缩操作得到的每个通道的统计信息进行学习，在两个全连接层中加入修正线性单元，能够捕捉不同通道之间潜在的非线性关系。It should be noted that the compression-excitation layer mainly consists of two modules: compression and excitation. The compression module corresponds to global average pooling. Each channel is converted into a real number. The statistical information corresponding to each channel is calculated through the pooling operation to complete the summary of time domain features. The main function of the excitation module is to capture the relationship between channels and achieve the effect of the gating mechanism. Two fully connected layers learn the statistical information of each channel obtained by the compression operation, and add corrections to the two fully connected layers. Linear units are capable of capturing potential nonlinear relationships between different channels.

另一实施例中，对梅尔频谱进行卷积编码时，使用对应编码器进行卷积编码，编码器可以由两层卷积神经网络层和两层双向长短时记忆网络构成。其中，两层卷积神经网络层的卷积核大小分别为7×7和20×7、卷积层之后依次连接了批归一化层、ReLU非线性激活层和最大池化层，最大池化层的核尺寸分别为2×2和1×5。通过卷积操作将得到74×128维的中间每个频谱表征序列M＝[m1,m2,...,mn,...,mN]，mn为特征第n个位置的特征向量，通过两层卷积神经网络层提取FBank声学特征中频谱相关的特征，并将该特征作为长短时记忆网络层的输入特征，输出对应的卷积编码特征。在编码器中，使用两层双向长短时记忆网络对输入的中间序列特征M进行时序关系建模。双向长短时记忆网络的隐向量表征分别来自正向和反向LSTM(使用和表示)，每层LSTM的隐藏层节点数为128，同时使用非线性激活可以得到最终的隐向量组成卷积编码特征。In another embodiment, when convolutionally encoding the Mel spectrum, a corresponding encoder is used for convolutional encoding. The encoder may be composed of two layers of convolutional neural network layers and two layers of bidirectional long short-term memory networks. Among them, the convolution kernel sizes of the two convolutional neural network layers are 7×7 and 20×7 respectively. After the convolution layer, the batch normalization layer, the ReLU nonlinear activation layer and the maximum pooling layer are connected in sequence. The maximum pooling layer The kernel sizes of the chemical layers are 2×2 and 1×5 respectively. Through the convolution operation, a 74 × 128-dimensional middle each spectrum representation sequence M = [m1, m2,..., mn,..., mN] will be obtained, where mn is the feature vector of the nth position of the feature, through two The layer convolutional neural network layer extracts the spectrum-related features in the FBank acoustic features, uses this feature as the input feature of the long short-term memory network layer, and outputs the corresponding convolutional coding feature. In the encoder, a two-layer bidirectional long short-term memory network is used to model the temporal relationship of the input intermediate sequence feature M. The latent vector representation of the bidirectional long short-term memory network comes from forward and reverse LSTM (using and representation) respectively. The number of hidden layer nodes of each layer of LSTM is 128. At the same time, nonlinear activation can be used to obtain the final latent vector to form a convolutional coding feature. .

需要说明的是，可以使用librosa库中的第一预设函数将声音频转成对应的傅里叶变换频谱，然后再使用第二预设函数将其转换为更符合人耳听觉的梅尔频谱，这样便可以将一维的难以处理的时序信号转换成易于处理且信息更丰富的二维频域数据。It should be noted that you can use the first preset function in the librosa library to convert the audio frequency into the corresponding Fourier transform spectrum, and then use the second preset function to convert it into a Mel spectrum that is more in line with human hearing. , in this way, one-dimensional time series signals that are difficult to process can be converted into two-dimensional frequency domain data that is easy to process and has richer information.

利用基于神经网络的声纹识别模型，对卷积编码特征进行处理，得到声音对应的声纹特征。在一些实施例中，声纹识别模型为去除最后一层全连接层的经典神经网络，经典神经网络包括但不限于ResNet(ResidualNeuralNetwork，残差神经网络)、x-vector或ECAPA-TDNN。The voiceprint recognition model based on neural network is used to process the convolutional coding features to obtain the voiceprint features corresponding to the voice. In some embodiments, the voiceprint recognition model is a classic neural network with the last fully connected layer removed. Classic neural networks include but are not limited to ResNet (ResidualNeuralNetwork, residual neural network), x-vector or ECAPA-TDNN.

利用声纹特征，对声音进行声纹识别，包括：在声纹验证阶段，获取说话人标识及其对应的注册声纹特征，对声纹特征与注册声纹特征进行相似度计算，根据相似度计算的结果，确定声音对应的说话人标识。例如，说话人标识为说话人ID。Use voiceprint features to perform voiceprint recognition on voices, including: in the voiceprint verification stage, obtain the speaker identity and its corresponding registered voiceprint features, calculate the similarity between the voiceprint features and the registered voiceprint features, and calculate the similarity based on the similarity The calculation result determines the speaker identification corresponding to the sound. For example, the speaker identification is speaker ID.

图5是本申请实施例四提供的一种根据声纹识别结果执行对应任务方法的流程示意图。Figure 5 is a schematic flowchart of a method for performing corresponding tasks based on voiceprint recognition results provided in Embodiment 4 of the present application.

可选地，根据声纹识别结果执行对应任务，包括：Optionally, perform corresponding tasks based on the voiceprint recognition results, including:

获取预设特色指令，根据声纹识别结果与预设特色指令的对应关系，生成与声纹识别结果匹配的目标特色指令；Obtain the preset characteristic instructions, and generate target characteristic instructions that match the voiceprint recognition results based on the correspondence between the voiceprint recognition results and the preset characteristic instructions;

根据目标特色指令执行对应任务。Perform corresponding tasks according to the target characteristic instructions.

本实施例中，获取预设特色指令，其中预设特色指令与声纹识别结果之间一一对应，根据不同的声纹识别结果，对应不同的预设特色指令，例如，为A用户设置对应的特色指令为扫地机器人清扫客厅的指令，为B用户设置对应的特色指令为扫地机器人清扫餐厅的指令，为C用户设置对应的特色指令为扫地机器人吸尘的指令等，依据录入的用户，为每个用户设置不同的特征指令，当根据声纹识别结果确定对应的用户时，可以根据设置的对应指令，生成与声纹识别结果匹配的目标特色指令，根据目标特色指令执行对应任务。In this embodiment, the preset characteristic instructions are obtained, where there is a one-to-one correspondence between the preset characteristic instructions and the voiceprint recognition results. According to different voiceprint recognition results, different preset characteristic instructions are corresponding. For example, the corresponding settings are set for user A. The featured instructions are instructions for the sweeping robot to clean the living room, the corresponding featured instructions are set for user B to be instructions for the sweeping robot to clean the restaurant, and the corresponding featured instructions are set for user C to be instructions for the sweeping robot to vacuum, etc. According to the entered user, for Each user sets different characteristic instructions. When the corresponding user is determined based on the voiceprint recognition result, the target characteristic instruction that matches the voiceprint recognition result can be generated according to the set corresponding instruction, and the corresponding task is performed according to the target characteristic instruction.

S205：在扫地机器人的位置处采集图像，重新执行对扫地机器人采集到的图像进行人脸检测，得到检测结果的步骤。S205: Collect images at the position of the sweeping robot, and re-execute the steps of performing face detection on the images collected by the sweeping robot to obtain the detection results.

在步骤S20中，当扫地机器人执行对应任务，不断移动的过程中，扫地机器人扫描周围，采集对应的图像，重新执行对扫地机器人采集到的图像进行人脸检测，得到检测结果的步骤。若检测结果为人脸图像，对人脸图像进行识别，得到人脸识别结果，根据人脸识别结果，从预设的语言包集合中查找与人脸识别结果匹配的目标语言，获取扫地机器人的当前语言，若当前语言与目标语言不相同时，将当前语言切换为目标语言，使用目标语言与用户进行语音交互。In step S20, when the sweeping robot performs the corresponding task and moves continuously, the sweeping robot scans the surroundings, collects corresponding images, and re-executes the steps of performing face detection on the images collected by the sweeping robot to obtain the detection results. If the detection result is a face image, the face image is recognized to obtain the face recognition result. Based on the face recognition result, the target language that matches the face recognition result is searched from the preset language package collection to obtain the current status of the sweeping robot. Language, if the current language and the target language are different, switch the current language to the target language, and use the target language to perform voice interaction with the user.

本实施例中，当扫地机器人采集到的图像不是人脸图像时，执行对应的任务时，扫地机器人扫描周围，采集对应图像，直到采集的图像中可以检测到人脸，对人脸图像进行识别，得到人脸识别结果，根据人脸识别结果，从预设的语言包集合中查找与人脸识别结果匹配的目标语言，获取扫地机器人的当前语言，若当前语言与目标语言不相同时，将当前语言切换为目标语言，使用目标语言与用户进行语音交互。确定对应目标语言，使用目标语言与用户进行语音交互。扫地机器人进行实时扫描，针对采集到的不同的人脸图像，确定不同人脸图像对应的目标语言，扫地机器人可以快速地切换到对应的目标语言中，提高了语言转换的灵活性。In this embodiment, when the image collected by the sweeping robot is not a human face image, when performing the corresponding task, the sweeping robot scans the surroundings and collects corresponding images until a human face can be detected in the collected image, and the facial image is recognized. , get the face recognition result. According to the face recognition result, search for the target language that matches the face recognition result from the preset language package collection, and obtain the current language of the sweeping robot. If the current language is different from the target language, The current language is switched to the target language, and the target language is used for voice interaction with the user. Determine the corresponding target language and use the target language to conduct voice interaction with the user. The sweeping robot performs real-time scanning and determines the target language corresponding to the different face images collected. The sweeping robot can quickly switch to the corresponding target language, improving the flexibility of language conversion.

对应于上文实施例的一种扫地机器人语言切换方法，图6示出了本申请实施例五提供的一种扫地机器人语言切换装置的结构框图，为了便于说明，仅示出了与本申请实施例相关的部分。Corresponding to the language switching method of a sweeping robot in the above embodiment, FIG. 6 shows a structural block diagram of a language switching device for a sweeping robot provided in Embodiment 5 of the present application. For convenience of explanation, only the language switching method implemented in the present application is shown. Example related parts.

参见图6，该扫地机器人语言切换装置60包括：Referring to Figure 6, the sweeping robot language switching device 60 includes:

检测模块61，用于对扫地机器人采集到的图像进行人脸检测，得到检测结果，若检测结果为人脸图像，对人脸图像进行识别，得到人脸识别结果。The detection module 61 is used to perform face detection on the images collected by the sweeping robot to obtain the detection result. If the detection result is a face image, recognize the face image and obtain the face recognition result.

匹配模块62，用于根据人脸识别结果，从预设的语言包集合中查找与人脸识别结果匹配的目标语言。The matching module 62 is configured to search for a target language that matches the face recognition result from a preset language package set based on the face recognition result.

切换模块63，用于获取扫地机器人的当前语言，若当前语言与目标语言不相同时，将当前语言切换为目标语言，使用目标语言与用户进行语音交互。The switching module 63 is used to obtain the current language of the sweeping robot. If the current language is different from the target language, switch the current language to the target language and use the target language to perform voice interaction with the user.

采集模块64，用于若检测结果不是人脸图像，采集用户的声音，对声音进行声纹识别，得到声纹识别结果，根据声纹识别结果执行对应任务，确定扫地机器人的位置。The acquisition module 64 is used to collect the user's voice if the detection result is not a face image, perform voiceprint recognition on the voice, obtain a voiceprint recognition result, perform corresponding tasks based on the voiceprint recognition result, and determine the position of the sweeping robot.

重新执行模块65，用于在扫地机器人的位置处采集图像，重新执行对扫地机器人采集到的图像进行人脸检测，得到检测结果的步骤。The re-execution module 65 is used to collect images at the position of the sweeping robot, and re-execute the steps of performing face detection on the images collected by the sweeping robot to obtain the detection results.

可选地，上述检测模块61包括：Optionally, the above-mentioned detection module 61 includes:

提取单元，用于提取人脸图像中的脸部特征，将脸部特征与预设人脸检测库中的预设脸部特征进行匹配，得到匹配结果。The extraction unit is used to extract facial features in the face image, match the facial features with preset facial features in the preset face detection library, and obtain a matching result.

识别单元，用于根据匹配结果确定人脸图像中的人脸识别结果。The recognition unit is used to determine the face recognition result in the face image according to the matching result.

可选地，上述述检测模块61还包括：Optionally, the above-mentioned detection module 61 also includes:

录入单元，用于根据接收到的人脸识别录入指令，在预设人脸检测库中录入预设脸部特征。The input unit is used to input preset facial features into the preset face detection library according to the received face recognition input instruction.

可选地，上述采集模块64包括：Optionally, the above collection module 64 includes:

卷积单元，用于使用卷积神经网络，对所述声音进行卷积操作，得到卷积编码特征。The convolution unit is used to use a convolutional neural network to perform a convolution operation on the sound to obtain convolutional coding features.

特征提取单元，用于利用基于神经网络的声纹识别模型，对所述卷积编码特征进行处理，得到所述声音对应的声纹特征。A feature extraction unit is configured to use a neural network-based voiceprint recognition model to process the convolutional coding features to obtain voiceprint features corresponding to the voice.

得到单元，用于利用所述声纹特征，对所述声音进行声纹识别，得到声纹识别结果。An obtaining unit is configured to use the voiceprint characteristics to perform voiceprint recognition on the voice and obtain a voiceprint recognition result.

生成单元，用于获取预设特色指令，根据声纹识别结果与预设特色指令的对应关系，生成与声纹识别结果匹配的目标特色指令。The generation unit is used to obtain the preset characteristic instructions, and generate target characteristic instructions that match the voiceprint recognition results according to the corresponding relationship between the voiceprint recognition results and the preset characteristic instructions.

执行单元，用于根据目标特色指令执行对应任务。The execution unit is used to execute corresponding tasks according to the target characteristic instructions.

需要说明的是，上述模块之间的信息交互、执行过程等内容，由于与本申请方法实施例基于同一构思，其具体功能及带来的技术效果，具体可参见方法实施例部分，此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above modules are based on the same concept as the method embodiments of this application. For details of their specific functions and technical effects, please refer to the method embodiments section, which will not be discussed here. Again.

图7为本申请实施例六提供的一种终端设备的结构示意图。如图7所示，该实施例的终端设备包括：至少一个处理器(图7中仅示出一个)、存储器以及存储在存储器中并可在至少一个处理器上运行的计算机程序，处理器执行计算机程序时实现上述任意各个扫地机器人语言切换方法实施例中的步骤。Figure 7 is a schematic structural diagram of a terminal device provided in Embodiment 6 of the present application. As shown in Figure 7, the terminal device of this embodiment includes: at least one processor (only one is shown in Figure 7), a memory, and a computer program stored in the memory and executable on at least one processor. The processor executes The computer program implements the steps in any of the above embodiments of the language switching method for the sweeping robot.

该终端设备可包括，但不仅限于，处理器、存储器。本领域技术人员可以理解，图7仅仅是终端设备的举例，并不构成对终端设备的限定，终端设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如还可以包括网络接口、显示屏和输入装置等。The terminal device may include, but is not limited to, a processor and a memory. Those skilled in the art can understand that FIG. 7 is only an example of a terminal device and does not constitute a limitation on the terminal device. The terminal device may include more or fewer components than shown in the figure, or may combine certain components, or may use different components. , for example, it may also include a network interface, a display screen, an input device, etc.

所称处理器可以是CPU，该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific IntegratedCircuit，ASIC)、现成可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor can be a CPU, which can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field- Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

存储器包括可读存储介质、内存储器等，其中，内存储器可以是终端设备的内存，内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。可读存储介质可以是终端设备的硬盘，在另一些实施例中也可以是终端设备的外部存储设备，例如，终端设备上配备的插接式硬盘、智能存储卡(Smart Media Card，SMC)、安全数字(SecureDigital，SD)卡、闪存卡(Flash Card)等。进一步地，存储器还可以既包括终端设备的内部存储单元也包括外部存储设备。存储器用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等，该其他程序如计算机程序的程序代码等。存储器还可以用于暂时地存储已经输出或者将要输出的数据。The memory includes readable storage media, internal memory, etc., where the internal memory can be the memory of the terminal device, and the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be the hard disk of the terminal device. In other embodiments, it may also be an external storage device of the terminal device, such as a plug-in hard disk, a smart memory card (SMC), etc. equipped on the terminal device. Secure Digital (SD) card, Flash Card, etc. Further, the memory may also include both an internal storage unit of the terminal device and an external storage device. The memory is used to store operating systems, application programs, boot loaders (Boot Loaders), data, and other programs, such as program codes of computer programs. The memory can also be used to temporarily store data that has been output or is to be output.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述装置中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实现上述实施例方法中的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述方法实施例的步骤。其中，计算机程序包括计算机程序代码，计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质至少可以包括：能够携带计算机程序代码的任何实体或装置、记录介质、计算机存储器、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区，根据立法和专利实践，计算机可读介质不可以是电载波信号和电信信号。Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above device, reference can be made to the corresponding processes in the foregoing method embodiments, which will not be described again here. Integrated units may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on this understanding, this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium, and the computer program can be processed after being processed. When the processor is executed, the steps of the above method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in the form of source code, object code, executable file or some intermediate form, etc. Computer-readable media may at least include: any entity or device capable of carrying computer program code, recording media, computer memory, read-only memory (ROM), random access memory (Random Access Memory, RAM), electronic Carrier signals, telecommunications signals, and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.

本申请实现上述实施例方法中的全部或部分流程，也可以通过一种计算机程序产品来完成，当计算机程序产品在终端设备上运行时，使得终端设备执行时实现可实现上述方法实施例中的步骤。This application implements all or part of the processes in the above-mentioned embodiment methods, and can also be completed through a computer program product. When the computer program product is run on the terminal device, the terminal device can implement the above-mentioned method embodiments when executed. step.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not detailed or documented in a certain embodiment, please refer to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

在本申请所提供的实施例中，应该理解到，所揭露的装置/终端设备和方法，可以通过其它的方式实现。例如，以上所描述的装置/终端设备实施例仅仅是示意性的，例如，模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus/terminal equipment and methods can be implemented in other ways. For example, the apparatus/terminal equipment embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or components. can be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。A unit described as a separate component may or may not be physically separate. A component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present application, but are not intended to limit them. Although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still modify the technical solutions described in the foregoing embodiments. Modifications are made to the recorded technical solutions, or equivalent substitutions are made to some of the technical features; these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and shall be included in this application. within the scope of protection.