CN108174141B

Movatterモバイル変換

Info

Publication number: CN108174141B
Application number: CN201711241079.7A
Authority: CN
Inventors: 张恒莉; 金鑫
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2017-11-30
Filing date: 2017-11-30
Publication date: 2019-12-31
Anticipated expiration: 2037-11-30
Also published as: CN108174141A

Abstract

Translated fromChinese

本发明实施例提供了一种视频通信的方法和移动装置，所述的方法包括：当本端设备开启视频通信时，本端设备采集第一视频通信图像；第一视频通信图像包括本端用户的当前面部图像；本端设备针对第一视频通信图像生成确认请求，确认请求为确认对端设备是否存储有本端用户头像建模数据的请求；用户头像建模数据库包括用户的原始面部图像和第一面部特征值；本端设备将本端用户的当前面部图像和确认请求发送至对端设备；针对第一确认结果，本端设备从本端用户的当前面部图像中提取本端用户的第二面部特征值；本端设备将第二面部特征值发送至对端设备。本发明实施例可以在网络质量不好，或者网络带宽不够等情况下，依然可以进行高质量的视频通信。

Embodiments of the present invention provide a video communication method and a mobile device. The method includes: when the local device starts video communication, the local device collects a first video communication image; the first video communication image includes the local user current facial image; the local device generates a confirmation request for the first video communication image, and the confirmation request is a request for confirming whether the peer device stores the avatar modeling data of the local user; the user avatar modeling database includes the user's original facial image and The first facial feature value; the local device sends the current facial image and confirmation request of the local user to the peer device; for the first confirmation result, the local device extracts the current facial image of the local user from the current facial image of the local user The second facial feature value; the local device sends the second facial feature value to the peer device. In the embodiment of the present invention, high-quality video communication can still be performed when the network quality is not good or the network bandwidth is not enough.

Description

Translated fromChinese

一种视频通信的方法和一种移动装置A method of video communication and a mobile device

技术领域technical field

本发明涉及电子通信技术领域，特别是涉及一种视频通信的方法和一种视频通信的移动装置。The invention relates to the technical field of electronic communication, in particular to a video communication method and a video communication mobile device.

背景技术Background technique

科技的发展给人们的生活带来了越来越多的便利。例如，从前人们见面只能是面对面的见面，但是现在人们通过智能终端，比如手机、电脑，就能进行远程视频通信，即使不是面对面也能“见面”。The development of science and technology has brought more and more convenience to people's life. For example, in the past, people could only meet face-to-face, but now people can conduct remote video communication through smart terminals, such as mobile phones and computers, and "meet" even if they are not face-to-face.

但是，视频通信需要依靠网络，一般来说，网络质量越好，视频通信的效果也越好，用户接收到的视频图像就越清晰。However, video communication needs to rely on the network. Generally speaking, the better the network quality, the better the effect of video communication, and the clearer the video images received by users.

在实际应用中，用户需要使用智能终端进行视频通信时，意外碰到网络质量不好，或者网络带宽不够，此时就无法进行高质量的视频通信了。In practical applications, when users need to use smart terminals for video communication, they accidentally encounter poor network quality or insufficient network bandwidth. At this time, high-quality video communication cannot be performed.

发明内容Contents of the invention

鉴于上述问题，本发明实施例提出了一种视频通信的方法和相应的一种视频通信的移动装置，以解决网络质量不好或网络带宽不足导致的视频通信质量差的问题。In view of the above problems, an embodiment of the present invention proposes a video communication method and a corresponding video communication mobile device to solve the problem of poor video communication quality caused by poor network quality or insufficient network bandwidth.

为了解决上述问题，本发明实施例公开了一种视频通信的方法，应用于移动装置，所述移动装置包括本端设备和对端设备，所述方法应用于本端设备与对端设备之间；所述的方法包括：In order to solve the above problems, the embodiment of the present invention discloses a method for video communication, which is applied to a mobile device, the mobile device includes a local device and a peer device, and the method is applied between the local device and the peer device ; The methods include:

当所述本端设备开启视频通信时，所述本端设备采集第一视频通信图像；所述第一视频通信图像包括本端用户的当前面部图像；When the local device starts video communication, the local device collects a first video communication image; the first video communication image includes a current facial image of the local user;

所述本端设备针对所述第一视频通信图像生成确认请求，所述确认请求为确认所述对端设备是否存储有本端用户头像建模数据的请求；所述用户头像建模数据库包括用户的原始面部图像和第一面部特征值；The local device generates a confirmation request for the first video communication image, and the confirmation request is a request to confirm whether the peer device stores the local user avatar modeling data; the user avatar modeling database includes user The original facial image and the first facial feature value;

所述本端设备将所述本端用户的当前面部图像和确认请求发送至所述对端设备；所述对端设备用于针对所述确认请求返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；The local device sends the current facial image of the local user and the confirmation request to the peer device; the peer device is used to return a confirmation result for the confirmation request, and the confirmation result includes the A first confirmation result of the end-user avatar modeling data;

针对所述第一确认结果，所述本端设备从所述本端用户的当前面部图像中提取本端用户的第二面部特征值；For the first confirmation result, the local device extracts a second facial feature value of the local user from the current facial image of the local user;

所述本端设备将所述第二面部特征值发送至所述对端设备；所述对端设备用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。The local device sends the second facial feature value to the peer device; the peer device is configured to use the second facial feature value to update the first facial feature value, and combine the The original facial image of the user at the local end is used to generate and play a second video communication image.

本发明实施例还公开了一种视频通信的方法，应用于移动装置，所述移动装置包括本端设备和对端设备，所述方法应用于本端设备与对端设备之间；所述的方法包括：The embodiment of the present invention also discloses a video communication method, which is applied to a mobile device, and the mobile device includes a local device and a peer device, and the method is applied between the local device and the peer device; the described Methods include:

当所述对端设备开启视频通信状态时，接收所述本端发送的第一视频通信图像，以及所述对端设备是否存储有本端用户头像建模数据的确认请求；所述用户头像建模数据包括本端用户的原始面部图像和第一面部特征值；When the peer device turns on the video communication state, receive the first video communication image sent by the local end, and a confirmation request of whether the peer device has stored the user portrait modeling data of the local end; the user portrait creation Modular data includes the original facial image and the first facial feature value of the user at this end;

所述对端设备基于所述确认请求在用户头像建模数据库中进行匹配；所述用户头像建模数据库包括所述用户头像建模数据；The peer device performs matching in the user avatar modeling database based on the confirmation request; the user avatar modeling database includes the user avatar modeling data;

若匹配成功，则所述对端设备向所述本端设备返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；If the matching is successful, the peer device returns a confirmation result to the local device, and the confirmation result includes a first confirmation result storing the local user avatar modeling data;

接收所述本端设备发送的第二面部特征值；receiving the second facial feature value sent by the local device;

采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。Using the second facial feature value to update the first facial feature value, combined with the local user's original facial image, to generate and play a second video communication image.

相应的，本发明实施例公开了一种视频通信的移动装置，所述移动装置包括本端设备和对端设备，所述视频通信应用于本端设备与对端设备之间；所述的移动装置包括：Correspondingly, the embodiment of the present invention discloses a mobile device for video communication, the mobile device includes a local device and a peer device, and the video communication is applied between the local device and the peer device; the mobile Devices include:

采集模块，用于当所述本端设备开启视频通信时，采集第一视频通信图像；所述第一视频通信图像包括本端用户的当前面部图像；A collection module, configured to collect a first video communication image when the local device starts video communication; the first video communication image includes a current facial image of the local user;

确认请求生成模块，用于针对所述第一视频通信图像生成确认请求，所述确认请求为确认所述对端设备是否存储有本端用户头像建模数据的请求；所述用户头像建模数据库包括用户的原始面部图像和第一面部特征值；A confirmation request generating module, configured to generate a confirmation request for the first video communication image, the confirmation request is a request to confirm whether the peer device stores the local user avatar modeling data; the user avatar modeling database Including the user's original facial image and the first facial feature value;

第一发送模块，用于将所述本端用户的当前面部图像和确认请求发送至所述对端设备；所述对端设备用于针对所述确认请求返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；The first sending module is used to send the current facial image and confirmation request of the local user to the peer device; the peer device is used to return a confirmation result to the confirmation request, and the confirmation result includes storing There is the first confirmation result of the local user avatar modeling data;

第一提取模块，用于针对所述第一确认结果，从所述本端用户的当前面部图像中提取本端用户的第二面部特征值；A first extraction module, configured to extract a second facial feature value of the local user from the current facial image of the local user for the first confirmation result;

第二发送模块，用于将所述第二面部特征值发送至所述对端设备；所述对端设备用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。The second sending module is configured to send the second facial feature value to the peer device; the peer device is configured to use the second facial feature value to update the first facial feature value, and combine The original facial image of the local user generates and plays a second video communication image.

相应的，本发明实施例还公开了一种视频通信的移动装置，所述移动装置包括本端设备和对端设备，所述视频通信应用于本端设备与对端设备之间；所述的移动装置包括：Correspondingly, the embodiment of the present invention also discloses a mobile device for video communication, the mobile device includes a local device and a peer device, and the video communication is applied between the local device and the peer device; the described Mobile devices include:

第一接收模块，用于当所述对端设备开启视频通信状态时，接收所述本端设备发送的第一视频通信图像，以及所述对端设备是否存储有本端用户头像建模数据的确认请求；所述用户头像建模数据包括本端用户的原始面部图像和第一面部特征值；The first receiving module is configured to receive the first video communication image sent by the local device when the peer device is in the video communication state, and whether the peer device stores the local user avatar modeling data Confirm the request; the user avatar modeling data includes the original facial image and the first facial feature value of the local user;

匹配模块，用于基于所述确认请求在用户头像建模数据库中进行匹配；所述用户头像建模数据库包括所述用户头像建模数据；A matching module, configured to perform matching in a user avatar modeling database based on the confirmation request; the user avatar modeling database includes the user avatar modeling data;

确认模块，用于若匹配成功，则向所述本端设备返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；A confirmation module, configured to return a confirmation result to the local device if the matching is successful, the confirmation result including a first confirmation result storing the local user avatar modeling data;

第二接收模块，用于接收所述本端设备发送的第二面部特征值；The second receiving module is configured to receive the second facial feature value sent by the local device;

第一播放模块，用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。The first playing module is configured to use the second facial feature value to update the first facial feature value, and combine the local user's original facial image to generate and play a second video communication image.

本发明实施例包括以下优点：Embodiments of the present invention include the following advantages:

本发明实施例的视频通信应于本端设备与对端设备之间，所述对端设备具有用户头像建模数据库，所述用户头像建模数据库包括用户的面部图像和第一面部特征值。当本端设备开启视频通信时，本端设备采集第一视频通信图像，其中，第一视频通信图像包括本端用户的当前面部图像；然后，本端设备针对第一视频通信图像生成确认请求，其中，确认请求为确认对端设备的用户头像建模数据库中是否存储有本端用户头像建模数据的请求，头像建模数据包括本端用户的原始面部图像和第一面部特征值；接着，本端设备将本端用户的当前面部图像和确认请求发送至对端设备，其中，对端设备用于针对确认请求返回确认结果，确认结果包括存储有本端用户头像建模数据的第一确认结果；针对所述第一确认结果，本端设备从本端用户的当前面部图像中提取本端用户的第二面部特征值，最后，本端设备将第二面部特征值发送至对端设备；对端设备用于采用第二面部特征值更新第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。这样，当本端用户与对端用户在视频通信时，可以不用将本端设备采集到的完整视频图像发送至对端设备，而是将视频图像中本端用户面部的特征值发送至对端设备，对端设备根据本端用户面部的特征值，结合用户头像建模数据，模拟出本端设备采集到的视频图像，从而在网络质量不好，或者网络带宽不够等情况下，依然可以进行高质量的视频通信。The video communication in the embodiment of the present invention should be between the local device and the peer device, the peer device has a user avatar modeling database, and the user avatar modeling database includes the user's facial image and the first facial feature value . When the local device starts video communication, the local device collects a first video communication image, wherein the first video communication image includes the current facial image of the local user; then, the local device generates a confirmation request for the first video communication image, Wherein, the confirmation request is a request for confirming whether the user avatar modeling data of the local user is stored in the user avatar modeling database of the peer device, and the avatar modeling data includes the original facial image and the first facial feature value of the local user; then , the local device sends the current facial image of the local user and the confirmation request to the peer device, wherein the peer device is used to return a confirmation result for the confirmation request, and the confirmation result includes the first image that stores the local user's avatar modeling data. Confirmation result; for the first confirmation result, the local device extracts the second facial feature value of the local user from the current facial image of the local user, and finally, the local device sends the second facial feature value to the peer device ; The peer device is used to update the first facial feature value with the second facial feature value, and combine the original facial image of the local user to generate and play a second video communication image. In this way, when the local user and the peer user are in video communication, instead of sending the complete video image collected by the local device to the peer device, the feature values of the face of the local user in the video image are sent to the peer Device, the peer device simulates the video image collected by the local device according to the feature value of the local user’s face, combined with the user avatar modeling data, so that the network quality can still be performed when the network quality is not good or the network bandwidth is not enough. High-quality video communication.

附图说明Description of drawings

图1是本发明的一种视频通信的方法实施例的步骤流程图一；FIG. 1 is a flow chart 1 of a method embodiment of a video communication method of the present invention;

图2是本发明的一种视频通信的方法实施例的步骤流程图二；Fig. 2 is a flow chart 2 of a video communication method embodiment of the present invention;

图3是本发明的一种视频通信的移动装置实施例的结构框图一；FIG. 3 is a structural block diagram 1 of an embodiment of a video communication mobile device of the present invention;

图4是本发明的一种视频通信的移动装置实施例的结构框图二。FIG. 4 is a second structural block diagram of an embodiment of a mobile device for video communication according to the present invention.

具体实施方式Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

参照图1，示出了本发明的一种视频通信的方法实施例的步骤流程图一，所述方法应用于本端设备与对端设备之间。Referring to FIG. 1 , it shows a flow chart 1 of a video communication method embodiment of the present invention, and the method is applied between a local device and a peer device.

在本发明实施例中，本端设备和对端设备可以具有如下特点：In the embodiment of the present invention, the local device and the peer device may have the following characteristics:

(1)在硬件体系上，设备具备中央处理器、存储器、输入部件和输出部件，也就是说，设备往往是具备通信功能的微型计算机设备。另外，还可以具有多种输入方式，诸如键盘、鼠标、触摸屏、送话器和摄像头等，并可以根据需要进行调整输入。同时，设备往往具有多种输出方式，如受话器、显示屏等，也可以根据需要进行调整；(1) In terms of hardware system, the device has a central processing unit, memory, input components and output components, that is to say, the device is often a microcomputer device with communication functions. In addition, it can also have a variety of input methods, such as keyboard, mouse, touch screen, microphone and camera, etc., and the input can be adjusted according to needs. At the same time, equipment often has multiple output methods, such as receivers, display screens, etc., which can also be adjusted according to needs;

(2)在软件体系上，设备必须具备操作系统，如Windows Mobile、Symbian、Palm、Android、iOS等。同时，这些操作系统越来越开放，基于这些开放的操作系统平台开发的个性化应用程序层出不穷，如通信簿、日程表、记事本、计算器以及各类游戏等，极大程度地满足了个性化用户的需求；(2) In terms of software system, the device must have an operating system, such as Windows Mobile, Symbian, Palm, Android, iOS, etc. At the same time, these operating systems are becoming more and more open, and personalized applications developed based on these open operating system platforms emerge in an endless stream, such as address books, calendars, notepads, calculators, and various games, which greatly satisfy the needs of individual users. Customize the needs of users;

(3)在通信能力上，设备具有灵活的接入方式和高带宽通信性能，并且能根据所选择的业务和所处的环境，自动调整所选的通信方式，从而方便用户使用。设备可以支持GSM、WCDMA、CDMA2000、TDSCDMA、Wi-Fi以及WiMAX等，从而适应多种制式网络，不仅支持语音业务，更支持多种无线数据业务；(3) In terms of communication capabilities, the device has flexible access methods and high-bandwidth communication performance, and can automatically adjust the selected communication method according to the selected business and the environment, so that it is convenient for users to use. The device can support GSM, WCDMA, CDMA2000, TDSCDMA, Wi-Fi and WiMAX, etc., so as to adapt to various standard networks, not only support voice services, but also support various wireless data services;

(4)在功能使用上，设备更加注重人性化、个性化和多功能化。随着计算机技术的发展，设备从“以设备为中心”的模式进入“以人为中心”的模式，集成了嵌入式计算、控制技术、人工智能技术以及生物认证技术等，充分体现了以人为本的宗旨。由于软件技术的发展，设备可以根据个人需求调整设置，更加个性化。同时，设备本身集成了众多软件和硬件，功能也越来越强大。(4) In terms of functional use, the equipment pays more attention to humanization, personalization and multi-function. With the development of computer technology, the equipment has entered the "human-centered" mode from the "equipment-centric" mode, integrating embedded computing, control technology, artificial intelligence technology and biometric authentication technology, which fully embodies the purpose of people-oriented . Thanks to the development of software technology, the device can be adjusted according to individual needs and more personalized. At the same time, the device itself integrates a lot of software and hardware, and its functions are becoming more and more powerful.

所述的方法具体可以包括如下步骤：Described method specifically can comprise the steps:

步骤101，当所述本端设备开启视频通信时，所述本端设备采集第一视频通信图像；所述第一视频通信图像包括本端用户的当前面部图像；Step 101, when the local device starts video communication, the local device collects a first video communication image; the first video communication image includes the current facial image of the local user;

具体的，本端设备可以通过第三方软件进行视频通信，常见的例如有QQ、微信等即时通信软件，也可以通过设备内置的视频通信软件进行视频通信，例如苹果公司iOS和MacOS X内置的一款视频通话软件FaceTime，当然，还可以是其它方式进行的视频通信，本申请对此不作限制。Specifically, the local device can perform video communication through third-party software, such as instant messaging software such as QQ and WeChat, and can also perform video communication through built-in video communication software in the device, such as Apple's built-in iOS and MacOS X. The video call software FaceTime, of course, can also be video communication in other ways, which is not limited in this application.

以QQ为例，当本端用户在QQ中与好友的聊天界面中点击开启视频后，QQ会发送一个视频通信的请求给对端设备，同时在本端设备中也会发送一个中断信号给本端设备的CPU，该中断信号表示QQ需要调用摄像头进行视频图像的采集，当CPU接收到该中断信号后，调用摄像头的接口以启用摄像头。当对端用户在对端设备中同意视频通信的请求后，本端设备的摄像头开始采集第一视频通信图像。其中，所述第一视频通信图像包括本端用户的当前面部图像，也就是在当前开启视频通信的时刻用户的面部图像。Taking QQ as an example, when the local user clicks to start the video in the chat interface with friends in QQ, QQ will send a video communication request to the peer device, and at the same time, the local device will also send an interrupt signal to the local device. The CPU of the terminal device, the interrupt signal indicates that QQ needs to call the camera to collect video images, and when the CPU receives the interrupt signal, it calls the interface of the camera to enable the camera. After the peer user agrees to the video communication request in the peer device, the camera of the local device starts to collect the first video communication image. Wherein, the first video communication image includes the current facial image of the user at the local end, that is, the facial image of the user at the moment when the video communication is currently started.

需要说明的，在本申请中，本端和对端是相对的，例如，有设备A和设备B，如果站在设备A的角度来说，那么设备A就是本端设备，设备B就是对端设备；如果站在设备B的角度来说，那么设备B就是本端设备，设备A就是对端设备。因为视频通信是需要双方设备同时开启摄像头的，所以，当本端设备开启视频通信，并采集视频通信图像时，其实对端设备也开启了视频通信，并采集视频通信图像了，本申请中的本端和对端只是为了方便描述技术方案，并非是对本申请的限制。It should be noted that in this application, the local end and the opposite end are relative. For example, there are equipment A and equipment B. From the perspective of equipment A, then equipment A is the local equipment, and equipment B is the opposite equipment. Device; from the perspective of device B, device B is the local device, and device A is the peer device. Because video communication requires the cameras of both devices to be turned on at the same time, when the local device starts video communication and collects video communication images, the peer device also starts video communication and collects video communication images. In this application The local end and the opposite end are only for the convenience of describing the technical solution, and are not intended to limit the present application.

步骤102，所述本端设备针对所述第一视频通信图像生成确认请求，所述确认请求为确认所述对端设备的用户头像建模数据库中是否存储有本端用户头像建模数据的请求；所述头像建模数据包括本端用户的原始面部图像和第一面部特征值；Step 102, the local device generates a confirmation request for the first video communication image, and the confirmation request is a request to confirm whether the user portrait modeling data of the local terminal is stored in the user portrait modeling database of the peer device ; The avatar modeling data includes the original facial image and the first facial feature value of the local user;

视频图像的本质其实就是动画，所谓动画，就是采用逐帧拍摄对象并连续播放而形成运动的影像技术。动画是通过把人物的表情、动作、变化等分解后画成许多动作瞬间的画幅，再用摄影机连续拍摄成一系列画面，给视觉造成连续变化的图画。它的基本原理与电影、电视一样，都是视觉暂留原理。医学证明人类具有“视觉暂留”的特性，人的眼睛看到一幅画或一个物体后，在0.34秒内不会消失。利用这一原理，在一幅画还没有消失前播放下一幅画，就会给人造成一种流畅的视觉变化效果。The essence of video images is actually animation. The so-called animation is an image technology that uses frame-by-frame shooting of objects and continuous playback to form motion. Animation is to decompose the characters' expressions, actions, changes, etc. and draw them into frames of many action moments, and then use the camera to continuously shoot a series of pictures, causing continuous changes in the visual picture. Its basic principle is the same as that of movies and TV, which is the principle of persistence of vision. Medicine has proved that human beings have the characteristic of "persistence of vision". After seeing a picture or an object, the human eyes will not disappear within 0.34 seconds. Using this principle, playing the next picture before a picture disappears will give people a smooth visual change effect.

本申请正是利用这一原理，从视频图像中获取用户的面部图像和第一面部特征值。具体的，通常来说，1秒的动画包括24张图像，因此，可以从24张图像中分别获取用户的面部图像和面部特征值，然后从中挑选一张作为用户的原始面部图像和第一面部特征值。另外，因为用户在进行视频通信时，通常身体的动作幅度不会太大，而且1秒的时间也非常短，所以，24张图像中用户的面部的位置几乎不会有变化，有变化的基本只是表情，也就是五官的位置，比如嘴巴、眼睛等，因此，利用这一特征，也可以将从24张图像中分别获取的24张用户的面部图像合成一张图像作为用户的原始面部图像，然后从合成后的图像中提取用户的第一面部特征值。This application uses this principle to obtain the user's facial image and first facial feature value from the video image. Specifically, generally speaking, a 1-second animation includes 24 images. Therefore, the user's facial image and facial feature values can be obtained from the 24 images, and then one of them can be selected as the user's original facial image and the first face image. part feature value. In addition, because the user usually does not have too much body movement during video communication, and the time of 1 second is also very short, so the position of the user's face in the 24 images will hardly change. It’s just expressions, that is, the positions of facial features, such as mouth, eyes, etc. Therefore, using this feature, 24 user’s facial images obtained from 24 images can also be synthesized into one image as the user’s original facial image. Then the user's first facial feature value is extracted from the synthesized image.

而面部特征值的提取则依赖于人脸识别技术，人脸识别本质上是三维塑性物体二维投影图像的匹配问题，它的困难体现在：(1)人脸塑性变形(如表情等)的不确定性；(2)人脸模式的多样性(如胡须、发型、眼镜、化妆等)；(3)图像获取过程中的不确定性(如光照的强度、光源方向等)。识别人脸主要依靠人脸上的特征。也就是说依据那些在不同个体上存在的较大差异而对同一个人则比较稳定的度量。由于人脸变化复杂，因此特征表述和特征提取十分困难。The extraction of facial eigenvalues relies on face recognition technology. Face recognition is essentially a matching problem of two-dimensional projection images of three-dimensional plastic objects. Uncertainty; (2) Diversity of face patterns (such as beard, hairstyle, glasses, makeup, etc.); (3) Uncertainty in the image acquisition process (such as the intensity of light, the direction of the light source, etc.). Recognizing a human face mainly relies on the features on the human face. That is to say, it is based on those measures that are relatively stable for the same person and have large differences among different individuals. Due to the complexity of face changes, feature representation and feature extraction are very difficult.

在对人脸图像进行特征提取和分类之前一般需要做几何归一化和灰度归一化。几何归一化是指根据人脸定位结果将图像中人脸变换到同一位置和同样大小，灰度归一化是指对图像进行光照补偿等处理，光照补偿能够一定程度地克服光照变化的影响而提高识别率。Geometric normalization and grayscale normalization are generally required before feature extraction and classification of face images. Geometric normalization refers to transforming the face in the image to the same position and the same size according to the face positioning results. Gray normalization refers to performing illumination compensation on the image. Illumination compensation can overcome the influence of illumination changes to a certain extent. And improve the recognition rate.

提取人脸面部特征值的方法可以有如下几种：There are several methods for extracting face feature values as follows:

(1)基于几何特征的方法(1) Method based on geometric features

人脸由眼睛、鼻子、嘴巴、下巴等部件构成，正因为这些部件的形状、大小和结构上的各种差异才使得世界上每个人脸干差万别，因此对这些部件的形状和结构关系的几何描述，可以作为人脸识别的重要特征。几何特征最早是用于人脸侧面轮廓的描述与识别，首先根据侧面轮廓曲线确定若干显著点，并由这些显著点导出一组用于识别的特征度量如距离、角度等。Jia等由正面灰度图中线附近的积分投影模拟侧面轮廓图是一种很有新意的方法。The human face is composed of parts such as eyes, nose, mouth, and chin. It is precisely because of the various differences in the shape, size and structure of these parts that each person's face in the world is very different. Therefore, the shape and structural relationship of these parts The geometric description of can be used as an important feature of face recognition. Geometric features were first used in the description and recognition of the profile of the face. First, a number of salient points were determined according to the profile curve, and a set of feature metrics for recognition, such as distance and angle, were derived from these salient points. It is a very innovative method that Jia et al. simulate the side profile image by the integral projection near the line in the frontal gray image.

采用几何特征进行正面人脸识别一般是通过提取人眼、口、鼻等重要特征点的位置和眼睛等重要器官的几何形状作为分类特征，但Roder对几何特征提取的精确性进行了实验性的研究，结果不容乐观。可变形模板法可以视为几何特征方法的一种改进，其基本思想是：设计一个参数可调的器官模型，定义一个能量函数，通过调整模型参数使能量函数最小化，此时的模型参数即做为该器官的几何特征。这种方法思想很好，但是存在两个问题，一是能量函数中各种代价的加权系数只能由经验确定，难以推广；二是能量函数优化过程十分耗时，难以实际应用。The use of geometric features for frontal face recognition generally extracts the positions of important feature points such as eyes, mouth, and nose, and the geometric shapes of important organs such as eyes as classification features, but Roder conducted an experimental test on the accuracy of geometric feature extraction. Research, the results are not optimistic. The deformable template method can be regarded as an improvement of the geometric feature method. Its basic idea is to design an organ model with adjustable parameters, define an energy function, and minimize the energy function by adjusting the model parameters. At this time, the model parameters are as the geometric features of the organ. The idea of this method is very good, but there are two problems. One is that the weighting coefficients of various costs in the energy function can only be determined by experience, which is difficult to promote; the other is that the optimization process of the energy function is very time-consuming and difficult to apply in practice.

基于参数的人脸表示可以实现对人脸显著特征的一个高效描述，但它需要大量的前处理和精细的参数选择。同时，采用一般几何特征只描述了部件的基本形状与结构关系，忽略了局部细微特征，造成部分信息的丢失，更适合于做粗分类，而且目前已有的特征点检测技术在精确率上还远不能满足要求，计算量也较大。The parameter-based face representation can achieve an efficient description of the salient features of the face, but it requires a lot of pre-processing and fine parameter selection. At the same time, the use of general geometric features only describes the basic shape and structural relationship of parts, ignoring local subtle features, resulting in the loss of part of the information, which is more suitable for rough classification, and the existing feature point detection technology is not as accurate in terms of accuracy. Far from meeting the requirements, the amount of calculation is also relatively large.

(2)基于特征脸的方法(2) Method based on eigenface

Turk和Pentland提出特征脸的方法，它根据一组人脸训练图像构造主元子空间，由于主元具有脸的形状，也称为特征脸。识别时将测试图像投影到主元子空间上，得到一组投影系数，和各个己知的人脸图像比较进行识别。Pentland等报告了相当好的结果，在200个人的3000幅图像中得到95％的正确识别率，在FERET数据库上对150幅正面人脸象只有一个误识别。但系统在进行特征脸方法之前需要作大量预处理工作，如归一化等。Turk and Pentland proposed the method of eigenface, which constructs the principal component subspace according to a set of face training images. Since the principal component has the shape of the face, it is also called eigenface. During recognition, the test image is projected onto the principal component subspace to obtain a set of projection coefficients, which are compared with known face images for recognition. Pentland et al. reported quite good results, with a 95% correct recognition rate in 3000 images of 200 individuals, and only one misidentification for 150 frontal face images on the FERET database. However, the system needs to do a lot of preprocessing work, such as normalization, before performing the eigenface method.

在传统特征脸的基础上，研究者注意到特征值大的特征人脸识向量(即特征脸)并不一定是分类性能好的方向，据此发展了多种特征(子空间)选择方法，如Peng的双子空间方法、Weng的线性歧义分析方法、Belhumeur的FisherFace方法等。事实上，特征脸方法是一种显式主元分析人脸建模，一些线性自联想、线性压缩型BP网则为隐式的主元分析方法。它们都是把人脸表示为一些向量的加权和，这些向量是训练集叉积阵的主特征向量，Valetin对此作了详细讨论。总之，特征脸方法是一种简单、快速、实用的基于变换系数特征的算法，但由于它在本质上依赖于训练集和测试集图像的灰度相关性，所以还有着很大的局限性。On the basis of traditional eigenfaces, researchers noticed that feature face recognition vectors with large eigenvalues (i.e., eigenfaces) are not necessarily the direction of good classification performance. Based on this, a variety of feature (subspace) selection methods have been developed. Such as Peng's double subspace method, Weng's linear ambiguity analysis method, Belhumeur's FisherFace method, etc. In fact, the eigenface method is an explicit principal component analysis face modeling, and some linear self-association and linear compression BP networks are implicit principal component analysis methods. They both represent the face as a weighted sum of vectors that are the main eigenvectors of the cross-product array of the training set, which is discussed in detail by Valetin. In short, the eigenface method is a simple, fast and practical algorithm based on the transformation coefficient features, but it still has great limitations because it essentially relies on the gray level correlation of the training set and test set images.

(3)局部特征分析LFA方法(3) Local feature analysis LFA method

主元子空间的表示是紧凑的，特征维数大大降低，但它是非局部化的，其核函数的支集扩展在整个坐标空间中，同时它是非拓扑的，某个轴投影后邻近的点与原图像空间中点的邻近性没有任何关系，而局部性和拓扑性对模式分析和分割是理想的特性，似乎这更符合神经信息处理的机制，因此寻找具有这种特性的表达十分重要。基于这种考虑，Atick提出基于局部特征的人脸特征提取与识别方法。这种方法在实际应用取得了很好的效果，它构成了Facelt软件的基础。The representation of the principal subspace is compact, and the feature dimension is greatly reduced, but it is non-localized, and the support set of its kernel function extends in the entire coordinate space, and it is non-topological, and the adjacent points after a certain axis projection It has nothing to do with the proximity of points in the original image space, and locality and topology are ideal properties for pattern analysis and segmentation. It seems that this is more in line with the mechanism of neural information processing, so it is very important to find expressions with this property. Based on this consideration, Atick proposed a face feature extraction and recognition method based on local features. This method has achieved good results in practical applications, and it forms the basis of the Facelt software.

局部特征分析(Local Feature Analysis，LFA)是一种基于特征表示的面像识别技术，源于类似搭建积木的局部统计的原理。LFA基于所有的面像(包括各种复杂的式样)都可以从由很多不能再简化的结构单元子集综合而成。这些单元使用复杂的统计技术而形成，它们代表了整个面像，通常跨越多个像素(在局部区域内)并代表了普遍的面部形状，但并不是通常意义上的面部特征。实际上，面部结构单元比面像的部位要多得多。Local Feature Analysis (LFA) is a face recognition technology based on feature representation, which is derived from the principle of local statistics similar to building blocks. LFA is based on the fact that all face images (including various complex styles) can be synthesized from many subsets of structural units that cannot be simplified any more. Formed using sophisticated statistical techniques, these cells represent the entire face, often spanning multiple pixels (in localized regions) and represent general facial shapes, but not facial features in the usual sense. In fact, there are many more facial structural units than facial image parts.

然而，要综合形成一张精确逼真的面像，只需要整个可用集合中很少的单元子集(12～40特征单元)。要确定身份不仅仅取决于特性单元，还决定于它们的几何结构(比如它们的相关位置)。通过这种方式，LFA将个人的特性对应成一种复杂的数字表达方式，可以进行对比和识别。“面纹”编码方式是根据脸部的本质特征和形状来工作的，它可以抵抗光线、皮肤色调、面部毛发、发型、眼镜、表情和姿态的变化，具有强大的可靠性，使它可以从百万人中精确地辨认出一个人。银晨面像识别系统用的就是这种方法。However, to synthesize an accurate and realistic facial image, only a small subset of units (12-40 feature units) from the entire available set is required. Determining identity depends not only on characteristic cells, but also on their geometry (eg their relative positions). In this way, LFA maps an individual's identity into a complex digital representation that can be compared and identified. The "face pattern" coding method works according to the essential characteristics and shape of the face, it can resist changes in light, skin tone, facial hair, hairstyle, glasses, expression and posture, and has strong reliability, so that it can be read from Precisely identify one person in a million. Yinchen's face recognition system uses this method.

(4)基于弹性模型的方法(4) Method based on elastic model

Lades等人针对畸变不变性的物体识别提出了动态链接模型(DLA)，将物体用稀疏图形来描述，其顶点用局部能量的多尺度描述来标记，边则表示拓扑连接关系并用几何距离来标记，然后应用塑性图形匹配技术来寻找最近的己知图形。Wiscott等人在此基础上作了改进，用FERET等图像库做实验，用300幅人脸图像和另外300幅图像作比较，准确率达到97.3％；此方法的缺点是计算量非常巨大。Lades et al. proposed a dynamic link model (DLA) for object recognition with distortion invariance. The object is described by a sparse graph, its vertices are marked by a multi-scale description of local energy, and the edges represent topological connections and are marked by geometric distances. , and then apply the plastic graph matching technique to find the nearest known graph. Wiscott et al. made improvements on this basis. They used FERET and other image libraries to do experiments, and compared 300 face images with the other 300 images, and the accuracy rate reached 97.3%. The disadvantage of this method is that the amount of calculation is very large.

Nastar将人脸图像I(x，y)建模为可变形的3D网格表面(x，y，I(x，y))，从而将人脸匹配问题转化为可变形曲面的弹性匹配问题。利用有限元分析的方法进行曲面变形，并根据变形的情况判断两张图片是否为同一个人。这种方法的特点在于将空间(x，y)和灰度I(x，y)放在了一个3D空间中同时考虑，实验表明识别结果明显优于特征脸方法。Nastar models the face image I(x, y) as a deformable 3D mesh surface (x, y, I(x, y)), thus transforming the face matching problem into an elastic matching problem of deformable surfaces. Use the finite element analysis method to deform the surface, and judge whether the two pictures are the same person according to the deformation. The characteristic of this method is that the space (x, y) and grayscale I(x, y) are considered in a 3D space at the same time. Experiments show that the recognition results are significantly better than the eigenface method.

Lanitis等提出灵活表现模型方法，通过自动定位人脸的显著特征，将人脸编码为83个模型参数，并利用辨别分析的方法进行基于形状的人脸识别。Lanitis et al. proposed a flexible representation model method, by automatically locating the salient features of the face, encoding the face into 83 model parameters, and using the discriminant analysis method for shape-based face recognition.

(5)神经网络方法(5) Neural network method

目前神经网络方法在人脸识别中的研究方兴未艾。Valentin提出一种方法，首先提取人脸的50个主元，然后用自相关神经网络将它映射到5维空间中，再用一个普通的多层感知器进行判别，对一些简单的测试图像效果较好；Intrator等提出了一种混合型神经网络来进行人脸识别，其中非监督神经网络用于特征提取，而监督神经网络用于分类。Lee等将人脸的特点用六条规则描述，然后根据这六条规则进行五官的定位，将五官之间的几何距离输入模糊神经网络进行识别，效果较一般的基于欧氏距离的方法有较大改善；Laurence等采用卷积神经网络方法进行人脸识别，由于卷积神经网络中集成了相邻像素之间的相关性知识，从而在一定程度上获得了对图像平移、旋转和局部变形的不变性，因此得到非常理想的识别结果；Lin等提出了基于概率决策的神经网络方法(PDBNN)，其主要思想是采用虚拟(正反例)样本进行强化和反强化学习，从而得到较为理想的概率估计结果，并采用模块化的网络结构(OCON)加快网络的学习。这种方法在人脸检测、人脸定位和人脸识别的各个步骤上都得到了较好的应用。其它研究还有：Dai等提出用Hopfield网络进行低分辨率人脸联想与识别；Gutta等提出将RBF与树型分类器结合起来进行人脸识别的混合分类器模型；Phillips等人将Matching Pursuit滤波器用于人脸识别；还有人用统计学习理论中的支撑向量机(SVM)进行人脸分类。At present, the research of neural network method in face recognition is in the ascendant. Valentin proposed a method, first extracting 50 pivots of the face, and then using an autocorrelation neural network to map it into a 5-dimensional space, and then using an ordinary multi-layer perceptron for discrimination, and for some simple test image effects Better; Intrator et al. proposed a hybrid neural network for face recognition, in which an unsupervised neural network is used for feature extraction and a supervised neural network is used for classification. Lee et al. described the characteristics of the face with six rules, and then positioned the facial features according to these six rules, and input the geometric distance between the facial features into the fuzzy neural network for recognition, and the effect was greatly improved compared with the general method based on Euclidean distance. ; Laurence et al. used the convolutional neural network method for face recognition. Since the convolutional neural network integrated the correlation knowledge between adjacent pixels, it obtained the invariance of image translation, rotation and local deformation to a certain extent. , so a very ideal recognition result is obtained; Lin et al. proposed a neural network method based on probability decision-making (PDBNN), the main idea of which is to use virtual (positive and negative examples) samples for reinforcement and anti-reinforcement learning, so as to obtain an ideal probability estimate As a result, a modular network structure (OCON) is adopted to speed up the learning of the network. This method has been well applied in various steps of face detection, face location and face recognition. Other studies include: Dai et al. proposed to use Hopfield network for low-resolution face association and recognition; Gutta et al. proposed a hybrid classifier model combining RBF and tree classifiers for face recognition; Phillips et al. Machines are used for face recognition; others use support vector machines (SVM) in statistical learning theory for face classification.

神经网络方法在人脸识别上的应用比起前述几类方法来有一定的优势，因为对人脸识别的许多规律或规则进行显性的描述是相当困难的，而神经网络方法则可以通过学习的过程获得对这些规律和规则的隐性表达，它的适应性更强，一般也比较容易实现。The application of neural network methods in face recognition has certain advantages over the aforementioned methods, because it is quite difficult to explicitly describe many laws or rules of face recognition, while neural network methods can learn The process of obtaining the implicit expression of these laws and rules is more adaptable and generally easier to implement.

(6)其他方法(6) Other methods

Brunelli等对模板匹配方法作了大量实验，结果表明在尺度、光照、旋转角度等各种条件稳定的情况下，模板匹配的效果优于其他方法，但它对光照、旋转和表情变化比较敏感，影响了它的直接使用。Goudail等人采用局部自相关性作为人脸识别的判断依据，它具有平移不变性，在脸部表情变化时比较稳定。Brunelli et al. have done a lot of experiments on the template matching method, and the results show that the effect of template matching is better than other methods when the scale, illumination, rotation angle and other conditions are stable, but it is sensitive to illumination, rotation and expression changes. affect its direct use. Goudail et al. used local autocorrelation as the judgment basis for face recognition, which has translation invariance and is relatively stable when facial expressions change.

当然，上述方法仅仅只是举例说明，除了上述方法外，其它可以用于提取用户面部特征值的方法都适用于本申请，本申请对此不作限制。Of course, the above method is just an example. Except for the above method, other methods that can be used to extract the facial feature value of the user are applicable to this application, and this application does not limit it.

在本发明实施例中，对端设备具有用户头像建模数据库，所述用户头像建模数据库包括本端用户的原始面部图像和第一面部特征值。例如，假设本端设备A与对端设备B、设备C进行视频通信，对端设备B、C中分别具有用户头像建模数据库，其中就存储有本端用户A的原始面部图像和第一面部特征值。当然，如果站在设备B的角度，那么本端设备就是B，对端设备A、C的头像建模数据库中也都分别具有本端用户B的原始面部图像和第一面部特征值，设备C依此类推。因为本端设备不仅仅只是与一个对端设备进行一对一的视频通信，所以，不管是本端设备还是对端设备，都需要将多个用户的原始面部图像和第一面部特征值进行存储，而存储的位置就是用户头像建模数据库。In the embodiment of the present invention, the peer device has a user avatar modeling database, and the user avatar modeling database includes the original facial image and the first facial feature value of the local user. For example, assume that the local device A performs video communication with the peer devices B and C, and the peer devices B and C respectively have user avatar modeling databases, which store the original facial image and the first facial image of the local user A. part feature value. Of course, from the perspective of device B, the local device is B, and the avatar modeling databases of the peer devices A and C also have the original facial image and the first facial feature value of the local user B respectively. C and so on. Because the local device not only performs one-to-one video communication with a peer device, no matter whether it is the local device or the peer device, it is necessary to compare the original facial images and the first facial feature values of multiple users. storage, and the storage location is the user avatar modeling database.

其中，用户的原始面部图像为之前本端设备与对端设备视频通信时，对端设备存储的本端用户的面部图像。例如，假如当前时间为2017年4月10日，而本端用户A与对端用户B第一次视频通信的时间为2017年4月9日，在本次视频通信时，本端设备A存储了对端用户B的面部图像，对端设备B也存储了本端用户A的面部图像，那么此时本端设备A中的对端用户B的面部图像就是对端用户B的原始面部图像，对端设备B中的本端用户A的面部图像就是本端用户A的原始面部图像。在当前时间，本端用户A再次与对端用户B进行视频通信，此时，本端设备A会针对当前采集到的第一视频通信图像生成确认请求，所述确认请求为确认所述对端设备的用户头像建模数据库中是否存储有本端用户头像建模数据的请求。Wherein, the original facial image of the user is the facial image of the local user stored by the peer device during video communication between the local device and the peer device. For example, if the current time is April 10, 2017, and the time of the first video communication between local user A and peer user B is April 9, 2017, during this video communication, local device A stores The face image of the peer user B is stored in the peer device B, and the face image of the local user A is also stored in the peer device B. At this time, the face image of the peer user B in the local device A is the original face image of the peer user B. The facial image of the local user A in the peer device B is the original facial image of the local user A. At the current time, the local user A conducts video communication with the peer user B again. At this time, the local device A will generate a confirmation request for the currently collected first video communication image, and the confirmation request is to confirm that the peer Whether there is a request for local user avatar modeling data stored in the user avatar modeling database of the device.

需要说明的是，虽然本端用户与对端用户在之前的视频通信时对端设备就存储了本端用户的头像建模数据，但是，有可能出现人为清理、数据损坏等情况，导致对端设备中存储的本端用户的头像建模数据无法使用，因此，每次本端设备与对端设备进行视频通信时，都可以先确认一下对端设备是否存储有本端用户头像建模数据。It should be noted that although the local user and the peer user have stored the avatar modeling data of the local user in the previous video communication, there may be artificial cleaning, data damage, etc., which may cause the peer The avatar modeling data of the local user stored in the device cannot be used. Therefore, each time the local device communicates with the peer device, you can first confirm whether the peer device has the local user's avatar modeling data.

步骤103，所述本端设备将所述本端用户的当前面部图像和确认请求发送至所述对端设备；所述对端设备用于针对所述确认请求返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；Step 103, the local device sends the current facial image of the local user and a confirmation request to the peer device; the peer device is used to return a confirmation result to the confirmation request, and the confirmation result includes storing the first confirmation result of the local user avatar modeling data;

本端设备从第一视频通信图像中获取本端用户的当前面部图像可以在本端设备与对端设备建立视频通信成功之后获取。具体的，本端设备发送的本端用户的当前面部图像是从本端设备采集的第一视频通信图像中提取出来的，可以直接从第一视频通信图像中提取第一张图像作为本端用户的当前面部图像，也可以提取多张图像从中选一张作为本端用户的当前面部图像，还可以提取多张图像然后合并成一张图像作为本端用户的当前面部图像，然后连同确认请求一同发送给对端设备。The acquisition by the local device of the current facial image of the local user from the first video communication image may be obtained after the video communication between the local device and the peer device is successfully established. Specifically, the current facial image of the local user sent by the local device is extracted from the first video communication image collected by the local device, and the first image can be directly extracted from the first video communication image as the local user You can also extract multiple images and select one as the current facial image of the local user, or you can extract multiple images and combine them into one image as the current facial image of the local user, and then send it together with the confirmation request to the peer device.

本端设备从第一视频通信图像中获取本端用户的当前面部图像还可以在本端设备与对端设备建立视频通信成功之前获取。具体的，当本端用户发起与对端设备进行视频通信时，本端设备的摄像头已经在工作了，当对端用户同意建立连接的请求之前，本端用户是可以在本端设备中看到自己的视频图像的，那么，此时本端设备也可以获取本端用户的当前面部图像，这样，当对端用户同意建立连接的请求时，本端设备可以直接将获取到的本端用户的当前面部图像连同确认请求一同发送给对端设备。The acquisition of the current facial image of the local user by the local device from the first video communication image may also be obtained before the video communication between the local device and the peer device is successfully established. Specifically, when the local user initiates a video communication with the peer device, the camera of the local device is already working. Before the peer user agrees to establish a connection, the local user can see the own video image, then the local device can also obtain the current facial image of the local user at this time, so that when the remote user agrees to the request for establishing a connection, the local device can directly transfer the obtained local user’s facial image The current facial image is sent to the peer device together with the confirmation request.

当对端设备接收到本端用户的当前面部图像和确认请求后，会将本端用户的当前面部图像与用户头像数据库中的用户原始面部图像进行匹配，具体的，将用户的当前面部图像与对端设备的用户头像建模数据库中的用户原始面部图像一一进行对比计算相似度，当相似度大于某个阈值时，则认为对端设备的用户头像建模数据库中存储有本端用户的原始面部图像，然后生成确认结果并返回给本端设备。When the peer device receives the current facial image of the local user and the confirmation request, it will match the current facial image of the local user with the user's original facial image in the user avatar database, specifically, match the current facial image of the user with The user's original facial images in the user avatar modeling database of the peer device are compared one by one to calculate the similarity. When the similarity is greater than a certain threshold, it is considered that the user's avatar modeling database of the peer device has the local user's face image. The original facial image, and then generate a confirmation result and return it to the local device.

步骤104，针对所述第一确认结果，所述本端设备从所述本端用户的当前面部图像中提取本端用户的第二面部特征值；Step 104, for the first confirmation result, the local device extracts a second facial feature value of the local user from the current facial image of the local user;

当本端设备接收到对端设备返回的确认结果，得知对端设备的用户头像建模数据库中存储有本端用户的原始面部图像和第一面部特征值后，本端设备就可以从摄像头采集的视频中提取本端用户的第二面部特征值，即当前时刻本端用户的面部特征值。具体的，可以从第一视频通信图像的每一帧图像中提取当前本端用户的第二面部特征值。When the local device receives the confirmation result returned by the peer device and knows that the original facial image and the first facial feature value of the local user are stored in the user avatar modeling database of the peer device, the local device can start from The second facial feature value of the local user is extracted from the video collected by the camera, that is, the facial feature value of the local user at the current moment. Specifically, the second facial feature value of the current local user may be extracted from each frame of the first video communication image.

步骤105，所述本端设备将所述第二面部特征值发送至所述对端设备；所述对端设备用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。Step 105, the local device sends the second facial feature value to the peer device; the peer device is configured to use the second facial feature value to update the first facial feature value, and Combined with the original facial image of the local user, generate and play a second video communication image.

本端设备在获得了本端用户的第二面部特征值后，将所述第二面部特征值发送至对端设备，对端设备接收到的第二面部特征值，即用户的当前面部特征值，同时从用户头像建模数据库中获取用户的原始面部图像和第一面部特征值，然后采用第二面部特征值更新第一面部特征值，结合用户的原始面部图像，从而生成第二视频通信图像，并进行播放。After obtaining the second facial feature value of the local user, the local device sends the second facial feature value to the peer device, and the second facial feature value received by the peer device is the user's current facial feature value , at the same time obtain the user's original facial image and the first facial feature value from the user avatar modeling database, then use the second facial feature value to update the first facial feature value, and combine the user's original facial image to generate the second video Communicate images and play them back.

因为对端的用户头像建模数据库中存储的是一张本端用户的原始面部图像，以及该图像中用户的第一面部特征值，所以，当对端设备将每一张当前本端用户的面部特征值更新本端用户原始面部图像中的第一面部特征值时，就可以在原始面部图像的基础上，结合第二面部特征值，生成本端用户当前的面部图像，然后将每一张图像连续播放，从而形成视频图像。Because the user avatar modeling database at the peer end stores an original facial image of the local user and the first facial feature value of the user in the image, so when the peer device stores each facial feature value of the current local user When updating the first facial feature value in the original facial image of the local user, the current facial image of the local user can be generated on the basis of the original facial image and combined with the second facial feature value, and then each image Continuous playback, thus forming a video image.

当然，因为本端用户的当前面部图像和本端用户的原始面部图像的相似度非常高，所以，在本发明实施例中，对端设备除了可以采用第二面部特征值更新第一面部特征值，然后结合本端用户的原始面部图像，生成第二视频通信图像外，也可以采用第二面部特征值更新第一面部特征值，然后结合本端设备发送过来的本端用户的当前面部图像，生成第二视频通信图像。这样，本端设备就可以在不发送完整视频通信图像的情况下，使得对端设备也可以根据当前本端用户的面部特征值，模拟出本端用户的当前面部图像了。Of course, because the similarity between the current facial image of the local user and the original facial image of the local user is very high, in this embodiment of the present invention, the peer device can use the second facial feature value to update the first facial feature value, and then combined with the original face image of the local user to generate the second video communication image, the second facial feature value can also be used to update the first facial feature value, and then combined with the current face of the local user sent by the local device image to generate a second video communication image. In this way, the local device can simulate the current facial image of the local user according to the facial feature value of the current local user without sending a complete video communication image.

在本发明一种优选实施例中，所述确认结果还包括未存储有本端用户头像建模数据的第二确认结果；In a preferred embodiment of the present invention, the confirmation result further includes a second confirmation result that does not store the avatar modeling data of the local user;

针对所述第二确认结果，所述的方法还包括：For the second confirmation result, the method also includes:

所述本端设备基于所述第一视频通信图像的前n秒视频通信图像生成本端用户的头像建模数据；所述n为不小于1的整数；The local device generates the avatar modeling data of the local user based on the video communication image of the first n seconds of the first video communication image; the n is an integer not less than 1;

所述本端设备将所述前n秒视频通信图像，以及生成的所述本端用户的头像建模数据发送至所述对端设备；所述对端设备用于播放所述前n秒视频通信图像并将所述本端用户的头像建模数据存储至所述用户头像建模数据库；The local device sends the video communication image of the first n seconds and the generated avatar modeling data of the local user to the peer device; the peer device is used to play the video of the previous n seconds communicating images and storing the avatar modeling data of the local user in the user avatar modeling database;

所述本端设备从所述第一视频通信图像的第n+1秒开始提取本端用户的第二面部特征值；The local device starts to extract the second facial feature value of the local user from the n+1th second of the first video communication image;

所述本端设备将所述第二面部特征值发送至所述对端设备；所述对端设备用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的当前面部图像，生成并播放第二视频通信图像。The local device sends the second facial feature value to the peer device; the peer device is configured to use the second facial feature value to update the first facial feature value, and combine the Generate and play a second video communication image based on the current facial image of the user at the local end.

具体的，当对端设备的用户头像建模数据库中没有存储本端用户的头像建模数据时，就需要即时生成本端用户的头像建模数据。可以是本端设备从第一视频通信图像中提取前n秒的完整视频通信图像中提取本端用户的当前面部图像和面部特征值，然后连同所述前n秒的完整视频通信图像，一起发送至对端设备。因为对端设备中没用本端用户的面部图像和面部特征值，所以，对端设备在接收到本端设备发送的前n秒的完整视频通信图像、本端用户的当前面部图像和面部特征值后，一边播放接收到的前n秒的完整视频通信图像，一边将本端用户的当前面部图像和面部特征值存储至用户头像建模数据库中。然后从第n+1秒开始，本端设备就不再发送完整的视频通信图像了，而是直接提取视频图像中的第二面部特征值发送至对端设备，而对端设备因为在前n秒内接收到了本端用户的当前面部图像和第一面部特征值，所以就可以采用第二面部特征值更新第一面部特征值，并结合本端用户的当前面部图像，生成第二视频通信图像，然后进行播放。Specifically, when the avatar modeling data of the local user is not stored in the user avatar modeling database of the peer device, it is necessary to generate the avatar modeling data of the local user in real time. It may be that the local device extracts the current facial image and facial feature value of the local user from the complete video communication image of the first n seconds from the first video communication image, and then sends it together with the complete video communication image of the previous n seconds to the peer device. Because the peer device does not use the facial image and facial features of the local user, the peer device receives the complete video communication image of the first n seconds sent by the local device, the current facial image and facial features of the local user After the value, the current facial image and facial feature values of the local user are stored in the user avatar modeling database while playing the received complete video communication image of the first n seconds. Then from the n+1th second, the local device no longer sends the complete video communication image, but directly extracts the second facial feature value in the video image and sends it to the peer device. The current facial image and the first facial feature value of the local user are received within seconds, so the second facial feature value can be used to update the first facial feature value, and combined with the current facial image of the local user, a second video is generated Communicate images, then play them back.

在本发明一种优选实施例中，针对所述第二确认结果，所述的方法还包括：In a preferred embodiment of the present invention, for the second confirmation result, the method further includes:

所述本端设备将所述前n秒视频通信图像发送至所述对端设备；所述对端设备用于播放所述前n秒视频通信图像，并基于所述前n秒视频通信图像生成本端用户的头像建模数据，并将生成的本端用户的头像建模数据存储至所述用户头像建模数据库；The local device sends the video communication image of the previous n seconds to the peer device; the peer device is used to play the video communication image of the previous n seconds, and generate an image based on the video communication image of the previous n seconds. costing the avatar modeling data of the end user, and storing the generated avatar modeling data of the local user in the user avatar modeling database;

具体的，除了在本端设备中生成本端用户的当前面部图像和第一面部特征值，然后发送至对端设备外，还可以在对端设备中生成本端用户的当前面部图像和第一面部特征值。Specifically, in addition to generating the current facial image and the first facial feature value of the local user in the local device and then sending them to the peer device, the current facial image and the first facial feature value of the local user can also be generated in the peer device. A facial feature value.

本端设备从第一视频通信图像中提取前n秒的完整的视频图像后，可以直接将前n秒的完整的视频图像发送至对端设备，对端设备接收到前n秒的完整的视频图像后，也可以从前n秒的完整的视频图像中提取出本端用户的当前面部图像和第一面部特征值，并将生成的本端用户的当前面部图像和第一面部特征值存储至用户头像建模数据库。当从第n+1秒开始，对端设备就接收本端设备发送的第二面部特征值，采用第二面部特征值更新第一面部特征值，并结合生成的当前本端用户的面部图像，生成第二视频通信图像，然后进行播放。After the local device extracts the complete video images of the first n seconds from the first video communication image, it can directly send the complete video images of the first n seconds to the peer device, and the peer device receives the complete video images of the previous n seconds After the image, the current facial image and the first facial feature value of the local user can also be extracted from the complete video image of the previous n seconds, and the generated current facial image and first facial feature value of the local user are stored To the user avatar modeling database. When starting from the n+1th second, the peer device receives the second facial feature value sent by the local device, uses the second facial feature value to update the first facial feature value, and combines the generated facial image of the current local user , generate a second video communication image, and then play it.

在本发明一种优选实施例中，所述生成本端用户的头像建模数据的步骤包括：In a preferred embodiment of the present invention, the step of generating the avatar modeling data of the local user includes:

获取所述前n秒视频图像的每一帧图像；所述n为不小于1的整数；Obtain each frame image of the video image in the first n seconds; the n is an integer not less than 1;

基于所述每一帧图像，生成本端用户的当前面部图像和第一面部特征值。Based on each frame of image, generate the current facial image and the first facial feature value of the local user.

例如，通常来说1秒的视频图像包括24张静态图像，当n取1时，可以从24张静态图像中分别提取本端用户的面部图像和面部特征值，也就是会提取24个本端用户的面部图像和24个面部特征值，然后从24个本端用户的面部图像和24个面部特征值中挑选一个本端用户的面部图像和面部特征值，作为本端用户的当前面部图像和第一面部特征值。也可以将24个本端用户的面部图像合并成一张图像作为本端用户的当前面部图像，并将该图像的面部特征值作为第一面部特征值。For example, generally speaking, a 1-second video image includes 24 static images. When n is set to 1, the facial image and facial feature value of the local user can be extracted from the 24 static images, that is, 24 local users will be extracted. The user's facial image and 24 facial feature values, and then select a local user's facial image and facial feature value from the 24 local user's facial images and 24 facial feature values, as the current facial image and facial feature value of the local user The first face feature value. It is also possible to combine 24 facial images of the local user into one image as the current facial image of the local user, and use the facial feature value of the image as the first facial feature value.

在实际应用中，因为人脸识别的算法已经非常成熟了，通过几张或者十几张图像就可以提取出本端用户的第一面部特征值，也就是零点几秒的时间，所以，n除了是不小于1的整数外，也可以取小数，比如0.5秒、0.8秒等，进一步，除了n+1外，也可以是n+0.1、n+0.5等，n和n+1仅仅只是为了清楚地说明本申请的技术方案，具体的数值可以根据实际需求进行设定。In practical applications, because the face recognition algorithm is very mature, the first facial feature value of the local user can be extracted through a few or a dozen images, that is, a few tenths of a second, so, n In addition to integers not less than 1, decimals can also be taken, such as 0.5 seconds, 0.8 seconds, etc. Further, in addition to n+1, it can also be n+0.1, n+0.5, etc. n and n+1 are just for The technical solution of the present application is clearly described, and specific numerical values can be set according to actual needs.

另外，本端设备除了可以将第二面部特征值发送至对端设备外，还可以同时发送补光参数。具体的，当本端设备自在光线较差的环境下时，本端设备在发送第二面部特征值时，还可以同时发送补光参数给对端设备。其中，补光参数可以包括是否需要补光，补光的程序等。对端设备在接收到第二面部特征时以及补光参数后，可以根据补光参数通过算法来对第二视频通信图像进行模拟补光，这样，即使本端设备牌光线较差的环境下时，对端设备也可以还原出光线较好的视频通信图像了。In addition, in addition to sending the second facial feature value to the peer device, the local device can also send fill light parameters at the same time. Specifically, when the local device is in an environment with poor light, when sending the second facial feature value, the local device may also send supplementary light parameters to the peer device at the same time. Wherein, the supplementary light parameter may include whether supplementary light is required, a program of supplementary light, and the like. When the peer device receives the second facial feature and the supplementary light parameters, it can perform simulated supplementary light on the second video communication image through an algorithm according to the supplementary light parameters, so that even if the local device is in a poorly lit environment , the peer device can also restore the video communication image with better light.

需要说明的是，对于方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本发明实施例并不受所描述的动作顺序的限制，因为依据本发明实施例，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作并不一定是本发明实施例所必须的。It should be noted that, for the method embodiment, for the sake of simple description, it is expressed as a series of action combinations, but those skilled in the art should know that the embodiment of the present invention is not limited by the described action sequence, because According to the embodiment of the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions involved are not necessarily required by the embodiments of the present invention.

参照图2，示出了本发明的一种视频通信的方法实施例的步骤流程图二，所述方法应用于本端设备与对端设备之间。Referring to FIG. 2 , it shows a flow chart 2 of a video communication method embodiment of the present invention, and the method is applied between a local device and a peer device.

步骤201，当所述对端设备开启视频通信状态时，接收所述本端发送的第一视频通信图像，以及所述对端设备是否存储有本端用户头像建模数据的确认请求；所述头像建模数据包括本端用户的原始面部图像和第一面部特征值；Step 201, when the peer device is in the video communication state, receive the first video communication image sent by the local end, and a confirmation request as to whether the peer device stores the user avatar modeling data of the local end; the The avatar modeling data includes the original facial image and the first facial feature value of the local user;

步骤202，所述对端设备基于所述确认请求在用户头像建模数据库中进行匹配；所述用户头像建模数据库包括所述用户头像建模数据；Step 202, the peer device performs matching in the user avatar modeling database based on the confirmation request; the user avatar modeling database includes the user avatar modeling data;

步骤203，若匹配成功，则所述对端设备向所述本端设备返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；Step 203, if the matching is successful, the peer device returns a confirmation result to the local device, and the confirmation result includes a first confirmation result in which the local user avatar modeling data is stored;

步骤204，接收所述本端设备发送的第二面部特征值；Step 204, receiving the second facial feature value sent by the local device;

步骤205，采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。Step 205: Using the second facial feature value to update the first facial feature value, and combining the local user's original facial image, to generate and play a second video communication image.

针对所述第二确认结果，则本端设备向所述对端设备发送确认结果后，所述的方法还包括：Regarding the second confirmation result, after the local device sends the confirmation result to the peer device, the method further includes:

接收所述本端设备发送的所述第一视频通信图像的前n秒视频通信图像，以及基于所述前n秒视频通信图像生成的本端用户的头像建模数据；所述n为不小于1的整数；Receive the video communication image of the first n seconds of the first video communication image sent by the local device, and the avatar modeling data of the local user generated based on the video communication image of the previous n seconds; the n is not less than an integer of 1;

播放所述前n秒视频通信图像并将所述本端用户的头像建模数据存储至所述用户头像建模数据库；Playing the first n seconds of video communication images and storing the avatar modeling data of the local user in the user avatar modeling database;

所述对端设备从第n+1秒开始，接收所述本端设备发送的第二面部特征值；The peer device receives the second facial feature value sent by the local device starting from the n+1 second;

采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的当前面部图像，生成并播放第二视频通信图像。Using the second facial feature value to update the first facial feature value, combined with the current facial image of the local user, to generate and play a second video communication image.

在本发明一种优选实施例中，针对所述第二确认结果，则本端设备向所述对端设备发送确认结果后，所述的方法还包括：In a preferred embodiment of the present invention, for the second confirmation result, after the local device sends the confirmation result to the peer device, the method further includes:

接收所述本端设备发送的第一视频通信图像的前n秒视频通信图像；receiving the first n seconds of video communication images of the first video communication images sent by the local device;

基于所述前n秒视频通信图像生成本端用户的头像建模数据；所述n为不小于1的整数；Generate the avatar modeling data of the local user based on the video communication images of the first n seconds; the n is an integer not less than 1;

播放所述前n秒视频通信图像并将生成的本端用户的头像建模数据存储至所述用户头像建模数据库；Playing the first n seconds of video communication images and storing the generated avatar modeling data of the local user in the user avatar modeling database;

在本发明一种优选实施例中，所述生成本端用户头像建模数据的步骤包括：In a preferred embodiment of the present invention, the step of generating the avatar modeling data of the local user includes:

对于对端设备的方法实施例而言，由于其与本端设备的方法实施例基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可As for the method embodiment of the peer device, since it is basically similar to the method embodiment of the local device, the description is relatively simple. For relevant information, please refer to the part of the description of the method embodiment.

参照图3，示出了本发明的一种视频通信的移动装置实施例的结构框图一，具体可以包括如下模块：Referring to FIG. 3 , it shows a structural block diagram 1 of a video communication mobile device embodiment of the present invention, which may specifically include the following modules:

采集模块301，用于当所述本端设备开启视频通信时，采集第一视频通信图像；所述第一视频通信图像包括本端用户的当前面部图像；The collection module 301 is configured to collect a first video communication image when the local device starts video communication; the first video communication image includes a current facial image of the local user;

确认请求生成模块302，用于针对所述第一视频通信图像生成确认请求，所述确认请求为确认所述对端设备是否存储有本端用户头像建模数据的请求；所述用户头像建模数据库包括用户的原始面部图像和第一面部特征值；Confirmation request generating module 302, configured to generate a confirmation request for the first video communication image, the confirmation request is a request to confirm whether the peer device stores the local user avatar modeling data; the user avatar modeling The database includes the user's original facial image and first facial feature values;

第一发送模块303，用于将所述本端用户的当前面部图像和确认请求发送至所述对端设备；所述对端设备用于针对所述确认请求返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；The first sending module 303 is configured to send the current facial image and confirmation request of the local user to the peer device; the peer device is used to return a confirmation result to the confirmation request, and the confirmation result includes storing the first confirmation result of the local user avatar modeling data;

第一提取模块304，用于针对所述第一确认结果，从所述本端用户的当前面部图像中提取本端用户的第二面部特征值；The first extraction module 304 is configured to extract a second facial feature value of the local user from the current facial image of the local user for the first confirmation result;

第二发送模块305，用于将所述第二面部特征值发送至所述对端设备；所述对端设备用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。The second sending module 305 is configured to send the second facial feature value to the peer device; the peer device is configured to use the second facial feature value to update the first facial feature value, and Combined with the original facial image of the local user, generate and play a second video communication image.

针对所述第二确认结果，所述的装置还包括：Regarding the second confirmation result, the device further includes:

头像建模数据生成模块，用于基于所述第一视频通信图像的前n秒视频通信图像生成本端用户的头像建模数据；所述n为不小于1的整数；An avatar modeling data generating module, configured to generate the avatar modeling data of the local user based on the first n seconds of video communication images of the first video communication image; the n is an integer not less than 1;

第三发送模块，用于将所述前n秒视频通信图像，以及生成的所述本端用户的头像建模数据发送至所述对端设备；所述对端设备用于播放所述前n秒视频通信图像并将所述本端用户的头像建模数据存储至所述用户头像建模数据库；The third sending module is used to send the video communication images of the previous n seconds and the generated avatar modeling data of the local user to the peer device; the peer device is used to play the previous n seconds second video communication image and store the avatar modeling data of the local user in the user avatar modeling database;

第二提取模块，用于从所述第一视频通信图像的第n+1秒开始提取本端用户的第二面部特征值；The second extraction module is used to extract the second facial feature value of the local user from the n+1th second of the first video communication image;

第四发送模块，用于将所述第二面部特征值发送至所述对端设备；所述对端设备用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的当前面部图像，生成并播放第二视频通信图像。A fourth sending module, configured to send the second facial feature value to the peer device; the peer device is configured to use the second facial feature value to update the first facial feature value, and combine The current facial image of the local user generates and plays a second video communication image.

在本发明一种优选实施例中，针对所述第二确认结果，所述的装置还包括：In a preferred embodiment of the present invention, for the second confirmation result, the device further includes:

第五发送模块，用于将所述前n秒视频通信图像发送至所述对端设备；所述对端设备用于播放所述前n秒视频通信图像，并基于所述前n秒视频通信图像生成本端用户的头像建模数据，并将生成的本端用户的头像建模数据存储至所述用户头像建模数据库；The fifth sending module is configured to send the video communication image of the previous n seconds to the peer device; the peer device is used to play the video communication image of the previous n seconds, and based on the video communication image of the previous n seconds The image generates the avatar modeling data of the local user, and stores the generated avatar modeling data of the local user into the user avatar modeling database;

第六发送模块，用于将所述第二面部特征值发送至所述对端设备；所述对端设备用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的当前面部图像，生成并播放第二视频通信图像。The sixth sending module is configured to send the second facial feature value to the peer device; the peer device is configured to use the second facial feature value to update the first facial feature value, and combine The current facial image of the local user generates and plays a second video communication image.

在本发明一种优选实施例中，所述头像建模数据生成模块包括：In a preferred embodiment of the present invention, the avatar modeling data generation module includes:

第一获取子模块，用于获取所述前n秒视频图像的每一帧图像；所述n为不小于1的整数；The first acquisition submodule is used to acquire each frame image of the video image in the first n seconds; the n is an integer not less than 1;

第一数据生成子模块，用于基于所述每一帧图像，生成本端用户的当前面部图像和第一面部特征值。The first data generation sub-module is configured to generate the current facial image and the first facial feature value of the local user based on each frame of image.

对于装置实施例一而言，由于其与方法实施例一基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。As for the device embodiment 1, since it is basically similar to the method embodiment 1, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

参照图4，示出了本发明的一种视频通信的装置实施例的结构框图二，具体可以包括如下模块：Referring to FIG. 4 , it shows a structural block diagram 2 of a video communication device embodiment of the present invention, which may specifically include the following modules:

第一接收模块401，用于当所述对端设备开启视频通信状态时，接收所述本端设备发送的第一视频通信图像，以及所述对端设备是否存储有本端用户头像建模数据的确认请求；所述用户头像建模数据包括本端用户的原始面部图像和第一面部特征值；The first receiving module 401 is configured to receive the first video communication image sent by the local device when the peer device is in the video communication state, and whether the peer device stores the local user avatar modeling data A confirmation request; the user avatar modeling data includes the original facial image and the first facial feature value of the user at this end;

匹配模块402，用于基于所述确认请求在用户头像建模数据库中进行匹配；所述用户头像建模数据库包括所述用户头像建模数据；A matching module 402, configured to perform matching in the user avatar modeling database based on the confirmation request; the user avatar modeling database includes the user avatar modeling data;

确认模块403，用于若匹配成功，则向所述本端设备返回确认结果，所述确认结果包括存储有本端用户头像建模数据的第一确认结果；A confirmation module 403, configured to return a confirmation result to the local device if the matching is successful, the confirmation result including a first confirmation result storing the local user avatar modeling data;

第二接收模块404，用于接收所述本端设备发送的第二面部特征值；The second receiving module 404 is configured to receive the second facial feature value sent by the local device;

第一播放模块405，用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的原始面部图像，生成并播放第二视频通信图像。The first playing module 405 is configured to use the second facial feature value to update the first facial feature value, and combine the local user's original facial image to generate and play a second video communication image.

第三接收模块，用于接收所述本端设备发送的所述第一视频通信图像的前n秒视频通信图像，以及基于所述前n秒视频通信图像生成的本端用户的头像建模数据；所述n为不小于1的整数；A third receiving module, configured to receive the video communication images of the first n seconds of the first video communication image sent by the local device, and the avatar modeling data of the local user generated based on the video communication images of the previous n seconds ; The n is an integer not less than 1;

第二播放模块，用于播放所述前n秒视频通信图像并将所述本端用户的头像建模数据存储至所述用户头像建模数据库；The second playing module is used to play the video communication image of the first n seconds and store the avatar modeling data of the local user in the user avatar modeling database;

第四接收模块，用于从第n+1秒开始，接收所述本端设备发送的第二面部特征值；The fourth receiving module is configured to receive the second facial feature value sent by the local device starting from the n+1th second;

第三播放模块，用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的当前面部图像，生成并播放第二视频通信图像。The third playing module is configured to use the second facial feature value to update the first facial feature value, and combine the current facial image of the local user to generate and play a second video communication image.

第五接收模块，用于接收所述本端设备发送的第一视频通信图像的前n秒视频通信图像；The fifth receiving module is configured to receive the first n seconds of video communication images of the first video communication images sent by the local device;

生成模块，用于基于所述前n秒视频通信图像生成本端用户的头像建模数据；所述n为不小于1的整数；A generating module, configured to generate the avatar modeling data of the local user based on the video communication images of the first n seconds; the n is an integer not less than 1;

第四播放模块，用于播放所述前n秒视频通信图像并将生成的本端用户的头像建模数据存储至所述用户头像建模数据库；The fourth playing module is used to play the video communication image of the first n seconds and store the generated avatar modeling data of the local user in the user avatar modeling database;

第六接收模块，用于从第n+1秒开始，接收所述本端设备发送的第二面部特征值；The sixth receiving module is configured to receive the second facial feature value sent by the local device starting from the n+1th second;

第五播放模块，用于采用所述第二面部特征值更新所述第一面部特征值，并结合所述本端用户的当前面部图像，生成并播放第二视频通信图像。The fifth playing module is configured to use the second facial feature value to update the first facial feature value, and combine the local user's current facial image to generate and play a second video communication image.

在本发明一种优选实施例中，所述生成模块包括：In a preferred embodiment of the present invention, the generating module includes:

第二获取子模块，用于获取所述前n秒视频图像的每一帧图像；所述n为不小于1的整数；The second acquisition submodule is used to acquire each frame image of the video image in the first n seconds; the n is an integer not less than 1;

第二数据生成子模块，用于基于所述每一帧图像，生成本端用户的当前面部图像和第一面部特征值。The second data generation sub-module is configured to generate the current facial image and the first facial feature value of the local user based on each frame of image.

对于装置实施例二而言，由于其与方法实施例二基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。As for the second embodiment of the device, since it is basically similar to the second embodiment of the method, the description is relatively simple, and for relevant parts, please refer to the part of the description of the method embodiment.

本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other.

本领域内的技术人员应明白，本发明实施例的实施例可提供为方法、装置、或计算机程序产品。因此，本发明实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, devices, or computer program products. Accordingly, embodiments of the invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明实施例是参照根据本发明实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the present invention are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor or processor of other programmable data processing terminal equipment to produce a machine such that instructions executed by the computer or processor of other programmable data processing terminal equipment Produce means for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing terminal to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the The instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上，使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded into a computer or other programmable data processing terminal equipment, so that a series of operational steps are performed on the computer or other programmable terminal equipment to produce computer-implemented processing, thereby The instructions executed above provide steps for implementing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

尽管已描述了本发明实施例的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明实施例范围的所有变更和修改。Having described preferred embodiments of embodiments of the present invention, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, the appended claims are intended to be construed to cover the preferred embodiment and all changes and modifications which fall within the scope of the embodiments of the present invention.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should also be noted that in this text, relational terms such as first and second etc. are only used to distinguish one entity or operation from another, and do not necessarily require or imply that these entities or operations, any such actual relationship or order exists. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or terminal equipment comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements identified, or also include elements inherent in such a process, method, article, or end-equipment. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or terminal device comprising said element.

以上对本发明所提供的一种视频通信的方法和一种视频通信的移动装置，进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。A method for video communication and a mobile device for video communication provided by the present invention have been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only It is used to help understand the method of the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. In summary, this The content of the description should not be construed as limiting the present invention.