CN106557730A

Movatterモバイル変換

Info

Publication number: CN106557730A
Application number: CN201510641456.0A
Authority: CN
Inventors: 王务志; 王军
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Priority date: 2015-09-30
Filing date: 2015-09-30
Publication date: 2017-04-05

Abstract

Translated fromChinese

本发明提供一种视频通话过程中的人脸纠正方法，步骤为：响应于视频通话而启动摄像头采集视频图像；对所述采集的视频图像进行人脸检测以获取多角度人脸图像；基于获取的多角度人脸图像对人脸进行三维建模；基于视频中的人脸的位置变化，调整三维人脸模型，以纠正三维人脸图像；传输纠正后的三维人脸图像。同时，还提供一种视频通话过程中的人脸纠正装置。本发明所述方法及装置使得视频通话过程中实现了通话双方的眼神交流，克服了手机、笔记本等设备将摄像头设计在屏幕顶端的设计缺陷。

The invention provides a face correction method in the process of video call, the steps are: in response to the video call, start a camera to collect video images; perform face detection on the collected video images to obtain multi-angle face images; The multi-angle face image is used to perform three-dimensional modeling of the face; based on the position change of the face in the video, the three-dimensional face model is adjusted to correct the three-dimensional face image; the corrected three-dimensional face image is transmitted. At the same time, a face correction device in the video call process is also provided. The method and device of the present invention enable eye contact between the two parties during the video call, and overcome the design defect that the camera is designed at the top of the screen in mobile phones, notebooks and other equipment.

Description

Translated fromChinese

视频通话过程中的人脸纠正方法及装置Face correction method and device during video call

技术领域technical field

本发明涉及图像处理领域，具体而言，本发明涉及一种视频通话过程中的人脸纠正方法及相应装置。The present invention relates to the field of image processing, in particular, the present invention relates to a face correction method and a corresponding device during a video call.

背景技术Background technique

随着通信技术的发展，网速的提升，越来越多的用户不满足于仅仅通过语音进行通话，更多的通信软件提供了语音和视频两种通信方式，使得用户可以方便的通过视频进行通话。智能手机由于其携带方便，而且集成的功能越来越丰富，越来越多的移动APP被开发，用户可以很方便通过手机随时随地进行视频通话。但由于手机终端的体积有限，又为了扩充显示屏幕的面积，几乎所有的手机厂商都将摄像头设计到手机的顶端，这种设计缺陷使得用户在通过摄像头进行视频通话过程中，眼神聚焦摄像头时无法看屏幕上显示的对方人脸，看屏幕时又无法眼神聚焦到摄像头，导致双方在进行视频通话时，无法进行眼神交流。并且，由于用户在使用手机时，通常习惯俯视镜头，使得镜头总是处于仰视角度拍摄人脸，导致显示在手机屏幕上的人脸图像都是仰视状态下的，使得用户在进行视频通话时得不到优良的视觉体验。With the development of communication technology and the improvement of network speed, more and more users are not satisfied with just talking through voice. More communication software provides two communication methods of voice and video, so that users can conveniently communicate through video call. Smartphones are easy to carry and have more and more integrated functions. More and more mobile APPs have been developed. Users can easily make video calls through mobile phones anytime and anywhere. However, due to the limited volume of mobile phone terminals and in order to expand the area of the display screen, almost all mobile phone manufacturers design the camera on the top of the mobile phone. This design defect makes it difficult for users to focus on the camera when making video calls through the camera. Looking at the other person's face displayed on the screen, and unable to focus on the camera while looking at the screen, the two parties cannot make eye contact when making a video call. And, because the user is usually accustomed to looking down at the camera when using a mobile phone, the camera is always at an angle of looking up to shoot a face, causing the face images displayed on the screen of the mobile phone to be in a state of looking up, which makes the user feel uncomfortable when making a video call. Less than a good visual experience.

发明内容Contents of the invention

本发明的目的旨在解决上述至少一个问题，提供一种视频通话过程中的人脸纠正方法及相应装置。The object of the present invention is to solve at least one of the above problems, and provide a face correction method and corresponding device during a video call.

为了实现上述目的，本发明提供一种视频通话过程中的人脸纠正方法，包括如下步骤：In order to achieve the above object, the present invention provides a face correction method in the video call process, comprising the following steps:

响应于视频通话而启动摄像头采集视频图像；Starting the camera to capture video images in response to the video call;

对所述采集的视频图像进行人脸检测以获取多角度的人脸图像；Perform face detection on the collected video images to obtain multi-angle face images;

基于获取的多角度人脸图像对人脸进行三维建模；Carry out three-dimensional modeling of the face based on the obtained multi-angle face image;

基于视频中的人脸的位置变化，调整三维人脸模型，以纠正三维人脸图像；Adjusting the 3D face model based on the position change of the face in the video to correct the 3D face image;

传输纠正后的三维人脸图像。Transmit the rectified 3D face image.

具体的，还包括采用Adaboost算法训练分类器以进行人脸检测。Specifically, it also includes using the Adaboost algorithm to train a classifier for face detection.

具体的，所述采用Adaboost算法训练分类器的具体步骤如下：Specifically, the specific steps of using the Adaboost algorithm to train the classifier are as follows:

采用Harr-like特征描述视频图像；Using Harr-like features to describe video images;

利用Adaboost算法筛选出最能代表人脸的特征，并将该些特征进行加权构造为一个强分类器；Use the Adaboost algorithm to screen out the features that best represent the face, and weight these features to construct a strong classifier;

训练多个强分类器并串联组成层叠分类器。Train multiple strong classifiers and form cascaded classifiers in series.

具体的，采用层叠分类器进行人脸检测，按照固定步长对检测的图像进行缩放，以获取检测图像中不同大小的人脸图像。Specifically, a stacked classifier is used for face detection, and the detected image is scaled according to a fixed step size to obtain face images of different sizes in the detected image.

进一步的，所述对人脸进行三维建模包括如下步骤：Further, the three-dimensional modeling of the human face includes the following steps:

对获取的多角度人脸图像进行预处理；Preprocessing the acquired multi-angle face image;

采用ASM算法提取人脸特征点；Use the ASM algorithm to extract face feature points;

计算人脸特征点的三维空间点坐标，建立人脸几何模型；Calculate the three-dimensional space point coordinates of the feature points of the face, and establish the geometric model of the face;

基于人脸几何模型将多角度人脸图像合成人脸的纹理图像并进行纹理映射，从而生成三维人脸模型。Based on the geometric model of the face, the multi-angle face image is synthesized into a texture image of the face and texture mapped to generate a 3D face model.

具体的，所述纠正后的三维人脸图像具体指正脸图像中眼睛部分与摄像头的焦点在同一直线上。Specifically, the corrected three-dimensional face image specifically refers to that the eyes in the frontal face image are on the same straight line as the focal point of the camera.

优选的，通过判断采集的三维人脸图像的人眼虹膜图像的清晰度是否达到设定的阈值，确定眼睛部分是否与摄像头的焦点处于同一直线上。Preferably, by judging whether the sharpness of the iris image of the human eye in the collected three-dimensional face image reaches a set threshold, it is determined whether the eye part is on the same line as the focal point of the camera.

具体的，所述多角度人脸图像包括正面人脸图像、左侧人脸图像、右侧人脸图像，其中所述左侧或右侧人脸图像偏转角度不超过30度。Specifically, the multi-angle face image includes a frontal face image, a left face image, and a right face image, wherein the deflection angle of the left or right face image does not exceed 30 degrees.

优选的，仅对采集的视频流中的单数帧图像进行纠正，以保证视频通话的流畅。Preferably, only odd-numbered frame images in the collected video stream are corrected to ensure the smoothness of the video call.

优选的，对采集的视频流中的人脸图像进行逐帧纠正，以保证视频通话的画面自然。Preferably, the face images in the collected video stream are corrected frame by frame, so as to ensure a natural picture of the video call.

一种视频通话过程中的人脸纠正装置，包括：A face correction device during a video call, comprising:

采集模块：用于响应于视频通话而启动摄像头采集视频图像；Acquisition module: used to start the camera to collect video images in response to the video call;

检测单元：用于对所述采集的视频图像进行人脸检测以获取多角度的人脸图像；Detection unit: used for performing face detection on the collected video images to obtain multi-angle face images;

建模单元：用于基于获取的多角度人脸图像对人脸进行三维建模；Modeling unit: used to perform three-dimensional modeling of the face based on the obtained multi-angle face image;

调整单元：用于基于视频中的人脸的位置变化，调整三维人脸模型，以纠正三维人脸图像；Adjustment unit: used to adjust the three-dimensional face model based on the position change of the face in the video to correct the three-dimensional face image;

传输单元：用于传输纠正后的三维人脸图像。Transmission unit: used to transmit the corrected 3D face image.

进一步，还包括分类器训练单元，用于采用Adaboost算法训练分类器以进行人脸检测。Further, it also includes a classifier training unit for using the Adaboost algorithm to train the classifier for face detection.

具体的，所述分类器训练单元包括：Specifically, the classifier training unit includes:

特征提取模块：用于采用Harr-like特征描述视频图像；Feature extraction module: used to describe video images using Harr-like features;

构造模块：用于利用adaboost算法筛选出最能代表人脸的特征，并将该些特征进行加权构造为一个强分类器；Construction module: used to use the adaboost algorithm to filter out the features that best represent the face, and weight these features to construct a strong classifier;

串联模块：用于将通过训练获取的多个强分类器串联组成层叠分类器。Concatenation module: used to concatenate multiple strong classifiers obtained through training to form a cascade classifier.

进一步，所述建模单元包括：Further, the modeling unit includes:

预处理模块：用于对获取的多角度人脸图像进行预处理；Preprocessing module: used to preprocess the acquired multi-angle face image;

特征提取模块：用于采用ASM算法提取人脸特征点；Feature extraction module: used to extract face feature points using ASM algorithm;

几何模型生成模块：用于计算人脸特征点的三维空间点坐标，建立人脸几何模型；Geometric model generation module: used to calculate the three-dimensional space point coordinates of facial feature points, and establish a facial geometric model;

纹理映射模块：用于基于人脸几何模型将多角度人脸图像合成人脸的纹理图像并进行纹理映射，从而生成三维人脸模型。Texture mapping module: used for synthesizing multi-angle face images into a texture image of the face based on the geometric model of the face and performing texture mapping to generate a three-dimensional face model.

进一步，还包括判断模块，用于判断采集的三维人脸图像的人眼虹膜图像的清晰度是否达到设定的阈值，以确定眼睛部分是否与摄像头的焦点处于同一直线上。Further, it also includes a judging module for judging whether the sharpness of the iris image of the human eye in the collected 3D face image reaches a set threshold, so as to determine whether the eye part is on the same line as the focal point of the camera.

具体的，所述多角度人脸图像包括正面人脸图像、左侧人脸图像、右侧人脸图像，其中所述左侧或右侧人脸图像角度不超过30度。Specifically, the multi-angle face image includes a frontal face image, a left face image, and a right face image, wherein the angle of the left or right face image does not exceed 30 degrees.

相比现有技术，本发明的方案具有以下优点：Compared with the prior art, the solution of the present invention has the following advantages:

本发明通过对视频图像中的人脸进行三维建模，获取三维人脸图像，并在视频通话过程中基于人脸的位置变化，变化三维人脸图像的角度，使得人脸图像中的眼睛部分与摄像头的焦点处于一条直线，从而在视频通话中传输眼睛看摄像头的图像，以实现通话双方的眼神交流，克服手机、笔记本等设备将摄像头设计在屏幕顶端的设计缺陷。The present invention obtains a three-dimensional human face image by performing three-dimensional modeling on the human face in the video image, and changes the angle of the three-dimensional human face image based on the position change of the human face during the video call process, so that the eyes in the human face image The focal point of the camera is in a straight line, so that the image of the eyes looking at the camera is transmitted during the video call, so as to realize the eye contact between the two parties in the call, and overcome the design defect that the camera is designed on the top of the screen in mobile phones, notebooks and other devices.

本发明附加的方面和优点将在下面的描述中部分给出，这些将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and will become apparent from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1为本发明所述视频通话过程中的人脸纠正方法流程图；Fig. 1 is the flow chart of the face correction method in the video call process of the present invention;

图2为本发明实施例中优选的人脸检测流程示意图；Fig. 2 is a schematic diagram of a preferred face detection process in an embodiment of the present invention;

图3为本发明实施例中优选的三维人脸建模的流程示意图；FIG. 3 is a schematic flow diagram of a preferred three-dimensional face modeling in an embodiment of the present invention;

图4为本发明所述视频通话过程中的人脸纠正装置示意框图。Fig. 4 is a schematic block diagram of a face correction device during a video call according to the present invention.

具体实施方式detailed description

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解，当我们称元件被“连接”或“耦接”到另一元件时，它可以直接连接或耦接到其他元件，或者也可以存在中间元件。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of said features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Additionally, "connected" or "coupled" as used herein may include wireless connection or wireless coupling. The expression "and/or" used herein includes all or any elements and all combinations of one or more associated listed items.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样被特定定义，否则不会用理想化或过于正式的含义来解释。Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this invention belongs. It should also be understood that terms, such as those defined in commonly used dictionaries, should be understood to have meanings consistent with their meaning in the context of the prior art, and unless specifically defined as herein, are not intended to be idealized or overly Formal meaning to explain.

本技术领域技术人员可以理解，这里所使用的“终端”、“终端设备”既包括无线信号接收器的设备，其仅具备无发射能力的无线信号接收器的设备，又包括接收和发射硬件的设备，其具有能够在双向通信链路上，执行双向通信的接收和发射硬件的设备。这种设备可以包括：蜂窝或其他通信设备，其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备；PCS(Personal Communications Service，个人通信系统)，其可以组合语音、数据处理、传真和/或数据通信能力；PDA(PersonalDigital Assistant_，个人数字助理)，其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global PositioningSystem，全球定位系统)接收器；常规膝上型和/或掌上型计算机或其他设备，其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的，或者适合于和/或配置为在本地运行，和/或以分布形式，运行在地球和/或空间的任何其他位置运行。这里所使用的“终端”、“终端设备”还可以是通信终端、上网终端、音乐/视频播放终端，例如可以是PDA、MID(Mobile Internet Device，移动互联网设备)和/或具有音乐/视频播放功能的移动电话，也可以是智能电视、机顶盒等设备。Those skilled in the art can understand that the "terminal" and "terminal equipment" used here not only include wireless signal receiver equipment, which only has wireless signal receiver equipment without transmission capabilities, but also include receiving and transmitting hardware. A device having receiving and transmitting hardware capable of performing bi-directional communication over a bi-directional communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or a multi-line display or a cellular or other communication equipment without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice, data Processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant_, Personal Digital Assistant), which may include radio frequency receiver, pager, Internet/Intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System , Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "terminal", "terminal device" may be portable, transportable, installed in a vehicle (air, sea, and/or land), or adapted and/or configured to operate locally, and/or In distributed form, the operation operates at any other location on Earth and/or in space. The "terminal" and "terminal equipment" used here can also be communication terminals, Internet terminals, music/video playback terminals, such as PDAs, MIDs (Mobile Internet Devices, mobile Internet devices) and/or with music/video playback terminals. Functional mobile phones, smart TVs, set-top boxes and other devices.

本技术领域技术人员可以理解，这里所使用的远端网络设备，其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云。在此，云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成，其中，云计算是分布式计算的一种，由一群松散耦合的计算机集组成的一个超级虚拟计算机。本发明的实施例中，远端网络设备、终端设备与WNS服务器之间可通过任何通信方式实现通信，包括但不限于，基于3GPP、LTE、WIMAX的移动通信、基于TCP/IP、UDP协议的计算机网络通信以及基于蓝牙、红外传输标准的近距无线传输方式。Those skilled in the art can understand that the remote network device used here includes, but is not limited to, a computer, a network host, a single network server, a set of multiple network servers, or a cloud formed by multiple servers. Here, the cloud is composed of a large number of computers or network servers based on cloud computing (Cloud Computing), wherein cloud computing is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computer sets. In the embodiment of the present invention, the communication between the remote network equipment, the terminal equipment and the WNS server can be realized through any communication method, including but not limited to, mobile communication based on 3GPP, LTE, WIMAX, based on TCP/IP, UDP protocol Computer network communication and short-distance wireless transmission methods based on Bluetooth and infrared transmission standards.

由于本发明涉及利用Adaboost算法进行人脸检测的技术，有必要对该算法做先导性介绍。Adaboost算法是一种迭代算法，其核心思想是针对同一个训练集训练不同的分类器，即弱分类器，然后将这些弱分类器集合起来，构造一个强分类器。算法通过改变数据分布，根据每次训练集中的每个样本的分类是否正确，以及上次的总体分类的准确率，确定每个样本的权值，将修改权值的新数据作为下一层分类器的输入进行训练，然后将每次训练得到的分类器结合成为最后的决策分类器。Since the present invention relates to the technology of face detection using the Adaboost algorithm, it is necessary to make a preliminary introduction to the algorithm. The Adaboost algorithm is an iterative algorithm. Its core idea is to train different classifiers for the same training set, that is, weak classifiers, and then combine these weak classifiers to construct a strong classifier. The algorithm changes the data distribution, determines the weight of each sample according to whether the classification of each sample in each training set is correct, and the accuracy of the last overall classification, and uses the new data with modified weight as the next layer of classification The input of the classifier is trained, and then the classifiers obtained from each training are combined into the final decision classifier.

需要注意的是，上述介绍主要用于更便捷地理解本发明，并非用于限制本发明的实施，理论上采用何种算法进行人脸检测，均不影响本发明的实施，本领域技术人员对此应当知晓。It should be noted that the above introduction is mainly used to understand the present invention more conveniently, and is not used to limit the implementation of the present invention. Which algorithm is used for face detection in theory does not affect the implementation of the present invention. This should be known.

参考图1所示,本发明提供一种视频通话过程中的人脸纠正方法，以下对该方法的原理进行详细说明，所述方法包括如下步骤:With reference to shown in Fig. 1, the present invention provides a kind of face correcting method in video call process, the principle of this method is described in detail below, described method comprises the steps:

S101、响应于视频通话而启动摄像头采集视频图像；S101. Start the camera to collect video images in response to the video call;

本发明所述视频通话具体指通过摄像头设备采集的图像进行信息传输的过程，所述摄像头设备不限于手机终端、笔记本等自带的摄像头，也不限于普通摄像设备、智能摄像设备等进行图像数据采集的设备，适用于一切进行视频采集的摄像设备。The video call in the present invention specifically refers to the process of transmitting information through the images collected by the camera equipment. The collection equipment is suitable for all camera equipment for video collection.

当用户采用摄像设备进行视频通话时，后台服务对其进行响应，调用摄像头相关函数，开启摄像头采集视频图像。以Android系统为例，其中的Camera类函数控制摄像头，如开启摄像头函数Camera.open()，拍照函数takePicture(shutterCallback,rawCallback,jpegCallback)等。Android系统设备采用Camera类函数获取摄像头采集的图像。When the user uses the camera device to make a video call, the background service responds to it, calls the camera-related functions, and starts the camera to collect video images. Taking the Android system as an example, the Camera class function controls the camera, such as the camera opening function Camera.open(), the camera function takePicture(shutterCallback, rawCallback, jpegCallback), etc. The Android system device uses the Camera class function to obtain the image collected by the camera.

S102、对所述采集的视频图像进行人脸检测以获取多角度人脸图像；S102. Perform face detection on the collected video images to obtain multi-angle face images;

由于人脸偏转角度超过30度时，现有算法的人脸检测率降低，误检率提高，对后续的三维建模也造成困难，而且在视频通话过程中，通常人脸的偏转不超过30度，故而本发明仅对正面人脸图像、偏转角度不超过30度的左侧人脸图像和右侧人脸图像进行人脸检测。所述获取多角度人脸图像，具体指对视频流数据中的每帧图像或单数帧图像进行人脸检测，即确定图像帧中的人脸在图像中的具体位置，并截取相应位置的人脸图像。When the face deflection angle exceeds 30 degrees, the face detection rate of the existing algorithm decreases and the false detection rate increases, which also causes difficulties for the subsequent 3D modeling, and in the process of video calls, the face deflection usually does not exceed 30 degrees. degree, so the present invention only performs face detection on the front face image, the left face image and the right face image whose deflection angle is no more than 30 degrees. The acquisition of a multi-angle face image specifically refers to performing face detection on each frame of image or an odd number of frame images in the video stream data, that is, determining the specific position of the face in the image frame in the image, and intercepting the person in the corresponding position. face image.

人脸检测属于计算机视觉领域，具体是指在输入图像中判断是否存在人脸区域，并进一步确定人脸的位置、大小等信息。在具体实施例中，参考图2所示，首先基于样本集训练得到人脸检测分类器，所述训练人脸检测分类器的步骤具体包括：Face detection belongs to the field of computer vision, which specifically refers to judging whether there is a face area in the input image, and further determining the position, size and other information of the face. In a specific embodiment, with reference to shown in Figure 2, at first obtain face detection classifier based on sample set training, the step of described training face detection classifier specifically comprises:

步骤1、采用Harr-like特征描述视频图像；Step 1, using Harr-like features to describe video images;

Harr-like特征是计算机视觉领域一种常用的特征描述算子，通常分为三类：线性特征、边缘特征、点特征、对角线特征。将视频图像由RGB格式转换为灰度图像，基于该灰度图像提取Harr-like特征，基于Harr-like特征描述视频图像。在具体实施例中，为了快速计算Harr-like特征值，采用积分图进行计算，即图像中任一点的像素值(x,y)，都表示成像素值(1,1)到像素值(x,y)组成的矩形区域内的灰度值之和。Harr-like feature is a commonly used feature description operator in the field of computer vision, which is usually divided into three categories: linear feature, edge feature, point feature, and diagonal feature. The video image is converted from RGB format to a grayscale image, Harr-like features are extracted based on the grayscale image, and the video image is described based on the Harr-like features. In a specific embodiment, in order to quickly calculate the Harr-like feature value, the integral map is used for calculation, that is, the pixel value (x, y) at any point in the image is expressed as pixel value (1, 1) to pixel value (x ,y) is the sum of the gray values in the rectangular area.

步骤2、利用Adaboost算法筛选出最能代表人脸的特征，并将该些特征进行加权构造为一个强分类器；Step 2, use the Adaboost algorithm to screen out the features that best represent the face, and weight these features to construct a strong classifier;

利用Adaboost算法训练分类器，具体过程简述如下：Using the Adaboost algorithm to train the classifier, the specific process is briefly described as follows:

预设训练样本集，包括正样本，即人脸图像，负样本，即非人脸图像。分别初始化正负样本的权重，对每个特征的基于所有样本训练一个弱分类器，计算所有弱分类器的加权错误率，错误率最小的分类器为最佳弱分类器，则该特征为最能代表人脸的特征。更新权重，重新对样本进行分类，最终分类正确率最高且错误率最低的分类器为强分类器。The preset training sample set includes positive samples, that is, face images, and negative samples, that is, non-face images. Initialize the weights of positive and negative samples respectively, train a weak classifier based on all samples for each feature, and calculate the weighted error rate of all weak classifiers, the classifier with the smallest error rate is the best weak classifier, then the feature is the most Features that can represent a human face. Update the weights and reclassify the samples, and finally the classifier with the highest classification accuracy and the lowest error rate is the strong classifier.

步骤3、训练多个强分类器并串联组成层叠分类器。Step 3. Train multiple strong classifiers and form a cascaded classifier in series.

按照上述方式训练多个强分类器，将该些分类器按照树状结构进行串联，组成最终的强分类器，即层叠分类器。Train multiple strong classifiers according to the above method, and connect these classifiers in series according to the tree structure to form the final strong classifier, that is, a cascaded classifier.

将得到的层叠分类器作为人脸检测器对每一帧视频图像进行检测，检测过程中，对待检测的每帧图像按照固定步长进行缩放，从而使得该人脸检测器可以检测到视频图像中不同大小的人脸图像。Use the obtained stacked classifier as a face detector to detect each frame of video image. During the detection process, each frame of image to be detected is scaled according to a fixed step size, so that the face detector can detect the Face images of different sizes.

将检测到的人脸图像按照位置信息截图，保存截图信息，该截图即为获取的人脸图像，依此方式获取不同角度的人脸图像并保存。Take a screenshot of the detected face image according to the location information, save the screenshot information, the screenshot is the acquired face image, and obtain and save the face images from different angles in this way.

S103、基于获取的多角度人脸图像对人脸进行三维建模；S103. Perform three-dimensional modeling of the face based on the acquired multi-angle face image;

将上述获取的多角度人脸图像做如下处理，以对人脸进行三维建模，参考图3所示，具体步骤如下：The multi-angle face image obtained above is processed as follows to carry out three-dimensional modeling of the face, as shown in Figure 3, the specific steps are as follows:

步骤1、对获取的多角度人脸图像进行预处理；Step 1, preprocessing the acquired multi-angle face image;

对检测获取的多角度人脸图像进行预处理，将RGB图像转换为灰度图。对每帧人脸图像进行归一化处理，使其大小相同，构成训练集。Preprocess the multi-angle face image obtained by detection, and convert the RGB image into a grayscale image. Normalize each frame of face images to make them the same size to form a training set.

步骤2、采用ASM算法提取人脸特征点；Step 2, using the ASM algorithm to extract face feature points;

ASM算法是一种基于点分布模型的算法，对于如人脸、人手等几何形状可以通过若干个关键特征点的坐标依次串联形成一个形状向量表示，具体包括训练和搜索两部分。The ASM algorithm is an algorithm based on a point distribution model. For geometric shapes such as faces and hands, the coordinates of several key feature points can be concatenated to form a shape vector representation, which specifically includes two parts: training and searching.

根据预设的样本集训练ASM模型，然后通过ASM模型对人脸图像进行搜索，获取每个人脸图像中的特征点。The ASM model is trained according to the preset sample set, and then the face images are searched through the ASM model to obtain the feature points in each face image.

步骤3、计算人脸特征点的三维空间点坐标，建立人脸几何模型；Step 3, calculating the three-dimensional space point coordinates of the feature points of the human face, and establishing the geometric model of the human face;

通过人脸特征点的二维空间坐标计算三维空间点坐标，从而建立人脸几何模型。本领域技术人员通常知晓该些坐标之间的变换，故在此不再赘述。The coordinates of the three-dimensional space points are calculated through the two-dimensional space coordinates of the feature points of the face, so as to establish the geometric model of the face. Those skilled in the art are generally aware of the transformation between these coordinates, so details will not be repeated here.

步骤4、基于人脸几何模型将多角度人脸图像合成人脸的纹理图像并进行纹理映射，从而生成三维人脸模型。Step 4: Synthesize the multi-angle face image into a texture image of the face based on the geometric model of the face and perform texture mapping to generate a three-dimensional face model.

由此获取视频通话中特定人脸的三维模型，以便后续基于该三维人脸模型进行人脸纠正。In this way, the 3D model of the specific face in the video call is obtained, so that subsequent face correction can be performed based on the 3D face model.

S104、基于视频中的人脸的位置变化，调整三维人脸模型，以纠正三维人脸图像；S104. Based on the position change of the face in the video, adjust the 3D face model to correct the 3D face image;

通过手机或笔记本等设备进行视频通话时，由于几乎所有的手厂商都将摄像头设计到手机或笔记本的顶端，这种设计缺陷使得用户在通过摄像头进行视频通话过程中，眼神聚焦摄像头时无法看屏幕上显示的对方人脸，看屏幕时又无法眼神聚焦到摄像头，导致双方在进行视频通话时，无法进行眼神交流。并且，由于用户在使用手机时，通常习惯俯视镜头，使得镜头总是处于仰视角度拍摄人脸，导致显示在手机屏幕上的人脸图像都是仰视状态下的，使得用户在进行视频通话时得不到优良的视觉体验。When making video calls through devices such as mobile phones or notebooks, because almost all mobile phone manufacturers design the camera on the top of the mobile phone or notebook, this design defect makes it impossible for users to look at the screen when focusing on the camera during a video call through the camera. The face of the other party displayed on the screen cannot focus on the camera when looking at the screen, which makes it impossible for the two parties to make eye contact when making a video call. And, because the user is usually accustomed to looking down at the camera when using a mobile phone, the camera is always at an angle of looking up to shoot a face, causing the face images displayed on the screen of the mobile phone to be in a state of looking up, which makes the user feel uncomfortable when making a video call. Less than a good visual experience.

故而，通过在视频通话过程中调整人脸图像的位置，以使人眼与摄像头焦点在一条直线上。由于侧脸不存在眼神交流的问题，因此具体指正脸图像中眼睛部分与摄像头的焦点在同一直线上。为了确定人脸图像中的人眼部分是否与摄像头的焦点处于同一直线上，本发明实施例优选采用计算人眼虹膜图像的清晰度，当清晰度达到设定的阈值时，判定为人眼部分与摄像头焦点处于同一直线，否则不处于同一直线。当判定为处于同一直线时，该方位的人脸图像即为纠正后的三维人脸图像。Therefore, by adjusting the position of the face image during the video call, the focus of the human eye and the camera is on a straight line. Since there is no eye contact problem in side faces, it specifically means that the eyes in the frontal face image are on the same straight line as the focus of the camera. In order to determine whether the human eye part in the face image is on the same line as the focal point of the camera, the embodiment of the present invention preferably uses the calculation of the sharpness of the human iris image, and when the sharpness reaches the set threshold, it is determined that the human eye part is in line with The camera focus is on the same straight line, otherwise it is not on the same straight line. When it is determined to be on the same straight line, the face image in this orientation is the corrected 3D face image.

其中，优选地，对视频通话过程中采集的单数帧图像进行人脸图像纠正，从而减少人脸纠正带来的耗时，使得通话视频更加流畅。Wherein, preferably, face image correction is performed on odd-numbered frame images collected during the video call, thereby reducing the time consumption caused by face correction and making the call video smoother.

其他实施例中，对视频通话过程中的每帧图像进行人脸图像纠正，使得通话视频的画面更加生动自然。In other embodiments, face image correction is performed on each frame of image during the video call, so that the picture of the call video is more vivid and natural.

S105、传输纠正后的三维人脸图像。S105. Transmitting the corrected 3D face image.

对视频通话过程中的视频图像进行上述处理后，将得到的纠正后的三维人脸图像进行编码传输，即编码传输包括三维人脸图像的视频。其中，所谓视频编码是指通过特定的压缩技术，将某种视频格式的文件转换为另一种视频格式的文件，通过压缩编码去除视频图像数据中的冗余信息。H264编码技术由于其性能更高，故而目前使用最广泛。H264编码技术对原始图像在4*4的倍数大小的宏块上进行帧内编码和帧间编码，帧内编码消除帧内空间冗余，帧间编码对视频数据的运动补偿进行预测。将编码后的包括纠正后的三维人脸图像通过视频通话过程中建立的通信隧道进行传输，从而使得视频通话双方可以看到彼此眼睛对着摄像头的视频，实现通话过程中的眼神交流。After the above-mentioned processing is performed on the video image during the video call, the obtained corrected 3D face image is coded and transmitted, that is, the video including the 3D face image is coded and transmitted. Among them, the so-called video coding refers to converting a file of a certain video format into a file of another video format through a specific compression technology, and removing redundant information in video image data through compression coding. H264 encoding technology is currently the most widely used due to its higher performance. The H264 coding technology performs intra-frame coding and inter-frame coding on the original image on a macroblock with a multiple size of 4*4. Intra-frame coding eliminates intra-frame space redundancy, and inter-frame coding predicts motion compensation of video data. The encoded three-dimensional face image including correction is transmitted through the communication tunnel established during the video call, so that both parties in the video call can see the video of each other's eyes facing the camera, and realize eye contact during the call.

为了进一步对本发明所述方法以模块化方式进行表述，参考图4所示，本发明还提供一种视频通话过程中的人脸纠正装置，包括：采集单元11、检测单元12、建模单元13、调整单元14、传输单元15以及分类器训练单元16，其中，In order to further describe the method of the present invention in a modular manner, as shown in FIG. 4 , the present invention also provides a face correction device during a video call, including: an acquisition unit 11, a detection unit 12, and a modeling unit 13 , adjustment unit 14, transmission unit 15 and classifier training unit 16, wherein,

所述采集单元11用于响应于视频通话而启动摄像头采集视频图像；The collection unit 11 is used to start the camera to collect video images in response to the video call;

当用户采用摄像设备进行视频通话时，由采集单元11对其进行响应，调用摄像头相关函数，开启摄像头采集视频图像。以Android系统为例，其中的Camera类函数控制摄像头，如开启摄像头函数Camera.open()，拍照函数takePicture(shutterCallback,rawCallback,jpegCallback)等。Android系统设备采用Camera类函数获取摄像头采集的图像。When the user uses the camera device to make a video call, the acquisition unit 11 responds to it, calls the camera-related functions, and starts the camera to collect video images. Taking the Android system as an example, the Camera class function controls the camera, such as the camera opening function Camera.open(), the camera function takePicture(shutterCallback, rawCallback, jpegCallback), etc. The Android system device uses the Camera class function to obtain the image collected by the camera.

检测单元12用于对所述采集的视频图像进行人脸检测以获取多角度人脸图像；The detection unit 12 is used to perform face detection on the collected video image to obtain a multi-angle face image;

人脸检测属于计算机视觉领域，具体是指在输入图像中判断是否存在人脸区域，并进一步确定人脸的位置、大小等信息。参考图2所示，在具体实施例中，首先通过分类器训练单元15训练得到人脸检测分类器，所述训练人脸分类器的步骤具体包括：Face detection belongs to the field of computer vision, which specifically refers to judging whether there is a face area in the input image, and further determining the position, size and other information of the face. Shown in Fig. 2 with reference to, in specific embodiment, at first obtain face detection classifier by classifier training unit 15 training, the step of described training face classifier specifically comprises:

检测单元12将得到的层叠分类器作为人脸检测器对每一帧视频图像进行检测，检测过程中，对待检测的每帧图像按照固定步长进行缩放，从而使得该人脸检测器可以检测到视频图像中不同大小的人脸图像。Detection unit 12 detects each frame of video image using the stacked classifier obtained as a face detector. During the detection process, each frame of image to be detected is scaled according to a fixed step size, so that the face detector can detect Face images of different sizes in video images.

建模单元13用于基于获取的多角度人脸图像对人脸进行三维建模；Modeling unit 13 is used for carrying out three-dimensional modeling to human face based on the obtained multi-angle human face image;

将上述获取的多角度人脸图像做如下处理，以对人脸进行三维建模，参考图3所示，建模单元13具体执行步骤如下：The multi-angle face image obtained above is processed as follows to carry out three-dimensional modeling of the face. Referring to FIG. 3, the specific execution steps of the modeling unit 13 are as follows:

调整单元14用于基于视频中的人脸的位置变化，调整三维人脸模型，以纠正三维人脸图像；The adjustment unit 14 is used to adjust the three-dimensional human face model based on the position change of the human face in the video to correct the three-dimensional human face image;

故而，由调整单元14在视频通话过程中调整人脸图像的位置，以使人眼与摄像头焦点在一条直线上。由于侧脸不存在眼神交流的问题，因此具体指正脸图像中眼睛部分与摄像头的焦点在同一直线上。为了确定人脸图像中的人眼部分是否与摄像头的焦点处于同一直线上，本发明实施例优选采用计算人眼虹膜图像的清晰度，当清晰度达到设定的阈值时，判定为人眼部分与摄像头焦点处于同一直线，否则不处于同一直线。当判定为处于同一直线时，该方位的人脸图像即为纠正后的三维人脸图像。Therefore, the adjustment unit 14 adjusts the position of the face image during the video call, so that the focus of the human eye and the camera is on a straight line. Since there is no eye contact problem in side faces, it specifically means that the eyes in the frontal face image are on the same straight line as the focus of the camera. In order to determine whether the human eye part in the face image is on the same line as the focal point of the camera, the embodiment of the present invention preferably uses the calculation of the sharpness of the human iris image, and when the sharpness reaches the set threshold, it is determined that the human eye part is in line with The camera focus is on the same straight line, otherwise it is not on the same straight line. When it is determined to be on the same straight line, the face image in this orientation is the corrected 3D face image.

传输单元15用于传输纠正后的三维人脸图像。The transmission unit 15 is used for transmitting the corrected 3D face image.

对视频通话过程中的视频图像进行上述处理后，传输单元15将得到的纠正后的三维人脸图像进行编码传输，即编码传输包括三维人脸图像的视频。其中，所谓视频编码是指通过特定的压缩技术，将某种视频格式的文件转换为另一种视频格式的文件，通过压缩编码去除视频图像数据中的冗余信息。H264编码技术由于其性能更高，故而目前使用最广泛。H264编码技术对原始图像在4*4的倍数大小的宏块上进行帧内编码和帧间编码，帧内编码消除帧内空间冗余，帧间编码对视频数据的运动补偿进行预测。将编码后的包括纠正后的三维人脸图像通过视频通话过程中建立的通信隧道进行传输，从而使得视频通话双方可以看到彼此眼睛对着摄像头的视频，实现通话过程中的眼神交流。After performing the above processing on the video image during the video call, the transmission unit 15 encodes and transmits the obtained corrected 3D face image, that is, encodes and transmits the video including the 3D face image. Among them, the so-called video coding refers to converting a file of a certain video format into a file of another video format through a specific compression technology, and removing redundant information in video image data through compression coding. H264 encoding technology is currently the most widely used due to its higher performance. The H264 coding technology performs intra-frame coding and inter-frame coding on the original image on a macroblock with a multiple size of 4*4. Intra-frame coding eliminates intra-frame space redundancy, and inter-frame coding predicts motion compensation of video data. The encoded three-dimensional face image including correction is transmitted through the communication tunnel established during the video call, so that both parties in the video call can see the video of each other's eyes facing the camera, and realize eye contact during the call.

A1、一种视频通话过程中的人脸纠正方法，其特征在于，包括以下步骤：A1, a face correction method in a video call process, is characterized in that, comprises the following steps:

对所述采集的视频图像进行人脸检测以获取多角度人脸图像；Perform face detection on the collected video images to obtain multi-angle face images;

传输纠正后的三维人脸图像。Transmit the rectified 3D face image.

A2、根据A1所述的方法，其特征在于，还包括采用Adaboost算法训练分类器以进行人脸检测。A2, the method according to A1, is characterized in that, also comprises adopting Adaboost algorithm training classifier to carry out face detection.

A3、根据A2所述的方法，其特征在于，所述采用Adaboost算法训练分类器的具体步骤如下：A3, according to the method described in A2, it is characterized in that, the concrete steps of described adopting Adaboost algorithm training classifier are as follows:

A4、根据A3所述的方法，其特征在于，采用层叠分类器进行人脸检测，按照固定步长对检测的图像进行缩放，以获取检测图像中不同大小的人脸图像。A4, according to the method described in A3, it is characterized in that, adopt stacked classifier to carry out face detection, according to fixed step size, the image of detection is scaled, to obtain the face images of different sizes in the detection image.

A5、根据A1所述的方法，其特征在于，所述对人脸进行三维建模包括如下步骤：A5, according to the method described in A1, it is characterized in that, described carrying out three-dimensional modeling to human face comprises the steps:

A6、根据A1所述的方法，其特征在于，所述纠正后的三维人脸图像具体指正脸图像中眼睛部分与摄像头的焦点在同一直线上。A6. The method according to A1, wherein the corrected three-dimensional face image specifically refers to that the eyes in the frontal face image are on the same straight line as the focus of the camera.

A7、根据A6所述的方法，其特征在于，通过判断采集的三维人脸图像的人眼虹膜图像的清晰度是否达到设定的阈值，确定眼睛部分是否与摄像头的焦点处于同一直线上。A7, according to the method described in A6, it is characterized in that, by judging whether the sharpness of the iris image of the human eye of the three-dimensional face image collected reaches a set threshold, it is determined whether the eye part is on the same line as the focal point of the camera.

A8、根据A1所述的方法，其特征在于，所述多角度人脸图像包括正面人脸图像、左侧人脸图像、右侧人脸图像，其中所述左侧或右侧人脸图像偏转角度不超过45度。A8. The method according to A1, wherein the multi-angle face image includes a frontal face image, a left face image, and a right face image, wherein the left or right face image is deflected The angle does not exceed 45 degrees.

A9、根据A1所述的方法，其特征在于，仅对采集的视频流中的单数帧人脸图像进行纠正，以保证视频通话的流畅。A9, according to the method described in A1, it is characterized in that only the odd-numbered frames of face images in the collected video stream are corrected to ensure the smoothness of the video call.

A10、根据A1所述的方法，其特征在于，对采集的视频流中的人脸图像进行逐帧纠正，以保证视频通话的画面自然。A10, according to the method described in A1, it is characterized in that, the face image in the collected video stream is corrected frame by frame to ensure that the picture of the video call is natural.

B1、一种视频通话过程中的人脸纠正装置，其特征在于，包括：B1, a face correction device during a video call, characterized in that it comprises:

采集单元：用于响应于视频通话而启动摄像头采集视频图像；Acquisition unit: used to start the camera to collect video images in response to the video call;

B2、根据B1所述的装置，其特征在于，还包括分类器训练单元，用于采用Adaboost算法训练分类器进行人脸检测以获取多角度人脸图像。B2, according to the described device of B1, it is characterized in that, also comprise classifier training unit, for adopting Adaboost algorithm training classifier to carry out face detection to obtain multi-angle face image.

B3、根据B2所述的方法，其特征在于，所述分类器训练单元包括：B3, according to the method described in B2, it is characterized in that, described classifier training unit comprises:

B4、根据B3所述的方法，其特征在于，采用层叠分类器进行人脸检测，按照固定步长对检测的图像进行缩放，以获取检测图像中不同大小的人脸图像。B4, according to the method described in B3, it is characterized in that, adopt cascade classifier to carry out face detection, according to fixed step size, the image of detection is scaled, to obtain the face images of different sizes in the detection image.

B5、根据B1所述的装置，其特征在于，所述建模单元包括：B5, according to the device described in B1, it is characterized in that, described modeling unit comprises:

B6、根据B1所述的装置，其特征在于，所述纠正后的三维人脸图像具体指正脸图像中眼睛部分与摄像头的焦点在同一直线上。B6. The device according to B1, wherein the corrected three-dimensional face image specifically refers to that the eyes in the frontal face image are on the same straight line as the focus of the camera.

B7、根据B6所述的装置，其特征在于，还包括判断单元，用于判断采集的三维人脸图像的人眼虹膜图像的清晰度是否达到设定的阈值，以确定眼睛部分是否与摄像头的焦点处于同一直线上。B7, according to the described device of B6, it is characterized in that, also comprises judging unit, is used for judging whether the sharpness of the iris image of the human eye of the three-dimensional face image of collection reaches the threshold value of setting, to determine whether the eye part is consistent with the camera's The focus is on the same line.

B8、根据B1所述的装置，其特征在于，所述多角度人脸图像包括正面人脸图像、左侧人脸图像、右侧人脸图像，其中所述左侧或右侧人脸图像角度不超过45度。B8, according to the device described in B1, it is characterized in that, described multi-angle human face image comprises frontal human face image, left side human face image, right side human face image, wherein said left side or right side human face image angle No more than 45 degrees.

B9、根据B1所述的装置，其特征在于，仅对采集的视频流中的单数帧图像进行纠正，以保证视频通话的流畅。B9. The device according to B1, characterized in that only odd-numbered frame images in the collected video stream are corrected to ensure the smoothness of the video call.

B10、根据B1所述的装置，其特征在于，对采集的视频流中的人脸图像进行逐帧纠正，以保证视频通话的画面自然。B10, according to the device described in B1, it is characterized in that, the facial image in the collected video stream is corrected frame by frame, to ensure that the picture of the video call is natural.

综上所述，本发明所述方法通过对视频通话中的人脸进行三维建模，获取三维人脸图像，并基于在通话过程中根据人脸的位置变化变换三维人脸图像，使得人脸图像中的眼睛部分与摄像头的焦点处于一条直线，从而规避手机、笔记本等设备将摄像头设计在屏幕顶端的缺陷，实现视频通话过程中的人眼交流。In summary, the method of the present invention obtains a three-dimensional face image by performing three-dimensional modeling on the face in the video call, and transforms the three-dimensional face image based on the position change of the face during the call, so that the face The eye part in the image is in a straight line with the focal point of the camera, so as to avoid the defect that the camera is designed on the top of the screen in mobile phones, notebooks and other devices, and realize human-eye communication during video calls.

以上所述仅是本发明的部分实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above descriptions are only part of the embodiments of the present invention. It should be pointed out that those skilled in the art can make some improvements and modifications without departing from the principles of the present invention. It should be regarded as the protection scope of the present invention.