Background
Face Detection (Face Detection) refers to a process of determining the position, size, and pose of all faces present in an input image. The face detection technology is a key technology in face information processing. The face image contains abundant pattern features including color features (skin color, hair color, etc.), contour features, histogram features, mosaic features, structural features, transform domain features, template features, heuristic features, and the like. Which of these pattern features are most useful and how to utilize them is a key issue to be studied for face detection. The face pattern has complex and detailed changes, so that multiple pattern features generally need to be synthesized, such as simple combination, statistical inference, fuzzy decision, machine learning, and the like. In summary, the face detection methods can be classified into two types, a method based on a skin color feature and a method based on a gray scale feature, according to the color attribute using the pattern feature. The method based on the skin color characteristics is suitable for constructing a rapid human face detection and human face tracking algorithm; the gray feature-based method utilizes more essential features of human faces different from other objects, and is the key point of research in the field of human face detection. Methods based on gray scale features can be divided into two broad categories according to different models employed in the synthesis of mode features: heuristic (knowledge) model based methods and statistical model based methods.
Face tracking is generally based on face detection to track the position of face motion in a video sequence. Face tracking techniques include Motion-based methods and Model-based methods. The method based on motion adopts methods such as motion segmentation, optical flow, stereoscopic vision and the like, and utilizes space-time gradient, Kalman filter and the like to track the motion of the human face; firstly, acquiring prior knowledge of a target, constructing a target model, and performing model matching on each frame of input image through a sliding window. In face tracking, these two methods are often used in combination.
The face synthesis refers to a process of generating face images in other postures according to a known face image in a certain posture, and is a problem of synthesis of the face images. The human face image synthesis system is based on mathematical modeling and realizes the deformation, transition and aging drawing of images by using a mathematical model. The face synthesis technique may be applied to the face detection technique and the face tracking technique described above.
The existing face synthesis technology is mainly applied as follows: the method comprises the steps of inputting a photo containing a face image or a video sequence, outputting a changed virtual face picture (for example, outputting an aged appearance or an appearance of children and the like) or a cartoon picture (namely, the face cartoon) after processing, but the face synthesis technology is not directly combined with video communication.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for transmitting a face synthesis video, which provide a user with an opportunity to modify the own video.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
an apparatus for face synthesis video transmission, comprising: the system comprises a face synthesis unit, a video segmentation unit and a video communication unit, wherein the video segmentation unit is used for carrying out segmentation processing on face position, texture information and background information to obtain unmodified separation information of the face position, the texture information and the background information; the face synthesis unit is used for processing the unmodified face position and texture information according to the established mathematical model of the face to obtain the modified face position and texture information; and the video communication unit is used for sending the modified human face, texture information and background information outwards.
The apparatus further comprises: the system comprises a preprocessing unit and a face modeling unit, wherein the preprocessing unit is used for determining a characteristic mode desired by a user; the face modeling unit is used for calculating image change schemes of the face of the user at various acquisition angles according to the characteristic modes provided by the preprocessing unit, establishing a face mathematical model and providing the face mathematical model to the face synthesis unit.
The apparatus further comprises: and the video synthesis unit is used for synthesizing the modified human face position and texture information with the background information and providing the synthesized information to the video communication unit.
The apparatus further comprises: the face detection and tracking unit is used for detecting and tracking the facial features and changes of the face in the video to obtain the synthetic information of the face position, the texture information and the background information and supplying the synthetic information to the face synthesis unit.
A method for transmitting a face synthesis video comprises the following steps:
A. detecting and tracking facial features and changes of a face in a video to obtain unmodified face position and texture information, and processing the unmodified face position and texture information according to an established mathematical model of the face to obtain modified face position and texture information;
B. and transmitting the modified face and texture information and the background information.
The step A comprises the following steps:
a1, calculating image change schemes of the face of the user at various acquisition angles according to the determined characteristic mode, and establishing a mathematical model of the face;
a2, detecting and tracking the facial features and the changes of the facial features to obtain the unmodified facial position and texture information;
and A3, processing the unmodified face position and texture information according to the mathematical model of the face to obtain the modified face position and texture information.
The step A2 includes: the method comprises the steps of detecting and tracking facial features and changes of a human face to obtain synthetic information of a human face position, texture information and background information, and carrying out segmentation processing on the synthetic information of the human face position, the texture information and the background information to obtain unmodified separation information of the human face position, the texture information and the background information.
The step B specifically comprises the following steps: separating and transmitting the modified face position and texture information and background information; or, the modified face position and texture information and the background information are synthesized and then transmitted.
The step A2 includes: extracting features of an input video sequence, and if the current frame is the first frame or a plurality of frames in the preamble of the current frame without detecting a face, performing face detection operation on the current frame; and if the face is detected in the preorder frame of the current frame, carrying out face tracking operation on the current frame.
Before the step of detecting the facial features of the human face and the changes thereof in the video, the method further comprises the following steps: setting a convention condition; the step of detecting the facial features and changes of the human faces in the video comprises the following steps: and carrying out face detection operation according to the appointed conditions.
In the invention, the unmodified face position and texture information are processed according to the determined mode characteristics expected by the user to obtain the modified face position and texture information, and then the obtained video information is transmitted, so that a self-modifying opportunity is provided for the user, the display desire of a specific user is better met, and the user has better satisfaction degree so as to achieve better user experience; and brings value-added potential for services using corresponding video communication.
Detailed Description
With the popularization of instant messaging technology and the continuous improvement of network bandwidth, in order to enhance interactivity, more and more relatives and friends start to use a camera for video chat, but the current video communication system only inputs a video encoder without any change to a video captured by the camera, encodes the video and transmits the encoded video to a receiving end. However, a general user usually has a psychological characteristic that the user does not satisfy himself in a hundred percent way of the external appearance, and if the user can be provided with a chance to modify the video of the present person, the user should have better satisfaction. Therefore, the video communication system combined with the face synthesis technology is provided, and a self-modification opportunity is provided for a user so as to achieve better user experience.
In the invention, the unmodified face position and texture information are processed according to the determined characteristic mode expected by the user to obtain the modified face position and texture information, and then the obtained video information is transmitted.
Fig. 1 is a schematic structural diagram of an apparatus for implementing face synthesis video transmission in the present invention, as shown in fig. 1, the apparatus includes: the system comprises a preprocessing unit, a face modeling unit, a face synthesizing unit, a face detecting and tracking unit, a video segmentation unit, a video synthesizing unit and a video communication unit.
The preprocessing unit is used for determining a characteristic mode desired by a user and providing the characteristic mode to the face modeling unit. One implementation is: the preprocessing unit collects pictures and/or video sequences expected by a user and the pictures and/or video sequences of the user, establishes a corresponding relation between the pictures and/or video sequences, namely a characteristic mode expected by the user, and provides the characteristic mode for the face modeling unit. Before the user performs video communication, the user provides a picture or a video sequence which is expected to be shown to the opposite terminal in the video communication and contains certain mode characteristics to the preprocessing unit, wherein the mode characteristics can comprise face shape, skin color, five sense organs distribution and the like. The other realization mode is as follows: the user sets the desired characteristic pattern in the preprocessing unit. The mode feature is a specific feature possessed by a picture or a video sequence, and a specific feature mode is composed of one or more mode features.
The face modeling unit is used for calculating image change schemes of the face of the user at various acquisition angles according to the characteristic modes provided by the preprocessing unit, establishing a face mathematical model and then providing the face mathematical model to the face synthesis unit. The mathematical model of the face can be established by any one of the known face modeling methods.
The face detection and tracking unit is used for detecting and tracking the facial features and changes of the face in the video to obtain the synthetic information of the face position, texture information and background information, and is specifically implemented as shown in fig. 2. Firstly, a face detection and tracking unit extracts features of an input video sequence, the extracted features comprise skin color, outline, histogram, motion vector and the like of a face, and the feature data are stored in a database of model parameters. If the current frame is the first frame or the frames in the preamble of the current frame do not detect the face, the face detection operation is carried out on the current frame. Here, a fast face detection algorithm, such as a face detection algorithm based on a skin color model, may be used. When the face detection is performed on the current frame, some default conditions may be set, for example, only the face in the largest or most significant position in the picture is detected, or the size of the face cannot be smaller than a set value. And if the face is detected in the preorder frame of the current frame, carrying out face tracking operation on the current frame. The face tracking is obtained by jointly analyzing the feature data provided by the model parameter database and the video picture content of the current frame. And finally, providing the obtained human face position and the synthetic information of the texture information and the background information to a video segmentation unit.
The video segmentation unit is used for segmenting the synthesis information of the face position, the texture information and the background information provided by the face detection and tracking unit, then providing the unmodified face position and the texture information for the face synthesis unit, and providing the background information for the video synthesis unit. Because the face detection and tracking unit provides face position, texture information and background information in the form of a complete video, the face position, texture information and background information need to be segmented by the video segmentation unit.
The face synthesis unit is used for processing the unmodified face position and texture information provided by the video segmentation unit according to the mathematical model of the face provided by the face modeling unit to obtain the modified face position and texture information, and providing the modified face position and texture information to the video synthesis unit. The face synthesis unit converts the face features before conversion into the face features expected by the user by using any known face synthesis technology according to the position and texture information of the face by using an image change scheme to obtain the modified face position and texture information.
The video synthesis unit is used for synthesizing the modified face position and texture information provided by the face synthesis unit and the background information provided by the video segmentation unit and then providing the synthesized face position and texture information to the video communication unit.
The video communication unit is used for sending out the video information provided by the video synthesis unit.
In the above apparatus for implementing transmission of face synthesized video, the preprocessing unit, the face modeling unit and the face detection and tracking unit as the pre-processing unit can be omitted in the main structure of the apparatus for implementing transmission of face synthesized video, that is, the main structure of the apparatus for implementing transmission of face synthesized video mainly includes the face synthesis unit, the video segmentation unit and the video communication unit.
In addition, in the processing, the video synthesis unit is not needed, in this case, the video segmentation unit directly provides the background information to the video communication unit, and the face synthesis unit directly provides the decorated face position and texture information to the video communication unit. The video communication unit can select any video coding algorithm to transmit the obtained video information, wherein the video information can be synthesized by the modified face position and texture information and the background information, and can also be separated modified face position and texture information and the background information.
Fig. 3 is a flow chart of implementing transmission of a face synthesized video in the present invention, and as shown in fig. 3, a processing flow of implementing transmission of a face synthesized video includes the following steps:
step 301: a user desired characteristic pattern is determined. Before the user performs video communication, a picture or a video sequence containing certain mode characteristics, which are expected to be shown to the opposite terminal in the video communication, is set, wherein the mode characteristics can comprise face shape, skin color, five sense organs distribution and the like. The user may also preset a desired feature pattern. The mode feature is a specific feature possessed by a picture or a video sequence, and a specific feature mode is composed of one or more mode features. The feature pattern refers to a specific pattern composed of related features desired by the user.
Step 302: and according to the determined characteristic mode, calculating image change schemes of the face of the user at various acquisition angles, and establishing a mathematical model of the face. The mathematical model of the face can be established by any one of the known face modeling methods.
Step 303: the face features and changes of the face in the video are detected and tracked to obtain the synthetic information of the face position, texture information and background information, and the specific implementation is as shown in fig. 2. Firstly, feature extraction is carried out on an input video sequence, the extracted features comprise skin color, outline, histogram, motion vector and the like of a human face, and the feature data are stored in a database of model parameters. If the current frame is the first frame or the frames in the preamble of the current frame do not detect the face, the face detection operation is carried out on the current frame. Here, a fast face detection algorithm, such as a face detection algorithm based on a skin color model, may be used. When the face detection is performed on the current frame, some default conditions may be set, for example, only the face in the largest or most significant position in the picture is detected, or the size of the face cannot be smaller than a set value. And if the face is detected in the preorder frame of the current frame, carrying out face tracking operation on the current frame. The face tracking is obtained by jointly analyzing the feature data provided by the model parameter database and the video picture content of the current frame. And finally, obtaining the synthetic information of the face position, the texture information and the background information.
Step 304: and (4) carrying out segmentation processing on the synthetic information of the face position, the texture information and the background information to obtain the unmodified separation information of the face position, the texture information and the background information. Since the face position, texture information and background information appear in the form of complete video information, it is necessary to segment the separation information of the face position, texture information and background information.
Step 305 to step 306: and processing the unmodified face position and texture information according to the mathematical model of the face to obtain the modified face position and texture information. And converting the face features before transformation into the face features expected by the user by using any known face synthesis technology according to the position and texture information of the face by using an image change scheme to obtain the modified face position and texture information.
For example, if the feature pattern desired by the user is the face shape of a star, a mathematical model of the face is created based on the feature pattern. And after obtaining the unmodified face position and texture information, modifying the face of the user according to the face of the star, and modifying the face of the user to be matched with the face of the star to obtain the modified face position and texture information.
Step 307 to step 308: and synthesizing the modified face position and texture information with background information, and then transmitting the obtained video information.
In addition, the face position and texture information may be transmitted separately from the background information without synthesizing the face position and texture information with the background information.
Any video coding algorithm can be selected for transmission in the transmission process of the video information.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.