Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a rendering method for an augmented reality special effect, a rendering device for an augmented reality special effect, a computer-readable medium, and a terminal device, which can improve accuracy and rendering efficiency of augmented reality rendering.
A first aspect of the embodiments of the present disclosure provides a rendering method for an augmented reality special effect, including:
acquiring a live broadcast video stream of a main broadcast, identifying and obtaining human body characteristic data of the main broadcast according to the live broadcast video stream, and extracting video stream picture parameters;
obtaining triggering scene data based on the human body feature data and special effect data generated when the augmented reality special effect is triggered, wherein the special effect data at least comprises a special effect type and special effect triggering time;
predicting the rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body feature data;
and rendering the augmented reality special effect on the display interface according to the human body feature data, the video stream picture parameters and the rendering position.
In an exemplary embodiment of the present disclosure, predicting a rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body feature data includes:
and inputting the trigger scene data and the human body characteristic data into a preset algorithm to obtain a predicted rendering position weight of the augmented reality special effect, and determining a rendering position in the live video stream according to the weight.
In an exemplary embodiment of the present disclosure, after the step of inputting the trigger scene data and the human body feature data into a preset algorithm to obtain a predicted rendering position weight of the augmented reality special effect, the method further includes:
and adjusting the weight value of the predicted rendering position according to the special effect type.
In an exemplary embodiment of the present disclosure, acquiring a live broadcast video stream of a main broadcast, identifying and obtaining human body feature data of the main broadcast according to the live broadcast video stream, and extracting video stream picture parameters at the same time includes:
and carrying out face recognition and human body recognition on the live video stream picture to obtain human body characteristic data including face data and mask data, and reading the live video stream to obtain video stream picture parameters.
In an exemplary embodiment of the present disclosure, inputting the trigger scene data and the human body feature data into a preset algorithm to obtain a predicted rendering position weight of the augmented reality special effect, includes:
inputting triggering scene data and human body characteristic data when the augmented reality characteristic is triggered into an evaluation function, and calculating a corresponding predicted rendering position weight;
and updating the predicted rendering position weight through a reinforcement learning algorithm, and outputting the updated predicted rendering position weight.
In an exemplary embodiment of the present disclosure, after the step of rendering an augmented reality special effect on the display interface according to the human body feature data, the video stream picture parameter, and the rendering position, the method includes:
taking the human body characteristic data, the video stream picture parameters and the rendering position as special effect rendering data, taking the special effect rendering data as a data frame, and writing the data frame, the audio frame and the video frame in the live video stream into a stream media file;
and sending the streaming media file to a server, and playing a live video and rendering an augmented reality special effect after the audience end receives the streaming media file through the server.
In an exemplary embodiment of the present disclosure, a viewer end, after receiving the streaming media file through the server, plays a live video and renders an augmented reality special effect, including:
decoding the streaming media file to obtain an audio frame, a video frame and a data frame;
acquiring a main broadcast live video picture according to the video frame, and identifying the main broadcast live video picture to obtain live scene data;
predicting a rendering position of the augmented reality effect at a spectator side based on the live scene data and the effect type, wherein the effect type is extracted from the data frame;
and playing live video on the display interface according to the audio frame and the video frame, and rendering the augmented reality special effect according to the rendering position weight.
In an exemplary embodiment of the present disclosure, after the playing a live video on the display interface according to the audio frame and the video frame, and performing a rendering step of an augmented reality special effect at a rendering position of the audience, the method further includes:
and generating a new view and a projection matrix according to the playing setting parameters of the audience and the video stream picture parameters, and correcting the special effect rendering position by applying the new view and the projection matrix, wherein the video stream picture parameters are extracted from the data frame.
According to a second aspect of the embodiments of the present disclosure, there is provided an augmented reality special effect rendering apparatus, including:
the video analysis module is used for acquiring a live broadcast video stream of a main broadcast, identifying and obtaining human body characteristic data of the main broadcast according to the live broadcast video stream, and extracting video stream picture parameters;
the data acquisition module is used for obtaining triggering scene data based on the human body characteristic data and special effect data generated when the augmented reality special effect is triggered, wherein the special effect data at least comprises a special effect type and special effect triggering time;
a rendering position prediction module, configured to predict a rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body feature data;
and the interface rendering module is used for rendering the augmented reality special effect on the display interface according to the human body characteristic data, the video stream picture parameters and the rendering position.
In an exemplary embodiment of the present disclosure, the rendering position prediction module is configured to input the trigger scene data and the human body feature data into a preset algorithm to obtain a predicted rendering position weight of the augmented reality special effect, and determine a rendering position in the live video stream according to the weight.
In an exemplary embodiment of the present disclosure, the apparatus for rendering an augmented reality special effect further includes: and the weight adjusting module is used for inputting the trigger scene data and the human body characteristic data into a preset algorithm, and adjusting the predicted rendering position weight according to the special effect type after obtaining the predicted rendering position weight of the augmented reality special effect.
In an exemplary embodiment of the present disclosure, the video analysis module is configured to perform face recognition and human body recognition on a live video stream picture, obtain human body feature data including face data and mask data, and read the live video stream to obtain video stream picture parameters.
In an exemplary embodiment of the disclosure, the data obtaining module is configured to obtain the human body characteristic data according to the video stream data analysis, and the picture parameter includes: carrying out face recognition and human body recognition on the image of the video stream to obtain human body characteristic data comprising face data and mask data; and reading the live video stream to obtain video stream picture parameters.
In an exemplary embodiment of the present disclosure, the rendering position prediction module is configured to input triggering scene data and human body feature data when the augmented reality feature is triggered into an evaluation function, and calculate a corresponding predicted rendering position weight;
and updating the predicted rendering position weight through a reinforcement learning algorithm, and outputting the updated predicted rendering position weight.
In an exemplary embodiment of the present disclosure, the apparatus for rendering an augmented reality special effect further includes:
a streaming media generation module, configured to, after rendering an augmented reality special effect on the display interface according to the human body feature data, the video streaming picture parameter, and the rendering position, take the human body feature data, the video streaming picture parameter, and the rendering position as special effect rendering data, take the special effect rendering data as a data frame, and write the data frame, the audio frame, and the video frame in the live video stream into a streaming media file;
and sending the streaming media file to a server, and playing a live video and rendering an augmented reality special effect after the audience end receives the streaming media file through the server.
In an exemplary embodiment of the present disclosure, the interface rendering module is further configured to decode the streaming media file, and obtain an audio frame, a video frame, and a data frame;
acquiring a main broadcast live video picture according to the video frame, and identifying the main broadcast live video picture to obtain live scene data;
predicting a rendering position of the augmented reality effect at a spectator side based on the live scene data and the effect type, wherein the effect type is extracted from the data frame;
and playing a live video on the display interface according to the audio frame and the video frame, and rendering the augmented reality special effect at the rendering position of the audience.
In an exemplary embodiment of the present disclosure, the apparatus for rendering an augmented reality special effect further includes: and the rendering correction module is used for playing a live video on the display interface according to the audio frame and the video frame, generating a new view and a projection matrix according to the playing setting parameters of the audience and the video stream picture parameters of the special effect rendering data in the data frame after rendering the augmented reality special effect according to the rendering position weight, and correcting the special effect rendering position by applying the new view and the projection matrix, wherein the video stream picture parameters are extracted from the data frame.
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the method for rendering an augmented reality effect as described in the first aspect of the embodiments above.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a terminal device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the method for rendering an augmented reality special effect as described in the first aspect of the embodiments above.
According to a fifth aspect of the present disclosure, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the technical scheme provided by some embodiments of the disclosure, after the anchor live video stream is acquired, the human body characteristic data and the video stream picture parameters can be acquired according to the live video stream. And further acquiring trigger scene data, and predicting the rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body characteristic data. And finally, rendering the augmented reality special effect on the display interface according to the human body characteristic data, the video stream picture parameters and the rendering position. By implementing the embodiment of the disclosure, on one hand, the special effect rendering position is predicted in advance by analyzing the human body characteristics of the anchor and enhancing the trigger scene of the display special effect, so that high delay caused by determining the current video image rendering position through visual recognition and then rendering can be avoided, and the efficiency of augmented reality rendering is improved. On the other hand, the rendering position confirmed based on the anchor human body characteristics and the trigger scene is more accurate, so that the display effect of augmented reality is optimized, and the display quality is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a game interface interaction method and a game interface interaction apparatus according to an embodiment of the present disclosure may be applied.
As shown in fig. 1,system architecture 100 may include one or more ofterminal devices 101, 102, anetwork 103, and aserver 104. Thenetwork 103 serves as a medium for providing communication links between theterminal devices 101, 102 and theserver 104.Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. Theterminal devices 101, 102 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example,server 104 may be a server cluster comprised of multiple servers, or the like. The terminal equipment is used for executing: acquiring a live broadcast video stream of a main broadcast, and acquiring human body characteristic data and video stream picture parameters of the main broadcast according to the live broadcast video stream; obtaining triggering scene data based on the human body feature data and special effect data generated when the augmented reality special effect is triggered, wherein the special effect data at least comprises a special effect type and special effect triggering time; predicting the rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body feature data; and rendering the augmented reality special effect on the display interface according to the human body feature data, the video stream picture parameters and the rendering position.
FIG. 2 illustrates a computer system architecture diagram of a terminal device suitable for use in implementing embodiments of the present disclosure.
It should be noted that theterminal device 200 shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 2, theterminal device 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)202 or a program loaded from astorage section 208 into a Random Access Memory (RAM) 203. In the (RAM)203, various programs and data necessary for system operation are also stored. The (CPU)201, (ROM)202, and (RAM)203 are connected to each other by abus 204. An input/output (I/O)interface 205 is also connected tobus 204.
The following components are connected to the (I/O) interface 205: aninput portion 206 including a keyboard, a mouse, and the like; anoutput section 207 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 208 including a hard disk and the like; and acommunication section 209 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 209 performs communication processing via a network such as the internet. Thedriver 210 is also connected to the (I/O)interface 205 as necessary. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 210 as necessary, so that a computer program read out therefrom is mounted into thestorage section 208 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU)201, performs various functions defined in the methods and apparatus of the present disclosure.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the terminal device described in the above embodiments; or may exist separately without being assembled into the terminal device. The computer-readable medium carries one or more programs which, when executed by the terminal device, cause the terminal device to implement the method as described in the embodiments below. For example, the terminal device may implement the steps shown in fig. 3, and the like.
For the purpose of facilitating an understanding of the embodiments of the present disclosure, the following partial terms are to be construed in relation to the present disclosure:
augmented reality: augmented Reality (AR), also called Augmented Reality, can promote the integration of real world information and virtual world information content, and implement analog simulation processing on the basis of computer and other scientific technologies of the entity information which is difficult to experience in the space range of the real world originally, and superimpose the virtual information content in the real world, and after the real environment and the virtual object are overlapped, the virtual information content can exist in the same picture and space at the same time. And can be perceived by human sense in the process, thereby realizing the sense experience beyond reality.
Referring to fig. 3, fig. 3 schematically illustrates a flowchart of a rendering method of an augmented reality effect according to an embodiment of the present disclosure. As shown in fig. 3, the method for rendering an augmented reality special effect is applied to a terminal device having a display interface, and may include:
step S310: acquiring a live broadcast video stream of a main broadcast, and acquiring human body characteristic data and video stream picture parameters of the main broadcast according to the live broadcast video stream.
Step S320: and obtaining triggering scene data based on the human body characteristic data and special effect data generated when the augmented reality special effect is triggered, wherein the special effect data at least comprises a special effect type and special effect triggering time.
Step S330: and predicting the rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body characteristic data.
Step S340: and rendering the augmented reality special effect on the display interface according to the human body feature data, the video stream picture parameters and the rendering position.
By implementing the method for rendering the augmented reality special effect shown in fig. 3, the augmented reality special effect can be rendered on the display interface according to the human body feature data, the video stream picture parameters and the rendering position. According to the embodiment of the disclosure, on one hand, the rendering position is predicted by analyzing the anchor human body characteristics and enhancing the trigger scene of the display special effect, so that the high delay of determining the rendering position of the current picture through visual recognition can be avoided, and the efficiency of augmented reality rendering is improved. On the other hand, the prediction is carried out based on the anchor human body characteristics and the trigger scene, so that the rendering position is more accurate, the display effect of the augmented reality special effect is further optimized, and the special effect rendering quality and efficiency are improved.
The above steps of the present exemplary embodiment will be described in more detail below.
In step S310, a live broadcast video stream of a main broadcast is obtained, human body feature data of the main broadcast is obtained according to the live broadcast video stream, and video stream picture parameters are extracted at the same time.
In the embodiment of the disclosure, the augmented reality special effect is jointly displayed on a plurality of terminal devices, and the augmented reality special effect can be triggered by any one terminal device. For example, in a live scene, both a main broadcast and a viewer can trigger an augmented reality special effect through some augmented reality entries preset in a live application program in respective terminal equipment, and the augmented reality entries are set within a display interface range of the application program and also can be out of the display range of the application program. And the anchor selects the augmented reality template, and the augmented reality special effect corresponding to the augmented reality template can be triggered. Optionally, the anchor may select the augmented reality template in multiple ways, for example, in the intelligent mobile terminal device, the augmented reality template on the real interface is subjected to touch operation, clicking, double clicking, long pressing or other operations, and a mouse is used for operation on the computer terminal device; for example, the augmented reality special effect to be selected can be judged by recognizing the anchor voice; for example, the selection operation may be completed by an entity key of the terminal device. The disclosed embodiments are not particularly limited in how augmented reality is triggered.
The anchor live video stream is used for providing a picture of the whole live broadcasting room, and the live video stream can at least comprise two parts, wherein one part is a camera picture, and the other part is a picture for showing other functions of the anchor room, such as a leader board area of anchor fans, an anchor personal information receiving area and the like. The camera pictures are captured by a camera that is owned or connected at the anchor terminal device, and the video stream records the condition of the live pictures. The live broadcast pictures in the video stream data can be identified to obtain human body characteristic data. The human characteristic data may be from the group comprising: and carrying out motion detection, face detection, bone feature point detection on the human body to obtain corresponding data, or other data for expressing human body characteristics. The object of the human body feature data can be not only a main broadcast, but also other people appearing in a live broadcast picture. The video stream picture parameters are parameters for describing the main broadcast live video picture, and may include picture proportion, relative position of the camera picture in the live video picture, relative size of the camera picture and the live video picture, picture frame rate, and the like. The embodiments of the present disclosure do not specifically limit the human body feature data and the video stream picture parameters.
In step S320, based on the human body feature data and the special effect data generated when the augmented reality special effect is triggered, trigger scene data is obtained, where the special effect data at least includes a special effect type and a special effect trigger time.
The trigger scene data is some data generated in the live broadcasting process and the trigger augmented reality, and may include, for example, a real live broadcasting environment for triggering augmented reality, a time point of the trigger, a live broadcasting type, and the like. In the embodiment of the present disclosure, the trigger scene data may include a live type, a type of an augmented reality special effect, and a special effect trigger time, and a specific situation of the special effect trigger at that time is described from a plurality of angles by data content included in the trigger scene data. Through the live type, the type of the augmented reality effect and the effect trigger time, it can be described what effect is triggered at what time point under what live type, for example, in the outdoor running live broadcast, a "first" effect is triggered at two afternoon.
The live broadcast type can be divided according to different classification standards, and the live broadcast type is judged through human body characteristic data acquired by a main broadcast live broadcast video stream. If the anchor is judged to be frequently sitting for live broadcast according to the human body characteristic data, the anchor can be regarded as static live broadcast; if the anchor is judged to move frequently according to the human body characteristic data, the anchor can be considered as the sports live broadcast. For example, it may be categorized according to the position of the anchor in the picture, which is often live on the right side of the picture, and then it may be of the right-hand type. The embodiments of the present disclosure do not specifically limit the type of live broadcast, and in a specific case, different type distinguishing criteria may be set.
Based on different expression modes and characteristics of augmented reality, type classification of different systems can be carried out. For example, by color; the embodiment of the present disclosure does not specifically limit the type of the augmented reality effect according to the object classification represented by the augmented reality effect, such as a fresh flower type augmented reality effect, a digital type augmented reality effect, a vehicle type augmented reality effect, an aircraft type augmented reality effect, or an augmented reality effect type generated based on other classification criteria.
In step S330, predicting a rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body feature data;
in the embodiment of the present disclosure, since the special effect of the augmented reality is often from objects actually existing in life, people have fixed cognition on the objects. For example, an airplane or a necklace, if the augmented reality special effect is not rendered to a proper position, the effect of the augmented reality cannot be achieved, and people feel incoherence. For example, rendering an airplane below a display interface, the airplane being below and common sense being violated; for example, rendering necklace effects above the display, such as above the head of the anchor, rather than in the middle of the display, such as at the neck, may also be contrary to a person's consistent knowledge of necklaces. In addition, the position of the special effect rendering should not conflict with the main body elements displayed on the display interface. For example, a special effect is rendered on the face of the anchor, and a special effect is rendered on the introduced object of the introduction class live. Therefore, the rendering position of the special effect directly affects the effect that the augmented reality can exhibit.
It can be understood that the trigger scene data is related to scene data when a special effect is triggered in the live broadcast, and can describe characteristics of the live broadcast scene, and the human body characteristic data is used for describing characteristics of a main broadcast or other people related to the live broadcast scene. Based on the trigger scene data and the human feature data, factors that have an impact on the augmented reality special effect rendering location may be covered. For example, in live game, it is judged by triggering scene data that a anchor is sitting live statically, a special effect type is like a sports car, the anchor is positioned in the middle of a picture by combining human body characteristic data of the anchor, and a rendering position of the special effect is predicted to be positioned below the middle of a live video picture.
In step S340, an augmented reality special effect is rendered on the display interface according to the human body feature data, the video stream picture parameter and the rendering position.
In the embodiment of the present disclosure, one or more data in human body feature data may be adopted, and the data may be face feature point data, where the number of feature points may be 21, 68, 77, or other number of feature points, and the more the feature points are, the higher the accuracy is; the human face offset coordinates (x, y, z) can be estimated from the human face pose, the human body mask data can be only data extracted through the contour, and therefore the data volume is reduced. The selection of the specific human body characteristic data and the picture parameters is not limited in this disclosure. Furthermore, because the face feature data is obtained by identifying the anchor or other people in the camera picture, the face feature data can also be subjected to position correction according to the video stream picture parameters, such as the relative position of the camera picture in the live video picture and the relative size of the camera picture and the live video picture.
In addition, a Uniform Resource Locator (URL) may be used to indicate a download path of the augmented reality effect, so as to obtain the effect file corresponding to the effect type.
In the embodiment of the present disclosure, as shown in fig. 4, the content of the terminal display interface is differentiated, and generally may be divided as follows: a softwaremain window 401, which is a window range of the whole live application program, is a part of a terminal display interface and can be zoomed and adjusted in range; the displayable region of the augmented reality special effect is used for the region of the augmented reality special effect and is consistent with the range size of the whole software main window; and avideo display area 402 for displaying the live broadcast picture, wherein the video display area is a part of the software main window and can also be zoomed and adjusted in range. A part of thevideo display area 402 is acamera screen area 403. The embodiments of the present disclosure do not specifically limit the content differentiation of the terminal display interface.
In an embodiment of the present disclosure, rendering the augmented reality special effect on the display interface may be creating an augmented reality window at a rendering position on the display interface, where the augmented reality window may be a semi-transparent window, and the augmented reality window is within a window range of a live application. A rendering location is determined by prediction, at which an augmented reality window is created for rendering. Rendering, by the augmented reality player, the augmented reality special effect using the rendering data. The augmented reality player is a special player capable of playing videos and augmented reality special effects outside the videos, and a common video player can only see video pictures and hear sounds, but cannot see the augmented reality special effects outside the videos. In addition, before the augmented reality special effect is rendered, countdown prompt information can be set on the display interface to prompt the anchor and the user to render the augmented reality special effect after a certain time period.
In an embodiment of the present disclosure, an implementation manner of predicting a rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body feature data is further provided, and includes:
and inputting the trigger scene data and the human body characteristic data into a preset algorithm to obtain a predicted rendering position weight of the augmented reality special effect, and determining a rendering position in the live video stream according to the weight.
In the embodiment of the present disclosure, the preset algorithm is preset in advance, and the algorithm is input to allow the triggered augmented reality special effect to slowly adapt to the current live broadcast scene according to the triggering scene data and the human body feature data of the triggered augmented reality special effect each time. And acquiring trigger scene data and human body characteristic data every time the special effect is triggered, and using the trigger scene data and the human body characteristic data as input to calculate a predicted rendering position weight. The preset algorithm may be an algorithm for selecting a behavior by value, a behavior directly, imagining an environment, and learning three categories therefrom. For example, the value selection behavior may be selected from Q-learning algorithm, Sarsa algorithm, Deep Q Network (DQN) algorithm. The disclosed embodiments do not limit the preset algorithm specifically.
Further, in an embodiment of the present disclosure, after the step of inputting the trigger scene data and the human body feature data into a preset algorithm to obtain a predicted rendering position weight of the augmented reality special effect, the method further includes:
and adjusting the weight value of the predicted rendering position according to the special effect type.
It will be appreciated that the effects rendered by augmented reality are many from real-life objects such as airplanes, flowers, yachts. Based on the characteristics of these real-life objects, the rendering position of the corresponding augmented reality effect should be within a reasonable range. For example, an airplane is above the picture and flowers are below the picture. For example, a mapping relation may be established based on the type of the augmented reality effect and different real interface regions of the screen, and a greater weight may be given to a region on the display interface corresponding to the type of the augmented reality effect, so that the effect is more likely to be displayed in the corresponding region, and thus the predicted rendering position weight may be adjusted differently by the type of the augmented reality effect.
In the embodiment of the disclosure, the output predicted rendering position weight is further adjusted according to the type of the augmented reality special effect, so that the accuracy of the final output predicted rendering position weight can be improved, and the rendering quality of the augmented reality special effect is improved.
In an embodiment of the present disclosure, an implementation manner is further provided for acquiring a live broadcast video stream of a main broadcast, and obtaining human body feature data and video stream picture parameters of the main broadcast according to the live broadcast video stream, including:
and carrying out face recognition and human body recognition on the live video stream picture to obtain human body characteristic data including face data and mask data, and reading the live video stream to obtain video stream picture parameters.
In the disclosed embodiment, the face data may include face feature points, face pose estimates. Optionally, the process of obtaining the feature points of the face is performed on the basis of face detection, and the feature points on the face, such as corners of the mouth, corners of the eyes, and the like, are located. Cascade regression CNN face feature point detection, Dlib face feature point detection, libfacedetect face feature point detection, Seetaface face feature point detection methods can be adopted. For example, when dlib is adopted to realize face feature point detection, a model based on dlib training is required. And then 68 points are marked by using the model, imaging processing is carried out by adopting OpenCv, 68 points are drawn on the face, and serial numbers are marked. Other face feature point detection methods may also be adopted, and the embodiments of the present disclosure are not limited in particular to how the face feature points are obtained.
The face pose estimation comprises determining the face pose through the three euler angles of pitch, yaw and roll. Where pitch denotes a pitch angle (rotation angle about the x-axis), yaw denotes a yaw angle (rotation angle about the y-axis), and roll denotes a roll angle (rotation angle about the z-axis). Respectively representing the angles of up-down turning, left-right turning and in-plane rotation. The face pose estimation can be performed by a model-based method, an appearance-based method and a classification-based method. For example, using a model-based approach, a 3D face model with n keypoints is first defined, where n can be defined according to the tolerance level of accuracy. The more keypoints, the higher the accuracy. For example, a 3D face model (left canthus, right canthus, nose tip, left mouth corner, right mouth corner, lower jaw) with 6 key points is selected; obtaining 2D face key points corresponding to the 3D face model by adopting face detection and face key point detection; solving a rotation vector by adopting a solvePnP function of Opencv; the rotation vector is converted into an euler angle. Other face pose estimation methods may also be used, and the embodiment is not limited in this respect.
For the recognition of the human body, a threshold-based segmentation method, a region-based segmentation method, and an edge-based segmentation method may be used. For example, the threshold segmentation method sets T as a threshold through the transformation of an input image f to an output image g; for an image element of an object, g (i, j) is 1, and for an image element of a background, g (i, j) is 0. The key of the threshold segmentation algorithm is to determine a threshold and determine a suitable threshold, the threshold is compared with the gray value of a pixel point and pixel segmentation can be carried out on each pixel in parallel, and the segmentation result is directly given to an image area. And acquiring human body mask data based on the segmented image region result, extracting the outline of the mask data, and marking the outline coordinate. The embodiment does not specifically limit how the human body mask data is acquired.
In the disclosed embodiment, the video stream picture parameter may include a camera picture original size (camera _ size). Since the camera view is actually only a portion of the complete live video view, the video stream view parameters also include the actual position of the camera view in the video view. The video stream picture parameter may include a resolution of the video.
Optionally, the obtained face data and video stream picture parameters are encapsulated in a json manner; after the contour coordinate points of the human body mask data are stored by a 2-system, base64 coding is carried out, and the data transmission is facilitated by putting the contour coordinate points into a json structure.
As shown in fig. 5, fig. 5 schematically illustrates a flowchart of a rendering method of an augmented reality effect according to an embodiment of the present disclosure. In an embodiment of the present disclosure, inputting the trigger scene data and the human body feature data into a preset algorithm to obtain a predicted rendering position weight of the augmented reality special effect, includes:
step S510, inputting the triggering scene data and the human body characteristic data when the augmented reality characteristic is triggered into an evaluation function, and calculating the predicted rendering position weight corresponding to the triggering scene data and the human body characteristic data when the augmented reality characteristic is triggered.
The main task of the evaluation function is to estimate the degree of importance of each rendering position to determine the degree of priority. In the evaluation function, various evaluation indexes may be used, such as Root Mean Square Error (RMSE), R-square (R2), Mean Absolute Percent Error (MAPE), Mean Absolute Error (MAE), hill coefficient (TIC), and the like. Triggering scene data and human body characteristic data are factors influencing rendering positions, and after each augmented reality triggering, an evaluation function is input according to specific triggering scene data and human body characteristic data during triggering to obtain a predicted rendering position weight of a live broadcast condition during the triggering.
And step S520, updating the weight of the predicted rendering position through a reinforcement learning algorithm, and outputting the updated weight of the predicted rendering position.
In the embodiment of the present disclosure, the predicted rendering position weight is updated. Action cost functions, direct-computation policy functions (policy-based), and estimation models (model-based) or other reinforcement learning algorithms may be employed. For example, Bellman's formula is used: and (3) iteratively updating the weight value, wherein NewQ is Q + α [ R + γ maxQ' -Q ], and represents the weight value after iterative updating, and Q represents the previous weight value of a certain trigger scene. And alpha is a weight-biased parameter and is a manual parameter. R represents a reward value calculated before, gamma is a weight value of each step of reward punishment value, gamma maxQ' is an optimal weight value which can be predicted in the next step, and finally, a weight value after iterative updating of a prediction rendering position is obtained. And the calculation result is continuously updated through a reinforcement learning algorithm, so that the method can be adapted to the trigger scene of the multi-type augmented reality special effect.
In the embodiment of the disclosure, the predicted rendering position weight of each augmented reality is calculated through the evaluation function, and then iterative updating is performed through the reinforcement learning algorithm, so that the prediction of the rendering position is more accurate, the process of determining the rendering position is more efficient, and the rendering quality of the augmented reality special effect is ensured.
As shown in fig. 6, fig. 6 schematically illustrates a flowchart of a rendering method of an augmented reality effect according to an embodiment of the present disclosure. In an embodiment of the present disclosure, after the step of rendering an augmented reality special effect on the display interface according to the human body feature data, the video stream picture parameter, and the rendering position, the method includes:
step S610, writing the human body feature data, the video stream picture parameters, and the rendering position as special effect rendering data, and the special effect rendering data as data frames, together with audio frames and video frames in the live video stream, into a stream media file.
In the embodiment of the present disclosure, for example, in a live broadcast service, a viewer side is a terminal device that receives a live video sent by a live broadcast server side. The audience can perform augmented reality rendering only by acquiring complete data including audio data, video data and special effect rendering data, namely, a real live broadcast picture is constructed through the audio data and the video data, and an augmented reality special effect is further overlaid and rendered on the basis of the real picture. The special effect rendering data can thus be written as data frames together with the audiovisual frames into a streaming media file.
Optionally, a Real Time Messaging Protocol (RTMP), a Real Time Streaming Protocol (RTSP), or other protocols may be used to transmit the Streaming media file. For example, an RTMP protocol may be used, the RTMP stream of which consists of video frames, audio frames, data frames. The video frames, audio frames, and data frames are written into the RTMP stream with the same time stamp, i.e. each frame of video, each frame of audio, and each frame of special effect rendering data has the same time stamp. Therefore, when the video is played and the augmented reality special effect is rendered, the time dislocation can not occur, and the problem that the special effect rendering and the video picture are not synchronous can be avoided.
Step S620, the streaming media file is sent to a server, and the audience end plays the live video and renders the augmented reality special effect after receiving the streaming media file through the server.
In many cases, augmented reality effects need to be rendered on multiple devices, such as in a live service or in a video call. For example, in a live broadcast, if a viewer sends a gift to a main broadcast, a terminal device of the viewer needs to render a special gift effect, the main broadcast needs to render the special gift effect, and each terminal device of the viewer watching the live broadcast needs to render the special gift effect. In the live broadcast, the live broadcast video is generated at a main broadcast end, the main broadcast end pushes the live broadcast video to a server end, and a spectator end pulls the live broadcast video from the server end to a terminal of the spectator end for display. Because the live video frame is pushed to the live server by the anchor terminal, the anchor terminal does not need to pull the stream from the server to obtain the video stream data again, the transmission of the video stream does not exist, the time delay caused by the transmission of the video stream does not exist, and the rendering data used by the anchor terminal can be uncompressed special effect rendering data.
When the audience receives the streaming media file, whether the streaming media file contains a data frame or not is detected, namely whether the streaming media file contains special effect rendering data for rendering the augmented reality special effect or not is detected. And when the special effect rendering data exists, the audience end creates an augmented reality player on the display interface while playing the video, and the player renders the augmented reality special effect by using the special effect rendering data.
As shown in fig. 7, fig. 7 schematically shows a flowchart of a rendering method of an augmented reality effect according to an embodiment of the present disclosure. In an embodiment of the present disclosure, the playing a live video and rendering an augmented reality special effect after the spectator end receives the streaming media file through the server includes:
step S710, decoding the streaming media file, and acquiring an audio frame, a video frame, and a data frame.
Because the live broadcast is low-delay and real-time picture transmission, the received RTMP stream needs to be decoded continuously, and video frames, audio frames and data frames are obtained by decoding according to the standard format of RTMP.
And S720, acquiring a main broadcast live video picture according to the video frame, and identifying the main broadcast live video picture to obtain live scene data.
And identifying a live broadcast picture displayed by the video frame to obtain live broadcast scene data of the current trigger augmented reality special effect, wherein the live broadcast scene data comprises a live broadcast scene and a live broadcast type. The live broadcast scene is used for identifying the real environment where the anchor is located.
And step S730, inputting the live scene data and the special effect type into a preset algorithm, and predicting the rendering position of the augmented reality special effect at a spectator end.
The live broadcast scene data and the augmented reality special effect types are not repeated, a predicted rendering position weight on a spectator side is output based on the same calculation process, and a rendering position is judged according to the weight.
And step S740, playing live video on the display interface according to the audio frames and the video frames, and rendering the augmented reality special effect according to the rendering position of the audience.
In an embodiment of the present disclosure, after the step of playing a live video on the display interface according to the audio frame and the video frame, and performing rendering of an augmented reality special effect according to the rendering position weight, the method further includes:
and generating a new view and a projection matrix according to the playing setting parameters of the audience and the video stream picture parameters of the special effect rendering data in the data frame, and correcting the special effect rendering position by applying the new view and the projection matrix.
The new view V and the projection matrix P are generated according to the viewer-side play setting parameters including the size of the augmented reality player window (canvas _ size) and the position of the video picture in the video player with respect to the main window, Rect1, the original video size (video _ size), the face pose estimation in the rendering data, and the camera picture position with respect to the original video position, Rect 0. And then, according to the corresponding time point of the time stamp in the augmented reality special effect, a new V, P matrix is applied to play the augmented reality special effect for correcting the rendering position of the preset augmented reality special effect.
In the embodiment of the present disclosure, after the predicted rendering position weight of the audience terminal is rendered, the position of the augmented reality feature rendering is further adjusted based on the size of the additional augmented reality player window, the size of the original video, and a plurality of relative position relationships, so that the accuracy of the rendering position can be improved.
The main functions of the anchor side and the main functions of the viewer side, in connection with a specific scene, can be embodied by fig. 8. As shown in fig. 8, the anchor terminal has an audio/video writing unit, and writes the audio frame and the video frame into the streaming media file through the audio/video writing unit; carrying out face recognition and human body recognition through a human body data recognition module to obtain human body characteristic data comprising face data and mask data; the special effect rendering data generated by the anchor terminal is compressed through the special effect rendering data compression unit so as to reduce the transmission pressure of the streaming media file and improve the transmission efficiency; and the rendering position learning and predicting module is used for predicting the rendering position prediction of the augmented reality special effect in advance so as to improve the rendering accuracy and efficiency.
And forwarding the streaming media file between the anchor terminal and the audience terminal through the server to enable the audience terminal to acquire the live video. The streaming media file includes three kinds of data, i.e., audio frames, video frames, and special effect rendering data frames.
The audience end is provided with an audio and video playing unit which is used for extracting the audio frames and the video frames in the streaming media file to play common live video; the special effect rendering data extracting unit is used for decompressing the special effect rendering data; the system also comprises a special effect rendering position processing module, a live broadcast scene data processing module and a special effect type processing module, wherein the special effect rendering position processing module is used for predicting the rendering position of the augmented reality special effect at a spectator end according to the live broadcast scene data and the special effect type; and finally, the system comprises a special effect display module which is used for finishing the rendering of the augmented reality special effect at the rendering position of the audience.
Further, in this example embodiment, an apparatus for rendering an augmented reality special effect is also provided. Referring to fig. 9, the apparatus 900 for rendering an augmented reality special effect may include:
thevideo analysis module 901 is configured to obtain a live broadcast video stream of a main broadcast, and obtain human body characteristic data and video stream picture parameters of the main broadcast according to the live broadcast video stream;
adata obtaining module 902, configured to obtain trigger scene data based on the human body feature data and special effect data generated when the augmented reality special effect is triggered, where the special effect data at least includes a special effect type and a special effect trigger time;
a renderingposition prediction module 903, configured to predict a rendering position of the augmented reality special effect in the live video stream according to the trigger scene data and the human body feature data;
and aninterface rendering module 904, configured to render an augmented reality special effect on the display interface according to the human body feature data, the video stream picture parameter, and the rendering position.
In an exemplary embodiment of the present disclosure, the renderingposition predicting module 903 is configured to input the trigger scene data and the human body feature data into a preset algorithm, obtain a predicted rendering position weight of the augmented reality special effect, and determine a rendering position in the live video stream according to the weight.
In an exemplary embodiment of the present disclosure, the apparatus for rendering an augmented reality special effect further includes: and the weight adjusting module is used for inputting the trigger scene data and the human body characteristic data into a preset algorithm, and adjusting the predicted rendering position weight according to the special effect type after obtaining the predicted rendering position weight of the augmented reality special effect.
In an exemplary embodiment of the present disclosure, thevideo analysis module 901 is configured to perform face recognition and human body recognition on a live video stream picture, obtain human body feature data including face data and mask data, and read the live video stream to obtain video stream picture parameters.
In an exemplary embodiment of the disclosure, thedata obtaining module 902 is configured to obtain human characteristic data according to the video stream data analysis, and the picture parameter includes: carrying out face recognition and human body recognition on the image of the video stream to obtain human body characteristic data comprising face data and mask data; and analyzing the video stream data, wherein the read picture parameters comprise shooting picture parameters and video stream picture parameters.
In an exemplary embodiment of the present disclosure, the renderingposition predicting module 903 is configured to input trigger scene data and human body feature data when an augmented reality feature is triggered into an evaluation function, and calculate a predicted rendering position weight corresponding to the trigger scene data and the human body feature data in the trigger augmented reality time;
and iteratively updating the predicted rendering position weight value through a reinforcement learning algorithm, and outputting the iteratively updated predicted rendering position weight value.
In an exemplary embodiment of the present disclosure, the apparatus for rendering an augmented reality special effect further includes:
a streaming media generation module, configured to, after rendering an augmented reality special effect on the display interface according to the human body feature data, the video streaming picture parameter, and the rendering position, take the human body feature data, the video streaming picture parameter, and the rendering position as special effect rendering data, take the special effect rendering data as a data frame, and write the data frame, the audio frame, and the video frame in the live video stream into a streaming media file;
and sending the streaming media file to a server, and playing a live video and rendering an augmented reality special effect after the audience end receives the streaming media file through the server.
In an exemplary embodiment of the disclosure, theinterface rendering module 904 is further configured to decode the streaming media file, and obtain an audio frame, a video frame, and a data frame;
acquiring a main broadcast live video picture according to the video frame, and identifying the main broadcast live video picture to obtain live scene data;
inputting the live scene data and the special effect type into a preset algorithm, and predicting the rendering position of the augmented reality special effect at a spectator end;
and playing live video on the display interface according to the audio frame and the video frame, and rendering the augmented reality special effect according to the rendering position weight.
In an exemplary embodiment of the present disclosure, the apparatus for rendering an augmented reality special effect further includes: and the rendering correction module is used for playing a live video on the display interface according to the audio frame and the video frame, generating a new view and a projection matrix according to the playing setting parameters of the audience and the video stream picture parameters of the special effect rendering data in the data frame after rendering the augmented reality special effect according to the rendering position weight, and correcting the special effect rendering position by applying the new view and the projection matrix.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, the modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).
Optionally, the invention also provides a program product, for example a computer-readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.