AR technology-based instant audio and video communication method and deviceTechnical Field
The invention relates to the technical field of instant messaging, in particular to an instant audio and video communication method and device based on an AR technology.
Background
With the advancement of science and technology, terminal devices such as smart phones, notebook computers and computers have become indispensable important components in work and life, and Augmented Reality (AR) technology brings people a better experience by combining a virtual world and a real world.
Currently, a cross-platform and cross-terminal instant messaging system generally only adopts a man-machine interaction mode, such as characters, pictures, animations, audio, video, voice calls and the like, and the interaction mode has certain singleness and limitation. It is thus important how to apply AR technology to instant messaging systems to enrich the way of interaction.
Disclosure of Invention
The embodiment of the invention provides an AR technology-based instant audio and video communication method and device, which are used for enriching an interactive mode of instant communication.
On one hand, the embodiment of the invention provides an instant audio and video communication method based on an AR technology, which comprises the following steps:
establishing audio and video call connection with at least one call opposite terminal;
determining a target object according to the operation of a local end user;
according to the mapping relation between the preset object and the 3D image model file, determining the virtual 3D image of the target object, adding the virtual 3D image into an audio and video call picture to obtain an AR call picture, and sending the target object to at least one call opposite terminal to enable the at least one call opposite terminal to determine the virtual 3D image of the target object and obtain the AR call picture.
On the other hand, the embodiment of the invention also provides an instant audio and video communication device based on the AR technology, which comprises the following components:
the call connection module is used for establishing audio and video call connection with at least one call opposite terminal;
the target object determining module is used for determining a target object according to the operation of a local end user;
and the call picture generation module is used for determining the virtual 3D image of the target object according to the mapping relation between the preset object and the 3D image model file, adding the virtual 3D image into the audio and video call picture to obtain an AR call picture, and sending the target object to the at least one call opposite terminal to ensure that the at least one call opposite terminal determines the virtual 3D image of the target object and obtains the AR call picture.
According to the technical scheme provided by the embodiment of the invention, the local call terminal establishes audio and video call connection with at least one call opposite terminal, determines a target object according to the operation of a local terminal user, determines a virtual 3D image of the target object according to a mapping relation between an object stored in the local terminal in advance and a 3D image model file, and adds the virtual 3D image to an audio and video call interface to obtain an AR call picture of the local terminal. And the local terminal also sends the target object to at least one call opposite terminal, so that the at least one call opposite terminal determines the virtual 3D image of the target object according to the mapping relation between the preset object and the 3D image model file, and adds the virtual 3D image to the audio and video call interface to obtain an AR call picture of the call opposite terminal. AR call pictures containing virtual 3D images can be synchronously generated at the local terminal and the call opposite terminal, interaction methods of the instant messaging system are enriched, and user experience is improved.
Drawings
Fig. 1 is a flowchart of an instant audio/video communication method based on AR technology according to a first embodiment of the present invention;
fig. 2 is a flowchart of an instant audio/video communication method based on AR technology according to a second embodiment of the present invention;
fig. 3 is a flowchart of an instant audio/video communication method based on AR technology according to a third embodiment of the present invention;
fig. 4 is a communication architecture diagram of an AR audio/video instant messaging system provided in the third embodiment of the present invention;
FIG. 5 is a schematic diagram of rendering a model of a virtual 3D avatar according to a third embodiment of the present invention;
FIG. 6 is a schematic diagram of mapping and texture processing of a virtual 3D image according to a third embodiment of the present invention;
fig. 7 is a structural diagram of an instant audio/video communication device based on AR technology according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of an instant audio/video communication method based on an AR technology according to an embodiment of the present invention. The method of the present embodiment may be performed by an instant audio-video communication device based on AR technology, which may be implemented by means of hardware and/or software. Referring to fig. 1, the instant audio/video communication method based on the AR technology provided in this embodiment may specifically include the following steps:
and step 11, establishing audio and video call connection with at least one call opposite terminal.
Specifically, two or more parties needing to communicate log in the AR audio/video communication system on the terminal by using a pre-registered or pre-distributed account and a password respectively. And then, any calling party selects a friend needing to call through the friend list of the client, initiates an audio and video call request, and if the selected party receives the request, the audio and video call connection of two parties or more parties is successfully established.
The AR audio and video communication clients pre-store mapping relations between the objects and the 3D image model files so as to generate virtual 3D images according to the mapping relations. Objects may refer to character identities such as kittens, puppies, girls, boys, piccolons, and sprites. It should be noted that the mapping relationship between the 3D avatar model file and the map and texture file is also stored in the client, so as to facilitate subsequent rendering of the virtual 3D avatar.
And step 12, determining a target object according to the operation of the local end user.
The local end user refers to any party user who actively needs to join the AR virtual object. Specifically, when the local end user clicks an option of adding an AR virtual object provided by the communication client on the screen, the local end may determine the target object according to the user operation.
Step 13, according to the mapping relation between the preset object and the 3D image model file, determining the virtual 3D image of the target object, adding the virtual 3D image to the audio and video call picture to obtain an AR call picture, and sending the target object to the at least one call opposite terminal, so that the at least one call opposite terminal determines the virtual 3D image of the target object and obtains the AR call picture.
Specifically, the local terminal determines a target 3D image model file associated with the target object according to a mapping relationship between a preset object and the 3D image model file, generates a virtual 3D image of the target object based on the target 3D image model file, and adds the generated virtual 3D image to an audio/video call picture displayed on a screen of the local terminal to obtain an AR call picture of the local terminal. It should be noted that the virtual 3D avatar needs to be rendered based on the map and texture files associated with the target 3D avatar model file.
And the local terminal can also send the target object to at least one call opposite terminal through a server of the audio and video call application. The method comprises the steps that a call opposite end receiving a target object determines a target 3D image model file related to the target object according to a mapping relation between the preset object and the 3D image model file in a client carried by the call opposite end and the target 3D image model file, generates a virtual 3D image of the target object based on the target 3D image model file, and adds the generated virtual 3D image to an audio and video call picture displayed on a screen of the call opposite end to obtain an AR call picture of the call opposite end.
According to the technical scheme provided by the embodiment, the local call terminal establishes audio and video call connection with at least one call opposite terminal, determines a target object according to the operation of a local terminal user, determines a virtual 3D image of the target object according to a mapping relation between an object stored in the local terminal in advance and a 3D image model file, and adds the virtual 3D image to an audio and video call interface to obtain an AR call picture of the local terminal. And the local terminal also sends the target object to at least one call opposite terminal, so that the at least one call opposite terminal determines the virtual 3D image of the target object according to the mapping relation between the preset object and the 3D image model file, and adds the virtual 3D image to the audio and video call interface to obtain an AR call picture of the call opposite terminal. AR call pictures containing virtual 3D images can be synchronously generated at the local terminal and the call opposite terminal, interaction methods of the instant messaging system are enriched, and user experience is improved.
Illustratively, determining the target object according to the operation of the local end user may include:
A. acquiring a target image drawn by the local end user through a screen; or acquiring a target image acquired by the local camera.
The instant audio and video communication client is provided with an object acquisition function, such as acquiring a target image in an image drawing mode. Specifically, if the local user selects to obtain the target image in a drawing manner, the client provides an image drawing area similar to an electronic whiteboard for the user to draw, and the drawing content is used as the target image.
In addition, the client can also acquire the target image in an image shooting mode. Specifically, if the local user selects to obtain the target image by a shooting mode, the client may control the local to start the camera, and the user may shoot the target image by the camera toward an object. Further, in order to enable the acquisition of the target image not to affect the video communication of the user, the target image can be acquired through a rear camera, and the video call content can be acquired through a front camera.
B. And determining a target object to which the target image belongs by adopting an image recognition technology.
Specifically, an image recognition technology is adopted to recognize the target image, and the role identity of the target image is determined according to the recognition result, namely the target object of the target image is determined.
In the embodiment, the client end has the function of acquiring the target image in a drawing mode or a shooting mode, so that the acquisition mode of the target image is enriched, the interestingness is improved, the target object can be conveniently and individually determined according to the user requirement, and the virtual 3D image can be conveniently and individually generated subsequently.
In addition, after determining the target object, the method may further include: size and/or scale parameters of the target image are determined. Accordingly, to further satisfy the requirement of the user to generate a personalized virtual 3D character, the exemplary determining the virtual 3D character of the target object according to the preset mapping relationship between the object and the 3D character model file may include: determining a size and/or scale of the target object; determining a virtual 3D image of the target object according to a mapping relation between a preset object and a 3D image model file; and correcting and displaying the virtual 3D image of the target object according to the size and/or the proportion of the target object. The matching degree of the virtual 3D image and the target object can be further improved by correcting the virtual 3D image of the target object according to the size and/or the proportion of the target object, so that the user experience is improved.
Example two
Fig. 2 is a flowchart of an instant audio/video communication method based on the AR technology provided in the second embodiment of the present invention. Referring to fig. 2, the instant audio/video communication method based on the AR technology provided in this embodiment may specifically include the following steps:
and step 21, establishing audio and video call connection with at least one call opposite terminal.
And step 22, determining a target object according to the operation of the local end user.
Step 23, according to the mapping relation between the preset object and the 3D image model file, determining the virtual 3D image of the target object, adding the virtual 3D image to the audio and video call picture to obtain an AR call picture, and sending the target object to the at least one call opposite terminal, so that the at least one call opposite terminal determines the virtual 3D image of the target object and obtains the AR call picture.
And 24, acquiring a target action instruction.
In the two-party or multi-party communication process, the target action instruction can be generated according to the instruction action of any party user on the virtual 3D image. Specifically, any party user can execute the instruction action on the virtual 3D image displayed on the screen of the terminal, the terminal which detects the instruction action generates the target action instruction, and the target action instruction is sent to other communication terminals through the server.
Illustratively, step 24 may include: and acquiring a target action instruction according to the operation of the local end user on the virtual 3D image of the target object.
Illustratively, step 24 may also include: and receiving a target action instruction sent by any call opposite end through a server, wherein the target action instruction is generated according to the operation of a user of the call opposite end on the virtual 3D image of the target object.
And step 25, determining the target interactive animation corresponding to the target action instruction according to the preset mapping relation between the action instruction and the interactive animation.
And mapping relations between the action instructions and the interactive animations are also stored in each instant messaging client in advance. In addition, the mapping relation between the interactive animation and the map and the texture file can be stored in the client.
Specifically, any party communication terminal receiving the action instruction determines the target interactive animation corresponding to the target action instruction according to the mapping relation between the action instruction and the interactive animation in the party communication terminal.
And step 26, controlling the virtual 3D image of the target object to execute the action corresponding to the target action instruction based on the target interactive animation.
When controlling the 3D avatar to execute the action corresponding to the target action command, the mapping and texture file associated with the target interactive animation is also needed.
In addition, in the process of multi-party video call, any party clicks an exit button to close a window, and other parties continue the call until the whole session is completely ended when only one party is left.
According to the technical scheme provided by the embodiment, after the virtual 3D image of the target object is formed and displayed at each call end, the target action instruction can be generated according to the action instruction of any call end to the virtual 3D image, and the target action instruction is sent to other call ends, so that the virtual 3D image displayed at each call end can execute the action corresponding to the target action instruction. The scheme further enriches the interaction modes by controlling the virtual 3D image to execute the action, and can improve the satisfaction degree of user experience.
EXAMPLE III
Fig. 3 is a flowchart of an instant audio/video communication method based on the AR technology provided in the third embodiment of the present invention. Referring to fig. 3, the instant audio/video communication method based on the AR technology provided in this embodiment may specifically include the following steps:
and step 31, pre-storing the mapping relation between the object and the 3D image model file and the mapping relation between the action instruction and the interactive animation in the AR audio and video call client.
Specifically, a designer designs a 3D object and an interactive animation of the 3D object which are common in life in advance, hands to a designer and a designer to manufacture a 3D image model through a 3D modeling tool (3DMAX, Blender and the like), and stores a 3D image model file, the interactive animation, a related chartlet and a texture file in a client of the AR audio and video instant messaging application.
The AR audio and video instant messaging system is mainly based on the SIP (Session Initiation Protocol), which is a Protocol standard established by IETF (Internet Engineering Task Force), and is described in multiple RFC (Request For Comments, a series of files with scheduled numbers). SIP is an application-layer protocol for establishing, terminating sessions or multimedia calls. The audio and video transmission adopts an RTP (Real-time Transport Protocol), wherein the RTP is a standard provided by IETF, and a corresponding RFC document is RFC 3550. RFC3550 defines not only RTP but also the supporting related Protocol RTCP (Real-time Transport Control Protocol). RTP is used to provide end-to-end real-time transmission service for various multimedia data requiring real-time transmission, such as voice, image, fax, etc., over an IP network.
The AR audio and video instant messaging system supports p2p point-to-point communication. The Network Address Translation (NAT) traversal and forwarding based on the TurnServer (open source traversal server) can completely realize the peer-to-peer communication of P2P. The TurnServer includes a STUN (Simple Traversal of User data protocol through Network Address translations, UDP Simple Traversal of NAT) protocol and a function of forwarding in the server. The STUN Protocol is a lightweight Protocol, and is a complete NAT-penetrating solution based on UDP (User Datagram Protocol). It allows applications to discover NATs and firewalls and other types that exist between them and the public internet. It may also let the application determine the public network IP address and port number that the NAT assigns to them. STUN is a protocol of Client/Server and also a protocol of Request/Response.
The AR audio and video instant messaging system also supports multi-party communication. The turnServer forwarding function is needed in the multi-party call, the forwarding function can only realize the video forwarding of at most two people, and all call sounds are forwarded to each party call end through the mixer to be played.
And step 32, establishing audio and video call connection of two parties or multiple parties.
Fig. 4 is a communication architecture diagram of an AR audio/video instant messaging system provided in the third embodiment of the present invention. Referring to fig. 4, the AR audio/video system employs an SIP Protocol and an HTTP (HyperText Transfer Protocol) Protocol, which are mainstream in the audio/video call. The main functions of the SIP protocol include: 1. and the client logs in the Opensips server by adopting an SIP protocol and updates the online and offline states. 2. The client-side carries out audio and video calling and answering and media negotiation of multimedia data through an SIP protocol. 3. And the client transmits the AR interactive communication information through the subscribe and notify of the SIP protocol. 4. And the client side informs the online and offline states of the friends through the subscribe and notify of the SIP protocol. The main functions of the HTTP protocol are: 1. and registering the user name on the client. 2. And managing friends of the client, adding friends, deleting friends and synchronizing friends.
In addition, referring to fig. 4, the functions of the background services are as follows: JSP (Java Server Pages ) web: and the web background of the AR audio and video system is used for background management of the whole system, monitoring of the audio and video communication system, client management and other functions. J2EE spring: and the background HTTP service function of friend management of the client is realized, and the data of friends is stored. Redis: the data intermediate bridge of Opensips service and J2EE service is realized, and the user saves the real-time state of the current client, the state information of Opensips and TurnServer and the like. MySQL: as a database served by J2EE, system data is stored. Opensips: and the proxy and forwarding functions of the SIP protocol are realized.
Specifically, the client realizes the functions of initiating (invite), answering and hanging up the SIP session based on the PJSIP (open source SIP encapsulation library). The data channel of the audio and video call is transmitted through an RTP protocol, if the data channel is P2P point-to-point transmission, the data transmission is directly started after the internal network IP of the TurnServer is analyzed and penetrated to obtain the public network IP and the port of the other party, and if the data channel is the P2P point-to-point transmission, the audio and video data from the TurnServer is directly received.
Based on the NAT transparent transmission and transfer function of TurnServer, the point-to-point and multi-party forwarding function of P2P can be realized. After the client logs in and is connected to the TurnServer, the client and the server automatically detect the type of the intranet and whether the NAT transparent transmission function (based on the STUN protocol) can be realized or not, and if the client cannot realize the NAT transparent transmission (the NAT type is symmetrical) in the intranet, the client forwards the NAT transparent transmission function through the TurnServer server.
The principle of multiparty communication is the same as that of server forwarding, and multiparty audio data are uploaded to the server for audio synthesis and then forwarded to all call clients for decoding and playing. The video of multi-party communication can only realize the broadcast of 2 client video pictures at most.
And step 33, drawing the target image by the user of any call end by adopting an electronic whiteboard, or shooting a real object by any call end through a rear camera to obtain the target image.
And step 34, the call end adopts an image recognition technology to recognize the target image to obtain a target object to which the target image belongs.
And step 35, the call terminal determines the virtual 3D image of the target object according to the mapping relation between the locally preset object and the 3D image model file, and adds the virtual 3D image to the local video call picture to obtain an AR call picture.
Specifically, the channel side parses a locally stored target 3D image model file corresponding to the target object through an OpenGL ES (OpenGL for Embedded Systems, OpenGL three-dimensional graphics interface subset) graphics interface, and renders the virtual 3D model on the screen of the call side.
Specifically, the rendering process of the virtual 3D model includes:
1) and setting drawing parameters. Referring to fig. 5, data of a locally stored 3D model is loaded by a code, and then the data is sent to a Video Random Access Memory (VRAM) through a Central Processing Unit (CPU) clock, and rendering of a graphic is completed using the data and commands in the VRAM under the control of a Graphics Processing Unit (GPU), and the result is stored in a frame buffer, where a frame in the frame buffer is finally sent to a display.
2) Mapping and texturing. Referring to fig. 6, a map stored locally is loaded by a code, and a two-dimensional array data is generated by parsing the map by the code and loaded into a main memory. And then, the method carries out splitting by calling glTexImage2D () or gluBuiled 2DMipmaps (), namely, texture data is sent to the video memory according to the storage mode and the pixel conversion operation of the pixels.
3) And (6) rendering. And taking the two-dimensional array data in the video memory, and performing texture rendering through a GPU (graphics processing unit) to map the whole texture into a three-dimensional space.
And step 36, the calling terminal sends the target object to other calling terminals, so that the other calling terminals determine the virtual 3D image of the target object and obtain an AR calling picture.
And step 37, if the user of any one of the call ends executes the action command on the virtual 3D image, the virtual 3D image of each call end executes the action corresponding to the action command.
Specifically, the 3D object parses different animation files according to different instructions. The original 3D object is operated through an OpenGL ES image interface, and the space coordinate and the related chartlet texture of the original 3D object are changed, so that the special effect of moving the 3D object is achieved.
Example four
Fig. 7 is a structural diagram of an instant audio/video communication device based on AR technology according to a fourth embodiment of the present invention. As shown in fig. 7, the instant audio/video communication device based on AR technology may include:
a call connection module 41, configured to establish an audio/video call connection with at least one call opposite terminal;
a target object determining module 42, configured to determine a target object according to an operation of a local end user;
and the call picture generation module 43 is configured to determine the virtual 3D image of the target object according to a mapping relationship between a preset object and a 3D image model file, add the virtual 3D image to an audio/video call picture to obtain an AR call picture, and send the target object to the at least one call opposite terminal, so that the at least one call opposite terminal determines the virtual 3D image of the target object and obtains the AR call picture.
Illustratively, the call screen generating module 43 may be specifically configured to:
determining a size and/or scale of the target object;
determining a virtual 3D image of the target object according to a mapping relation between a preset object and a 3D image model file;
and correcting and displaying the virtual 3D image of the target object according to the size and/or the proportion of the target object.
Illustratively, the apparatus may further include:
the instruction acquisition module is used for acquiring a target action instruction after determining and displaying the virtual 3D image of the target object and sending the target object to the at least one opposite call terminal;
the target animation determining module is used for determining a target interactive animation corresponding to a target action instruction according to a mapping relation between a preset action instruction and the interactive animation;
and the action execution module is used for controlling the virtual 3D image of the target object to execute the action corresponding to the target action instruction based on the target interactive animation.
For example, the instruction obtaining module may be specifically configured to:
acquiring a target action instruction according to the operation of a local end user on the virtual 3D image of the target object; or,
and receiving a target action instruction sent by any call opposite end through a server, wherein the target action instruction is generated according to the operation of a user of the call opposite end on the virtual 3D image of the target object.
For example, the target object determination module 42 may include:
the target image acquisition unit is used for acquiring a target image drawn by the local end user through a screen; or, the system is used for acquiring a target image acquired by a local camera;
and the target object determining unit is used for determining the target object to which the target image belongs by adopting an image recognition technology.
The AR technology-based instant audio/video communication device provided by this embodiment belongs to the same inventive concept as the AR technology-based instant audio/video communication method provided by any embodiment of the present invention, can execute the AR technology-based instant audio/video communication method provided by any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution of the AR technology-based instant audio/video communication method. For details of the technology that are not described in detail in this embodiment, reference may be made to the AR technology-based instant audio/video communication method provided in any embodiment of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.