Communication method and communication systemTechnical Field
The present invention relates to the field of video networking technologies, and in particular, to a communication method, a communication system, a video networking conference system, and a computer-readable storage medium.
Background
The video networking conference adopts a real-time high-definition data exchange technology, generally speaking, participants of the video networking conference do not have terminals with pure audio, but usually access terminals with pure audio, such as VOIP (voice over internet protocol) pure audio terminals, PSTN (public switched telephone network) fixed telephones, mobile phones, mobile command equipment and the like, when the video networking conference is in butt joint with a third party conference.
In the video networking conference, a video networking core server sends audio and video data to all the participating terminals, and the pure audio terminal cannot process the video data, so that extra bandwidth is caused.
Further, if a third-party pure audio terminal serves as a speaking party, an XMCU (video networking conference gateway) server may also continuously receive an I-frame request message from other conference terminals, because video networking terminals in a video networking conference are generally audio and video terminals in a butt joint mode, and as the speaking party, the XMCU (video networking conference gateway) server needs to send audio data and video data to other participants, and the participants can normally analyze and display the audio data and the video data as long as receiving I-frame data of a video.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide a communication method and a corresponding communication system, a video networking conferencing system and a computer readable storage medium that overcome or at least partially solve the above problems.
In order to solve the above problem, an embodiment of the present invention discloses a communication method, which is applied to a video network, where the video network includes a video network core server and a video network conference gateway, and the method includes:
when a video networking conference gateway detects that a conference is accessed to an internet audio-only terminal, determining a communication protocol of the internet audio-only terminal;
when receiving video and audio data sent by a video networking core server, converting the video and audio data into audio data according to the communication protocol, and sending the audio data to the internet pure audio terminal;
and when receiving the audio data sent by the internet pure audio terminal, converting the audio data according to a preset video networking protocol, sending the converted audio data to a video networking core server, and sending a preset video stream to the video networking core server.
Preferably, when the video conference gateway detects that the conference accesses the internet audio only terminal, the step of determining the communication protocol of the internet audio only terminal includes:
receiving a video networking conference communication request sent by a video networking core server, wherein the video networking conference communication request comprises a video networking number;
searching the media type of the terminal corresponding to the video networking number from a preset mapping relation table;
and if the media type is a pure audio type, judging that the terminal is an internet pure audio terminal, and acquiring a communication protocol corresponding to the video networking number from the mapping relation table.
Preferably, the method further comprises:
adopting the communication protocol to perform protocol conversion on the video networking conference communication request to obtain a call request;
and sending the call request to a corresponding internet pure audio terminal, wherein the call request carries a first audio coding format and a first audio decoding format which can be identified by the video networking terminal.
Preferably, the video networking conference gateway comprises a virtual video networking terminal and a virtual audio-only terminal; the method further comprises the following steps:
receiving a call response message returned by the internet pure audio terminal, wherein the call response message carries a second audio coding format and a second audio decoding format which can be identified by the internet pure audio terminal;
judging whether the second audio coding format and the second audio decoding format are the specified coding mode and decoding mode;
if not, judging that the session establishment fails;
if so, establishing a communication session between the virtual pure audio terminal and the internet pure audio terminal, generating a call confirmation message, sending the call confirmation message to the internet pure audio terminal through the virtual pure audio terminal, generating a conference communication response message, and sending the conference communication response message to a video networking core server through the virtual video networking terminal.
Preferably, before the step of receiving a call request sent by an internet of view core server, the call request including an internet of view number, the method further includes:
and establishing a mapping relation table.
Preferably, the step of establishing a mapping relation table includes:
determining a video networking number corresponding to the registered virtual video networking terminal;
respectively distributing a virtual pure audio terminal which carries out point-to-point communication with the registered virtual video network terminal for the registered virtual video network terminal;
and generating a binding relationship between the video networking number and the terminal information of the virtual pure audio terminal through a configuration file.
Preferably, the step of establishing a mapping relation table includes:
receiving a resource acquisition request sent by a video networking core server, wherein the resource acquisition request comprises terminal information of an internet pure audio terminal;
if the terminal corresponding to the terminal information is judged to be a pure audio terminal, inquiring an idle virtual video network terminal;
acquiring a video networking number corresponding to the idle virtual video networking terminal;
generating a binding relationship between the video networking number and the terminal information;
and storing the binding relationship in a mapping relationship table.
Preferably, the method further comprises:
and after the conference is finished, removing the binding relationship, and setting the state of the virtual video network terminal corresponding to the binding relationship as an idle state.
The embodiment of the invention also discloses a communication system applied to the video network, wherein the video network comprises a video network core server and a video network conference gateway, and the system comprises:
the system comprises a communication protocol determining module, a conference switching module and a conference switching module, wherein the communication protocol determining module is used for determining the communication protocol of an internet pure audio terminal when a video networking conference gateway detects that a conference is switched into the internet pure audio terminal;
the video-audio conversion module is used for converting the video-audio data into audio data according to the communication protocol and sending the audio data to the internet pure audio terminal when receiving the video-audio data sent by the video networking core server;
and the audio conversion module is used for converting the audio data according to a preset video networking protocol and then sending the converted audio data to a video networking core server when receiving the audio data sent by the internet pure audio terminal, and sending a preset video stream to the video networking core server.
Preferably, the communication protocol determination module includes:
the video networking conference communication request receiving submodule is used for receiving a video networking conference communication request sent by a video networking core server, and the video networking conference communication request comprises a video networking number;
the media type determination submodule is used for searching the media type of the terminal corresponding to the video networking number from a preset mapping relation table;
and the communication protocol acquisition submodule is used for judging that the terminal is an internet pure audio terminal if the media type is a pure audio type, and acquiring a communication protocol corresponding to the video networking number from the mapping relation table.
Preferably, the system further comprises:
the call request acquisition module is used for carrying out protocol conversion on the video networking conference communication request by adopting the communication protocol to obtain a call request;
and the call request sending module is used for sending the call request to a corresponding internet pure audio terminal, wherein the call request carries a first audio coding format and a first audio decoding format which can be identified by the video networking terminal.
Preferably, the video networking conference gateway comprises a virtual video networking terminal and a virtual audio-only terminal; the system further comprises:
a call response message receiving module, configured to receive a call response message returned by the internet audio only terminal, where the call response message carries a second audio encoding format and a second audio decoding format that can be identified by the internet audio only terminal;
the coding and decoding format judging module is used for judging whether the second audio coding format and the second audio decoding format are the specified coding mode and decoding mode; if not, calling a session failure module, and if so, calling a session success module;
the session failure module is used for judging that the session establishment fails;
and the session success module is used for establishing a communication session between the virtual pure audio terminal and the internet pure audio terminal, generating a call confirmation message, sending the call confirmation message to the internet pure audio terminal through the virtual pure audio terminal, generating a conference communication response message, and sending the conference communication response message to the video networking core server through the virtual video networking terminal.
Preferably, the system further comprises:
and the mapping relation table establishing module is used for establishing a mapping relation table.
Preferably, the mapping relationship table establishing module includes:
the video networking number determining submodule is used for determining the video networking number corresponding to the registered virtual video networking terminal;
the virtual pure audio terminal distribution submodule is used for respectively distributing a virtual pure audio terminal which carries out point-to-point communication with the registered virtual video network terminal to the registered virtual video network terminal;
and the configuration file writing submodule is used for generating the binding relationship between the video networking number and the terminal information of the virtual pure audio terminal through a configuration file.
Preferably, the mapping relationship table establishing module includes:
the resource acquisition request receiving submodule is used for receiving a resource acquisition request sent by a video networking core server, and the resource acquisition request comprises terminal information of an internet pure audio terminal;
the idle virtual video network terminal query submodule is used for querying an idle virtual video network terminal if the terminal corresponding to the terminal information is judged to be a pure audio terminal;
the idle video network number acquisition sub-module is used for acquiring the video network number corresponding to the idle virtual video network terminal;
and the binding relation generation submodule is used for generating the binding relation between the video networking number and the terminal information and storing the binding relation in a mapping relation table.
Preferably, the system further comprises:
and the binding relation removing module is used for removing the binding relation after the conference is finished and setting the state of the virtual video network terminal corresponding to the binding relation as an idle state.
The embodiment of the invention also discloses a video networking conference system, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of the method when executing the program.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer readable storage medium is characterized in that the computer readable storage medium is used for realizing the steps of the method when the computer program is executed by a processor.
The embodiment of the invention has the following advantages:
in the video networking conference system provided by the embodiment of the invention, when an internet pure audio terminal is accessed, aiming at the optimization problems of extra network load, frequent communication logic processing and the like caused by meaningless I frame request messages and video data, the type of the butt joint terminal is judged through the XMCU, so that when the video networking conference is butted with the internet pure audio terminal, unnecessary network bandwidth load and extra communication signaling processing are reduced, the logic optimization of the flow is kept, and the seamless fusion of the video networking conference and the internet conference is ensured.
Drawings
FIG. 1 is a schematic networking diagram of a video network of the present invention;
FIG. 2 is a schematic diagram of a hardware architecture of a node server according to the present invention;
fig. 3 is a schematic diagram of a hardware structure of an access switch of the present invention;
fig. 4 is a schematic diagram of a hardware structure of an ethernet protocol conversion gateway according to the present invention;
FIG. 5 is a schematic structural diagram of a video conferencing system according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating steps of a first embodiment of a communication method according to the present invention;
fig. 7 is a flowchart illustrating steps of a second embodiment of a communication method according to the present invention;
FIG. 8 is an interaction flow diagram of an embodiment of the present invention;
FIG. 9 is a schematic view of a configuration file according to an embodiment of the present invention;
fig. 10 is a block diagram of a communication system according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The video networking is an important milestone for network development, is a real-time network, can realize high-definition video real-time transmission, and pushes a plurality of internet applications to high-definition video, and high-definition faces each other.
The video networking adopts a real-time high-definition video exchange technology, can integrate required services such as dozens of services of video, voice, pictures, characters, communication, data and the like on a system platform on a network platform, such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delayed television, network teaching, live broadcast, VOD on demand, television mail, Personal Video Recorder (PVR), intranet (self-office) channels, intelligent video broadcast control, information distribution and the like, and realizes high-definition quality video broadcast through a television or a computer.
To better understand the embodiments of the present invention, the following description refers to the internet of view:
some of the technologies applied in the video networking are as follows:
network Technology (Network Technology)
Network technology innovation in video networking has improved over traditional Ethernet (Ethernet) to face the potentially enormous video traffic on the network. Unlike pure network Packet Switching (Packet Switching) or network circuit Switching (circuit Switching), the Packet Switching is adopted by the technology of the video networking to meet the Streaming requirement. The video networking technology has the advantages of flexibility, simplicity and low price of packet switching, and simultaneously has the quality and safety guarantee of circuit switching, thereby realizing the seamless connection of the whole network switching type virtual circuit and the data format.
Switching Technology (Switching Technology)
The video network adopts two advantages of asynchronism and packet switching of the Ethernet, eliminates the defects of the Ethernet on the premise of full compatibility, has end-to-end seamless connection of the whole network, is directly communicated with a user terminal, and directly bears an IP data packet. The user data does not require any format conversion across the entire network. The video networking is a higher-level form of the Ethernet, is a real-time exchange platform, can realize the real-time transmission of the whole-network large-scale high-definition video which cannot be realized by the existing Internet, and pushes a plurality of network video applications to high-definition and unification.
Server Technology (Server Technology)
The server technology on the video networking and unified video platform is different from the traditional server, the streaming media transmission of the video networking and unified video platform is established on the basis of connection orientation, the data processing capacity of the video networking and unified video platform is independent of flow and communication time, and a single network layer can contain signaling and data transmission. For voice and video services, the complexity of video networking and unified video platform streaming media processing is much simpler than that of data processing, and the efficiency is greatly improved by more than one hundred times compared with that of a traditional server.
Storage Technology (Storage Technology)
The super-high speed storage technology of the unified video platform adopts the most advanced real-time operating system in order to adapt to the media content with super-large capacity and super-large flow, the program information in the server instruction is mapped to the specific hard disk space, the media content is not passed through the server any more, and is directly sent to the user terminal instantly, and the general waiting time of the user is less than 0.2 second. The optimized sector distribution greatly reduces the mechanical motion of the magnetic head track seeking of the hard disk, the resource consumption only accounts for 20% of that of the IP internet of the same grade, but concurrent flow which is 3 times larger than that of the traditional hard disk array is generated, and the comprehensive efficiency is improved by more than 10 times.
Network Security Technology (Network Security Technology)
The structural design of the video network completely eliminates the network security problem troubling the internet structurally by the modes of independent service permission control each time, complete isolation of equipment and user data and the like, generally does not need antivirus programs and firewalls, avoids the attack of hackers and viruses, and provides a structural carefree security network for users.
Service Innovation Technology (Service Innovation Technology)
The unified video platform integrates services and transmission, and is not only automatically connected once whether a single user, a private network user or a network aggregate. The user terminal, the set-top box or the PC are directly connected to the unified video platform to obtain various multimedia video services in various forms. The unified video platform adopts a menu type configuration table mode to replace the traditional complex application programming, can realize complex application by using very few codes, and realizes infinite new service innovation.
Networking of the video network is as follows:
the video network is a centralized control network structure, and the network can be a tree network, a star network, a ring network and the like, but on the basis of the centralized control node, the whole network is controlled by the centralized control node in the network.
As shown in fig. 1, the video network is divided into an access network and a metropolitan network.
The devices of the access network part can be mainly classified into 3 types: node server, access switch, terminal (including various set-top boxes, coding boards, memories, etc.). The node server is connected to an access switch, which may be connected to a plurality of terminals and may be connected to an ethernet network.
The node server is a node which plays a centralized control function in the access network and can control the access switch and the terminal. The node server can be directly connected with the access switch or directly connected with the terminal.
Similarly, devices of the metropolitan network portion may also be classified into 3 types: a metropolitan area server, a node switch and a node server. The metro server is connected to a node switch, which may be connected to a plurality of node servers.
The node server is a node server of the access network part, namely the node server belongs to both the access network part and the metropolitan area network part.
The metropolitan area server is a node which plays a centralized control function in the metropolitan area network and can control a node switch and a node server. The metropolitan area server can be directly connected with the node switch or directly connected with the node server.
Therefore, the whole video network is a network structure with layered centralized control, and the network controlled by the node server and the metropolitan area server can be in various structures such as tree, star and ring.
The access network part can form a unified video platform (the part in the dotted circle), and a plurality of unified video platforms can form a video network; each unified video platform may be interconnected via metropolitan area and wide area video networking.
Video networking device classification
1.1 devices in the video network of the embodiment of the present invention can be mainly classified into 3 types: server, exchanger (including Ethernet protocol conversion gateway), terminal (including various set-top boxes, code board, memory, etc.). The video network as a whole can be divided into a metropolitan area network (or national network, global network, etc.) and an access network.
1.2 wherein the devices of the access network part can be mainly classified into 3 types: node server, access exchanger (including Ethernet protocol conversion gateway), terminal (including various set-top boxes, coding board, memory, etc.).
The specific hardware structure of each access network device is as follows:
a node server:
as shown in fig. 2, the system mainly includes a network interface module 201, a switching engine module 202, a CPU module 203, and a disk array module 204;
the network interface module 201, the CPU module 203, and the disk array module 204 all enter the switching engine module 202; the switching engine module 202 performs an operation of looking up the address table 205 on the incoming packet, thereby obtaining the direction information of the packet; and stores the packet in a queue of the corresponding packet buffer 206 based on the packet's steering information; if the queue of the packet buffer 206 is nearly full, it is discarded; the switching engine module 202 polls all packet buffer queues for forwarding if the following conditions are met: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero. The disk array module 204 mainly implements control over the hard disk, including initialization, read-write, and other operations on the hard disk; the CPU module 203 is mainly responsible for protocol processing with an access switch and a terminal (not shown in the figure), configuring an address table 205 (including a downlink protocol packet address table, an uplink protocol packet address table, and a data packet address table), and configuring the disk array module 204.
The access switch:
as shown in fig. 3, the network interface module mainly includes a network interface module (a downlink network interface module 301 and an uplink network interface module 302), a switching engine module 303 and a CPU module 304;
wherein, the packet (uplink data) coming from the downlink network interface module 301 enters the packet detection module 305; the packet detection module 305 detects whether the Destination Address (DA), the Source Address (SA), the packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id) and enters the switching engine module 303, otherwise, discards the stream identifier; the packet (downstream data) coming from the upstream network interface module 302 enters the switching engine module 303; the data packet coming from the CPU module 204 enters the switching engine module 303; the switching engine module 303 performs an operation of looking up the address table 306 on the incoming packet, thereby obtaining the direction information of the packet; if the packet entering the switching engine module 303 is from the downstream network interface to the upstream network interface, the packet is stored in the queue of the corresponding packet buffer 307 in association with the stream-id; if the queue of the packet buffer 307 is nearly full, it is discarded; if the packet entering the switching engine module 303 is not from the downlink network interface to the uplink network interface, the data packet is stored in the queue of the corresponding packet buffer 307 according to the guiding information of the packet; if the queue of the packet buffer 307 is nearly full, it is discarded.
The switching engine module 303 polls all packet buffer queues and may include two cases:
if the queue is from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queued packet counter is greater than zero; 3) obtaining a token generated by a code rate control module;
if the queue is not from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero.
The rate control module 208 is configured by the CPU module 204, and generates tokens for packet buffer queues from all downstream network interfaces to upstream network interfaces at programmable intervals to control the rate of upstream forwarding.
The CPU module 304 is mainly responsible for protocol processing with the node server, configuration of the address table 306, and configuration of the code rate control module 308.
Ethernet protocol conversion gateway:
As shown in fig. 4, the apparatus mainly includes a network interface module (a downlink network interface module 401 and an uplink network interface module 402), a switching engine module 403, a CPU module 404, a packet detection module 405, a rate control module 408, an address table 406, a packet buffer 407, a MAC adding module 409, and a MAC deleting module 410.
Wherein, the data packet coming from the downlink network interface module 401 enters the packet detection module 405; the packet detection module 405 detects whether the ethernet MAC DA, the ethernet MAC SA, the ethernet length or frame type, the video network destination address DA, the video network source address SA, the video network packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id); then, the MAC deletion module 410 subtracts MAC DA, MAC SA, length or frame type (2byte) and enters the corresponding receiving buffer, otherwise, discards it;
the downlink network interface module 401 detects the sending buffer of the port, and if there is a packet, obtains the ethernet MAC DA of the corresponding terminal according to the destination address DA of the packet, adds the ethernet MAC DA of the terminal, the MACSA of the ethernet coordination gateway, and the ethernet length or frame type, and sends the packet.
The other modules in the ethernet protocol gateway function similarly to the access switch.
A terminal:
the system mainly comprises a network interface module, a service processing module and a CPU module; for example, the set-top box mainly comprises a network interface module, a video and audio coding and decoding engine module and a CPU module; the coding board mainly comprises a network interface module, a video and audio coding engine module and a CPU module; the memory mainly comprises a network interface module, a CPU module and a disk array module.
1.3 devices of the metropolitan area network part can be mainly classified into 2 types: node server, node exchanger, metropolitan area server. The node switch mainly comprises a network interface module, a switching engine module and a CPU module; the metropolitan area server mainly comprises a network interface module, a switching engine module and a CPU module.
2. Video networking packet definition
2.1 Access network packet definition
The data packet of the access network mainly comprises the following parts: destination Address (DA), Source Address (SA), reserved bytes, payload (pdu), CRC.
As shown in the following table, the data packet of the access network mainly includes the following parts:
wherein:
the Destination Address (DA) is composed of 8 bytes (byte), the first byte represents the type of the data packet (such as various protocol packets, multicast data packets, unicast data packets, etc.), there are 256 possibilities at most, the second byte to the sixth byte are metropolitan area network addresses, and the seventh byte and the eighth byte are access network addresses;
the Source Address (SA) is also composed of 8 bytes (byte), defined as the same as the Destination Address (DA);
the reserved byte consists of 2 bytes;
the payload part has different lengths according to different types of datagrams, and is 64 bytes if the datagram is various types of protocol packets, and is 32+1024 or 1056 bytes if the datagram is a unicast packet, of course, the length is not limited to the above 2 types;
the CRC consists of 4 bytes and is calculated in accordance with the standard ethernet CRC algorithm.
2.2 metropolitan area network packet definition
The topology of a metropolitan area network is a graph and there may be 2, or even more than 2, connections between two devices, i.e., there may be more than 2 connections between a node switch and a node server, a node switch and a node switch, and a node switch and a node server. However, the metro network address of the metro network device is unique, and in order to accurately describe the connection relationship between the metro network devices, parameters are introduced in the embodiment of the present invention: a label to uniquely describe a metropolitan area network device.
In this specification, the definition of the Label is similar to that of the Label of MPLS (Multi-Protocol Label Switch), and assuming that there are two connections between the device a and the device B, there are 2 labels for the packet from the device a to the device B, and 2 labels for the packet from the device B to the device a. The label is classified into an incoming label and an outgoing label, and assuming that the label (incoming label) of the packet entering the device a is 0x0000, the label (outgoing label) of the packet leaving the device a may become 0x 0001. The network access process of the metro network is a network access process under centralized control, that is, address allocation and label allocation of the metro network are both dominated by the metro server, and the node switch and the node server are both passively executed, which is different from label allocation of MPLS, and label allocation of MPLS is a result of mutual negotiation between the switch and the server.
As shown in the following table, the data packet of the metro network mainly includes the following parts:
| DA | SA | Reserved | label (R) | Payload | CRC |
Namely Destination Address (DA), Source Address (SA), Reserved byte (Reserved), tag, payload (pdu), CRC. The format of the tag may be defined by reference to the following: the tag is 32 bits with the upper 16 bits reserved and only the lower 16 bits used, and its position is between the reserved bytes and payload of the packet.
Based on the characteristics of the video network, the core concept of the embodiment of the invention is provided, and through an interface provided by a specified application program installed on an internet terminal, when a user inputs a video network number, the corresponding video network terminal is dialed according to the video network number to obtain the multimedia data of the video network terminal, and the multimedia data is played in the specified application program.
The embodiment of the present invention can be applied to a video networking conference system, and referring to the schematic structural diagram of the video networking conference system shown in fig. 5, the video networking conference system can include a video networking core server (not shown in the figure) and an XMCU server. As shown in fig. 5, the XMCU server may correspond to h.323 (audio video Protocol)/SIP (session initiation Protocol) terminals of a plurality of third parties, may support participation in a plurality of video networking terminals (STB (Set top box), including a chairman STB terminal and a participant STB terminal, and support access to a plurality of third party terminals in one conference, and also support a plurality of groups of conferences.
In a particular implementation, the XMCU can be considered a special device of the video network that communicates with the video network core server and/or other video network devices via the video network V2V protocol.
The XMCU and the audio-only terminal may be connected by a protocol supported by the audio-only terminal, and as an example, there are two connection methods: h.323 protocol or SIP protocol.
The embodiment of the invention can realize the docking of the conference terminal through the XMCU, and because the video network core server only sends audio and video data, the docking optimization needs to be completed by the XMCU.
Based on the video networking conference system shown in fig. 5, referring to fig. 6, a flowchart of a first step of a communication method embodiment of the present invention is shown, which may specifically include the following steps:
601, when a video networking conference gateway detects that a conference is accessed to an internet audio-only terminal, determining a communication protocol of the internet audio-only terminal;
step 602, when receiving video and audio data sent by a video networking core server, converting the video and audio data into audio data according to the communication protocol, and sending the audio data to the internet pure audio terminal;
step 603, when receiving the audio data sent by the internet pure audio terminal, converting the audio data according to a preset video networking protocol, sending the converted audio data to a video networking core server, and sending a preset video stream to the video networking core server.
In the embodiment of the present invention, the video network may further include a video network conference gateway (XMCU) in communication with the video network core server, and the XMCU is responsible for docking and merging in the video network conference and the internet conference, so as to ensure that the video network conference and the internet conference can be interconnected and intercommunicated.
The XMCU in the embodiment of the invention can judge the type of the third party docking terminal, and can further acquire the communication protocol of the Internet pure audio terminal corresponding to the pure audio type if the type is detected to be the pure audio type.
When the internet pure audio terminal serves as a participant, if the XMCU receives the video and audio data sent by the video networking core server, the video and audio data can be converted into audio data which can be identified by the internet pure audio terminal according to a communication protocol, and the audio data is sent to the internet pure audio terminal, so that the problem that the third party terminal fails to be connected (some pure audio terminals may crash when receiving the video data, especially when the audio and video data are at the same receiving port) due to the fact that the video data are sent to the internet pure audio terminal is avoided, and meanwhile, extra bandwidth load caused by the fact that the video data are sent is avoided.
On the other hand, if the internet pure audio terminal is used as a speaking party, when the XMCU receives audio data sent by the internet pure audio terminal, the audio data can be converted according to a preset video networking protocol and then sent to the video networking core server, and meanwhile, a preset video stream can be sent to the video networking core server.
In the video networking conference system provided by the embodiment of the invention, when an internet pure audio terminal is accessed, aiming at the optimization problems of extra network load, frequent communication logic processing and the like caused by meaningless I frame request messages and video data, the type of the butt joint terminal is judged through the XMCU, so that when the video networking conference is butted with the internet pure audio terminal, unnecessary network bandwidth load and extra communication signaling processing are reduced, the logic optimization of the flow is kept, and the seamless fusion of the video networking conference and the internet conference is ensured.
Referring to fig. 7, a flowchart illustrating steps of a second communication method embodiment according to the embodiment of the present invention is shown, and the embodiment of the present invention is specifically described with reference to the interaction flowchart illustrated in fig. 8 on the basis of fig. 7.
As shown in fig. 7, the embodiment of the present invention may specifically include the following steps:
step 701, establishing a mapping relation table;
as shown in FIG. 8, in the embodiment of the present invention, a virtual video networking terminal and a virtual audio-only terminal can be included in the XMCU. Wherein,
the virtual video network terminal is a video network terminal instantiated in the XMCU, has all functions of the video network terminal, and can be used as the video network terminal to participate in a video network conference. The virtual video network terminals are distributed by the video network core server and comprise video network numbers, Mac address information and the like, and each virtual video network terminal is just like a communication number and can be registered on the video network core server.
The virtual audio-only terminal is an instantiated H.323/SIP terminal, which has the conference function of the H.323/SIP terminal. When the conference is started, the video networking terminal and the virtual video networking terminal enter the conference through the video networking, the virtual pure audio terminal participates in the conference with the internet pure audio terminal through the internet H.323/SIP protocol, and the XMCU performs conversion between the virtual video networking terminal and the virtual pure audio terminal, so that the pure audio terminal is added into the video networking conference.
In the embodiment of the present invention, the mapping relationship table may include mapping relationships between the numbers and Mac addresses of the virtual video network terminals and information such as IP addresses, communication protocols, media types, and the like of the internet audio only terminals.
In a preferred embodiment of the present invention, the mapping relationship may be established in a static manner, and then step 701 may include the following sub-steps:
substep S11, determining the video network number corresponding to the registered virtual video network terminal;
substep S12, respectively allocating a virtual pure audio terminal for peer-to-peer communication with the registered virtual video network terminal;
and a substep S13, generating a binding relationship between the video networking number and the terminal information of the virtual audio-only terminal through a configuration file.
In a specific implementation, the static manner means that the number of a pure audio terminal is bound to the video networking number of each virtual video networking terminal, and the relationship is a one-to-one correspondence relationship. In practice, this approach can be used for video networking videophone services, i.e., point-to-point calls, where the XMCU can establish a binding between the video networking terminal and the audio-only terminal by modifying a profile that indicates that the access terminal is an audio-only terminal.
Specifically, for a video networking number corresponding to each registered virtual video networking terminal, a virtual pure audio terminal performing point-to-point communication with the virtual pure audio terminal may be allocated to the video networking number, terminal information of the virtual pure audio terminal may be acquired, and then, a binding relationship between the video networking number and the terminal information may be established through a configuration file.
In an embodiment, as shown in the configuration file diagram of fig. 9, the configuration file may include information such as an ID (typically, a video networking number) number of the virtual video networking terminal, a Protocol of a communication Protocol to which the audio-only terminal is connected, an IP address of the audio-only terminal, an ID of the audio-only terminal, a media _ attribute of the terminal, a video stream bitrate, a video resolution preferred _ video _ resolution, and a Mac address vmmac of the virtual video networking terminal, and a binding relationship between these information may be generated through the configuration file.
As shown in fig. 9, for the h.323 protocol, the corresponding IP address needs to be written in the configuration file, and the sipid may be set to 0. For the SIP protocol, the corresponding sipid needs to be written in the configuration file, and the IP address may be set to 0.
In another preferred embodiment of the present invention, the mapping relationship may be established in a dynamic manner, and then step 701 may include the following sub-steps:
a substep S21, receiving a resource obtaining request sent by a video network core server, wherein the resource obtaining request comprises terminal information of an internet pure audio terminal;
step S22, if the terminal corresponding to the terminal information is judged to be a pure audio terminal, then inquiring an idle virtual video network terminal;
substep S23, acquiring the video network number corresponding to the idle virtual video network terminal;
substep S24, generating a binding relationship between the video networking number and the terminal information;
and a substep S25, storing the binding relationship in a mapping relationship table.
In a specific implementation, the dynamic mode may be used for dynamic binding in a Pamir conference service, and after the conference is finished, resources of the virtual video networking terminal are released.
Specifically, when meeting is conducted by video networking meeting control software, the type of the pure audio terminal and a docking protocol need to be added, and then the video networking core server sends a resource acquisition request for applying for virtual terminal resources to the XMCU through the meeting control software, wherein the resource acquisition request can include terminal information of the internet pure audio terminal.
After receiving the resource acquisition request, the XMCU may search the received terminal information in a terminal information database of a pre-registered pure audio terminal, and if the search is successful, may determine that the terminal corresponding to the terminal information is a pure audio terminal. At this time, the XMCU may query the state information of all the virtual video network terminals, determine an idle virtual video network terminal whose state information is in an idle state, acquire a video network number of the idle virtual video network terminal, generate a binding relationship between the video network number and the received terminal information, and store the binding relationship in the mapping relationship database, thereby allocating an available video network virtual terminal to the audio-only terminal, and after the conference is ended, may remove the binding relationship, and set the state of the virtual video network terminal corresponding to the binding relationship to an idle state, so as to release the resource of the virtual video network terminal.
Step 702, receiving a video networking conference communication request sent by a video networking core server, wherein the video networking conference communication request comprises a video networking number;
in a specific implementation, as shown in fig. 8, when the video networking terminal needs to access the internet audio only terminal for conference, the video networking terminal may send a video networking conference communication request to the video networking core server, and the video networking core server sends the video networking conference communication request to the XMCU. Wherein, the video network conference communication request can comprise the video network number of the virtual video network terminal.
Step 703, searching the media type of the terminal corresponding to the video network number from a preset mapping relation table;
after receiving the video networking conference communication request, the XMCU extracts the video networking number from the video networking conference communication request, searches the video networking number in a mapping relation table, and if the video networking number is searched, obtains the media type media _ attribute of the terminal accessed to the video networking terminal from the corresponding binding relation.
Step 704, if the media type is a pure audio type, determining that the terminal is an internet pure audio terminal, and acquiring a communication protocol corresponding to the video networking number from the mapping relation table;
in one embodiment, if the media type is audio type, it represents a pure audio terminal of the terminal accessed with the video network terminal; and if the media type is the audio + video type, the audio and video terminal of the terminal accessed to the video network terminal is represented.
If the accessed terminal is a pure audio terminal, the communication protocol corresponding to the pure audio terminal can be further obtained from the corresponding binding relationship.
As an audio-only terminal, the communication protocol may include the h.323 protocol and/or the SIP protocol.
Step 705, performing protocol conversion on the video networking conference communication request by using the communication protocol to obtain a call request; sending the call request to a corresponding internet pure audio terminal;
in a specific implementation, the XMCU is a protocol conversion gateway that can convert the V2V protocol for video networking to the H.323/SIP protocol supported by audio-only terminals, so that after the communication protocol of the audio-only terminals is obtained by the XMCU, the video networking conference communication request can be converted to an Internet H.323/SIP call request using the communication protocol.
The H.323/SIP protocol can carry the audio and video coding and decoding formats supported by the terminal, so that two parties of the conference can negotiate and reach a agreement, and therefore, the Internet H.323/SIP call request carries the first audio coding format and the first audio decoding format which can be identified by the video networking terminal, so that the pure audio terminal can acquire the coding and decoding formats of the video networking terminal of the opposite communication terminal.
After the XMCU converts the video networking conference communication request into a call request by adopting a communication protocol, the call request can be sent to the audio-only terminal so as to call the audio-only terminal of the Internet for meeting.
Step 706, receiving a call response message returned by the internet pure audio terminal;
after receiving the call request, the audio-only terminal may return an internet h.323/SIP call response message to the XMCU, where the call response message may carry a second audio encoding format and a second audio decoding format that are recognizable by the audio-only terminal.
Step 707, determining whether the second audio encoding format and the second audio decoding format are the specified encoding mode and decoding mode; if not, go to step 708; if yes, go to step 709;
step 708, determining that the session establishment fails;
step 709, establishing a communication session between the virtual audio-only terminal and the internet audio-only terminal, generating a call confirmation message, sending the call confirmation message to the internet audio-only terminal through the virtual audio-only terminal, generating a conference communication response message, and sending the conference communication response message to a video networking core server through the virtual video networking terminal; continuing to execute step 710;
after receiving the Internet H.323/SIP call response message, the XMCU can further judge whether the audio coding and decoding formats of the video network terminal and the Internet terminal are intercommunicated or not and whether the audio coding and decoding formats of the video network terminal and the Internet terminal are intercommunicated or not after transcoding, if the audio coding and decoding formats are intercommunicated, the session is successfully established, and if the audio coding and decoding formats are not intercommunicated, the session is failed to be established.
In a specific implementation, the XMCU can determine whether the audio codec formats of the video network terminal and the internet terminal are intercommunicated and whether the audio codec formats can be intercommunicated after transcoding by determining whether the second audio encoding format and the second audio decoding format are the designated encoding mode and decoding mode.
Specifically, the Audio Coding of the video networking terminal and the virtual video networking terminal only supports the AAC (Advanced Audio Coding) format, and the video Coding only supports the h.264 format. Most of internet terminals support g.711 (an audio coding method established by the international telecommunication union (ITU-T)) format, and video coding supports h.264 format, so when all virtual audio-only terminals call an internet audio-only terminal, only g.711 is provided for audio negotiated in h.323/SIP messages, if it is determined from a call response message returned by the audio-only terminal that the audio-only terminal does not support g.711, session establishment fails, the video network terminal cannot pull the audio-only terminal into a meeting, if g.711 is supported, session establishment succeeds, and XMCU needs to perform transcoding between AAC and g.711 audio, thereby ensuring audio interworking between both parties.
When the session between the virtual pure audio terminal and the internet pure audio terminal is successfully established, the XMCU can generate an internet H.323/SIP call confirmation message, and the internet H.323/SIP call confirmation message is sent to the internet pure audio terminal through the virtual pure audio terminal to inform the internet pure audio terminal that the session is successfully established and the internet pure audio terminal has already entered a meeting.
And meanwhile, the XMCU can also generate a conference communication response message, the conference communication response message is sent to the video networking core server through the virtual video networking terminal, and the video networking core server sends the conference communication response message to the video networking terminal so as to inform the video networking terminal that the session is successfully established and the Internet audio-only terminal has already entered a meeting.
Step 710, when receiving video and audio data sent by a video networking core server, converting the video and audio data into audio data according to the communication protocol, and sending the audio data to the internet pure audio terminal;
and 711, when receiving the audio data sent by the internet pure audio terminal, converting the audio data according to a preset video networking protocol, sending the converted audio data to a video networking core server, and sending a preset video stream to the video networking core server.
After the session is successfully established, the audio and video data protocol conversion process of the conference process can be started. As shown in fig. 8, video networking audio and video data packets are transmitted between the video networking terminal and the video networking core server, video networking audio and video data packets are also transmitted between the video networking core server and the virtual video networking terminal of the XMCU, and pure audio data packets are transmitted between the virtual pure audio terminal of the XMCU and the internet pure audio terminal.
Specifically, when the XMCU receives the video and audio data sent by the video networking core server, the video and audio data may be converted into audio data according to a communication protocol that can be recognized by the audio-only terminal, and the audio data may be sent to the internet audio-only terminal.
When the XMCU receives the audio data sent by the internet audio-only terminal, the audio data can be converted into a video networking data packet according to a preset video networking protocol (such as a V2V protocol) and sent to the video networking core server, and the video networking data packet is sent to the video networking terminal by the video networking core server.
Meanwhile, in order to avoid the I frame request messages sent by other conference terminals, the virtual video networking terminal on the XMCU can send a preset video stream to the video networking core server, and the video networking core server sends the video stream to the video networking terminal so as to inform the participant terminals of pure audio terminals in butt joint and prevent the participant terminals from continuously sending the I frame request messages. In addition, the video stream may contain I-frame information, and the participant terminal may decode and display normally based on the I-frame information after receiving the video stream, so that the participant terminal does not send an I-frame request message, and prevents the XMCU from continuously receiving the I-frame request message and performing logic processing.
In one embodiment, the video stream may be generated as follows: the specified image data is packetized into a video stream of h.264 format.
In particular implementations, the video stream may include video networking V2V 2002 instructions.
As an example, the specified image data may be a picture of a video networking LOGO.
In the embodiment of the invention, the session between the video networking terminal and the pure audio terminal can be established through the XMCU pair, so that when the video networking conference is in butt joint with the internet pure audio terminal, unnecessary network bandwidth load and extra communication signaling processing are reduced, unnecessary I frame requests are reduced, the optimization of the flow is kept, and the seamless butt joint with a third party terminal is ensured.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 10, a block diagram of a communication system according to an embodiment of the present invention is shown, where the communication system may be applied to a video network, where the video network may include a video network core server and a video network conference gateway, and the system may specifically include the following modules:
a communication protocol determining module 1001, configured to determine a communication protocol of an internet-only audio terminal when a video networking conference gateway detects that a conference accesses the internet-only audio terminal;
the video-audio conversion module 1002 is configured to, when receiving video-audio data sent by a video networking core server, convert the video-audio data into audio data according to the communication protocol, and send the audio data to the internet pure audio terminal;
and the audio conversion module 1003 is configured to, when receiving audio data sent by the internet pure audio terminal, convert the audio data according to a preset video networking protocol and send the converted audio data to a video networking core server, and send a preset video stream to the video networking core server.
In a preferred embodiment of the present invention, the communication protocol determining module 1001 may include the following sub-modules:
the video networking conference communication request receiving submodule is used for receiving a video networking conference communication request sent by a video networking core server, and the video networking conference communication request comprises a video networking number;
the media type determination submodule is used for searching the media type of the terminal corresponding to the video networking number from a preset mapping relation table;
and the communication protocol acquisition submodule is used for judging that the terminal is an internet pure audio terminal if the media type is a pure audio type, and acquiring a communication protocol corresponding to the video networking number from the mapping relation table.
In a preferred embodiment of the present invention, the system may further include the following modules:
the call request acquisition module is used for carrying out protocol conversion on the video networking conference communication request by adopting the communication protocol to obtain a call request;
and the call request sending module is used for sending the call request to a corresponding internet pure audio terminal, wherein the call request carries a first audio coding format and a first audio decoding format which can be identified by the video networking terminal.
In a preferred embodiment of the present invention, the video networking conference gateway includes a virtual video networking terminal and a virtual audio-only terminal; the system may further include the following modules:
a call response message receiving module, configured to receive a call response message returned by the internet audio only terminal, where the call response message carries a second audio encoding format and a second audio decoding format that can be identified by the internet audio only terminal;
the coding and decoding format judging module is used for judging whether the second audio coding format and the second audio decoding format are the specified coding mode and decoding mode; if not, calling a session failure module, and if so, calling a session success module;
the session failure module is used for judging that the session establishment fails;
and the session success module is used for establishing a communication session between the virtual pure audio terminal and the internet pure audio terminal, generating a call confirmation message, sending the call confirmation message to the internet pure audio terminal through the virtual pure audio terminal, generating a conference communication response message, and sending the conference communication response message to the video networking core server through the virtual video networking terminal.
In a preferred embodiment of the present invention, the system may further include the following modules:
and the mapping relation table establishing module is used for establishing a mapping relation table.
In a preferred embodiment of the present invention, the mapping relationship table establishing module may include the following sub-modules:
the video networking number determining submodule is used for determining the video networking number corresponding to the registered virtual video networking terminal;
the virtual pure audio terminal distribution submodule is used for respectively distributing a virtual pure audio terminal which carries out point-to-point communication with the registered virtual video network terminal to the registered virtual video network terminal;
and the configuration file writing submodule is used for generating the binding relationship between the video networking number and the terminal information of the virtual pure audio terminal through a configuration file.
In another preferred embodiment of the present invention, the mapping relationship table establishing module may include the following sub-modules:
the resource acquisition request receiving submodule is used for receiving a resource acquisition request sent by a video networking core server, and the resource acquisition request comprises terminal information of an internet pure audio terminal;
the idle virtual video network terminal query submodule is used for querying an idle virtual video network terminal if the terminal corresponding to the terminal information is judged to be a pure audio terminal;
the idle video network number acquisition sub-module is used for acquiring the video network number corresponding to the idle virtual video network terminal;
and the binding relation generation submodule is used for generating the binding relation between the video networking number and the terminal information and storing the binding relation in a mapping relation table.
In a preferred embodiment of the present invention, the system may further include the following modules:
and the binding relation removing module is used for removing the binding relation after the conference is finished and setting the state of the virtual video network terminal corresponding to the binding relation as an idle state.
For the system embodiment of fig. 10, since it is basically similar to the method embodiment described above, the description is simple, and for relevant points, reference may be made to partial description of the method embodiment.
In addition, the embodiment of the invention also discloses a video networking conference system, which comprises a memory, a processor, a computer program which is stored on the memory and can run on the processor, and the computer program which runs on the processor, wherein the processor realizes the steps of the method of the embodiment when executing the program.
In addition, the embodiment of the invention also discloses a computer readable storage medium, on which a computer program is stored, and the program is executed by a processor to realize the steps of the method of the embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above detailed description of the communication method and the communication system provided by the present invention, and the specific examples applied herein are provided to explain the principles and embodiments of the present invention, and the descriptions of the above examples are only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.