CN115883855A

Movatterモバイル変換

Info

Publication number: CN115883855A
Application number: CN202111123072.1A
Authority: CN
Inventors: 吴昊; 张亮; 肖志宏; 李玉龙
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2023-03-31
Anticipated expiration: 2041-09-24
Also published as: CN115883855B

Abstract

The application relates to a playing data processing method, a playing data processing device, a computer device and a storage medium. The method comprises the following steps: receiving a first data stream corresponding to a target playing object from a first transmission channel, and caching the first data stream through a first data cache region; receiving a second data stream from a second transmission channel, and caching the second data stream through a second data cache region; storing the data stream in the first data cache region into a convergence cache region; when the first data stream is determined to have data loss, switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to a second data cache region, and determining an aggregation initial position corresponding to the second data stream according to the data loss position; storing the data stream in the second data cache region into the convergence cache region from the convergence starting position; and obtaining playing data corresponding to the target playing object based on the converged data stream in the converged cache region. By adopting the method, the fluency of playing the video can be improved.

Description

Playing data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing playing data, a computer device, and a storage medium.

Background

With the development of internet technology, streaming media technology appears, and streaming media refers to a media format, such as audio or video, which is continuously played in real time on a network by using a streaming technology. Streaming media technology can be applied in many aspects, for example, it can be applied to live broadcasting. In the streaming media technology, a problem of data loss often occurs in a data transmission process, which causes an abnormality in data playing, for example, a pause occurs in playing a live video.

In the conventional technology, the played data is backed or jumped forward, which results in low fluency of data playing.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a playing data processing method, device, computer device and storage medium capable of improving fluency of data playing.

A method of processing playback data, the method comprising: receiving a first data stream corresponding to a target playing object from a first transmission channel, and caching the first data stream through a first data cache region; receiving a second data stream corresponding to the target playing object from a second transmission channel, and caching the second data stream through a second data cache region; taking the first data cache region as a data stream cache source corresponding to a convergence cache region, and storing the data stream in the first data cache region into the convergence cache region; when it is determined that the first data stream has data loss, switching a data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region, determining a data loss position corresponding to the first data stream, and determining an aggregation starting position corresponding to the second data stream according to the data loss position; starting from the aggregation starting position corresponding to the second data stream, storing the data stream in the second data cache region into the aggregation cache region; and obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged cache region.

A playback data processing apparatus, the apparatus comprising: the first data stream receiving module is used for receiving a first data stream corresponding to a target playing object from a first transmission channel and caching the first data stream through a first data cache region; a second data stream receiving module, configured to receive a second data stream corresponding to the target playing object from a second transmission channel, and cache the second data stream through a second data cache region; the first data flow aggregation module is used for taking the first data cache region as a data flow cache source corresponding to the aggregation cache region and storing the data flow in the first data cache region into the aggregation cache region; a convergence initial position determining module, configured to switch a data stream cache source corresponding to the convergence cache region from the first data cache region to the second data cache region when it is determined that the first data stream has data missing, determine a data missing position corresponding to the first data stream, and determine a convergence initial position corresponding to the second data stream according to the data missing position; the second data stream aggregation module is used for storing the data stream in the second data cache region into the aggregation cache region from an aggregation starting position corresponding to the second data stream; and the playing data obtaining module is used for obtaining playing data corresponding to the target playing object based on the converged data stream in the converged cache region.

In some embodiments, the first data stream and the second data stream are encoded data streams; the convergence start position determining module includes: a target data position determining unit, configured to obtain a target data position corresponding to the data missing position in the second data stream; a frame coding type obtaining unit, configured to obtain a frame coding type of a target data frame corresponding to the target data position in the second data stream; and the convergence initial position determining unit is used for determining a position determination strategy according to the frame coding type and determining a convergence initial position corresponding to the second data stream according to the target data position and the position determination strategy.

In some embodiments, the convergence start position determining unit is further configured to: when the frame coding type is a non-coding reference frame, determining the position determination strategy to be a coding data group skipping strategy; and skipping a target coding data group corresponding to the target data position based on the coding data group skipping strategy, and taking the position of a coding reference frame in a backward coding data group corresponding to the target coding data group as a convergence starting position corresponding to the second data stream.

In some embodiments, the convergence start position determining unit is further configured to determine that the position determination policy is a position keeping policy when the frame coding type is a coding reference frame; and taking the target data position as a corresponding convergence starting position of the second data stream based on the position keeping strategy.

In some embodiments, the playing data obtaining module includes: the target data length acquisition unit is used for acquiring the target data length corresponding to the transcoding data set; an updated converged data stream obtaining unit, configured to insert a reference indication frame into the converged data stream in the converged cache region according to the target data length to obtain an updated converged data stream; a transcoding data stream obtaining unit, configured to determine a transcoding reference frame based on the reference indication frame in a transcoding process, and transcode, based on the transcoding reference frame, a transcoding data group in the updated aggregated data stream, where the transcoding reference frame is located, to obtain a transcoding data stream; and the playing data obtaining unit is used for obtaining the playing data corresponding to the target playing object according to the transcoding data stream.

In some embodiments, the transcoding data stream obtaining unit is further configured to decode based on the updated aggregate data stream to obtain a decoded data stream; in the process of coding the decoded data stream, when a reference indication frame is detected, a backward adjacent data frame of the reference indication frame is used as a transcoding reference frame, and intra-frame coding is carried out on the transcoding reference frame to obtain an intra-frame coding frame; and transcoding the transcoding data group where the transcoding reference frame is located based on the intra-frame coding frame to obtain the transcoding data in the transcoding data stream, wherein the transcoding data group where the transcoding reference frame is located comprises the data frame with the target data length.

In some embodiments, the first data stream and the second data stream are encoded data streams, and the second data stream aggregation module includes: an information switching indication frame generating unit, configured to acquire a second coding parameter corresponding to the second data stream, and generate an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, and the backward data stream of the information switching indication frame is decoded based on the second coding parameter; and the data stream aggregation unit is used for inserting the information switching indication frame into the tail end position of the first data stream in the aggregation buffer area, storing the data stream in the second data buffer area into the aggregation buffer area from the aggregation initial position corresponding to the second data stream, and using the data stream as the backward data stream of the information switching indication frame.

In some embodiments, the playing data obtaining module includes: a first decoding unit, configured to decode based on a first encoding parameter corresponding to the first data stream in a process of decoding the aggregated data stream; the second decoding unit is used for extracting a second coding parameter from the information switching indication frame when the information switching indication frame is detected, switching the coding parameter referred to by decoding from the first coding parameter to the second coding parameter, and decoding based on the second coding parameter; and the coding unit is used for uniformly coding the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

In some embodiments, the apparatus further comprises: the live broadcast view angle set acquisition module is used for acquiring a live broadcast view angle set corresponding to a target live broadcast scene and establishing a cache group corresponding to each live broadcast view angle in the live broadcast view angle set; the cache group comprises a data cache region and a convergence cache region corresponding to each live broadcast device corresponding to the live broadcast visual angle; the live broadcast view angle corresponds to a plurality of live broadcast devices; the data cache region corresponding to any live broadcast device in the cache group is a first data cache region, the convergence initial position determining module is further configured to select a data cache region outside the first data cache region as a second data cache region from the cache group corresponding to the first data cache region when it is determined that the first data stream has data loss, and switch the data stream cache source corresponding to the convergence cache region from the first data cache region to the second data cache region.

In some embodiments, the target playing object is a target live video corresponding to a target live scene, and the first data stream receiving module is further configured to establish a first transmission channel with a first shooting device corresponding to the target live scene, and receive a first video stream transmitted by the first shooting device through the first transmission channel and a target scene identifier corresponding to the target live scene; taking the first video stream as a main video stream corresponding to the target live video based on the target scene identification; the second data stream receiving module is further configured to establish a second transmission channel with a second shooting device corresponding to the target live broadcast scene, and receive a second video stream transmitted by the second shooting device through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the above-described play data processing method when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned play data processing method.

In some embodiments, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

According to the playing data processing method, the playing data processing device, the computer equipment and the storage medium, the first data stream of the target playing object is cached through the first data cache region, the second data stream of the target playing object is cached through the second data cache region, the data stream in the first data cache region is stored into the aggregation cache region, when the first data stream is determined to have data loss, the data stream cache source corresponding to the aggregation cache region is switched from the first data cache region to the second data cache region, the aggregation initial position corresponding to the second data stream is determined according to the data loss position corresponding to the first data stream, and the data stream in the second data cache region is stored into the aggregation cache region from the aggregation initial position corresponding to the second data stream. The data streams corresponding to the same target playing object are respectively transmitted through the first transmission channel and the second transmission channel, data caching is carried out through the cache spaces respectively corresponding to the channels, the aggregation cache region takes the first data cache region as a data stream cache source to obtain the data streams, the data streams of the playing object can be obtained, when the data to be cached by the data stream cache source is abnormal, the data streams can be instantly and accurately switched to the second data cache region, the aggregation initial position corresponding to the second data stream is determined based on the data missing position, caching is carried out from the aggregation initial position, the missing condition of the data streams in the aggregation cache region is reduced, the playing data corresponding to the target playing object is obtained based on the aggregation data streams in the aggregation cache region, and the integrity of the playing data and the smoothness of data playing are improved.

Drawings

FIG. 1 is a diagram of an application environment of a method for processing broadcast data in some embodiments;

FIG. 2 is a flow diagram illustrating a method for processing broadcast data in some embodiments;

FIG. 3 is a graph of the relationship between the time stamps of data frames and the order in which the data frames are arranged in some embodiments;

FIG. 4 is a schematic diagram of obtaining an aggregated data stream in some embodiments;

FIG. 5 is a block diagram of a live streaming framework in some embodiments;

fig. 6 is a schematic diagram of main stream standby stream switching in the live streaming media technology in some embodiments;

fig. 7 is a schematic diagram of main stream standby stream switching in the live streaming media technology in some embodiments;

fig. 8 is a schematic diagram of main stream standby stream switching in the live streaming media technology in some embodiments;

fig. 9 is a schematic diagram of main stream standby stream switching in the live streaming media technology in some embodiments;

FIG. 10 is a schematic diagram of a primary standby flow switch in some embodiments;

FIG. 11A is a schematic diagram of the frame format of an SEI frame in some embodiments;

FIG. 11B is a schematic diagram of obtaining a transcoded data stream in some embodiments;

FIG. 12 is a block diagram of a playback data processing apparatus in some embodiments;

FIG. 13 is a diagram of the internal structure of a computer device in some embodiments;

FIG. 14 is a block diagram that illustrates the internal components of a computing device in some embodiments.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The play data processing method provided by the present application can be applied to the application environment shown in fig. 1, where the application environment includes thefirst terminal 102, theserver 104, and thesecond terminal 106. Thefirst terminal 102, theserver 104, and thesecond terminal 106 communicate with each other via a network. Various applications may be installed on thefirst terminal 102 and thesecond terminal 106, for example, an application for live broadcast may be installed, thefirst terminal 102 may send the collected live broadcast data to theserver 104, thefirst terminal 102 may transmit the collected live broadcast data to theserver 104 through different transmission channels, and for example, the collected live broadcast data may be transmitted to theserver 104 through 2 or more transmission channels. Theserver 104 may also receive live broadcast data transmitted by another terminal, where the other terminal and thefirst terminal 102 may be a shooting device in the same live broadcast scene, for example, may be a multimedia capturing device in a live broadcast scene of the same concert, and the multimedia capturing device may have at least one of a video capturing function and an audio capturing function. Thesecond terminal 106 may be a terminal that watches live. Theserver 104 may receive live data transmitted by the samefirst terminal 102 through different transmission channels, process the live data from the different transmission channels to obtain processed live data, and send the processed live data to thesecond terminal 106 that watches live.

Specifically, theserver 104 may receive, from the first transmission channel, a first data stream corresponding to the target playback object sent by thefirst terminal 102, and receive, through the second transmission channel, a second data stream corresponding to the target playback object sent by thefirst terminal 102. Theserver 104 may cache the first data stream through the first data cache region, cache the second data stream through the second data cache region, and perform aggregation of the data streams into the aggregation cache region based on the first data stream and the second data stream. For example, theserver 104 may store the data stream in the first data cache region into the aggregation cache region for the data stream cache source corresponding to the first data cache region as the aggregation cache region, switch the data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region when it is determined that the first data stream has data missing, determine a data missing position corresponding to the first data stream, determine an aggregation starting position corresponding to the second data stream according to the data missing position, and store the data stream in the second data cache region into the aggregation cache region from the aggregation starting position corresponding to the second data stream. Theserver 104 may transcode the aggregated data stream in the aggregation buffer to obtain a transcoded data stream, and theserver 104 may send the transcoded data stream to thesecond terminal 106. There may be a plurality ofsecond terminals 106, and theserver 104 may distribute the processed data streams to the respectivesecond terminals 106. Theserver 104 may be, for example, a server on which the streaming media is hosted.

In some embodiments, thefirst terminal 102 may be a terminal corresponding to a live user, where a terminal corresponding to live user may also be referred to as a main broadcast terminal or an audio/video data source, thesecond terminal 106 may be a terminal corresponding to a user watching live broadcast, and a terminal corresponding to a user watching live broadcast may also be referred to as a viewer terminal. In the live broadcast process of the user, thefirst terminal 102 may collect an audio data or video data source, encode the collected audio data or video data, and send the encoded audio data or video data to theserver 104, and theserver 104 may transcode the received encoded audio data or video data and send the transcoded audio data or video data to thesecond terminal 106.

Theterminal 102 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, and the like, but is not limited thereto. Theserver 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In some embodiments, as shown in fig. 2, a playing data processing method is provided, which is described by taking the method as an example applied to theserver 104 in fig. 1, and includes the following steps:

s202, receiving a first data stream corresponding to the target playing object from the first transmission channel, and caching the first data stream through the first data cache region.

The playing object refers to an object to be played, and may be video data or audio data to be played, for example, video data or audio data in live broadcasting. When the playing object is video data, the playing object may include a plurality of video frames, the video frames may also be referred to as image frames, and when the playing object is audio data, the playing object may include a plurality of audio frames, the audio frames and the image frames may be collectively referred to as data frames, that is, the data frames may include at least one of audio frames or image frames. The data frame in the playing object may also be an encoded data frame, the encoded data frame refers to data obtained through encoding, and the encoded data frame in the playing object may be an encoded video frame obtained through encoding a collected video frame or an encoded audio frame obtained through encoding a collected audio frame. Each coded data frame in the playing data can correspond to playing time, the playing time is used for representing the sequence of the playing data, and the earlier the playing time is, the earlier the playing time is. When the data frame is a video, the playing time may be a display time. The presentation time may be, for example, a Presentation Time Stamp (PTS), and the encoded data frames in the playing object may be arranged in chronological order, with the position of the encoded data frames in the playing object being earlier than the presentation time. Each encoded data frame in the playing data may also have a decoding time, which may be, for example, a Decoding Time Stamp (DTS). In the playing data, the display time corresponding to different data frames is different, and the decoding time corresponding to different data frames is different. The decoding time stamps are used for indicating the decoding sequence of the data frames, the display time stamps are used for indicating the display sequence of the data frames, and the display sequence is earlier the decoding time stamps are. As shown in fig. 3, a GOP is shown which includes 15 video frames, and it can be seen that the display order coincides with the order of PTS and the decoding order coincides with the order of DTS. The target play object may be an arbitrary play object. The data frames may also correspond to sequence numbers, which may be, for example, sequence numbers in RTP.

The data stream refers to the form of the target playing object in the transmission process, and the data stream includes the data in the target playing object. The first data stream refers to data received through the first transmission channel.

The data buffer is a storage space for buffering data, and may be, for example, a block of storage area in the memory. The first data buffer area is used for buffering the data stream transmitted from the first transmission channel. The data stored in the data buffer may be updated continuously, for example, during a first time period, the data stream received during the first time period is stored in the data buffer, and during a second time period, the data stream received during the second time period is stored in the data buffer. The data stream received in the target time duration may be stored in the data buffer, and the target time duration may be preset or set as needed, and may be, for example, the data stream received in 1 minute. The first data buffer may be created when the server determines that the first transmission channel is established, or the first data buffer may be created when the first transmission channel is determined to have data transmission. The server may establish a correspondence between the first transmission channel and the first data cache. The data frames in the first data buffer may be arranged according to a playing time, and the earlier the playing time, the earlier the position in the first data buffer.

Specifically, the first transmission channel may be a channel for transmitting data, which is established between the first terminal and the server, the target playback object is a playback object obtained by the first terminal, the first terminal may transmit the target playback object to the server through the first transmission channel, and the server may receive a first data stream corresponding to the target playback object transmitted by the first transmission channel, and store the first data stream in the first data cache region. For example, a data stream receiving module corresponding to the first transmission channel may be provided in the server, and the data stream receiving module corresponding to the first transmission channel is referred to as a first data stream receiving module. When the server acquires a first data stream corresponding to the target playing object, the server may receive the data stream transmitted by the first transmission channel through the first data stream receiving module, and store the received data stream in the first data cache region.

In some embodiments, the first terminal is a first shooting device in a target live broadcast scene, and the first terminal may acquire the target live broadcast scene to obtain first scene data, and encode the first scene data to obtain a target playing object. The photographing apparatus may have a function of acquiring audio data or video data. For example, for video data, the first terminal may encode the video data into data in an h.264 format or an h.265 format. H.264 and h.265 are Video Coding (or decoding) standards, and are highly compressed digital Video codec standards proposed by Joint Video Team (Joint Video Team, JVT) consisting of ITU-T (ITU-T for ITU Telecommunication Standardization Sector, international Telecommunication union, telecommunication standards branch) Video Coding Experts Group (VCEG) and ISO (International Standard Organization) and IEC (International Electro technical Commission) dynamic picture Experts Group (MPEG) Joint. For the Audio data, the first terminal may encode the Audio data into an ACC (Advanced Audio Coding) format or an MP3 (Moving Picture Experts Group Audio Layer III, MP3, motion Picture Experts compression standard Audio Layer 3) format. AAC is an audio coding technique based on MPEG-2 (moving picture Experts Group, generation 2) and is the mainstream audio coding (or decoding) format of current live streaming media.

In some embodiments, the target playing object may be data obtained by encoding and encapsulating the data acquired by the first terminal. Specifically, the first terminal may encode the acquired data, and encapsulate the encoded data to obtain the target playing object. For example, the first terminal may encapsulate the encoded data into FLV format. The FLV (Flash Video, streaming media format) is a Video format developed along with the promotion of Flash MX, and can be applied to live streaming media technology. Consists of a series of tags encapsulating audio, video, media description information, etc. The server may decapsulate, decode, encode, and repackage the first data stream and send the first data stream to the second terminals, for example, the server may repackage the data into RTMP/FLV/HLS or DASH, and distribute the data to the second terminals for playing.

In some embodiments, the first terminal may transmit the first data stream to the server through a preset transmission protocol. The predetermined transport protocol may be at least one of RTMP, RTP, SRT, HLS, or DASH. The RTMP (Real Time Messaging Protocol) is a network Protocol for Real-Time data communication, and is mainly used for audio/video and data communication between a Flash/AIR platform (an operating platform across operating systems) and a streaming media or interactive server supporting the RTMP Protocol. RTP (Real-Time Transport Protocol), which is used in a Streaming media system, is used in cooperation with an RTSP (Real Time Streaming Protocol) Protocol or directly transmits TS (Transport Stream) Stream using RTP, is also used in a video conference system, and is a technical basis of the IP telephone industry. The RTP Protocol is used together with the RTP Control Protocol RTCP (Real-time Control Protocol), and it is created on the UDP (User Datagram Protocol) Protocol. SRT (Secure Reliable Transport Protocol) is an open source UDP-based Transport Protocol without copyright charges formulated by havisin in combination with Wowza, and aims to safely and reliably solve the problems of high delay and poor jitter resistance of TCP (Transmission Control Protocol) in long-distance link Transmission, and optimize a live streaming media scene. The HTTP Live Streaming (HTTP-based adaptive bitrate Streaming media transport protocol) is a dynamic bitrate adaptive technology of Apple, is mainly used for audio and video services of PCs (Personal computers) and Apple terminals, and includes an m3u (8) index file, a TS media fragment file, and a key encryption string file. HTTP (Hypertext Transfer Prtcl) refers to the Hypertext Transfer protocol. DASH (Dynamic Adaptive Streaming over HTTP), which is a Dynamic Adaptive Streaming over HTTP protocol, is mainly used to effectively distribute content in an Adaptive, progressive, download or Streaming manner through the HTTP protocol. For example, the first terminal may encode at least one of audio data or video data, encapsulate the encoded data, and transmit the encapsulated data to the server.

And S204, receiving a second data stream corresponding to the target playing object from the second transmission channel, and caching the second data stream through a second data cache region.

Specifically, the server may receive a second data stream corresponding to the target playing object transmitted by the second transmission channel, and store the second data stream in the second data buffer. The server may be provided with a data stream receiving module corresponding to the second transmission channel, and the data stream receiving module corresponding to the second transmission channel is referred to as a second data stream receiving module. The server may receive the data stream transmitted by the second transmission channel through the second data stream receiving module, and store the received data stream in the second data buffer.

And S206, taking the first data cache region as a data stream cache source corresponding to the convergence cache region, and storing the data stream in the first data cache region into the convergence cache region.

Specifically, the server may be provided with a convergence module, the convergence module may include a first data cache region, a second data cache region, and a convergence cache region, and the server may converge data of the first data cache region and the second data cache region into the convergence cache region.

In some embodiments, when data is stored in the aggregation buffer, the server may store the data frames according to playing times corresponding to the data frames, where the earlier the playing time is, the earlier the storage order is, that is, the data frames with the earlier playing times are preferentially stored, and in the storage process, when it is determined that a time interval between the playing times of the data frames in the first data buffer is greater than a standard time interval, it is determined that data in the first data buffer is missing, and when it is determined that the data is missing, the server may obtain the missing data frames from other data buffers and store the missing data frames into the aggregation buffer, so that the data in the aggregation buffer is complete. The standard time interval refers to a time interval between two adjacent frames of data frames in a data stream without loss.

In some embodiments, the server may count the number of transmission channels corresponding to the target play object, as the number of target channels, and when it is determined that the number of target channels is greater than the threshold of the number of channels, trigger the aggregation buffer creation instruction, and create the aggregation buffer corresponding to the target play object. Wherein the threshold number of channels may be preset or set as desired, for example, may be 2.

And S208, when it is determined that the first data stream has data loss, switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to a second data cache region, determining a data loss position corresponding to the first data stream, and determining an aggregation starting position corresponding to the second data stream according to the data loss position.

Wherein, the data missing position refers to the position of the missing data frame in the first data stream. Missing data frames refer to data frames missing in the first data stream. The playing time corresponding to different data frames in the data stream is different, and the position of the data frame can be identified by the playing time, that is, the data missing position can be identified by the playing time, for example, when the data frame is a video frame, the data missing position can be identified by the display time of the missing data frame. The data missing locations may include one or more data missing locations corresponding to missing data frames.

Specifically, the server may obtain a first data frame from the first data buffer according to the playing time, and store the first data frame in the aggregation buffer, for example, the server obtains a current first data frame from the first data buffer, determines, based on the playing time corresponding to the current first data frame, the playing time corresponding to a backward adjacent data frame corresponding to the current first data frame, as a backward adjacent playing time, obtains, from the first data buffer, a data frame whose playing time is the backward adjacent playing time, when the obtaining is successful, stores the obtained data frame in the buffer to be aggregated, and when the obtaining is failed, determines that the backward adjacent data frame is missing in the first data buffer, that is, determines that the first data stream has data missing. The current first data frame may be any one of the data frames in the first data buffer. The backward adjacent data frame corresponding to the current first data frame refers to a data frame which is arranged after and adjacent to the current first data frame when no data is missing and the data frames are arranged from front to back according to the playing time.

In some embodiments, the data frames in the second data cache may be arranged according to play time, the data missing position may be determined by the play time, the play time is different, and the data missing position is different, when the server acquires data from the second data cache, the data frame that is acquired according to the play time corresponding to the data frame may be acquired preferentially, the server may use the data missing position as an aggregation start position, for example, when the first data stream and the second data stream are both data streams pushed to the server by the first terminal, that is, the first data stream and the second data stream are from the same encoder, the server may use the data missing position as the aggregation start position, or the server may calculate the aggregation start position based on the data missing position, for example, a position in the second data cache, where a difference between the data missing position and the data missing position is smaller than a position difference threshold, may be used as the aggregation start position. The position difference threshold may be preset or set as desired, and may be, for example, 1 second. For example, when the first data stream and the second data stream are data streams pushed to the server by different devices, for example, the first data stream is pushed to the server by the first terminal, the second data stream is pushed to the server by the second terminal, the server may obtain data cached in the second data cache area as second cache data, obtain a position of a data frame in the second cache data as a comparison position, calculate a position difference between a data missing position and the comparison position, take the comparison position with the position difference smaller than a position difference threshold as a target data position, and determine a convergence starting position corresponding to the second data stream based on the target data position. For example, the target data position may be used as the aggregation start position or the aggregation start position may be determined based on a frame coding type of the data frame at the target data position, for example, when the frame coding type of the data frame at the target data position is an intra-coded frame, the server may use the target data position as the aggregation start position, and when the frame coding type corresponding to the data frame at the target data position is a frame coding type other than the intra-coded frame, the server may obtain, as the aggregation start position, a position in the second data stream where the data frame whose position is after the target data position and whose frame coding type is the intra-coded frame is located.

When the data frame is a video frame, the frame coding type may include at least one of an intra-frame coding frame, a forward predictive coding frame, or a bidirectional predictive interpolation coding frame, where the intra-frame coding frame may also be referred to as a key frame or an I frame, the forward predictive coding frame may be referred to as a P frame, and the bidirectional predictive interpolation coding frame may be referred to as a B frame. The I frame, P frame, and B frame are data frames obtained by encoding. The I frame comprises a complete image, the decoding of the I frame can be completed only by data of the I frame without referring to other frames when the I frame is decoded, and the data volume of the I frame is large. When decoding a P frame, it is necessary to refer to frames before the P frame, and when decoding a B frame, it is necessary to refer to frames before and after the B frame, and the data amounts of the P frame and the B frame are small. The I frame may include a normal I frame as well as an IDR (Instantaneous Decoding Refresh) frame. An IDR frame is the beginning of a coded sequence, which is a sequence of coded video frames, also called group of pictures (GOP), which is a set of video frame data in a video stream, and in video coding the length of a group of pictures is the frame interval between two IDR frames. When reading the IDR frame, the decoder will refresh the coding related parameter information, and when decoding the frame after the IDR frame, the decoder will not refer to the frame before the IDR frame.

In some embodiments, the server may obtain a playing time corresponding to a data frame in the first cache data and obtain a playing time corresponding to a data frame in the second cache data, and calibrate the playing time corresponding to the data frame in the second cache data based on the playing time corresponding to the data frame in the first cache data to obtain calibrated second cache data, for example, the server may calibrate the display time stamp corresponding to the data frame in the second cache data based on the display time stamp corresponding to the data frame in the first cache data so that the data frame in the second cache data is unified with the data frame in the first cache data in terms of display time. The first cache data refers to data cached in the first data cache region, and the second cache data refers to data cached in the second data cache region. The first data frame refers to a data frame in the first buffer data, and the second data frame refers to a data frame in the second buffer data.

In some embodiments, when the server determines that the first data stream has data missing, the data stream cache source corresponding to the aggregation cache region is switched from the first data cache region to the second data cache region, a data missing position corresponding to the first data stream is determined, and an aggregation start position corresponding to the calibrated second cache data is determined according to the data missing position. Specifically, the server may use the data missing position as the aggregation start position, or determine, from the second cache data, a frame coding type of the data frame corresponding to the data missing position, and when the frame coding type is an intra-frame coding frame, use the data missing position as the aggregation start position, and when the frame coding type is not an intra-frame coding frame, determine, from the second cache data, a data frame corresponding to the data missing position as a missing data frame, determine, from the second cache data, an intra-frame coding frame that is located after the missing data frame and has the closest distance to the missing data frame, and use a position corresponding to the intra-frame coding frame as the aggregation start position.

In this embodiment, by using a timestamp-based consistency algorithm, it is possible to implement that, under multiple paths, data repetition or data loss of audio and video data acquired by a stream processing module due to jitter of an upstream stream does not occur, so that a player may go back, jump, or jam during playing.

And S210, starting from the aggregation initial position corresponding to the second data stream, storing the data stream in the second data buffer area into the aggregation buffer area.

Specifically, the server may obtain a second data frame from the data in the second data buffer, that is, the second buffer data, from the aggregation start position, store the obtained second data frame into the aggregation buffer, arrange the second data frame in the aggregation buffer according to the sequence of storing the second data frame into the aggregation buffer, arrange the data stream stored into the aggregation buffer after the data stream stored into the aggregation buffer is arranged in advance, and obtain the play data corresponding to the target play object based on the aggregation data stream in the aggregation buffer. As shown in fig. 4, each rectangular box in the figure represents a data frame, the data frame represented by the rectangular box with an "I" inside is an intra-coded frame, the 13 th frame to the 16 th frame of the data frame in the first data stream are missing, and the 13 th frame to the 16 th frame exist in the second data stream, so that the 1 st frame to the 12 th frame in the first data stream can be stored in the aggregation buffer first, and then the 13 th frame to the 16 th frame in the second data stream are aggregated in the aggregation buffer to obtain an aggregated data stream, where the data frames in the aggregated data stream are consecutive and have no missing.

In some embodiments, when it is determined that the data frame in the second data buffer has a data loss, the server may switch the data stream buffer source corresponding to the aggregation buffer to a third data buffer, where the third data buffer may be the first data buffer, or may be a buffer other than the first data buffer and the second data buffer. The third data buffer area stores the data stream corresponding to the target playing object. For example, the server may receive a third data stream corresponding to the target playing object from the third transmission channel, and cache the third data stream through the third data cache region. S212, playing data corresponding to the target playing object is obtained based on the converged data stream in the converged cache region.

Specifically, the playback data is data for playback. The playback data may be data obtained after a decoding process. The playing data may be data obtained after the second terminal decodes the data, for example, the server may send the aggregated data stream to the second terminal, and the second terminal may decode the aggregated data stream to obtain data that can be played and play the data. Or, the server may transcode the aggregated data stream to obtain a transcoded data stream, and send the transcoded data stream to the second terminal, for example, the server may include a stream processing module, where the stream processing module has a transcoding function, and the server may transcode the aggregated data stream by using the stream processing module to obtain the transcoded data stream.

In some embodiments, the server may divide the stream of coded data into a plurality of divided data streams, each of which is transmitted to the second terminal. The second terminal can decode the coded data stream to obtain playable data for playing. When the number of the second terminals is multiple, the server can distribute the transcoded data stream to each second terminal.

In some embodiments, the target playback object is a target live video in a target live scene. The first data stream is from a first terminal, the first terminal is an anchor terminal in a live streaming media technology, the server may be a server where a streaming media background in the live streaming media technology is located, the anchor terminal is an audio/video data source, that is, a device for acquiring audio/video data, the anchor terminal may encode the acquired audio/video data, encapsulate the encoded audio data and video data and transmit the encapsulated audio data and video data to the streaming media background through a transmission protocol, and the background performs decapsulation, decoding, encoding, and encapsulation to distribute the encapsulated data to a player of a viewer terminal to play, for example, the encapsulated data may be distributed to the viewer terminal by using a Content Delivery Network (CDN). As shown in fig. 5, an architecture diagram of a live streaming media technology is shown, where a streaming node is used to receive audio and video data sent by an anchor terminal.

In some embodiments, the first terminal is an anchor terminal in the live streaming media technology, the anchor terminal belongs to a data stream push device, the anchor terminal can push multiple streams when pushing streams, where multiple streams refer to at least two streams, and the first data stream and the second data stream may be two audio streams or video streams pushed by the same anchor terminal. The stream pushing means that the anchor terminal pushes locally acquired audio and video streams to a server where a streaming media background is located, and each pushed audio and video stream is received through a different stream receiving node, so that a plurality of stream receiving nodes may be provided, each pushed audio and video stream may include a main stream and a standby stream, the stream receiving node that receives the main stream may be referred to as a main stream access node, the stream receiving node that receives the standby stream may be referred to as a standby stream access node, and the stream processing module may switch between the main stream and the standby stream, for example, may switch in real time according to a stream state and quality. The first data stream receiving module receiving the first data stream may be an access node, for example, a primary flow access node, and the second data stream receiving module receiving the second data stream may also be an access node, for example, a standby flow access node.

As shown in fig. 6, the number of push streams at the anchor end is 2, the stream access node includes an uplink access point a and an uplink access point B, the uplink access point a receives a main stream, the uplink access point B receives a standby stream, the stream processing module obtains the main stream from the uplink access point a, and when the main stream fails or is severely stuck, the stream processing module may automatically switch to receive the standby stream, for example, the stream processing module may disconnect, in combination with scheduling, a connection with the main stream access node, that is, the uplink access point a, then establish a connection with the standby stream access node, that is, the uplink access point B, and obtain audio and video data from the uplink access point B.

In some embodiments, in the server, an aggregation module may be disposed between the streaming access node and the streaming processing module, and the aggregation module is used to acquire real-time audio and video data from the mainstream access node and the standby streaming access node, and to aggregate the audio and video data acquired from the mainstream access node and the standby streaming access node. As shown in fig. 7, an aggregation module is disposed between the uplink access point and the stream processing module, and the aggregation/switching module in the figure refers to the aggregation module. In fig. 7, by unifying the time stamps and GOP sequences of the main stream and the standby stream, switching between the main stream and the standby stream can be realized, so that the downlink is not perceived and is smooth.

At present, in the live broadcast streaming media technology, when there is only one upstream stream, in order to improve the high availability of the cloud system (streaming media background system), during internal transmission of the cloud system, multi-path forwarding may be adopted, and in combination with a remote manner, the disaster tolerance is improved, as shown in fig. 8, a multi-path disaster tolerance is adopted inside the system. In some scenes, in a scene of a proportional match activity or a live concert, a live broadcasting scene is provided with a plurality of cameras (machine positions), a data acquisition system and a coding stream pushing device, an uplink access point of the scene is used for receiving audio and video data pushed by different pushing ends, as shown in fig. 9, the uplink access point a receives the audio and video data sent by the pushing end a, the uplink access point B receives the audio and video data sent by the pushing end B, a stream processing module acquires the data pushed by the pushing end a from the uplink access point a, and when the data pushed by the pushing end a fails, the stream processing module is switched to acquire the data pushed by the pushing end B from the uplink access point B. However, whether the stream pushing end performs multi-path pushing or the inside of the streaming media background system performs multi-path disaster tolerance, when the main stream and the standby stream are switched, due to the problem of the switching time, data before and after switching cannot be aligned accurately, so that the problem that a picture seen by a user at the playing end is backed off or jumps forwards is caused, and the viewing experience of the clock end is influenced. As shown in fig. 10, when switching from the main stream to the standby stream is performed attime 102 seconds (S), data acquired from the standby stream starts from 101 seconds due to the fact that the data of the standby stream still stays in the previous frame, or the data acquired from the standby stream starts from 101 seconds due to a video gop cache mechanism, so that a fallback phenomenon occurs. No matter the stream pushing end carries out multipath pushing or the inside of the system carries out multipath disaster tolerance, when the main stream and the standby stream are switched, due to the problem of the switching time, the data before and after the switching can not be accurately aligned on the main stream and the standby stream, and the frame jumps, falls back or is jammed, thereby influencing the watching experience of the audience end.

The playing data processing method provided by the application can realize a plurality of input streams with the same codes, when the main stream is switched to the standby stream due to faults, smooth switching can be realized at the stream receiving end, the alignment of the frame level is realized, the watching of a downlink playing user is not influenced, the picture backspacing, jumping or blocking cannot occur, and the main stream and the standby stream can be switched smoothly and losslessly.

In some embodiments, the first data stream and the second data stream are encoded data streams; determining a data missing position corresponding to the first data stream, and determining a convergence starting position corresponding to the second data stream according to the data missing position comprises: acquiring a target data position corresponding to a data missing position in a second data stream; acquiring a frame coding type of a target data frame corresponding to a target data position in a second data stream; and determining a position determination strategy according to the frame coding type, and determining a convergence initial position corresponding to the second data stream according to the target data position and the position determination strategy.

The location determination policy may include at least one of a coded data set skipping policy or a location preserving policy. The encoded data group skipping strategy is used for indicating to skip the encoded data group where the target data position is located, and determining the convergence starting position from the position behind the encoded data group where the target data position is located in the second cache data. The location maintenance policy is used to indicate the target data location as the aggregation start location. The target data frame is a data frame at a target data location in the second buffered data.

The data group is a sequence in which a plurality of data frames are arranged from front to back in accordance with the play time.

The encoded data group refers to a data group in a data stream obtained by encoding, that is, a segment in the data stream, and the lengths of the encoded data groups may be the same or different, that is, the number of data frames included in each data group may be the same or different. When one data set is decoded, the other data set is not used. When each data frame in the same data group is decoded, other data frames in the data group can be referred for decoding. The encoded data group refers to a data group in the encoded data stream. When the data frame is an image frame, if the image frame is encoded by using h.264, one encoded data group may be referred to as one image group. The data frames included in the coding data set are arranged according to the playing time, the initial data frame in the coding data set is a coding reference frame, the coding reference frame can decode itself based on data of itself, and the data frames except the coding reference frame in the coding data set directly or indirectly depend on the coding reference frame for decoding. The decoding of data frames in one encoded data group is independent of the decoding of data frames in other encoded data groups. The encoded reference frame may be an intra-coded frame, i.e., an I-frame. The non-encoded reference frame may be a P frame or a B frame.

For example, if the original video data, that is, the video data that is not encoded, is [video frame 1,video frame 2,video frame 3,video frame 4,video frame 5, video frame 6], the original video data is data that is not encoded, that is, uncompressed data, the first terminal encodes the original video data, and in the encoding process, if the video frames 1 to 3 are encoded as thedata group 1, the encodeddata group 1= [ encodedvideo frame 1, encodedvideo frame 2, encoded video frame 3] is obtained, and the video frames 4 to 6 are encoded as thedata group 2, the encodeddata group 2= [ encodedvideo frame 4, encodedvideo frame 5, encoded video frame 6] is obtained, and then the original video data is encoded to obtain [ encodeddata group 1, encoded data group 2]. Specifically, the server may obtain display timestamps corresponding to respective second data frames in the second cache data and obtain display timestamps corresponding to data missing positions, and when the server determines that the display timestamps corresponding to the respective second data frames have the same display timestamp as the display timestamp corresponding to the data missing position, the server may take the data missing position as the display timestamp corresponding to the target data position.

In some embodiments, when the server determines that the same display timestamp as the display timestamp corresponding to the data missing position does not exist in the display timestamps of the respective second data frames, the server may acquire, as the display timestamp corresponding to the target data position, the display timestamp having the smallest difference from the display timestamps of the data missing positions from the display timestamps of the respective second data frames.

In some embodiments, the server may determine, from the second cached data, a target data frame corresponding to the target data location, obtain a frame coding type corresponding to the target data frame, and determine the location determination policy according to the target data frame, for example, when the frame coding type of the target data frame is a coding reference frame, the server may determine the location determination policy as a location keeping policy, and when the frame coding type of the target data frame is a non-coding reference frame, the server may determine the location determination policy as a coding data group skipping policy. In this embodiment, the position determination policy is determined according to the frame coding type, and the convergence start position corresponding to the second data stream is determined according to the target data position and the position determination policy, so that the accuracy of the convergence start position is improved.

In some embodiments, determining the position determination policy according to the frame coding type, and determining the convergence start position corresponding to the second data stream according to the target data position and the position determination policy includes: when the frame coding type is a non-coding reference frame, determining the position determination strategy as a coding data group skipping strategy; and skipping a target coding data group corresponding to the target data position based on the coding data group skipping strategy, and taking the position of the coding reference frame in a backward coding data group corresponding to the target coding data group as a convergence starting position corresponding to the second data stream.

Specifically, when the server determines that the frame coding type of the target data frame is a non-coding reference frame, that is, when it is determined that the target data frame is not a starting data frame in the coding data set, the server may determine that the position determination policy is a coding data set skip policy, skip the target coding data set corresponding to the target data position, determine a convergence starting position corresponding to the second cache data from a position after the target coding data set, for example, take a position of a coding reference frame in a backward coding data set corresponding to the target coding data set as the convergence starting position, that is, take a position corresponding to a starting data frame in a backward coding data set corresponding to the target coding data set as the convergence starting position.

In this embodiment, when the frame coding type is a non-coding reference frame, the target coding data group corresponding to the target data position is skipped, and the position of the coding reference frame in the backward coding data group corresponding to the target coding data group is used as the aggregation start position corresponding to the first data stream, so that the data frame corresponding to the aggregation start position can be the coding reference frame, and thus the data in the second data stream aggregated into the aggregation buffer area can be transcoded based on the coding reference frame, thereby improving the quality and transcoding effect of the data stream in the aggregation buffer area.

In some embodiments, determining the position determination policy according to the frame coding type, and determining the convergence start position corresponding to the second data stream according to the target data position and the position determination policy includes: when the frame coding type is a coding reference frame, determining the position determination strategy as a position keeping strategy; and taking the target data position as a corresponding aggregation starting position of the second data stream based on the position holding strategy.

Specifically, when the server determines that the frame coding type corresponding to the target data frame is a coding reference frame, that is, when the frame coding type corresponding to the target data frame is determined to be a starting data frame in the coded data set, the server may determine that the position determination policy is a position holding policy, and use the position of the target data as a convergence starting position corresponding to the second cache data.

In this embodiment, when the frame coding type is a coding reference frame, the target data position is used as a convergence starting position corresponding to the second data stream, so that the data frame corresponding to the convergence starting position can be a coding reference frame, and thus, data in the second data stream converged into the convergence buffer area can be transcoded based on the coding reference frame, thereby improving the quality and transcoding effect of the data stream in the convergence buffer area.

In some embodiments, obtaining the play data corresponding to the target play object based on the aggregated data stream in the aggregation buffer includes: acquiring a target data length corresponding to the transcoding data set; inserting a reference indication frame into the converged data stream in the converged cache region according to the target data length to obtain an updated converged data stream; in the transcoding process, a transcoding reference frame is determined based on the reference indication frame, and a transcoding data set where the transcoding reference frame is located in the updated aggregated data stream is transcoded based on the transcoding reference frame to obtain a transcoded data stream; and obtaining playing data corresponding to the target playing object according to the transcoding data stream.

The target data length is a length of the transcoded data group, and may be represented by using a number of data frames, for example, if the transcoded data group includes 10 data frames, the target data length may be 10 frames, and of course, a display duration corresponding to the transcoded data group may also be used to represent the target data length, where the display duration refers to a time interval between a start display timestamp and an end display timestamp, the start display timestamp refers to a display timestamp corresponding to a start data frame in the data group, and the end display timestamp refers to a display timestamp corresponding to an end data frame in the data group, and for example, if the display duration corresponding to the transcoded data group is 10 seconds, the target data length may be 10 seconds. The transcoded data stream is a data stream obtained by transcoding data frames in the aggregated data stream. The transcoding process comprises decoding and encoding, namely, decoding the data frames in the converged data stream, and then encoding the data frames obtained by decoding to obtain the data frames in the transcoded data stream.

The reference indication frame is used for indicating the position of the transcoding reference frame. The transcoding reference frame refers to a starting data frame of the transcoded data group, and in the transcoding process, the transcoding reference frame is a data frame which needs to be transcoded into an intra-frame coding frame. The updated converged data stream is a data stream obtained by inserting a reference indication frame into the converged data stream.

The reference indication frame may be a custom indication frame. For example, the reference indication frame may be implemented by an SEI (Supplemental Enhancement Information) frame, or may be implemented by script data. The frame format of the SEI frame to which the reference indication frame corresponds is as shown in fig. 11A.

The format is added with SEI payload (payload) =5, length (length) field is variable byte, and conforms to the SEI standard of h.264 or h.265SEI, the length does not contain 0x80 end byte, but contains Content byte, wherein, userBusinessID is 16 bytes, content is self-defined character string: { \\ "iframe \":1}. The iframe is used to mark the SIE frame as key frame control information.

Specifically, the server may insert reference indication frames into the aggregated data stream at intervals of the target data length to obtain an updated aggregated data stream, where the number of data frames included between the reference indication frames in the updated aggregated data stream is the target data length. For example, if the target data length is 3, the aggregate data stream is [ encodedvideo frame 1, encodedvideo frame 2, encodedvideo frame 3, encodedvideo frame 4, encodedvideo frame 5, encodedvideo frame 6, encodedvideo 7, encodedvideo 8, encoded video 9], encoded video 1-encodedvideo 6 are from the first buffer data, and encoded video 7-encodedvideo 9 are from the second buffer data, the updated aggregate data stream may be [ reference indication frame, encodedvideo frame 1, encodedvideo frame 2, encodedvideo frame 3, reference indication frame, encodedvideo frame 4, encodedvideo frame 5, encodedvideo frame 6, reference indication frame, encodedvideo 7, encodedvideo 8, encoded video 9]. The server may preferentially consider the reference indication frame as a forward adjacent frame of the key frame, where the forward adjacent frame of the key frame refers to a frame located before and adjacent to the key frame. The key frame refers to an intra-coded frame.

In some embodiments, the server may determine the target data length according to at least one of the first data group length or the second data group length, for example, the server may use the length of the data group in the first data stream as the target data length, or use the length of the data group in the second data stream as the target data length, or use the smaller of the first data group length and the second data group length as the target data length, or use the larger of the first data group length and the second data group length as the target data length, or perform weighted calculation on the first data group length and the second data group length, for example, perform an average operation, and use the calculated result as the target data length. The first data group length refers to the length of the coding data group in the first data stream, and the second data group length refers to the length of the coding data group in the second data stream. In some embodiments, the server may send the transcoded data stream to the second terminal, and since the transcoded data stream is data obtained by decoding and re-encoding, the data frames in the transcoded data stream are encoded data frames, and thus when the second terminal receives the transcoded data stream, the server may decode the transcoded data stream and play the data frames obtained by decoding.

In some embodiments, the server may determine a backward neighboring frame of the reference indication frame from the updated aggregated data stream, and use the backward neighboring frame of the reference indication frame as the transcoding reference frame, where the backward neighboring frame of the reference indication frame refers to a frame that is located after and adjacent to the reference indication frame.

In some embodiments, the server may transcode the transcoding reference frame in the transcoding data set, and in the transcoding process, may decode the transcoding reference frame in the transcoding data set to obtain a decoded data frame corresponding to the transcoding reference frame, decode other data frames in the transcoding data set based on the decoded data frame corresponding to the transcoding reference frame to obtain decoded data frames of the other data frames, and encode each decoded data frame to obtain each data frame in the transcoding data stream.

In some embodiments, the transcoded data stream does not include the reference indication frame, as shown in fig. 11B, a schematic diagram of the transcoded data stream is shown, a rectangular box including S in the diagram refers to the reference indication frame, it can be seen from the diagram that the updated aggregate data stream includes the reference indication frame, there are 6 data frames between the reference indication frames, and thus it can be seen that the target data length corresponding to the transcoded data group is 6 frames. It can also be seen from the transcoded data stream in the figure that the length of the transcoded data group is 6 frames, and the reference indication frame is not included in the transcoded data stream.

In this embodiment, according to the length of the target data corresponding to the transcoded data group, a reference indication frame is inserted into the aggregated data stream in the aggregation buffer area to obtain an updated aggregated data stream, a transcoding reference frame is determined based on the reference indication frame, the transcoded data group where the transcoding reference frame is located in the updated aggregated data stream is transcoded based on the transcoding reference frame to obtain a transcoded data stream, and the playing data corresponding to the target playing object is obtained according to the transcoded data stream, so that the lengths of the data groups in the transcoded data stream are consistent, and the uniformity of the lengths of the data groups in the transcoded data stream is improved.

In some embodiments, in the transcoding process, determining a transcoding reference frame based on the reference indication frame, and transcoding a transcoding data group in which the transcoding reference frame is located in the updated aggregated data stream based on the transcoding reference frame to obtain a transcoded data stream includes: decoding is carried out on the basis of the updated converged data stream to obtain a decoded data stream; in the process of coding the decoded data stream, when a reference indication frame is detected, a backward adjacent data frame of the reference indication frame is used as a transcoding reference frame, and intra-frame coding is carried out on the transcoding reference frame to obtain an intra-frame coding frame; and transcoding the transcoding data group where the transcoding reference frame is located based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream, wherein the transcoding data group where the transcoding reference frame is located comprises a data frame with a target data length.

The decoding data stream is a data stream obtained by decoding a data frame in the updated aggregated data stream. The backward neighboring data frame of the reference indication frame refers to a data frame located after and adjacent to the reference indication frame.

Specifically, in order to implement transcoding, the server may encode the decoded data stream, and in the encoding process, the server may determine, from the updated aggregated data stream, a data frame to be encoded as an intra-frame encoded frame according to a reference indication frame, where the transcoding reference frame refers to a data frame that needs to be encoded as an intra-frame encoded frame in the encoding process. For example, the backward adjacent data frame of the reference indication frame may be used as a transcoding reference frame, and the transcoding reference frame is intra-coded to obtain an intra-coded frame.

In some embodiments, the server may obtain a transcoding data set in which the transcoding reference frame is located, encode, based on an intra-coded frame obtained by intra-coding the transcoding reference frame, other data frames in the transcoding data set in which the transcoding reference frame is located to obtain other encoded frames, and use the obtained intra-coded frame and the other encoded frames as transcoding data in the transcoding data stream.

In some embodiments, when a new referenceable frame is detected, the server may update the referenceable frame, and return to using a backward-adjacent data frame of the referenceable frame as a transcoding reference frame, and perform intra-frame coding on the transcoding reference frame to obtain an intra-frame coded frame.

In this embodiment, in the process of decoding a decoded data stream, when a reference indication frame is detected, a backward data frame of the reference indication frame is used as a transcoding reference frame, the transcoding reference frame is intra-coded to obtain an intra-coded frame, a transcoding data group where the transcoding reference frame is located is transcoded based on the intra-coded frame to obtain transcoding data in the transcoded data stream, and since the transcoding data group where the transcoding reference frame is located includes a data frame with a target data length, the lengths of the transcoding data groups in the transcoded data stream are unified.

In some embodiments, the first data stream and the second data stream are encoded data streams, and storing the data stream in the second data buffer area into the aggregation buffer area from the aggregation start position corresponding to the second data stream includes: acquiring a second coding parameter corresponding to the second data stream, and generating an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, and the backward data stream of the information switching indication frame is decoded based on the second coding parameter; and inserting the information switching indication frame into the tail end position of the first data stream in the convergence cache region, starting from the convergence initial position corresponding to the second data stream, storing the data stream in the second data cache region into the convergence cache region, and using the data stream as the backward data stream of the information switching indication frame.

The information switching indication frame generated by the second encoding parameter may include the second encoding parameter, and the information switching indication frame is used to indicate to switch the encoding parameter used for decoding, for example, the information switching indication frame generated by the second encoding parameter is used to indicate to switch the encoding parameter used for decoding to the second encoding parameter. The information switching indication frame may be generated by the server according to the encoding parameters, or may be carried in the data stream. The information switching indication frame generated by the first encoding parameter may be referred to as a first information switching indication frame, and the information switching indication frame generated by the second encoding parameter may be referred to as a second information switching indication frame.

Specifically, when switching the data stream cache source corresponding to the aggregation cache region, the server may obtain an information switching indication frame corresponding to the switched data stream cache source, insert the information switching indication frame into the aggregation cache region after the data is already present, and store the data in the aggregation cache region after the information switching indication frame from the aggregation start position corresponding to the switched data stream cache source. For example, if the server first uses the first data buffer as the data stream cache source, then switches the data stream cache source to the second data buffer, and then switches the data stream cache source back to the first data buffer, the data in the aggregation buffer may be [ the first information switching indication frame, the data from the first data buffer, the second information switching indication frame, the data from the second data buffer, the first information switching indication frame, and the data from the first data buffer ].

In some embodiments, starting from the aggregation start position corresponding to the second data stream, the step of storing the data stream in the second data buffer into the aggregation buffer and serving as the backward data stream of the information switch indication frame includes: and starting from the aggregation initial position corresponding to the second data stream, storing the data stream in the second data buffer area to the aggregation buffer area after the information switching indication frame.

In this embodiment, a second coding parameter corresponding to a second data stream is obtained, an information switching indication frame is generated based on the second coding parameter, the information switching indication frame is inserted into an end position of a first data stream in a convergence buffer, a data stream in the second data buffer is stored in the convergence buffer and serves as a backward data stream of the information switching indication frame, so that when data from the second data buffer in the convergence data stream is decoded, the second coding parameter can be used for decoding, and the success rate of decoding is improved.

In some embodiments, obtaining the play data corresponding to the target play object based on the aggregated data stream in the aggregation buffer includes: decoding based on a first coding parameter corresponding to a first data stream in the process of decoding the converged data stream; when the information switching indication frame is detected, extracting a second coding parameter from the information switching indication frame, switching the coding parameter referred by decoding from the first coding parameter to the second coding parameter, and decoding based on the second coding parameter; and carrying out unified coding on the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

In this embodiment, when the information switching indication frame is detected, the second coding parameter is extracted from the information switching indication frame, the coding parameter referred to for decoding is switched from the first coding parameter to the second coding parameter, and decoding is performed based on the second coding parameter, so that when data from the second data buffer in the aggregated data stream is decoded, the second coding parameter can be used for decoding, and the success rate of decoding is improved.

In some embodiments, the method further comprises: acquiring a live broadcast view angle set corresponding to a target live broadcast scene, and establishing a cache group corresponding to each live broadcast view angle in the live broadcast view angle set; the cache group comprises a data cache region and a convergence cache region corresponding to each live broadcast device corresponding to the live broadcast visual angle; the live broadcasting view angle corresponds to a plurality of live broadcasting devices; when determining that the first data stream has data loss, switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region comprises: when the first data stream is determined to have data loss, selecting a data cache region outside the first data cache region from the cache group corresponding to the first data cache region as a second data cache region, and switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region.

The target playing object can be a target live video corresponding to a target live scene, and the first data stream and the second data stream are data streams obtained by collecting the target live scene by using live equipment.

The target live broadcast scene can be provided with a plurality of live broadcast devices, the live broadcast devices are used for collecting data of the live broadcast scene, encoding the collected data, and sending the encoded data to a server corresponding to a live broadcast background, so that the server sends the data of the live broadcast scene to a spectator end. Live broadcast visual angle can be confirmed according to the shooting angle of live broadcast equipment, and the different live broadcast visual angles of shooting angle are also different. One or more live devices may be provided at the same shooting angle.

The set of live views includes a plurality of live views. The cache set includes a plurality of data cache regions. The cache group corresponding to the live broadcast visual angle comprises data cache regions respectively corresponding to all live broadcast devices of the live broadcast visual angle, and the data cache regions are used for caching data streams from all the live broadcast devices. The cache group corresponding to the live view angle can also comprise a convergence cache region, and the data in the convergence cache region corresponding to the live view angle comes from the data cache region corresponding to the live view angle.

Specifically, the server may obtain the aggregation cache stream corresponding to the live view by aggregating data in each data cache region corresponding to the live view into the aggregation cache region, so as to obtain the aggregation cache stream corresponding to each live view. Taking a live broadcast view as an example, a cache group of the live broadcast view includes data cache regions corresponding to a plurality of live broadcast devices, a server receives a data stream sent by the live broadcast devices, the data stream sent by the live broadcast devices is cached in the corresponding data cache regions, a first data cache region is determined from each data cache region, for example, any data cache region may be used as a first data cache region, data in the first data cache region is stored into a convergence cache region, in the storage process, when it is determined that data in the first data cache region is missing, a second data cache region is selected from each data cache region outside the first data cache region in the cache group, and a data stream cache source corresponding to the convergence cache region is switched from the first data cache region to the second data cache region. In this embodiment, the data cache region outside the first data cache region is selected as the second data cache region from the cache group corresponding to the first data cache region, so that the first data cache region and the second data cache region correspond to the same live broadcast viewing angle, and thus, the data streams at the same live broadcast viewing angle are converged, thereby improving the integrity of the data streams at the live broadcast viewing angle, reducing the data loss, and improving the fluency of the live broadcast data corresponding to the live broadcast viewing angle.

In some embodiments, the target playback object is a target live video corresponding to a target live scene, receiving a first data stream corresponding to the target playback object from a first transmission channel, and caching the first data stream through a first data cache region includes:

establishing a first transmission channel with first shooting equipment corresponding to a target live broadcast scene, and receiving a first video stream transmitted by the first shooting equipment through the first transmission channel and a target scene identifier corresponding to the target live broadcast scene; based on the target scene identification, taking the first video stream as a main video stream corresponding to the target live video; receiving a second data stream corresponding to the target playing object from the second transmission channel, and caching the second data stream through a second data cache region, wherein the caching step comprises the following steps: establishing a second transmission channel by a second shooting device corresponding to the target live broadcast scene, and receiving a second video stream transmitted by the second shooting device through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

The target live scene may be any live scene. The target playing object is a target live broadcast video corresponding to a target live broadcast scene, the target live broadcast scene may be, for example, a live broadcast site of a concert performed by the star a in the stadium L, and the target live broadcast video may be, for example, a live broadcast video of the concert.

The capture device may be for at least one of capturing video data or recording audio data. The target live scene may include a plurality of shooting devices, where the plurality refers to at least two of the shooting devices, for example, the shooting devices include a first shooting device and a second shooting device. The first terminal may be a first shooting device in a target live scene, and the third terminal may be a second shooting device in the target live scene. The first transmission channel may be a channel established between the first photographing apparatus and the server for transmitting data, and the second transmission channel may be a channel established between the second photographing apparatus and the server for transmitting data. The scene identification is used for uniquely identifying the scene, and the target scene identification is the scene identification corresponding to the target live broadcast scene. The scene identification may be set as needed or preset, and may be determined by using the position of the scene or the identity information of the person in the scene, for example.

The main video stream and the backup video stream are relative concepts, and when the server receives a plurality of video streams from a target live scene, at least one of the plurality of video streams may be used as the main video stream, and at least one of the plurality of video streams may be used as the backup data stream, for example, there may be only one main video stream.

The first data stream may be a first video stream and the second data stream may be a second video stream.

Specifically, the server may establish a first transmission channel with the first shooting device, establish a second transmission channel with the second shooting device, transmit the acquired video data of the target live broadcast scene to the server through the first transmission channel to obtain a first video stream, and transmit the acquired video data of the target live broadcast scene to the server through the second transmission channel to obtain a second video stream.

In some embodiments, the first shooting device may send a first data receiving request to the server through the first transmission channel, where the first data receiving request may include the target scene identification and the first video stream. The second shooting device may send a second data receiving request to the server through the second transmission channel, where the second data receiving request may include the target scene identifier and the second video stream.

In some embodiments, the server may determine the primary video stream from the plurality of video streams according to the quality of the video streams, for example, the video stream with better quality may be used as the primary video stream. For example, the server receives 3 video streams from a target live scene, which are a first video stream, a second video stream, and a third video stream, respectively, and if the quality of the first video stream is better than the quality of the second video stream and the quality of the third video stream, the server may use the first video stream as a main video stream.

In this embodiment, a first transmission channel is established for a first shooting device corresponding to a target live broadcast scene, a first video stream transmitted by the first shooting device through the first transmission channel and a target scene identifier corresponding to the target live broadcast scene are received, the first video stream is used as a main video stream corresponding to the target live broadcast video based on the target scene identifier, a second transmission channel is established for a second shooting device corresponding to the target live broadcast scene, a second video stream transmitted by the second shooting device through the second transmission channel and the target scene identifier are received, and the second video stream is used as a backup video stream, so that video streams acquired by different devices of the target live broadcast scene are obtained, and therefore when the main video stream is abnormal, the backup video stream can be used for transmitting the video streams, thereby improving disaster tolerance under the live broadcast scene, reducing video data loss in the live broadcast scene, and improving fluency of live broadcast.

The application also provides an application scene, and the application scene applies the play data processing method. Specifically, the application of the playing data processing method in the application scenario is as follows:

1. and establishing a first transmission channel and a second transmission channel with the first terminal.

The first terminal carries out video acquisition on a target live broadcast scene and encodes acquired video data to obtain a target live broadcast video, and the first terminal transmits the target live broadcast video to the server through the first transmission channel and the second transmission channel respectively.

2. And receiving a first video stream corresponding to the target live video sent by the first terminal from the first transmission channel, and receiving a second video stream corresponding to the target live video sent by the first terminal from the second transmission channel.

3. The first video stream is buffered through the first data buffer area, and the second video stream is buffered through the second data buffer area.

The server may create a first data buffer for the first transmission channel, create a second data buffer for the second transmission channel, store the data transmitted by the first transmission channel in the first data buffer, and store the data transmitted by the second transmission channel in the second data buffer. The server can also be provided with a convergence cache region.

4. And taking the first data cache region as a data stream cache source corresponding to the aggregation cache region, and storing the data stream in the first data cache region into the aggregation cache region.

5. When it is determined that the first data stream has data loss, switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to a second data cache region, determining a data loss position corresponding to the first data stream, and taking the data loss position as an aggregation starting position corresponding to the second data stream.

6. And storing the data stream in the second data buffer area into the convergence buffer area from the convergence starting position corresponding to the second data stream.

7. And obtaining playing data corresponding to the target playing object based on the converged data stream in the converged cache region.

The application also provides an application scene, and the application scene applies the play data processing method. Specifically, the application of the play data processing method in the application scenario is as follows:

1. and establishing a first transmission channel with the first terminal and establishing a second transmission channel with the third terminal.

The first terminal carries out video acquisition on a target live broadcast scene and encodes acquired video data to obtain a first target live broadcast video, the second terminal carries out video acquisition on the target live broadcast scene and encodes the acquired video data to obtain a second target live broadcast video, the first terminal transmits the first target live broadcast video to the server through a first transmission channel, and the second terminal transmits the second target live broadcast video to the server through a second transmission channel.

2. And receiving a first video stream corresponding to a first target live video sent by a first terminal from a first transmission channel, and receiving a second video stream corresponding to a second target live video sent by a second terminal from a second transmission channel.

4. And taking the first data cache region as a data stream cache source corresponding to the convergence cache region, and storing the data stream in the first data cache region into the convergence cache region.

5. And when the first data stream is determined to have data loss, switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to a second data cache region, and determining the data loss position corresponding to the first data stream.

6. And acquiring a target data position corresponding to the data missing position in the second data stream.

7. And acquiring the frame coding type of the target data frame corresponding to the target data position in the second data stream.

8. And when the frame coding type is a non-coding reference frame, skipping a target coding data group corresponding to the target data position by using a skipping strategy, and taking the position of the coding reference frame in a backward coding data group corresponding to the target coding data group as a convergence initial position corresponding to the second data stream.

9. And when the frame coding type is a coding reference frame, taking the target data position as a convergence initial position corresponding to the second data stream.

10. Acquiring a second coding parameter corresponding to the second data stream, and generating an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that the second coding parameter is used as a new coding parameter when the information switching indication frame is detected in the decoding process, so that the backward data stream of the information switching indication frame is decoded based on the second coding parameter.

11. And inserting the information switching indication frame into the tail end position of the first data stream in the convergence cache region, starting from the convergence initial position corresponding to the second data stream, storing the data stream in the second data cache region into the convergence cache region, and using the data stream as the backward data stream of the information switching indication frame.

12. And acquiring a target data length corresponding to the transcoding data set, and inserting a reference indication frame into the converged data stream in the convergence cache region according to the target data length to obtain an updated converged data stream.

13. Decoding is carried out based on the updated converged data stream to obtain a decoded data stream, and decoding is carried out based on a first coding parameter corresponding to a first data stream in the process of decoding the converged data stream; when the information switching indication frame is detected, extracting a second coding parameter from the information switching indication frame, switching the coding parameter referred by decoding from the first coding parameter to the second coding parameter, and decoding based on the second coding parameter.

14. In the process of coding the decoded data stream, when a reference indication frame is detected, a backward adjacent data frame of the reference indication frame is used as a transcoding reference frame, and intra-frame coding is carried out on the transcoding reference frame to obtain an intra-frame coding frame.

15. And transcoding the transcoding data group where the transcoding reference frame is located based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream.

And the transcoding data group where the transcoding reference frame is located comprises a data frame with a target data length.

16. And segmenting the coded data stream, and distributing the segmented data stream to each live-broadcast watching terminal.

The application also provides an application scenario, and the application scenario applies the play data processing method. Specifically, the application of the playing data processing method in the application scenario is as follows:

The first terminal acquires audio of a target live broadcast scene and encodes the acquired audio data to obtain target live broadcast audio, and transmits the target live broadcast audio to the server through the first transmission channel and the second transmission channel respectively.

2. Receiving a first audio stream corresponding to target live broadcast audio sent by a first terminal from a first transmission channel, and receiving a second audio stream corresponding to the target live broadcast audio sent by the first terminal from a second transmission channel.

3. The first audio stream is buffered by the first data buffer and the second audio stream is buffered by the second data buffer.

At present, more and more live broadcast platforms appear, and as a client tends to select a stable live broadcast platform with low jam-on frequency, the stability and the playing quality of the live broadcast platform are two important aspects for measuring the quality of the live broadcast platform. Since a manufacturer (e.g., a cloud manufacturer) providing the live broadcast service receives the cost of the client depending on the downstream traffic of the client and the related value-added service, the stability of the live broadcast platform is improved, the possibility that the user selects the live broadcast platform is improved, and the income of the manufacturer providing the live broadcast service is improved.

In the play data processing method provided by the application, a plurality of input streams with the same codes can be realized, when the main stream is switched to the standby stream due to a fault, the switching can be very smooth at the stream receiving end, and the alignment of the frame level is realized, so that the watching of a downlink play user is not influenced, and the picture backspacing, jumping or jamming can not occur. The method and the device are applied to the server side of the live broadcast platform, global consistency of audio and video data is achieved through an algorithm mechanism, downlink blocking is reduced, and higher system disaster tolerance capability and robustness are achieved, so that the experience of watching live broadcast by a user is optimized.

It should be understood that although the various steps in the flowcharts of fig. 2-11B are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-11B may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.

In some embodiments, as shown in fig. 12, there is provided a playing data processing apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a first datastream receiving module 1202, a second datastream receiving module 1204, a first datastream aggregation module 1206, an aggregation startposition determining module 1208, a second datastream aggregation module 1210, and a playingdata obtaining module 1212, where:

a first datastream receiving module 1202, configured to receive a first data stream corresponding to a target playing object from a first transmission channel, and cache the first data stream through a first data cache region;

a second datastream receiving module 1204, configured to receive a second data stream corresponding to the target playing object from the second transmission channel, and cache the second data stream through a second data cache region;

the first datastream aggregation module 1206 is configured to use the first data cache region as a data stream cache source corresponding to the aggregation cache region, and store the data stream in the first data cache region in the aggregation cache region;

the aggregation startposition determining module 1208 is configured to, when it is determined that the first data stream has data missing, switch the data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region, determine a data missing position corresponding to the first data stream, and determine an aggregation start position corresponding to the second data stream according to the data missing position;

a second datastream aggregation module 1210, configured to store, starting from an aggregation start position corresponding to a second data stream, a data stream in a second data buffer into an aggregation buffer;

a playingdata obtaining module 1212, configured to obtain, based on the converged data stream in the converged cache region, playing data corresponding to the target playing object.

In some embodiments, the first data stream and the second data stream are encoded data streams; the convergence start position determining module includes: the target data position determining unit is used for acquiring a target data position corresponding to a data missing position in the second data stream; a frame coding type acquiring unit, configured to acquire a frame coding type of a target data frame corresponding to a target data position in a second data stream; and the convergence initial position determining unit is used for determining a position determination strategy according to the frame coding type and determining a convergence initial position corresponding to the second data stream according to the target data position and the position determination strategy.

In some embodiments, the convergence start position determining unit is further configured to: when the frame coding type is a non-coding reference frame, determining the position determination strategy as a coding data group skipping strategy; and skipping a target coding data group corresponding to the target data position based on the coding data group skipping strategy, and taking the position of the coding reference frame in a backward coding data group corresponding to the target coding data group as a convergence starting position corresponding to the second data stream.

In some embodiments, the convergence start position determining unit is further configured to determine that the position determination policy is a position keeping policy when the frame coding type is a coding reference frame; and taking the target data position as a corresponding aggregation starting position of the second data stream based on the position holding strategy.

In some embodiments, the playing data obtaining module comprises: the target data length acquisition unit is used for acquiring the target data length corresponding to the transcoding data set; the updated converged data stream obtaining unit is used for inserting a reference indication frame into the converged data stream in the converged cache region according to the target data length to obtain an updated converged data stream; the transcoding data stream obtaining unit is used for determining a transcoding reference frame based on the reference indication frame in the transcoding process, and transcoding a transcoding data set where the transcoding reference frame is located in the updated aggregated data stream based on the transcoding reference frame to obtain a transcoding data stream; and the playing data obtaining unit is used for obtaining playing data corresponding to the target playing object according to the transcoding data stream.

In some embodiments, the transcoding data stream obtaining unit is further configured to decode based on the updated aggregate data stream to obtain a decoded data stream; in the process of coding the decoded data stream, when a reference indication frame is detected, a backward adjacent data frame of the reference indication frame is used as a transcoding reference frame, and intra-frame coding is carried out on the transcoding reference frame to obtain an intra-frame coding frame; and transcoding the transcoding data set where the transcoding reference frame is located based on the intra-frame coding frame to obtain transcoding data in the transcoding data stream, wherein the transcoding data set where the transcoding reference frame is located comprises data frames with target data length.

In some embodiments, the first data stream and the second data stream are encoded data streams, and the second data stream aggregation module includes: an information switching indication frame generating unit, configured to acquire a second coding parameter corresponding to the second data stream, and generate an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, and the backward data stream of the information switching indication frame is decoded based on the second coding parameter; and the data stream aggregation unit is used for inserting the information switching indication frame into the tail end position of the first data stream in the aggregation buffer area, starting from the aggregation initial position corresponding to the second data stream, storing the data stream in the second data buffer area into the aggregation buffer area, and using the data stream as the backward data stream of the information switching indication frame.

In some embodiments, the playing data obtaining module comprises: a first decoding unit, configured to decode based on a first coding parameter corresponding to a first data stream in a process of decoding a converged data stream; the second decoding unit is used for extracting a second coding parameter from the information switching indication frame when the information switching indication frame is detected, switching the coding parameter referred by decoding from the first coding parameter to the second coding parameter and decoding based on the second coding parameter; and the coding unit is used for uniformly coding the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

In some embodiments, the apparatus further comprises: the live broadcast view angle set acquisition module is used for acquiring a live broadcast view angle set corresponding to a target live broadcast scene and establishing a cache group corresponding to each live broadcast view angle in the live broadcast view angle set; the cache group comprises a data cache region and a convergence cache region corresponding to each live broadcast device corresponding to the live broadcast visual angle; the live broadcasting view angle corresponds to a plurality of live broadcasting devices; the data cache region corresponding to any live broadcast device in the cache group is a first data cache region, and the convergence starting position determining module is further configured to select a data cache region other than the first data cache region as a second data cache region from the cache group corresponding to the first data cache region when it is determined that the first data stream has data loss, and switch the data stream cache source corresponding to the convergence cache region from the first data cache region to the second data cache region.

In some embodiments, the target playing object is a target live video corresponding to a target live scene, and the first data stream receiving module is further configured to establish a first transmission channel with a first shooting device corresponding to the target live scene, and receive a first video stream transmitted by the first shooting device through the first transmission channel and a target scene identifier corresponding to the target live scene; based on the target scene identification, taking the first video stream as a main video stream corresponding to the target live video; the second data stream receiving module is further configured to establish a second transmission channel with a second shooting device corresponding to the target live broadcast scene, and receive a second video stream transmitted by the second shooting device through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

For the specific limitations of the playback data processing apparatus, reference may be made to the above limitations on the playback data processing method, which are not described herein again. The modules in the playing data processing device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In some embodiments, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 13. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a play data processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

In some embodiments, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 14. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the data involved in the playing data processing method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a play data processing method.

Those skilled in the art will appreciate that the configurations shown in fig. 13 and 14 are block diagrams of only some of the configurations relevant to the present teachings and do not constitute limitations on the computing devices to which the present teachings may be applied, as a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In some embodiments, there is further provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above method embodiments when executing the computer program.

In some embodiments, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A method for processing playback data, the method comprising:

receiving a first data stream corresponding to a target playing object from a first transmission channel, and caching the first data stream through a first data cache region;

receiving a second data stream corresponding to the target playing object from a second transmission channel, and caching the second data stream through a second data cache region;

taking the first data cache region as a data stream cache source corresponding to a convergence cache region, and storing the data stream in the first data cache region into the convergence cache region;

when it is determined that the first data stream has data loss, switching a data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region, determining a data loss position corresponding to the first data stream, and determining an aggregation starting position corresponding to the second data stream according to the data loss position;

starting from the aggregation starting position corresponding to the second data stream, storing the data stream in the second data cache region into the aggregation cache region;

and obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged cache region.

2. The method of claim 1, wherein the first data stream and the second data stream are encoded data streams; the determining the data missing position corresponding to the first data stream and the determining the aggregation starting position corresponding to the second data stream according to the data missing position includes:

acquiring a target data position corresponding to the data missing position in the second data stream;

acquiring a frame coding type of a target data frame corresponding to the target data position in the second data stream;

and determining a position determination strategy according to the frame coding type, and determining a convergence starting position corresponding to the second data stream according to the target data position and the position determination strategy.

3. The method of claim 2, wherein the determining a position determination policy according to the frame coding type, and the determining a convergence start position corresponding to the second data stream according to the target data position and the position determination policy comprises:

when the frame coding type is a non-coding reference frame, determining the position determination strategy as a coding data group skipping strategy;

and skipping a target coding data group corresponding to the target data position based on the coding data group skipping strategy, and taking the position of a coding reference frame in a backward coding data group corresponding to the target coding data group as a convergence starting position corresponding to the second data stream.

4. The method of claim 2, wherein the determining a position determination policy according to the frame coding type, and the determining a convergence start position corresponding to the second data stream according to the target data position and the position determination policy comprises:

when the frame coding type is a coding reference frame, determining the position determination strategy to be a position keeping strategy;

and taking the target data position as a corresponding convergence starting position of the second data stream based on the position keeping strategy.

5. The method according to claim 1, wherein obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged buffer area comprises:

acquiring a target data length corresponding to the transcoding data set;

inserting a reference indication frame into the converged data stream in the converged cache region according to the target data length to obtain an updated converged data stream;

in the transcoding process, a transcoding reference frame is determined based on the reference indication frame, and a transcoding data group where the transcoding reference frame is located in the updated aggregated data stream is transcoded based on the transcoding reference frame to obtain a transcoded data stream;

and obtaining the playing data corresponding to the target playing object according to the transcoding data stream.

6. The method of claim 5, wherein in the transcoding process, determining a transcoding reference frame based on the reference indication frame, and transcoding a transcoding data set in the updated aggregated data stream where the transcoding reference frame is located based on the transcoding reference frame to obtain a transcoded data stream comprises:

decoding the updated converged data stream to obtain a decoded data stream;

in the process of coding the decoded data stream, when a reference indication frame is detected, a backward adjacent data frame of the reference indication frame is used as a transcoding reference frame, and intra-frame coding is carried out on the transcoding reference frame to obtain an intra-frame coding frame;

and transcoding the transcoding data set where the transcoding reference frame is located based on the intra-frame coding frame to obtain the transcoding data in the transcoding data stream, wherein the transcoding data set where the transcoding reference frame is located comprises the data frame with the target data length.

7. The method of claim 1, wherein the first data stream and the second data stream are encoded data streams, and the storing the data streams in the second data buffer into the convergence buffer from a convergence starting position corresponding to the second data stream comprises:

acquiring a second coding parameter corresponding to the second data stream, and generating an information switching indication frame based on the second coding parameter; the information switching indication frame is used for indicating that in the decoding process, if the information switching indication frame is detected, the second coding parameter is used as a new coding parameter, and the backward data stream of the information switching indication frame is decoded based on the second coding parameter;

and inserting the information switching indication frame into the tail end position of the first data stream in the aggregation buffer area, starting from the aggregation initial position corresponding to the second data stream, storing the data stream in the second data buffer area into the aggregation buffer area, and using the data stream as the backward data stream of the information switching indication frame.

8. The method according to claim 7, wherein the obtaining the playing data corresponding to the target playing object based on the converged data stream in the converged buffer includes:

decoding based on a first encoding parameter corresponding to the first data stream in a process of decoding the converged data stream;

when an information switching indication frame is detected, extracting a second coding parameter from the information switching indication frame, switching the coding parameter referred to for decoding from the first coding parameter to the second coding parameter, and decoding based on the second coding parameter;

and carrying out unified coding on the decoded data stream obtained by decoding to obtain the playing data corresponding to the target playing object.

9. The method of claim 1, further comprising:

acquiring a live broadcast view angle set corresponding to a target live broadcast scene, and establishing a cache group corresponding to each live broadcast view angle in the live broadcast view angle set; the cache group comprises a data cache region and a convergence cache region corresponding to each live broadcast device corresponding to the live broadcast visual angle; the live broadcast view angle corresponds to a plurality of live broadcast devices;

the data cache region corresponding to any live device in the cache group is a first data cache region, and when it is determined that the first data stream has data loss, switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region includes:

when it is determined that the first data stream has data loss, selecting a data cache region other than the first data cache region from the cache group corresponding to the first data cache region as the second data cache region, and switching the data stream cache source corresponding to the aggregation cache region from the first data cache region to the second data cache region.

10. The method of claim 1, wherein the target playback object is a target live video corresponding to a target live scene, receiving a first data stream corresponding to the target playback object from a first transmission channel, and caching the first data stream through a first data cache area comprises:

establishing a first transmission channel with a first shooting device corresponding to the target live broadcast scene, and receiving a first video stream transmitted by the first shooting device through the first transmission channel and a target scene identifier corresponding to the target live broadcast scene; taking the first video stream as a main video stream corresponding to the target live video based on the target scene identification;

the receiving, from a second transmission channel, a second data stream corresponding to the target play object, and caching the second data stream through a second data cache area includes:

establishing a second transmission channel with a second shooting device corresponding to the target live broadcast scene, and receiving a second video stream transmitted by the second shooting device through the second transmission channel and a target scene identifier corresponding to the target live broadcast scene; and taking the second video stream as a backup video stream corresponding to the target live video based on the target scene identification.

11. A playback data processing apparatus, characterized in that the apparatus comprises:

the first data stream receiving module is used for receiving a first data stream corresponding to a target playing object from a first transmission channel and caching the first data stream through a first data cache region;

a second data stream receiving module, configured to receive a second data stream corresponding to the target playing object from a second transmission channel, and cache the second data stream through a second data cache region;

the first data flow aggregation module is used for taking the first data cache region as a data flow cache source corresponding to the aggregation cache region and storing the data flow in the first data cache region into the aggregation cache region;

a convergence initial position determining module, configured to switch a data stream cache source corresponding to the convergence cache region from the first data cache region to the second data cache region when it is determined that the first data stream has data missing, determine a data missing position corresponding to the first data stream, and determine a convergence initial position corresponding to the second data stream according to the data missing position;

the second data stream aggregation module is used for storing the data stream in the second data cache region into the aggregation cache region from an aggregation starting position corresponding to the second data stream;

and the playing data obtaining module is used for obtaining playing data corresponding to the target playing object based on the converged data stream in the converged cache region.

12. The apparatus of claim 11, wherein the first data stream and the second data stream are encoded data streams; the convergence start position determining module comprises:

a target data position determining unit, configured to obtain a target data position corresponding to the data missing position in the second data stream;

a frame coding type obtaining unit, configured to obtain a frame coding type of a target data frame corresponding to the target data position in the second data stream;

and the convergence initial position determining unit is used for determining a position determination strategy according to the frame coding type and determining a convergence initial position corresponding to the second data stream according to the target data position and the position determination strategy.

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.

14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.

15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method of any one of claims 1 to 10.