Audio processing method and system in video communicationTechnical Field
The present invention relates to the field of audio and video processing technologies, and in particular, to an audio processing method and system in video communication.
Background
With the development of multimedia information processing technology and the enhancement of computer data processing capability, audio processing technology has been paid attention to and widely used. Such as: dubbing and music dubbing of video images; commentary, background music of the still image; voice in video phone, video conference; sound effects in the game; sound simulation in virtual reality; the Web is controlled by sound, and the sound output of the electronic reading material is controlled by sound.
In recent years, video communication has been rapidly developed and widely used in life and work of people, however, due to the influence of network environment and communication hardware devices, a phenomenon that video and audio are not synchronized in video communication occurs, which affects the use feeling of people.
Chinese patent publication No.: CN202011068363.0 discloses an audio processing method and system in video communication, where the disclosed technical scheme is that, among a plurality of audio processing modes included in a preset audio processing method set, a target audio processing mode is determined based on audio attribute information of at least one second processed audio data packet sent by a second video communication terminal before the first time; secondly, processing the first voice information to be processed in the first audio data packet to be processed based on the target audio processing mode to obtain first processed voice information; and then, based on the first processed voice information and the first timestamp information, obtaining a first processed audio data packet corresponding to the first audio data packet to be processed, and sending the first processed audio data packet to the second video communication terminal. By the method, the problem that the processing of the audio data in the existing video communication is unreasonable can be solved.
However, the problem of low processing efficiency in synchronizing video and audio in video communication in the related art is not effectively solved.
Disclosure of Invention
Therefore, the invention provides an audio processing method and system in video communication, which are used for solving the problem of low processing efficiency of synchronizing video and audio in the video communication in the prior art.
In one aspect, the present invention provides a method for processing audio in video communication, including:
step S1, a data acquisition unit acquires a section of video communication image data of a receiving end;
step S2, the data processing unit extracts the video data and the audio data in the video communication image data in the step S1, the data processing unit carries out frame processing on the video, and the audio is marked with a time stamp by taking bytes as a unit;
step S3, the central control unit compares whether the time stamp of the video frame corresponding to one byte corresponds to the time stamp of the audio corresponding to the byte one by one, if so, the step S1 is skipped to analyze the next section of video communication image data, and if not, the step S4 is continued;
step S4, the central control unit counts the qualification rate of the audio data corresponding to a plurality of bytes in the video and the corresponding audio, compares the counted qualification rate with a preset qualification rate to judge whether the corresponding audio and the video have problems, if yes, the step S5 is continued, and if no, the correspondence adjustment is carried out on the audio and the video of the bytes with the time stamps which cannot be in one-to-one correspondence;
step S5, the central control unit analyzes whether network delay exists in the downlink of the whole network of the receiving end, if so, the time stamp of the audio is integrally adjusted to enable the video and the audio to be in one-to-one correspondence, and if not, the step S6 is continued;
step S6, the central control unit analyzes whether network delay exists in the uplink of the whole network of the transmitting end, if yes, the time stamp of the audio is integrally adjusted by using the method of the step S5 so that the video and the audio are in one-to-one correspondence, and if not, the step S7 is continued;
and S7, the central control unit performs cache cleaning on the hardware equipment of the sending end and the receiving end.
Further, when the data processing unit finishes processing the video data and the audio data, the central control unit compares whether the time stamp of the video frame corresponding to one byte corresponds to the time stamp of the audio corresponding to one byte one by one,
if yes, the central control unit judges that the video communication image data of the section is qualified;
if not, the central control unit judges that the video communication image data of the section is unqualified.
Further, when the central control unit determines that the video communication image data of the segment is unqualified, the central control unit counts the qualification rate P of the audio data corresponding to a plurality of bytes in the video and the corresponding audio, and sets P=Y/Yz, wherein Y is the number of the audio data corresponding to the bytes with qualified time stamps, yz is the total number of the audio data corresponding to the plurality of bytes, the central control unit is provided with a preset qualification rate P0, and compares P with P0,
if P is more than or equal to P0, the central control unit judges that the video and audio correspondence of the section is not problematic and correspondingly adjusts the video and audio of bytes with the time stamps which cannot be in one-to-one correspondence;
if P is less than P0, the central control unit judges that the corresponding video and audio of the section has problems.
Further, when the central control unit judges that the video and the audio are corresponding to a problem, the central control unit analyzes whether the network delay exists in the downlink of the whole network of the receiving end,
if yes, the central control unit integrally adjusts the time stamp of the audio so that the video and the audio are in one-to-one correspondence;
if not, the central control unit analyzes whether the network delay exists in the uplink of the whole network of the transmitting end.
Further, when the central control unit judges that the downstream of the whole network of the receiving end has network delay, the central control unit integrally adjusts the time stamp of the audio so that the video and the audio are in one-to-one correspondence, the central control unit extracts the time stamp Ts of the video corresponding to a single byte and the time stamp Ty of the audio and calculates the difference delta T between Ts and Ty, and the delta T= |Ts-Ty| is set, the central control unit is provided with a first time stamp difference delta T1, a second time stamp difference delta T2, a first time stamp adjustment coefficient alpha 1, a second time stamp adjustment coefficient alpha 2 and a third time stamp adjustment coefficient alpha 3, wherein delta T1 < [ delta ] T2,0.1 < alpha 2 < alpha 3 < 0.3,
if delta T is less than or equal to delta T1, the central control unit adjusts the time stamp of the audio by using alpha 1;
if DeltaT 1 < DeltaTis less than or equal to DeltaT 2, the central control unit adjusts the time stamp of the audio by using alpha 2;
if DeltaT > DeltaT2, the central control unit adjusts the timestamp of the audio using alpha 3.
Further, when the central control unit adjusts the time stamp of the audio by using the alpha n, setting n=1, 2,3, the central control unit marks the adjusted time stamp of the audio as Ty1,
if the time stamp Ts of the video is earlier than the time stamp Ty of the audio, ty1=ty× (1- αn) is set;
if the time stamp Ts of the video is later than the time stamp Ty of the audio, ty1=ty× (1+αn) is set.
Further, when the central control unit determines that the downstream of the receiving end overall network has no network delay, the central control unit analyzes whether the upstream of the transmitting end overall network has network delay,
if yes, the central control unit integrally adjusts the time stamp of the audio by using the method of the step S5 so that the video and the audio are in one-to-one correspondence;
if not, the central control unit performs cache cleaning on the hardware devices of the sending end and the receiving end.
Further, when the central control unit judges that the qualification rate P of the audio data corresponding to a plurality of bytes in the video and the corresponding audio is more than or equal to P0, the central control unit correspondingly adjusts the video and the audio of the bytes with the time stamps which cannot be in one-to-one correspondence, the central control unit extracts the time stamp Ts 'of the video of the bytes which cannot be in one-to-one correspondence and the time stamp Ty' of the audio,
if the time stamp Ts 'of the video is earlier than the time stamp Ty' of the audio, the central control unit deletes the blank space of the corresponding byte before the earliest time stamp of the audio;
if the time stamp Ts 'of the video is later than the time stamp Ty' of the audio, the central control unit increases a space of a corresponding byte before the earliest time stamp of the audio.
Further, when the central control unit performs correspondence adjustment on the video and the audio of the bytes whose timestamps cannot be in one-to-one correspondence, the calculation method of the adjusted number L of the corresponding bytes is to set l= |ts '-Ty' |/Tz, where L is the adjusted number of the corresponding bytes, and Tz is the timestamp corresponding to the single byte.
In another aspect, the present invention also provides an audio processing system in video communication, including:
the data acquisition unit is used for acquiring video communication image data;
the data processing unit is connected with the data acquisition unit and is used for processing video communication image data;
the central control unit is connected with the data processing unit and is used for judging the correspondence between the video and the audio and detecting the network environment when judging that the video and the audio cannot be in one-to-one correspondence, and if the network environment has no network delay, the central control unit is used for carrying out cache cleaning on hardware equipment of a sending end and a receiving end and simultaneously carrying out correspondence adjustment on the video and the audio which cannot be in one-to-one correspondence.
Compared with the prior art, the system has the beneficial effects that the central control unit is arranged to judge the correspondence between the video and the audio and detect the network environment when judging that the video and the audio cannot be in one-to-one correspondence, and if the network environment has no network delay, the central control unit performs cache cleaning on hardware equipment of a transmitting end and a receiving end and simultaneously performs correspondence adjustment on the video and the audio which cannot be in one-to-one correspondence, so that the processing efficiency of the system for synchronizing the video and the audio in video communication is improved.
Furthermore, the central control unit compares whether the time stamp of the video frame corresponding to one byte and the time stamp of the audio corresponding to one byte are in one-to-one correspondence so as to judge whether the video communication image data of the section are qualified or not, and the accuracy of the system for synchronizing the video and the audio in the video communication is improved.
Furthermore, when the central control unit judges that the video communication image data of the section is unqualified, the qualification rate of the audio data corresponding to a plurality of bytes in the video and the corresponding audio is counted to judge whether the video and the audio corresponding to the section have problems or not, and when the video and the audio corresponding to the section do not have problems, the video and the audio corresponding to the bytes with the time stamps which cannot be in one-to-one correspondence are correspondingly adjusted, so that the workload is reduced, the time is saved, and the processing efficiency of the system for synchronizing the video and the audio in the video communication is further improved.
Furthermore, the central control unit detects the network environment when judging that the video and audio corresponding problems exist, and adjusts the time stamp of the audio entirely when judging that the network delay exists so that the video and the audio corresponding to each other one by one, thereby further improving the processing efficiency of the system for synchronizing the video and the audio in the video communication.
Furthermore, when the central control unit judges that the network environment does not have network delay, the central control unit performs cache cleaning on the hardware equipment of the sending end and the receiving end, and further improves the processing efficiency of the system for synchronizing the video and the audio in the video communication.
Drawings
FIG. 1 is a flow chart of an audio processing method in video communication according to an embodiment of the present invention;
fig. 2 is a block diagram of an audio processing system in video communication according to an embodiment of the present invention.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 2, a block diagram of an audio processing system in video communication according to an embodiment of the present invention includes:
the data acquisition unit is used for acquiring video communication image data;
the data processing unit is connected with the data acquisition unit and is used for processing video communication image data;
the central control unit is connected with the data processing unit and is used for judging the correspondence between the video and the audio and detecting the network environment when judging that the video and the audio cannot be in one-to-one correspondence, and if the network environment has no network delay, the central control unit is used for carrying out cache cleaning on hardware equipment of a sending end and a receiving end and simultaneously carrying out correspondence adjustment on the video and the audio which cannot be in one-to-one correspondence.
Referring to fig. 1, a flowchart of an audio processing method in video communication according to an embodiment of the invention includes:
step S1, a data acquisition unit acquires a section of video communication image data of a receiving end;
step S2, the data processing unit extracts the video data and the audio data in the video communication image data in the step S1, the data processing unit carries out frame processing on the video, and the audio is marked with a time stamp by taking bytes as a unit;
step S3, the central control unit compares whether the time stamp of the video frame corresponding to one byte corresponds to the time stamp of the audio corresponding to the byte one by one, if so, the step S1 is skipped to analyze the next section of video communication image data, and if not, the step S4 is continued;
step S4, the central control unit counts the qualification rate of the audio data corresponding to a plurality of bytes in the video and the corresponding audio, compares the counted qualification rate with a preset qualification rate to judge whether the corresponding audio and the video have problems, if yes, the step S5 is continued, and if no, the correspondence adjustment is carried out on the audio and the video of the bytes with the time stamps which cannot be in one-to-one correspondence;
step S5, the central control unit analyzes whether network delay exists in the downlink of the whole network of the receiving end, if so, the time stamp of the audio is integrally adjusted to enable the video and the audio to be in one-to-one correspondence, and if not, the step S6 is continued;
step S6, the central control unit analyzes whether network delay exists in the uplink of the whole network of the transmitting end, if yes, the time stamp of the audio is integrally adjusted by using the method of the step S5 so that the video and the audio are in one-to-one correspondence, and if not, the step S7 is continued;
and S7, the central control unit performs cache cleaning on the hardware equipment of the sending end and the receiving end.
According to the invention, the central control unit is arranged for judging the correspondence between the video and the audio and detecting the network environment when judging that the video and the audio cannot be in one-to-one correspondence, if the network environment has no network delay, the central control unit performs cache cleaning on hardware equipment of a sending end and a receiving end, and simultaneously performs correspondence adjustment on the video and the audio which cannot be in one-to-one correspondence, so that the processing efficiency of the system for synchronizing the video and the audio in video communication is improved.
Specifically, when the data processing unit finishes processing the video data and the audio data, the central control unit compares whether the time stamp of the video frame corresponding to one byte corresponds to the time stamp of the audio corresponding to one byte one by one,
if yes, the central control unit judges that the video communication image data of the section is qualified;
if not, the central control unit judges that the video communication image data of the section is unqualified.
The central control unit compares whether the time stamp of the video frame corresponding to one byte and the time stamp of the audio corresponding to one byte are in one-to-one correspondence so as to judge whether the video communication image data of the section are qualified or not, and the accuracy of the system for synchronizing the video and the audio in the video communication is improved.
Specifically, when the central control unit determines that the video communication image data of the segment is unqualified, the central control unit counts the qualification rate P of the audio data corresponding to a plurality of bytes in the video and the corresponding audio, sets P=Y/Yz, wherein Y is the number of the audio data corresponding to the bytes with qualified time stamps, yz is the total number of the audio data corresponding to the plurality of bytes, the central control unit is provided with a preset qualification rate P0, compares P with P0,
if P is more than or equal to P0, the central control unit judges that the video and audio correspondence of the section is not problematic and correspondingly adjusts the video and audio of bytes with the time stamps which cannot be in one-to-one correspondence;
if P is less than P0, the central control unit judges that the corresponding video and audio of the section has problems.
When judging that the video communication image data of the section is unqualified, the central control unit counts the qualification rate of the audio data corresponding to a plurality of bytes in the video and the corresponding audio so as to judge whether the corresponding problem exists in the video and the audio, and correspondingly adjusts the video and the audio of the bytes with the time stamps which cannot be in one-to-one correspondence when judging that the corresponding problem does not exist in the video and the audio, thereby reducing the workload, saving the time and further improving the processing efficiency of the system for synchronizing the video and the audio in the video communication.
Specifically, when the central control unit determines that the video and the audio are corresponding to a problem, the central control unit analyzes whether the network delay exists in the downlink of the whole network of the receiving end,
if yes, the central control unit integrally adjusts the time stamp of the audio so that the video and the audio are in one-to-one correspondence;
if not, the central control unit analyzes whether the network delay exists in the uplink of the whole network of the transmitting end.
Specifically, when the central control unit judges that network delay exists in the downlink of the whole network of the receiving end, the central control unit integrally adjusts the time stamp of the audio so that the video and the audio are in one-to-one correspondence, the central control unit extracts the time stamp Ts of the video corresponding to a single byte and the time stamp Ty of the audio and calculates the difference delta T between the Ts and the Ty, and the delta T= |Ts-Ty is set, the central control unit is provided with a first time stamp difference delta T1, a second time stamp difference delta T2, a first time stamp adjustment coefficient alpha 1, a second time stamp adjustment coefficient alpha 2 and a third time stamp adjustment coefficient alpha 3, wherein the delta T1 < [ delta ] T2,0.1 < alpha 2 < alpha 3 < 0.3,
if delta T is less than or equal to delta T1, the central control unit adjusts the time stamp of the audio by using alpha 1;
if DeltaT 1 < DeltaTis less than or equal to DeltaT 2, the central control unit adjusts the time stamp of the audio by using alpha 2;
if DeltaT > DeltaT2, the central control unit adjusts the timestamp of the audio using alpha 3.
Specifically, when the central control unit adjusts the time stamp of the audio by using the alpha n, setting n=1, 2,3, the central control unit marks the adjusted time stamp of the audio as Ty1,
if the time stamp Ts of the video is earlier than the time stamp Ty of the audio, ty1=ty× (1- αn) is set;
if the time stamp Ts of the video is later than the time stamp Ty of the audio, ty1=ty× (1+αn) is set.
The central control unit detects the network environment when judging that the correspondence of the video and the audio is problematic, and adjusts the time stamp of the audio as a whole when judging that the network delay is present so as to enable the video and the audio to correspond one by one, thereby further improving the processing efficiency of the system for synchronizing the video and the audio in video communication.
Specifically, when the central control unit determines that the downstream of the whole network of the receiving end does not have network delay, the central control unit analyzes whether the upstream of the whole network of the transmitting end has network delay,
if yes, the central control unit integrally adjusts the time stamp of the audio by using the method of the step S5 so that the video and the audio are in one-to-one correspondence;
if not, the central control unit performs cache cleaning on the hardware devices of the sending end and the receiving end.
When the central control unit judges that the network environment has no network delay, the central control unit performs cache cleaning on the hardware equipment of the sending end and the receiving end, and further improves the processing efficiency of the system for synchronizing the video and the audio in the video communication.
Specifically, when the central control unit judges that the qualification rate P of the audio data corresponding to a plurality of bytes in the video and the corresponding audio is more than or equal to P0, the central control unit correspondingly adjusts the video and the audio of the bytes with the time stamps which cannot be in one-to-one correspondence, the central control unit extracts the time stamp Ts 'of the video and the time stamp Ty' of the audio of the bytes with the time stamps which cannot be in one-to-one correspondence,
if the time stamp Ts 'of the video is earlier than the time stamp Ty' of the audio, the central control unit deletes the blank space of the corresponding byte before the earliest time stamp of the audio;
if the time stamp Ts 'of the video is later than the time stamp Ty' of the audio, the central control unit increases a space of a corresponding byte before the earliest time stamp of the audio.
Specifically, when the central control unit performs correspondence adjustment on the video and the audio of the bytes whose timestamps cannot be in one-to-one correspondence, the calculation method of the adjusted number L of the corresponding bytes is to set l= |ts '-Ty' |/Tz, where L is the adjusted number of the corresponding bytes, and Tz is the timestamp corresponding to the single byte.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.