CN118042238A

Movatterモバイル変換

Info

Publication number: CN118042238A
Application number: CN202410290871.5A
Authority: CN
Inventors: 周晨
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2024-03-14
Filing date: 2024-03-14
Publication date: 2024-05-14

Abstract

The invention provides a data alignment method, a device, equipment and a storage medium, which comprise the following steps: acquiring an original text; rewriting an original text to obtain a rewritten text; respectively matching the original text and the rewritten text with a preset audio/video to obtain audio/video time intervals respectively corresponding to the original text and the rewritten text; and under the condition that the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets the preset intersection condition, taking the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text as the target audio/video time interval corresponding to the original text. The accuracy of data alignment is improved.

Description

Data alignment method, device, equipment and storage medium

Technical Field

The present invention relates to the field of multimedia applications, and in particular, to a data alignment method, apparatus, device, and storage medium.

Background

Text matching is an important task in audio-visual media, and covers a plurality of fields such as movies, television shows, cartoons, short videos and the like, and the text matching can also be called data alignment, namely, the text matching can be understood as content matching of the text and the audio/video, for example, through algorithm processing, and a video time interval corresponding to the audio/video segment in which the text is matched with the content in the audio/video is determined. How to improve the accuracy of data alignment is a current urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention aims to provide a data alignment method, a device, equipment and a storage medium, so as to improve the accuracy of data alignment. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a data alignment method, including:

acquiring an original text;

Rewriting the original text to obtain a rewritten text, wherein the rewritten text contains the same or synonymous information as the original text and is different from the expression of the original text;

Respectively matching the original text and the rewritten text with a preset audio/video to obtain audio/video time intervals respectively corresponding to the original text and the rewritten text, wherein the audio/video time interval corresponding to the original text represents a time interval corresponding to an audio/video fragment of which the original text is matched with the content in the preset audio/video, and the audio/video time interval corresponding to the rewritten text represents a time interval corresponding to an audio/video fragment of which the rewritten text is matched with the content in the preset audio/video;

And under the condition that the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets a preset intersection condition, taking the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text as a target audio/video time interval corresponding to the original text.

Optionally, the rewriting the original text to obtain a rewritten text includes:

And obtaining the rewritten text corresponding to the original text through a text rewriting model.

Optionally, the text rewrite model is a chat robot model ChatGPT, where the ChatGPT is trained according to a plurality of preset samples and rewrite samples corresponding to the plurality of preset samples.

Optionally, the method further comprises:

And taking the audio/video time interval corresponding to the original text as a target audio/video time interval under the condition that the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text does not meet a preset intersection condition.

Optionally, the intersection between the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets a preset intersection condition, which includes: the corresponding audio/video time interval of the rewritten text and the corresponding audio/video time interval of the original text have intersection; or alternatively

And the ratio of the intersection interval of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text to the audio/video time interval corresponding to the original text is not lower than a preset intersection ratio.

Optionally, the original text is an original scenario text, the preset audio/video is a video, the audio/video time interval is a video time interval, the video time interval corresponding to the original scenario text represents a time interval corresponding to a video segment in which the original scenario text is matched with video content in the video, and the video time corresponding to the rewritten text represents a time interval corresponding to a video segment in which the rewritten text is matched with video content in the video.

In a second aspect, an embodiment of the present invention provides a data alignment apparatus, including:

The acquisition module is used for acquiring the original text;

The rewriting module is used for rewriting the original text to obtain a rewritten text, wherein the rewritten text contains the same or synonymous information as the original text and is different from the expression of the original text;

The matching module is used for respectively matching the original text and the rewritten text with a preset audio/video to obtain audio/video time intervals respectively corresponding to the original text and the rewritten text, wherein the audio/video time intervals corresponding to the original text represent time intervals corresponding to audio/video fragments of the original text matched with the content in the preset audio/video, and the audio/video time intervals corresponding to the rewritten text represent time intervals corresponding to the rewritten text matched with the content in the preset audio/video;

and the correction module is used for taking the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text as a target audio/video time interval corresponding to the original text when the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets a preset intersection condition.

Optionally, the rewrite module is specifically configured to input the original text into a text rewrite model, so as to obtain a rewritten text corresponding to the original text.

In a third aspect, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

a processor for implementing the method steps of any of the first aspects when executing a program stored on a memory.

In still another aspect of the implementation of the present invention, there is further provided a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the method for matching scenario text and video time described in any one of the above.

In yet another aspect of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the scenario text-to-video time matching method of any one of the above.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of a data alignment method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an embodiment of the present invention using a data alignment method;

FIG. 3 is a schematic diagram of a data alignment device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

An embodiment of the present invention provides a data alignment method, as shown in fig. 1, may include:

s101, acquiring an original text;

S102, rewriting an original text to obtain a rewritten text, wherein the rewritten text contains the same or synonymous information as the original text and is different from the expression of the original text;

S103, respectively matching the original text and the rewritten text with a preset audio/video to obtain audio/video time intervals respectively corresponding to the original text and the rewritten text, wherein the audio/video time interval corresponding to the original text represents a time interval corresponding to an audio/video segment of the original text matched with the content in the preset audio/video, and the audio/video time interval corresponding to the rewritten text represents a time interval corresponding to an audio/video segment of the rewritten text matched with the content in the preset audio/video;

And S104, taking the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text as a target audio/video time interval corresponding to the original text when the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets a preset intersection condition.

In the embodiment of the invention, the original text is rewritten to obtain the rewritten text, the original text and the rewritten text are respectively matched with the preset audio/video to obtain the audio/video time intervals corresponding to the original text and the rewritten text, and when the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets the preset intersection condition, the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text is taken as the target audio/video time interval corresponding to the original text. Therefore, the accuracy of the matching result can be improved by using the time intersection verification, so that the matching error is reduced, the accuracy of text matching can be improved, namely the accuracy of data alignment is improved, and the accuracy of matching the text with the corresponding audio/video time interval can be understood as being improved.

The original text in S101 may be any form of text. For example, it may be scenario text, including dialogue or scenario text, etc. For example, a scenario text may include a plurality of sentences, and a sentence, i.e., a sentence, may be understood as a scenario text.

The purpose of overwriting the original text in S102 is to expand and enrich the expression of the original sentence as much as possible. The rewritten text is a representation that contains the same or synonymous information as the original text and is different from the original text.

The different expression modes can be different in sentence sequence, chinese and English, and the like.

The number of the rewritten text may be plural, or may be one in one case.

For ease of understanding, the original text may be understood as a parent sentence, the original sentence, and the rewritten text corresponding to the original text may be understood as a clause.

In one implementation, S102 may include: and obtaining the rewritten text corresponding to the original text through the text rewriting model.

That is, the original text is rewritten by the text rewrite model, and the purpose of the text rewrite model is to generate a rewritten sentence, which may be synonymous, contain the same information but have different expressions, and n kinds of rewritten sentences may be used, while ensuring that the rewritten sentence is grammatically and logically correct.

The text rewrite model can be trained in advance, so that when the original text is rewritten, the original text can be input into the text rewrite model obtained in advance, and the rewritten text corresponding to the original text can be output through the text rewrite model.

For example, the text rewrite model is a chat robot model (CHAT GENERATIVE PRE-trained Transformer, chatGPT).

ChatGPT may be trained from a plurality of preset samples and rewritten samples corresponding to the plurality of preset samples, respectively.

For example, the rewritten samples corresponding to the preset samples may be understood as sample truth values, for the preset samples, the preset samples are input into an initial ChatGPT model, the output value of the initial ChatGPT model is compared with the rewritten samples corresponding to the preset samples (i.e. sample truth values), the model parameters are adjusted by using the difference between the output value and the sample truth values, and the process is repeated until the difference between the output value of the model corresponding to the preset sample and the sample truth value converges, or the iteration number (one preset sample is input into the initial ChatGPT model, the output value of the initial ChatGPT model is compared with the rewritten samples corresponding to the preset sample, and the model parameters are adjusted by using the difference between the output value and the sample truth value to be understood as one iteration) reaches the preset number, so that the trained model ChatGPT is obtained after training is completed.

In one implementation, the original text may be input into a text rewrite model to obtain a rewritten text corresponding to the original text.

In another implementation manner, the precursor of the guide sentence may be obtained according to the original text, and then the precursor of the guide sentence is input ChatGPT, that is, the precursor is used to drive ChatGPT, so as to obtain the rewritten text.

Campt may be understood as a parametric description in a format that is used to guide ChatGPT understanding user needs. For example, promtt is tell me: the rewritten sentence of "original sentence" or tell me: synonyms of "original sentence", etc., where the original sentence is the original text described above, may be determined according to actual requirements.

When the original text is a scenario text and the scenario text is matched with the video, the data alignment method provided by the embodiment of the invention can be understood as a scenario text and video time matching method, and can be understood as a scenario matching correction method based on a text rewriting model (such as a large language model ChatGPT).

In S103, the original text and the rewritten text are respectively matched with the preset audio/video, so as to obtain audio/video time intervals respectively corresponding to the original text and the rewritten text.

May include: and respectively matching the original text and the rewritten text with the preset audio, or respectively matching the original text and the rewritten text with the preset video and the preset audio. The matching is specifically content matching.

It is simply understood that text may be matched to audio, or text may be matched to video.

The audio/video time interval corresponding to the original text represents a time interval corresponding to an audio/video segment of the original text, which is matched with the content in the preset audio/video, and the audio/video time interval corresponding to the rewritten text represents a time interval corresponding to an audio/video segment of the rewritten text, which is matched with the content in the preset audio/video.

When the original text is a scenario text and the preset audio/video is a video, the original scenario text and the rewritten text are respectively subjected to content matching with the preset video to obtain a video time interval corresponding to the original scenario text (namely, a time interval corresponding to a video segment of the original scenario text matched with video content in the video), and obtain a video time corresponding to the rewritten text (namely, a time interval corresponding to a video segment of the rewritten text matched with video content in the video).

And respectively carrying out content matching on the original scenario text and the rewritten text with a preset video, and also can be understood as carrying out video time matching on the rewritten text and the original scenario text to obtain video time intervals respectively corresponding to the rewritten text and the original scenario text, namely carrying out scenario text and video time interval alignment.

The video time interval corresponding to the original scenario text obtained here can be also understood as a scenario matching result before correction, and can also be called an original scenario matching result.

And carrying out video time interval matching on each generated rewritten sentence and the original sentence. This can be done by existing scenario matching systems. Recording the video time interval corresponding to each rewritten sentence. Existing scenario matching systems such as HERO (HIERARCHICAL ENCODER FOR VIDEO +Language Omni-representation Pre-tracking).

In one implementation, a video segment corresponding to a text may be determined first, and then a video time period of the video segment is taken as a video time interval corresponding to the text. For example:

And (3) inputting n+1 sentences (1 original scenario text and corresponding n rewritten texts) into the existing scenario matching model by combining the original video to obtain n+1 video clips, namely, video clips corresponding to the n+1 sentences respectively. Because the video clips corresponding to the n+1 sentences respectively determine the video time intervals corresponding to the n+1 sentences respectively, namely the video time intervals of the video clips, the video time intervals corresponding to the n+1 sentences respectively are obtained.

The input of the scenario matching model is scenario text to be matched and original video, and the input is video clips or video time intervals corresponding to the scenario text.

Multiple sets of training samples can be collected in advance, and scenario matching models can be obtained by utilizing the multiple sets of training samples. Each set of training samples may include text and video clips or video time intervals corresponding to the annotated text. The video segment or the video time interval corresponding to the marked text can be understood as a sample true value, and when the sample true value is the video segment corresponding to the marked text, the output of the scenario matching model obtained by training is the video segment corresponding to the scenario text; and when the true value of the sample is the video time interval corresponding to the marked text, outputting the scenario matching model obtained through training is the video time interval corresponding to the scenario text.

In S104, when the intersection between the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets the preset intersection condition, the intersection between the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text is taken as the target audio/video time interval corresponding to the original text. That is, the intersection of the rewritten text and the audio/video time interval corresponding to the original text is used to correct the audio/video time interval corresponding to the original text.

And taking the audio/video time interval corresponding to the original text as the target audio/video time interval under the condition that the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text does not meet the preset intersection condition. I.e. the audio/video time interval corresponding to the original text is maintained unchanged.

Wherein, the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets the preset intersection condition, comprising: the rewritten text corresponds to the audio/video time interval and the original text corresponds to the audio/video time interval; or alternatively

The ratio of the intersection interval of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text to the audio/video time interval corresponding to the original text is not lower than the preset intersection ratio.

The preset intersection proportion may be determined empirically or in actual need, such as 65%, 70%, 80%, 90%, etc.

When the original text is a scenario text and the preset audio/video is video, that is, the text and the video are matched, in the embodiment of the invention, when the intersection of the video time interval corresponding to the rewritten text and the video time interval corresponding to the original scenario text meets the preset intersection condition, the intersection of the video time interval corresponding to the rewritten text and the video time interval corresponding to the original scenario text is taken as the target video time interval corresponding to the original text. That is, the video time interval corresponding to the original scenario text is corrected by using intersections of the rewritten text and the video time interval corresponding to the original scenario text, respectively.

In one implementation manner, when the rewritten text has only 1 text, whether the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text have intersection or not can be directly judged;

When a plurality of rewritten texts exist, whether the audio/video time intervals corresponding to the plurality of rewritten texts have intersections or not can be judged first, if so, the intersections are compared with the audio/video time intervals corresponding to the original texts, whether the intersections have intersections with the audio/video time intervals corresponding to the original texts or not is judged, and if so, the situation that the rewritten texts have intersections with the audio/video time intervals corresponding to the original texts is indicated. Or the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text can be directly put together to judge whether the intersection exists.

There is now a set of rewritten sentences and their corresponding time intervals. The time intervals and the original sentences can be checked to see whether an intersection exists, if so, the intersection interval can be considered to be a more accurate matching result, and the result is applied to be a final time interval.

Specifically, it can be understood that: the multiple rewritten texts corresponding to one original text can be regarded as a group, and a group of audio/video time intervals corresponding to the rewritten texts and the original texts respectively are obtained. And judging whether an intersection exists between the audio/video time intervals corresponding to the rewritten text and the original text, if so, considering the intersection as a more accurate matching result, and applying the result as a final time interval. And taking the intersection of the audio/video time intervals corresponding to the rewritten texts and the original texts as the target audio/video time intervals corresponding to the original texts.

Taking the intersection of each rewritten text and the audio/video time interval corresponding to the original text as the target audio/video time interval corresponding to the original text can be also understood as: and correcting the audio/video time interval corresponding to the original text into the intersection of each rewritten text and the audio/video time interval corresponding to the original text.

And under the condition that the rewritten text and the audio/video time interval corresponding to the original text do not have intersection, the audio/video time interval corresponding to the original text is not changed.

That is, if the rewritten text and the audio/video time interval corresponding to the original text do not intersect, the original matching result is maintained unchanged, that is, the determined audio/video time interval corresponding to the original text is not changed.

In the embodiment of the invention, the rewritten sentences, namely the audio/video time interval obtained by the rewritten text and the audio/video time interval corresponding to the original text are summarized, and if a time intersection exists, namely a plurality of sentences are matched to the same or similar time interval, the time intersection is used as a final correction result, namely the time intersection is used as a target audio/video time interval corresponding to the original scenario text.

If there is no time intersection, for example, the audio/video time interval corresponding to the original sentence is 1 second-12 seconds, and the audio/video time interval corresponding to the rewritten sentence is 30 seconds-45 seconds, that is, the rewritten sentence has no intersection with the original sentence time interval, then the original time interval is kept unchanged.

Therefore, the accuracy of matching between the text and the audio/video time interval can be effectively improved, the error of the traditional matching algorithm is improved, and more accurate data support is provided for downstream tasks.

In an optional embodiment, in the embodiment of the present invention, an audio/video time interval corresponding to an original text and a rewritten text may be put together, whether an intersection exists between the audio/video time interval corresponding to the original text and the rewritten text is determined, and when the audio/video time interval corresponding to the original text and the rewritten text has an intersection, the intersection of the audio/video time interval corresponding to the rewritten text is taken as a target audio/video time interval corresponding to the original text; and under the condition that the audio/video time interval corresponding to the original text and the rewritten text has no intersection, maintaining the original matching result unchanged.

In an alternative embodiment, the original text is an original scenario text, the preset audio/video is a video, the audio/video time interval is a video time interval, the video time interval corresponding to the original scenario text represents a time interval corresponding to a video segment in which the original scenario text matches video content in the video, and the video time corresponding to the rewritten text represents a time interval corresponding to a video segment in which the rewritten text matches video content in the video.

In this case, the data alignment method provided by the embodiment of the present invention may be also understood as a scenario text and video time matching method, and may also be understood as a scenario matching correction method based on a text rewrite model (e.g., a large language model ChatGPT).

Scenario matching is an important task in audio-visual media, and covers a plurality of fields such as movies, television shows, cartoons, short videos and the like, and scenario matching can be understood as matching scenario text with video time. Through algorithm processing, the scenario text is matched with the corresponding video time interval, and the method has a vital effect on tasks such as scenario understanding, automatic subtitle generation, follow-up video editing and the like.

However, existing scenario matching algorithms may cause deviations in matching for various reasons, such as ambiguity in dialogue content, uncertainty in actor performance, or some delay or advance in sound and picture. These deviations not only plague the viewer's understanding, but also negatively impact downstream tasks such as automatic subtitle generation, episode extraction, clip production, etc., resulting in some performance degradation. Therefore, how to improve the accuracy of scenario matching is an urgent issue to be solved.

In the embodiment of the invention, the rewritten text corresponding to the original scenario text is obtained by rewriting the original scenario text, namely, a matching sample can be added by rewriting, and video time matching is carried out on the rewritten text and the original scenario text, so that video time intervals corresponding to the rewritten text and the original scenario text respectively are obtained; and taking the intersection of the video time interval corresponding to the rewritten text and the video time interval corresponding to the original scenario text as a target video time interval corresponding to the original scenario text when the video time interval corresponding to the rewritten text and the video time interval corresponding to the original scenario text meet the preset intersection condition. Therefore, the accuracy of the matching result can be improved by using the time intersection verification, so that errors of the scenario matching algorithm can be reduced, and the accuracy of scenario matching can be improved, namely, the accuracy of matching the scenario text with the corresponding video time interval can be improved.

The original scenario text is rewritten to obtain a rewritten text, and the rewritten text corresponding to the original scenario text can be obtained through a text rewriting model. Reference is specifically made to the description of S102 above, and details thereof will not be described here.

The aim of rewriting the original scenario text is to expand and enrich the expression of the original sentence as much as possible so as to improve the accuracy of matching the original scenario text with the video time interval.

The rewritten text can be synonymous with the original scenario text, contains the same information but has different expression modes, and ensures that the rewritten sentence is correct in grammar and logic. For example, there may be n rewrites for an original scenario text, i.e., for an original scenario text, n rewritten texts are obtained.

For ease of understanding, the original scenario text may be understood as a parent sentence, the original sentence, and the rewritten text corresponding to the original scenario text may also be understood as a clause.

After obtaining the rewritten sentences, respectively matching the original scenario text and the rewritten text with a preset video to obtain the corresponding/video time intervals of the original scenario text and the rewritten text. The matching is specifically content matching.

The intersection of the video time interval corresponding to the rewritten text and the video time interval corresponding to the original scenario text meets the preset intersection condition, and the method comprises the following steps: the video time interval corresponding to the rewritten text is intersected with the video time interval corresponding to the original scenario text; or the ratio of the intersection interval of the video time interval corresponding to the rewritten text and the video time interval corresponding to the original scenario text to the video time interval corresponding to the original scenario text is not lower than the preset intersection ratio.

In one implementation manner, when the number of the rewritten texts is only 1, whether the video time interval corresponding to the rewritten texts and the video time interval corresponding to the original scenario text have intersection or not can be directly judged;

When a plurality of rewritten texts exist, whether the video time intervals corresponding to the plurality of rewritten texts have intersections or not can be judged, if so, the intersections are compared with the video time intervals corresponding to the original scenario texts, whether the intersections have the intersections with the video time intervals corresponding to the original scenario texts or not is judged, and if so, the situation that the rewritten texts have the intersections with the video time intervals corresponding to the original scenario texts is indicated. Or the video time interval corresponding to the rewritten text and the video time interval corresponding to the original scenario text can be directly put together to judge whether the intersection exists.

Specifically, it can be understood that: the plurality of rewritten texts corresponding to the original scenario text can be regarded as a group, and a group of video time intervals corresponding to the rewritten texts and the original scenario text respectively are obtained. And judging whether an intersection exists between the set of rewritten texts and the video time interval corresponding to the original scenario text, if so, considering the intersection interval as a more accurate matching result, and applying the result as a final time interval. And taking the intersection of each rewritten text and the video time interval corresponding to the original scenario text as the target video time interval corresponding to the original scenario text.

Taking the intersection of each rewritten text and the video time interval corresponding to the original scenario text as the target video time interval corresponding to the original scenario text can also be understood as: and correcting the original scenario matching result (namely, the video time interval corresponding to the determined original scenario text) into the intersection of each rewritten text and the video time interval corresponding to the original scenario text.

And when the rewritten text and the video time interval corresponding to the original scenario text do not intersect, the video time interval corresponding to the original scenario text is not changed.

That is, if the rewritten text does not intersect with the video time interval corresponding to the original scenario text, the original scenario matching result is maintained unchanged, that is, the video time interval corresponding to the determined original scenario text is not changed.

In the embodiment of the invention, the rewritten sentences, namely the video time interval obtained by the rewritten text and the video time interval corresponding to the original scenario text are summarized, and if a time intersection exists, namely a plurality of sentences are matched to a consistent or similar time interval, the time intersection is used as a final correction result, namely the time intersection is used as a target video time interval corresponding to the original scenario text.

If there is no time intersection, for example, the video time interval corresponding to the original sentence is 1 second-12 seconds, and the video time interval corresponding to the rewritten sentence is 30 seconds-45 seconds, that is, the rewritten sentence has no intersection with the original sentence time interval, then the original time interval is kept unchanged.

Therefore, the accuracy of the scenario and video time interval matching can be effectively improved, the error of the traditional scenario matching algorithm is improved, and more accurate data support is provided for downstream tasks.

In an optional embodiment, in the embodiment of the present invention, an original scenario text and a video time interval corresponding to a rewritten text may be put together, whether an intersection exists between the original scenario text and the video time interval corresponding to the rewritten text is determined, and when the intersection exists between the original scenario text and the video time interval corresponding to the rewritten text, the intersection of the video time interval corresponding to the rewritten text is taken as a target video time interval corresponding to the original scenario text; and under the condition that no intersection exists between the video time interval corresponding to the original scenario text and the rewritten text, the original scenario matching result is maintained unchanged.

The embodiment of the invention prepares the remarks of a plurality of versions by rewriting the scenario text and uses the remarks for matching the video time interval. This is based on the assumption that: once the text is rewritten more widely or differently, the likelihood of its matching with the video content increases and, therefore, the error may decrease. And is based on the following findings: if the parent sentence and the clause (the rewritten sentence) are matched in the same or similar time interval, then the result is more likely to be accurate. If there is no intersection between the matching results, there is no sufficient reason to replace the original matching results, thereby preserving the original time interval. After the embodiment of the invention is rewritten, the rewritten sentences are further used for matching the video time intervals, and whether the time intersection exists between the sentences is observed.

In summary, the embodiment of the invention achieves the purposes of reducing scenario matching algorithm errors and enhancing matching accuracy by adding the matching samples and verifying the accuracy of the matching result by utilizing the time intersection. Namely, by correcting the scenario matching result, scenario matching errors are reduced, and accuracy of scenario matching is improved. Furthermore, more accurate data support can be provided for subsequent tasks, and the quality and user experience of the whole audio-visual media product are improved.

Referring to fig. 2, a detailed description will be given of specific implementation steps of a correction method for scenario matching problem according to an embodiment of the present invention. The embodiment of the invention can greatly reduce the error rate of scenario matching and improve the effect and performance of downstream tasks. As shown in fig. 2:

Inputting a text, namely inputting a scenario text into a text rewrite model such as ChatGPT, for example, inputting 1 sentence into ChatGPT, and performing ChatGPT rewrite to obtain n rewritten texts, namely n sentences, so as to obtain n+1 sentences in total;

Then, the n+1 sentences are input into the existing scenario matching model by combining the original video, and n+1 video clips are obtained, namely, the video clips corresponding to the n+1 sentences respectively.

After that, time interval calculation and correction can be performed.

Because the video clips corresponding to the n+1 sentences respectively determine the video time intervals corresponding to the n+1 sentences respectively, namely the video time intervals of the video clips, the video time intervals corresponding to the n+1 sentences respectively are obtained.

And obtaining video time intervals corresponding to n+1 sentences respectively, namely obtaining video time intervals corresponding to n rewritten texts respectively and video intervals corresponding to original scenario texts.

As shown in fig. 3, an embodiment of the present invention provides a data alignment device, including:

An obtaining module 301, configured to obtain an original text;

The rewriting module 302 is configured to rewrite the original text to obtain a rewritten text, where the rewritten text includes information that is the same as or synonymous with the original text and is different from a representation of the original text;

The matching module 303 is configured to match the original text and the rewritten text with a preset audio/video respectively, so as to obtain audio/video time intervals corresponding to the original text and the rewritten text respectively, where the audio/video time interval corresponding to the original text represents a time interval corresponding to an audio/video segment in which the original text matches with content in the preset audio/video, and the audio/video time interval corresponding to the rewritten text represents a time interval corresponding to an audio/video segment in which the rewritten text matches with content in the preset audio/video;

The correction module 304 is configured to, when the intersection between the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets a preset intersection condition, take the intersection between the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text as a target audio/video time interval corresponding to the original text.

Optionally, the rewrite module 302 is specifically configured to obtain, through a text rewrite model, a rewritten text corresponding to the original text.

Optionally, the text rewrite model is a chat bot model ChatGPT.

Optionally, the apparatus further comprises:

And the maintaining module is used for taking the audio/video time interval corresponding to the original text as the target audio/video time interval under the condition that the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text does not meet the preset intersection condition.

Optionally, the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text meets a preset intersection condition, including: the rewritten text corresponds to the audio/video time interval and the original text corresponds to the audio/video time interval; or alternatively

Optionally, the original text is an original scenario text, the preset audio/video is a video, the audio/video time interval is a video time interval, the video time interval corresponding to the original scenario text represents a time interval corresponding to a video segment of the original scenario text, which is matched with video content in the video, and the video time corresponding to the rewritten text represents a time interval corresponding to a video segment of the rewritten text, which is matched with video content in the video.

The embodiment of the invention also provides an electronic device, as shown in fig. 4, which comprises a processor 401, a communication interface 402, a memory 403 and a communication bus 404, wherein the processor 401, the communication interface 402 and the memory 403 complete communication with each other through the communication bus 404,

A memory 403 for storing a computer program;

the processor 401 is configured to implement the method steps of the data alignment method when executing the program stored in the memory 403.

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the data alignment method according to any of the above embodiments.

In yet another embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the data alignment method of any of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device, computer readable storage medium, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments being referred to in the section of the description of the method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A data alignment method, comprising:

acquiring an original text;

2. The method of claim 1, wherein the overwriting the original text to obtain the overwritten text comprises:

3. The method of claim 2, wherein the text rewrite model is a chat robot model ChatGPT, and wherein the ChatGPT is trained from a plurality of preset samples and a plurality of rewrite samples corresponding to each of the preset samples.

4. The method according to claim 1, wherein the method further comprises:

5. The method according to any one of claims 1 to 4, wherein the intersection of the audio/video time interval corresponding to the rewritten text and the audio/video time interval corresponding to the original text satisfies a preset intersection condition, comprising: the corresponding audio/video time interval of the rewritten text and the corresponding audio/video time interval of the original text have intersection; or alternatively

6. The method according to any one of claims 1 to 4, wherein the original text is an original scenario text, the preset audio/video is a video, the audio/video time interval is a video time interval, the video time interval corresponding to the original scenario text represents a time interval corresponding to a video segment in which the original scenario text matches video content in the video, and the video time corresponding to the rewritten text represents a time interval corresponding to a video segment in which the rewritten text matches video content in the video.

7. A data alignment apparatus, comprising:

The acquisition module is used for acquiring the original text;

8. The apparatus of claim 7, wherein the rewrite module is specifically configured to obtain, through a text rewrite model, a rewritten text corresponding to the original text.

9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

A processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.