Movatterモバイル変換


[0]ホーム

URL:


CN113378000B - Video title generation method and device - Google Patents

Video title generation method and device
Download PDF

Info

Publication number
CN113378000B
CN113378000BCN202110760887.4ACN202110760887ACN113378000BCN 113378000 BCN113378000 BCN 113378000BCN 202110760887 ACN202110760887 ACN 202110760887ACN 113378000 BCN113378000 BCN 113378000B
Authority
CN
China
Prior art keywords
target
scenario description
video
sub
sentences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110760887.4A
Other languages
Chinese (zh)
Other versions
CN113378000A (en
Inventor
王亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co LtdfiledCriticalBeijing QIYI Century Science and Technology Co Ltd
Priority to CN202110760887.4ApriorityCriticalpatent/CN113378000B/en
Publication of CN113378000ApublicationCriticalpatent/CN113378000A/en
Application grantedgrantedCritical
Publication of CN113378000BpublicationCriticalpatent/CN113378000B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The embodiment of the invention provides a video title generation method and device, which relate to the technical field of video processing, and the method comprises the following steps: determining sub-videos in the target video; determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences; the scenario description text is used for describing video content of the target video; and generating a video title of the sub-video based on the target scenario description sentence. Based on the above processing, the generation efficiency of the video title can be improved.

Description

Video title generation method and device
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a method and apparatus for generating a video title.
Background
With the rapid development of internet technology, when a user views a video using a video application program, the user can select a segment of interest to obtain a sub-video (also called a split video) and upload the sub-video to a server of a video website. And in order to better manage the sub-videos, an appropriate title needs to be set for each sub-video.
In the related art, each sub-video may be viewed by a technician and a title may be set for the sub-video according to the content of the sub-video. The above process is cumbersome, resulting in low efficiency in the generation of video titles of sub-videos.
Disclosure of Invention
The embodiment of the invention aims to provide a video title generation method and device so as to improve the generation efficiency of video titles. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided a video title generation method, the method comprising:
determining sub-videos in the target video;
determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences; the scenario description text is used for describing video content of the target video;
and generating a video title of the sub-video based on the target scenario description sentence.
Optionally, the determining, from scenario description text of the target video, scenario description sentences associated with the sub video as target scenario description sentences includes:
determining a speech segment associated with the sub video from the speech segments contained in the target video as a target speech segment;
determining a scenario description sentence corresponding to the target speech segment based on a target corresponding relation between the speech segment and the scenario description sentence of the target video, and taking the scenario description sentence as a target scenario description sentence; the target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video.
Optionally, an intersection exists between a time period corresponding to the target speech segment and a time period of the sub video.
Optionally, the target correspondence is obtained by:
segmenting the speech of the target video to obtain a plurality of speech fragments;
dividing sentences of the scenario description text of the target video to obtain a plurality of scenario description sentences;
extracting keywords in each keyword segment and each scenario description sentence to obtain a keyword set of each keyword segment and a keyword set of each scenario description sentence;
and determining target corresponding relations between the line segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the line segments and the keyword sets of the scenario description sentences.
Optionally, the determining, based on the similarity between the keyword sets of the plurality of speech segments and the keyword sets of the plurality of scenario description sentences, the target correspondence between the speech segments and the scenario description sentences in the target video includes:
based on the similarity between the keyword sets of the plurality of line segments and the keyword sets of the plurality of scenario description sentences and a preset constraint condition, determining a plurality of corresponding relations between the line segments and the scenario description sentences in the target video as a first corresponding relation;
Wherein, the preset constraint condition comprises: a scenario description sentence corresponding to a speech segment with an earlier time period is positioned before a scenario description sentence corresponding to a speech segment with a later time period in the scenario description text;
determining the corresponding relation with the maximum total similarity in the first corresponding relations as a target corresponding relation;
the total similarity of one first corresponding relation is as follows: in the first corresponding relation, the sum value of the similarity between the keyword set of each line segment and the keyword set of the corresponding scenario description sentence.
Optionally, the generating, based on the target scenario description sentence, a video title of the sub-video includes:
if the target scenario description sentence is one, the target scenario description sentence is used as a video title of the sub-video;
and if the plurality of target scenario description sentences are provided, selecting one from the plurality of target scenario description sentences as the video title of the sub-video.
Optionally, the selecting one from the plurality of target scenario descriptions as the video title of the sub-video includes:
selecting a target scenario description sentence of which the time period of the first corresponding line segment belongs to the time period of the sub-video from a plurality of target scenario description sentences according to the sequence in the scenario description text based on the target correspondence, and taking the target scenario description sentence as a video title of the sub-video;
Or,
and selecting a target scenario description sentence with the largest intersection between the time period of the corresponding speech segment and the time period of the sub-video from the plurality of target scenario description sentences based on the target correspondence, and taking the target scenario description sentence as a video title of the sub-video.
A method for generating a video title, the method comprising:
when a video splitting request aiming at a target video is received, determining a sub video which is required to be split currently;
determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences;
and generating a video title of the sub-video based on the target scenario description sentence.
In a second aspect of the implementation of the present invention, there is also provided a video title generating apparatus, including:
the sub-video determining module is used for determining sub-videos in the target video;
the target scenario description sentence determining module is used for determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos to serve as target scenario description sentences; the scenario description text is used for describing video content of the target video;
And the title generation module is used for generating the video title of the sub-video based on the target scenario description sentence.
Optionally, the target scenario description sentence determining module includes:
the target speech segment determining submodule is used for determining speech segments associated with the sub video from all speech segments contained in the target video to serve as target speech segments;
the target scenario description sentence determining submodule is used for determining scenario description sentences corresponding to the target speech fragments based on target corresponding relations between the speech fragments of the target video and the scenario description sentences, and taking the scenario description sentences as target scenario description sentences; the target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video.
Optionally, an intersection exists between a time period corresponding to the target speech segment and a time period of the sub video.
Optionally, the apparatus further includes:
the segmentation module is used for segmenting the speech of the target video to obtain a plurality of speech fragments;
the clause module is used for carrying out clause on the scenario description text of the target video to obtain a plurality of scenario description sentences;
the keyword set acquisition module is used for extracting keywords in each keyword segment and each scenario description sentence to obtain a keyword set of each keyword segment and a keyword set of each scenario description sentence;
The target corresponding relation determining module is used for determining the target corresponding relation between the speech segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences.
Optionally, the target correspondence determining module includes:
the first correspondence determining sub-module is used for determining a plurality of correspondences between the speech segments and the scenario description sentences in the target video as a first correspondence based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences and a preset constraint condition;
wherein, the preset constraint condition comprises: a scenario description sentence corresponding to a speech segment with an earlier time period is positioned before a scenario description sentence corresponding to a speech segment with a later time period in the scenario description text;
the target corresponding relation determining sub-module is used for determining the corresponding relation with the maximum total similarity in each first corresponding relation as a target corresponding relation;
the total similarity of one first corresponding relation is as follows: in the first corresponding relation, the sum value of the similarity between the keyword set of each line segment and the keyword set of the corresponding scenario description sentence.
Optionally, the title generating module includes:
the first title generation submodule is used for taking the target scenario description sentence as a video title of the sub-video if the target scenario description sentence is one;
and the second title generation sub-module is used for selecting one from the plurality of target scenario description sentences as the video title of the sub-video if the target scenario description sentences are a plurality of.
Optionally, the second title generating sub-module is specifically configured to select, based on the target correspondence, according to the sequence in the scenario description text, from a plurality of target scenario description sentences, a target scenario description sentence in which a time period of the first corresponding speech segment belongs to a time period of the sub-video, as a video title of the sub-video;
or,
and selecting a target scenario description sentence with the largest intersection between the time period of the corresponding speech segment and the time period of the sub-video from the plurality of target scenario description sentences based on the target correspondence, and taking the target scenario description sentence as a video title of the sub-video.
In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus;
A memory for storing a computer program;
and a processor, configured to implement the video title generation method according to any one of the first aspect when executing the program stored in the memory.
In yet another aspect of the implementation of the present invention, there is also provided a computer readable storage medium, in which a computer program is stored, the computer program implementing any one of the above video title generation methods when executed by a processor.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the video title generation methods described above.
The video title generation method provided by the embodiment of the invention determines the sub-video in the target video; determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences; the scenario description text is used for describing video content of the target video; and generating a video title of the sub-video based on the target scenario description sentence.
The scenario description text of the target video can embody the video content of the target video, and correspondingly, the target scenario description sentence associated with the sub video can embody the video content of the sub video. Therefore, the video title of the sub-video generated based on the target scenario description sentence can embody the video content of the sub-video, and compared with the situation that a technician sets the title for the sub-video after watching the sub-video, the generation efficiency of the video title can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart of a video title generation method provided in an embodiment of the present invention;
FIG. 2 is a flowchart of another video title generation method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for generating a target correspondence according to an embodiment of the present invention;
FIG. 4 is a flowchart of another method for generating a target correspondence according to an embodiment of the present invention;
FIG. 5 is a flowchart of another video title generation method according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a generation target correspondence provided in an embodiment of the present invention;
fig. 7 is a block diagram of a video title generating apparatus according to an embodiment of the present invention;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
In the related art, a technician views each sub-video and sets a title for the sub-video according to the contents of the sub-video. The above process is cumbersome, resulting in low efficiency in the generation of video titles of sub-videos.
In order to solve the above-mentioned problems, the embodiment of the present invention provides a video title generation method, which can be applied to an electronic device, where the electronic device may be a terminal or may be a server.
For example, the electronic device is a terminal. The user views a certain video through the terminal, and when viewing the video, the video stripping operation may be performed, i.e., the terminal is instructed to extract a video clip (i.e., sub-video) from the video. Furthermore, the terminal can determine the sub-video to be split currently, and generate the video title of the sub-video based on the video title generation method provided by the embodiment of the invention.
For another example, the electronic device is a server. When a user watches a certain video through a terminal corresponding to a server, a video splitting operation can be executed, the terminal can send a video splitting request for the video to the server, and then the server can determine the sub-video which needs to be split currently, and the video title of the sub-video is generated based on the video title generation method provided by the embodiment of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a video title generation method according to an embodiment of the present invention, where the method may include the following steps:
S101: a sub-video in the target video is determined.
S102: and determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences.
Wherein, scenario description text is used for describing the video content of the target video.
S103: and generating a video title of the sub-video based on the target scenario description sentence.
According to the video title generation method provided by the embodiment of the invention, the scenario description text of the target video can embody the video content of the target video, and correspondingly, the target scenario description sentence associated with the sub video can embody the video content of the sub video. Therefore, the video title of the sub-video generated based on the target scenario description sentence can embody the video content of the sub-video, and compared with the situation that a technician sets the title for the sub-video after watching the sub-video, the generation efficiency of the video title can be improved.
For step S101, in one implementation, when the user views the target video through the terminal, a video stripping operation may be performed. For example, the user may select a start time and an end time from the target video played by the terminal, and determine to extract video clips (i.e., sub-videos) between the start time and the end time, and accordingly, the terminal may determine the sub-video selected by the user.
In another implementation, the user may also upload the extracted sub-video directly to the terminal.
For step S102, the scenario description text of the target video can embody video content of the target video. For example, if the target video is a movie, the scenario description text may be a scenario profile of the movie. For another example, if the target video is a certain episode of a television series, the scenario description text may be a scenario introduction of the episode of the television series.
It can be understood that the scenario description text of the target video generally includes a plurality of sentences (i.e., scenario description sentences), so that the scenario description sentence associated with the sub-video can be determined from the plurality of scenario description sentences included in the scenario description text, and the determined scenario description sentence can also embody the video content of the sub-video.
In one embodiment, a corresponding time period for each scenario description sentence in the target video may be predetermined. Further, the scenario description sentence associated with the sub-video may be determined according to a period of time in the target video of the sub-video (i.e., a period of time between the start time and the end time described above) corresponding to a period of time in the target video of each scenario description sentence.
For example, a scenario description sentence in which a corresponding period of time intersects with a period of time of a sub-video may be determined as a scenario description sentence associated with the sub-video.
In one embodiment, the corresponding time period for each scenario description sentence in the target video may be determined in different manners.
In one implementation manner, the total duration of the target video may be divided according to the total number of scenario description sentences to obtain a plurality of time periods, and then each scenario description sentence may be corresponding to each time period according to the sequence of scenario description sentences and the sequence of the time periods.
In another implementation manner, the corresponding lines of each scenario description sentence can be determined according to the similarity between the lines of the target video and the scenario description text. The lines generally correspond to the time periods in the target video, and further, the corresponding time periods of the scenario description sentences in the target video can be determined.
In one embodiment, referring to fig. 2, the step S102 may include the following steps based on fig. 1:
s1021: and determining the speech segments associated with the sub video from the speech segments contained in the target video as target speech segments.
S1022: and determining the scenario description sentence corresponding to the target speech segment based on the target corresponding relation between the speech segment and the scenario description sentence of the target video, and taking the scenario description sentence as the target scenario description sentence.
The target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video.
In the embodiment of the invention, the target corresponding relation between the speech segments and the scenario description sentences of the target video can be predetermined, and further, after the target speech segments associated with the sub video are determined, the scenario description sentences corresponding to the target speech segments in the target corresponding relation can be determined as scenario description sentences (namely, target scenario description sentences) associated with the sub video.
In one embodiment, there is an intersection of a time period corresponding to the target speech segment with a time period of the sub-video.
In the embodiment of the invention, the speech of the target video corresponds to the time, so that the speech can be segmented based on the corresponding time to obtain the speech segment. Correspondingly, the speech segments also correspond to the time of the target video.
If the time period corresponding to the line segment has intersection with the time period of the sub video, the line segment is indicated to contain part or all of the line in the sub video, and the line segment can be determined to be associated with the sub video.
In one embodiment, referring to fig. 3, fig. 3 is a flowchart of a method for generating a target correspondence according to an embodiment of the present invention, where the method may include the following steps:
S301: segmenting the speech of the target video to obtain a plurality of speech fragments.
S302: and dividing sentences of the scenario description text of the target video to obtain a plurality of scenario description sentences.
S303: extracting keywords in each keyword segment and each scenario description sentence to obtain a keyword set of each keyword segment and a keyword set of each scenario description sentence.
S304: and determining the target corresponding relation between the speech segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences.
In one implementation, the speech may be segmented with the target time period as a boundary of the segmentation. Wherein, no line exists in the target time period, and the duration of the target time period is a preset duration (for example, may be 10 seconds). For example, if there are 3 target time periods in the target video, the speech of the target video may be divided into 4 speech segments. The time interval between every two adjacent speech segments is a target time period.
In one implementation, the scenario description text may be claused based on preset punctuation. The preset punctuation marks may include at least one of: periods, questions, semicolons and exclamation marks.
The extracted keywords may include at least one of: nouns and verbs. In particular, nouns may include place names and person names.
In one implementation manner, for a speech segment and a scenario description sentence, the ratio of the number of keywords included in the intersection of two keyword sets to the number of keywords included in the union of the two keyword sets may be calculated as the similarity of the two keyword sets, that is, the similarity of the speech segment and the scenario description sentence.
For step S304, in one implementation manner, for each keyword segment, a scenario description sentence with the maximum similarity between the keyword set and the keyword set of the keyword segment may be determined from each scenario description sentence, and the scenario description sentence corresponding to the keyword segment is used as the scenario description sentence corresponding to the target corresponding relationship.
In another implementation, referring to fig. 4, on the basis of fig. 3, the step S304 may include the following steps:
s3041: based on the similarity between the keyword sets of the plurality of speech segments and the keyword sets of the plurality of scenario description sentences and preset constraint conditions, a plurality of corresponding relations between the speech segments and the scenario description sentences in the target video are determined and used as first corresponding relations.
The preset constraint conditions comprise: and the scenario description sentence corresponding to the speech segment with the earlier time period is positioned before the scenario description sentence corresponding to the speech segment with the later time period in the scenario description text.
S3042: and determining the corresponding relation with the maximum total similarity in the first corresponding relations as a target corresponding relation.
The total similarity of one first corresponding relation is as follows: in the first corresponding relation, the sum value of the similarity between the keyword set of each line segment and the keyword set of the corresponding scenario description sentence.
In the embodiment of the invention, since the scenario description text can describe the video content of the target video, each scenario description sentence in the scenario description text also has a time sequence.
That is, in the scenario description text, the period corresponding to the scenario description sentence at the earlier position is earlier than the period corresponding to the scenario description sentence at the later position. That is, the period of the line segment corresponding to the scenario description sentence at the earlier position is earlier than the period of the line segment corresponding to the scenario description sentence at the later position.
And ensuring that the scenario description sentence corresponding to the speech segment with the earlier time period is positioned before the scenario description sentence corresponding to the speech segment with the later time period in the scenario description text in each first corresponding relation determined based on the preset constraint condition. Accordingly, the accuracy of the corresponding relation between the line segments and the scenario description sentences in the first corresponding relation can be improved.
In one embodiment, the speech segments may be ordered in chronological order, and the scenario description sentences may be ordered in chronological order in the scenario description text.
Then, for the first speech segment, one scenario description sentence (which may be referred to as a first scenario description sentence) may be selected from the scenario description sentences as a scenario description sentence corresponding to the first speech segment.
For the second speech segment, one scenario description sentence (may be referred to as a second scenario description sentence) may be selected from the first scenario description sentence and each scenario description sentence located after the first scenario description sentence as a scenario description sentence corresponding to the second speech segment.
Similarly, for the third line segment, one line description sentence (may be referred to as a third line description sentence) may be selected from the second line description sentence and each line description sentence located after the second line description sentence as a line description sentence corresponding to the third line segment. And the like, the scenario description sentence corresponding to each speech segment can be determined, and a first corresponding relation is obtained.
Then, for the first line segment, one scenario description sentence (may be referred to as a fourth scenario description sentence) may be selected from scenario description sentences other than the first scenario description sentence as a scenario description sentence corresponding to the first line segment.
Then, for the second line segment, one line description sentence (may be referred to as a fifth line description sentence) may be selected from the fourth line description sentence and each line description sentence located after the fourth line description sentence as a line description sentence corresponding to the second line segment.
Similarly, for the third line segment, one line description sentence (may be referred to as a sixth line description sentence) may be selected from the fifth line description sentence and each line description sentence located after the fifth line description sentence as a line description sentence corresponding to the third line segment. And the like, the scenario description sentence corresponding to each speech segment can be determined, and another first corresponding relation is obtained.
Based on the above processing, until M first correspondence relationships are determined, M represents the number of scenario description sentences.
In addition, after the first scenario description sentence is used as the scenario description sentence corresponding to the first speech segment, for the second speech segment, one scenario description sentence (may be referred to as a seventh scenario description sentence) may be selected from the first scenario description sentence and scenario description sentences located after the first scenario description sentence, from scenario description sentences other than the second scenario description sentence, as the scenario description sentence corresponding to the second speech segment.
Similarly, for the third line segment, one line description sentence (may be referred to as an eighth line description sentence) may be selected from the seventh line description sentence and each line description sentence located after the seventh line description sentence as the line description sentence corresponding to the third line segment. And the like, the scenario description sentence corresponding to each speech segment can be determined, and a first corresponding relation is obtained.
Based on the processing, a plurality of first corresponding relations can be determined, and each first corresponding relation meets the preset constraint condition.
Correspondingly, in the first corresponding relation (namely the target corresponding relation) with the maximum total similarity, the overall similarity of the speech segments and the scenario description sentences is relatively high, namely the accuracy of the target corresponding relation is highest, and furthermore, the matching degree between the target scenario description sentences and the sub-videos determined based on the target corresponding relation is high, so that the video titles of the sub-videos generated based on the target scenario description sentences can more accurately represent the video contents of the sub-videos.
In one embodiment, the target correspondence may be determined based on a dynamic programming algorithm according to the preset constraint condition.
Specifically, the target correspondence (i.e., determining scenario description sentences corresponding to each speech segment) may be obtained by the following codes:
function ALAIGN-PLOT-SUBTITLE(plots,subtitles):P
m < -LENGTH (subtitles) #m is the number of the line segments
n < -LENGTH (plots) # n is the number of scenario description sentences
Creating a matching score matrix Dm+1, n+1, which is a two-dimensional matrix of (m+1) x (n+1), wherein the element Di, j represents the best matching score of the first i speech segments and the first j scenario description sentences
The optimal path matrix T [ m+1, n+1] is a two-dimensional matrix of (m+1) x (n+1), and the element T [ i, j ] stores the coordinates of the front node corresponding to the optimal matching path where D [ i, j ] is located
The optimal matching matrix M [ m+1, n+1] of the lines is a two-dimensional matrix of (m+1) x (n+1), and the element M [ i, j ] stores the optimal matching point of the ith line segment in the previous j scenario description sentences
# initialization
D[0,0]=0,T[0,0]=(-1,-1),M[0,0]=-1
for each subtitle i from 1to m do
D[i,0]=0,T[i,0]=(0,0),M[i,0]=-1
end for
for each plot j from 1to n do
D[0,j]=0,T[0,j]=(0,0),M[0,j]=-1
end for
Iterative optimum value #
for each subtitle i from 1to m do
for each plot j from 1to n do
The # ith speech segment does not match any scenario description sentence, and the score is consistent with the best matching score of the previous i-1 speech segments
score1=D[i-1][j]
No. j scenario description sentence does not match any speech segment, and the score is consistent with the best match of the previous j-1 scenario description sentences
score2=D[i][j-1]
The ith speech segment is matched with the jth scenario descriptive sentence, and the jth scenario descriptive sentence is not matched by the previous speech segment
score3=D[i-1][j-1]+match_score(subtitles[i],plots[j])
The ith speech segment is matched with the jth scenario descriptive sentence, and the jth scenario descriptive sentence is matched with the previous speech segment
score4=D[i-1][j]+match_score(subtitles[i],plots[j])
Comparing the four conditions, selecting the maximum value as the best matching score of the first i line segments and the first j scenario description sentences
Case1:max(score1,score2,score3,score4)is score1
D[i][j]=score1
T [ i ] [ j ] = (i-1, j) # (i, j) the coordinates of the previous node in the optimal path are (i-1, j)
M [ i ] [ j ] = -1# ith speech fragment without any matching
Case 2:max(score1,score2,socre3,score4)is score2
D[i][j]=score2
T [ i ] [ j ] = (i, j-1) # (i, j) the coordinates of the previous node in the optimal path are (i, j-1)
The best match of the ith speech segment of M [ i ] [ j ] = M [ i ] [ j-1] # in the j previous scenario description sentences is consistent with the best match in the j previous scenario description sentences
Case 3:max(score1,score2,socre3,score4)is score3
D[i][j]=score3
T [ i ] [ j ] = (i-1, j-1) # (i, j) the coordinates of the previous node in the optimal path are (i-1, j-1)
The best match of the ith speech segment of M [ i ] [ j ] = j# is the jth scenario description sentence
Case 4:max(score1,score2,socre3,score4)is score4
D[i][j]=score4
T [ i ] [ j ] = (i-1, j) # (i, j) the coordinates of the previous node in the optimal path are (i-1, j)
The best match of the ith speech segment of M [ i ] [ j ] = j# is the jth scenario description sentence
end for
end for
# reconstruction optimal path
Creating best matches P [ m ] of speech segments to scenario descriptive sentences
Starting from the last node, # reconstruct the best matching path through the stored leading node in the best matching path where each node is located
cur_i,cur_j=m,
do
P[cur_i]=M[cur_i,cur_j]
cur_i,cur_j=T[cur_i,cur_j]
while cur_i>0and cur_j>0
return P
match_score (subtitles [ i ], plots [ j ] represent the similarity of the ith speech segment and the jth scenario description sentence.
Based on the codes, the optimal path can be determined according to a dynamic programming algorithm. The optimal path includes a plurality of locations, each location representing a location of each overall similarity in the matching score matrix D [ i ] [ j ]. Further, it is possible to determine each position in the optimal path, the line coordinate value corresponding to each position represents the sequence number of the line segment, and the column coordinate value corresponding to the position represents the sequence number of the scenario description sentence, that is, the line segment is matched with the scenario description sentence, that is, in the target correspondence, the line segment is corresponding to the scenario description sentence.
In one embodiment, referring to fig. 5, the step S103 may include the following steps, based on fig. 2:
s1031: and if the target scenario description sentence is one, taking the target scenario description sentence as a video title of the sub-video.
S1032: if the target scenario description sentences are multiple, selecting one from the multiple target scenario description sentences as the video title of the sub-video.
In the embodiment of the invention, a target scenario description sentence can be selected as the video title of the sub-video based on different modes.
In one implementation manner, based on the target correspondence, selecting a target scenario description sentence of which the time period of the first corresponding line segment belongs to the time period of the sub-video from a plurality of target scenario description sentences according to the sequence in the scenario description text, and taking the target scenario description sentence as the video title of the sub-video.
In the embodiment of the invention, if the target scenario is a plurality of scenario descriptions, the scenario description sentence of the first completely covered by the sub video can be determined according to the sequence in the scenario description text, that is, according to the sequence of the corresponding time period.
And a scenario description sentence completely covered by the sub video, namely, a period of the speech section corresponding to the scenario description sentence in the target corresponding relation belongs to a period of the sub video.
In another implementation manner, based on the target correspondence, a target scenario description sentence with the largest intersection of a time period of the corresponding speech segment and a time period of the sub-video is selected from a plurality of target scenario description sentences as a video title of the sub-video.
If the number of the target scenario descriptions is multiple, determining a speech segment corresponding to each target scenario description from the target corresponding relation, and acquiring an intersection of a time period of the speech segment and a time period of the sub-video. Furthermore, the corresponding target scenario description sentence with the largest intersection can be determined and used as the video title of the sub-video.
Referring to fig. 6, fig. 6 is a schematic diagram of a generation target correspondence provided in an embodiment of the present invention.
In fig. 6, for each video in the video library, scenario description text of the video may be acquired, and a speech included in the video may be acquired. Furthermore, the scenario description text clauses and the lines can be segmented respectively to obtain scenario description sentences and line segments. Then, based on a dynamic programming algorithm, matching the scenario description sentences with the line segments according to preset constraint conditions to obtain target corresponding relations, and determining the time period corresponding to each scenario description sentence.
Based on the same inventive concept, the embodiment of the present invention further provides a video title generating apparatus, referring to fig. 7, fig. 7 is a structural diagram of the video title generating apparatus provided in the embodiment of the present invention, where the apparatus may include:
a sub-video determining module 701, configured to determine a sub-video in the target video;
a target scenario description sentence determining module 702, configured to determine, from scenario description texts of the target video, scenario description sentences associated with the sub-video as target scenario description sentences; the scenario description text is used for describing video content of the target video;
the title generation module 703 is configured to generate a video title of the sub-video based on the target scenario description sentence.
Optionally, the target scenario description sentence determining module 702 includes:
the target speech segment determining submodule is used for determining speech segments associated with the sub video from all speech segments contained in the target video to serve as target speech segments;
the target scenario description sentence determining submodule is used for determining scenario description sentences corresponding to the target speech fragments based on target corresponding relations between the speech fragments of the target video and the scenario description sentences, and taking the scenario description sentences as target scenario description sentences; the target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video.
Optionally, an intersection exists between a time period corresponding to the target speech segment and a time period of the sub video.
Optionally, the apparatus further includes:
the segmentation module is used for segmenting the speech of the target video to obtain a plurality of speech fragments;
the clause module is used for carrying out clause on the scenario description text of the target video to obtain a plurality of scenario description sentences;
the keyword set acquisition module is used for extracting keywords in each keyword segment and each scenario description sentence to obtain a keyword set of each keyword segment and a keyword set of each scenario description sentence;
The target corresponding relation determining module is used for determining the target corresponding relation between the speech segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences.
Optionally, the target correspondence determining module includes:
the first correspondence determining sub-module is used for determining a plurality of correspondences between the speech segments and the scenario description sentences in the target video as a first correspondence based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences and a preset constraint condition;
wherein, the preset constraint condition comprises: a scenario description sentence corresponding to a speech segment with an earlier time period is positioned before a scenario description sentence corresponding to a speech segment with a later time period in the scenario description text;
the target corresponding relation determining sub-module is used for determining the corresponding relation with the maximum total similarity in each first corresponding relation as a target corresponding relation;
the total similarity of one first corresponding relation is as follows: in the first corresponding relation, the sum value of the similarity between the keyword set of each line segment and the keyword set of the corresponding scenario description sentence.
Optionally, the title generating module 703 includes:
the first title generation submodule is used for taking the target scenario description sentence as a video title of the sub-video if the target scenario description sentence is one;
and the second title generation sub-module is used for selecting one from the plurality of target scenario description sentences as the video title of the sub-video if the target scenario description sentences are a plurality of.
Optionally, the second title generating sub-module is specifically configured to select, based on the target correspondence, according to the sequence in the scenario description text, from a plurality of target scenario description sentences, a target scenario description sentence in which a time period of the first corresponding speech segment belongs to a time period of the sub-video, as a video title of the sub-video;
or,
selecting a target scenario description sentence with the largest intersection between a time period of a corresponding speech segment and a time period of the sub-video from the plurality of target scenario description sentences based on the target correspondence, and taking the target scenario description sentence as a video title of the sub-video
The embodiment of the present invention further provides an electronic device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804,
A memory 803 for storing a computer program;
the processor 801, when executing the program stored in the memory 803, implements the following steps:
determining sub-videos in the target video;
determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences; the scenario description text is used for describing video content of the target video;
and generating a video title of the sub-video based on the target scenario description sentence.
The communication bus mentioned by the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, there is also provided a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements the video title generation method of any of the above embodiments.
In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform the video title generation method of any of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device, computer readable storage medium, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments being referred to in the section of the description of the method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (8)

CN202110760887.4A2021-07-062021-07-06Video title generation method and deviceActiveCN113378000B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110760887.4ACN113378000B (en)2021-07-062021-07-06Video title generation method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110760887.4ACN113378000B (en)2021-07-062021-07-06Video title generation method and device

Publications (2)

Publication NumberPublication Date
CN113378000A CN113378000A (en)2021-09-10
CN113378000Btrue CN113378000B (en)2023-09-05

Family

ID=77581078

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110760887.4AActiveCN113378000B (en)2021-07-062021-07-06Video title generation method and device

Country Status (1)

CountryLink
CN (1)CN113378000B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114860994B (en)*2022-04-212024-11-22北京奇艺世纪科技有限公司 Method, device, equipment and storage medium for aligning video and plot text
CN114860992A (en)*2022-04-212022-08-05北京奇艺世纪科技有限公司Video title generation method, device, equipment and storage medium
CN115408565A (en)*2022-09-222022-11-29北京奇艺世纪科技有限公司Text processing method, video processing method, device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108495185A (en)*2018-03-142018-09-04北京奇艺世纪科技有限公司A kind of video title generation method and device
CN108763369A (en)*2018-05-172018-11-06北京奇艺世纪科技有限公司A kind of video searching method and device
KR101916874B1 (en)*2017-10-192018-11-08충남대학교산학협력단Apparatus, method for auto generating a title of video contents, and computer readable recording medium
CN108829881A (en)*2018-06-272018-11-16深圳市腾讯网络信息技术有限公司video title generation method and device
CN109508406A (en)*2018-12-122019-03-22北京奇艺世纪科技有限公司A kind of information processing method, device and computer readable storage medium
CN110929094A (en)*2019-11-202020-03-27北京香侬慧语科技有限责任公司Video title processing method and device
CN112541095A (en)*2020-11-302021-03-23北京奇艺世纪科技有限公司Video title generation method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8191092B2 (en)*2001-06-192012-05-29Jlb Ventures LlcMethod and system for replacing/obscuring titles and descriptions of recorded content
US10650009B2 (en)*2016-11-222020-05-12Facebook, Inc.Generating news headlines on online social networks
US10795932B2 (en)*2017-09-282020-10-06Electronics And Telecommunications Research InstituteMethod and apparatus for generating title and keyframe of video

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR101916874B1 (en)*2017-10-192018-11-08충남대학교산학협력단Apparatus, method for auto generating a title of video contents, and computer readable recording medium
CN108495185A (en)*2018-03-142018-09-04北京奇艺世纪科技有限公司A kind of video title generation method and device
CN108763369A (en)*2018-05-172018-11-06北京奇艺世纪科技有限公司A kind of video searching method and device
CN108829881A (en)*2018-06-272018-11-16深圳市腾讯网络信息技术有限公司video title generation method and device
CN109508406A (en)*2018-12-122019-03-22北京奇艺世纪科技有限公司A kind of information processing method, device and computer readable storage medium
CN110929094A (en)*2019-11-202020-03-27北京香侬慧语科技有限责任公司Video title processing method and device
CN112541095A (en)*2020-11-302021-03-23北京奇艺世纪科技有限公司Video title generation method and device, electronic equipment and storage medium

Also Published As

Publication numberPublication date
CN113378000A (en)2021-09-10

Similar Documents

PublicationPublication DateTitle
US11978439B2 (en)Generating topic-specific language models
CN111814770B (en)Content keyword extraction method of news video, terminal device and medium
CN113378000B (en)Video title generation method and device
CN109325201B (en) Method, device, equipment and storage medium for generating entity relationship data
CN111159546B (en)Event pushing method, event pushing device, computer readable storage medium and computer equipment
US8001562B2 (en)Scene information extraction method, and scene extraction method and apparatus
CN103984772B (en)Text retrieval captions library generating method and device, video retrieval method and device
CN111984823B (en) Video search method, electronic device and storage medium
WO2020012700A1 (en)Labeling device, labeling method, and program
CN108520078B (en)Video identification method and device
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
US10499121B2 (en)Derivative media content systems and methods
US10595098B2 (en)Derivative media content systems and methods
JP2020042771A (en) Data analysis method and data analysis system
CN105045882A (en)Hot word processing method and device
CN108345679B (en)Audio and video retrieval method, device and equipment and readable storage medium
CN111666522A (en)Information processing method, device, equipment and storage medium
CN119172609B (en) Video barrage generation method, device, electronic device and storage medium
US20240048821A1 (en)System and method for generating a synopsis video of a requested duration
CN112861534B (en)Object name recognition method and device
CN119052601A (en)Video generation method, device, electronic equipment, storage medium and program
CN118102047A (en)Video processing method, video processing device, electronic equipment and computer readable storage medium
CN118042238A (en)Data alignment method, device, equipment and storage medium
CN119052602A (en)Video generation method, device, electronic equipment, storage medium and program
CN117743638A (en)Model training method, video file retrieval method, device and electronic equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp