CN113378000B

Movatterモバイル変換

Info

Publication number: CN113378000B
Application number: CN202110760887.4A
Authority: CN
Inventors: 王亮
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2023-09-05
Anticipated expiration: 2041-07-06
Also published as: CN113378000A

Abstract

The embodiment of the invention provides a video title generation method and device, which relate to the technical field of video processing, and the method comprises the following steps: determining sub-videos in the target video; determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences; the scenario description text is used for describing video content of the target video; and generating a video title of the sub-video based on the target scenario description sentence. Based on the above processing, the generation efficiency of the video title can be improved.

Description

Video title generation method and device

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a method and apparatus for generating a video title.

Background

With the rapid development of internet technology, when a user views a video using a video application program, the user can select a segment of interest to obtain a sub-video (also called a split video) and upload the sub-video to a server of a video website. And in order to better manage the sub-videos, an appropriate title needs to be set for each sub-video.

Disclosure of Invention

The embodiment of the invention aims to provide a video title generation method and device so as to improve the generation efficiency of video titles. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a video title generation method, the method comprising:

determining sub-videos in the target video;

determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences; the scenario description text is used for describing video content of the target video;

and generating a video title of the sub-video based on the target scenario description sentence.

Optionally, the determining, from scenario description text of the target video, scenario description sentences associated with the sub video as target scenario description sentences includes:

determining a speech segment associated with the sub video from the speech segments contained in the target video as a target speech segment;

determining a scenario description sentence corresponding to the target speech segment based on a target corresponding relation between the speech segment and the scenario description sentence of the target video, and taking the scenario description sentence as a target scenario description sentence; the target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video.

Optionally, an intersection exists between a time period corresponding to the target speech segment and a time period of the sub video.

Optionally, the target correspondence is obtained by:

segmenting the speech of the target video to obtain a plurality of speech fragments;

dividing sentences of the scenario description text of the target video to obtain a plurality of scenario description sentences;

extracting keywords in each keyword segment and each scenario description sentence to obtain a keyword set of each keyword segment and a keyword set of each scenario description sentence;

and determining target corresponding relations between the line segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the line segments and the keyword sets of the scenario description sentences.

Optionally, the determining, based on the similarity between the keyword sets of the plurality of speech segments and the keyword sets of the plurality of scenario description sentences, the target correspondence between the speech segments and the scenario description sentences in the target video includes:

Wherein, the preset constraint condition comprises: a scenario description sentence corresponding to a speech segment with an earlier time period is positioned before a scenario description sentence corresponding to a speech segment with a later time period in the scenario description text;

determining the corresponding relation with the maximum total similarity in the first corresponding relations as a target corresponding relation;

Optionally, the generating, based on the target scenario description sentence, a video title of the sub-video includes:

if the target scenario description sentence is one, the target scenario description sentence is used as a video title of the sub-video;

and if the plurality of target scenario description sentences are provided, selecting one from the plurality of target scenario description sentences as the video title of the sub-video.

Optionally, the selecting one from the plurality of target scenario descriptions as the video title of the sub-video includes:

selecting a target scenario description sentence of which the time period of the first corresponding line segment belongs to the time period of the sub-video from a plurality of target scenario description sentences according to the sequence in the scenario description text based on the target correspondence, and taking the target scenario description sentence as a video title of the sub-video;

Or,

and selecting a target scenario description sentence with the largest intersection between the time period of the corresponding speech segment and the time period of the sub-video from the plurality of target scenario description sentences based on the target correspondence, and taking the target scenario description sentence as a video title of the sub-video.

A method for generating a video title, the method comprising:

when a video splitting request aiming at a target video is received, determining a sub video which is required to be split currently;

determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences;

In a second aspect of the implementation of the present invention, there is also provided a video title generating apparatus, including:

the sub-video determining module is used for determining sub-videos in the target video;

the target scenario description sentence determining module is used for determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos to serve as target scenario description sentences; the scenario description text is used for describing video content of the target video;

And the title generation module is used for generating the video title of the sub-video based on the target scenario description sentence.

Optionally, the target scenario description sentence determining module includes:

the target speech segment determining submodule is used for determining speech segments associated with the sub video from all speech segments contained in the target video to serve as target speech segments;

the target scenario description sentence determining submodule is used for determining scenario description sentences corresponding to the target speech fragments based on target corresponding relations between the speech fragments of the target video and the scenario description sentences, and taking the scenario description sentences as target scenario description sentences; the target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video.

Optionally, the apparatus further includes:

the segmentation module is used for segmenting the speech of the target video to obtain a plurality of speech fragments;

the clause module is used for carrying out clause on the scenario description text of the target video to obtain a plurality of scenario description sentences;

the keyword set acquisition module is used for extracting keywords in each keyword segment and each scenario description sentence to obtain a keyword set of each keyword segment and a keyword set of each scenario description sentence;

The target corresponding relation determining module is used for determining the target corresponding relation between the speech segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences.

Optionally, the target correspondence determining module includes:

the first correspondence determining sub-module is used for determining a plurality of correspondences between the speech segments and the scenario description sentences in the target video as a first correspondence based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences and a preset constraint condition;

the target corresponding relation determining sub-module is used for determining the corresponding relation with the maximum total similarity in each first corresponding relation as a target corresponding relation;

Optionally, the title generating module includes:

the first title generation submodule is used for taking the target scenario description sentence as a video title of the sub-video if the target scenario description sentence is one;

and the second title generation sub-module is used for selecting one from the plurality of target scenario description sentences as the video title of the sub-video if the target scenario description sentences are a plurality of.

Optionally, the second title generating sub-module is specifically configured to select, based on the target correspondence, according to the sequence in the scenario description text, from a plurality of target scenario description sentences, a target scenario description sentence in which a time period of the first corresponding speech segment belongs to a time period of the sub-video, as a video title of the sub-video;

or,

In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus;

A memory for storing a computer program;

and a processor, configured to implement the video title generation method according to any one of the first aspect when executing the program stored in the memory.

In yet another aspect of the implementation of the present invention, there is also provided a computer readable storage medium, in which a computer program is stored, the computer program implementing any one of the above video title generation methods when executed by a processor.

In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the video title generation methods described above.

The video title generation method provided by the embodiment of the invention determines the sub-video in the target video; determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences; the scenario description text is used for describing video content of the target video; and generating a video title of the sub-video based on the target scenario description sentence.

The scenario description text of the target video can embody the video content of the target video, and correspondingly, the target scenario description sentence associated with the sub video can embody the video content of the sub video. Therefore, the video title of the sub-video generated based on the target scenario description sentence can embody the video content of the sub-video, and compared with the situation that a technician sets the title for the sub-video after watching the sub-video, the generation efficiency of the video title can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a video title generation method provided in an embodiment of the present invention;

FIG. 2 is a flowchart of another video title generation method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for generating a target correspondence according to an embodiment of the present invention;

FIG. 4 is a flowchart of another method for generating a target correspondence according to an embodiment of the present invention;

FIG. 5 is a flowchart of another video title generation method according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a generation target correspondence provided in an embodiment of the present invention;

fig. 7 is a block diagram of a video title generating apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

In order to solve the above-mentioned problems, the embodiment of the present invention provides a video title generation method, which can be applied to an electronic device, where the electronic device may be a terminal or may be a server.

For example, the electronic device is a terminal. The user views a certain video through the terminal, and when viewing the video, the video stripping operation may be performed, i.e., the terminal is instructed to extract a video clip (i.e., sub-video) from the video. Furthermore, the terminal can determine the sub-video to be split currently, and generate the video title of the sub-video based on the video title generation method provided by the embodiment of the invention.

For another example, the electronic device is a server. When a user watches a certain video through a terminal corresponding to a server, a video splitting operation can be executed, the terminal can send a video splitting request for the video to the server, and then the server can determine the sub-video which needs to be split currently, and the video title of the sub-video is generated based on the video title generation method provided by the embodiment of the invention.

Referring to fig. 1, fig. 1 is a flowchart of a video title generation method according to an embodiment of the present invention, where the method may include the following steps:

S101: a sub-video in the target video is determined.

S102: and determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences.

Wherein, scenario description text is used for describing the video content of the target video.

S103: and generating a video title of the sub-video based on the target scenario description sentence.

According to the video title generation method provided by the embodiment of the invention, the scenario description text of the target video can embody the video content of the target video, and correspondingly, the target scenario description sentence associated with the sub video can embody the video content of the sub video. Therefore, the video title of the sub-video generated based on the target scenario description sentence can embody the video content of the sub-video, and compared with the situation that a technician sets the title for the sub-video after watching the sub-video, the generation efficiency of the video title can be improved.

For step S101, in one implementation, when the user views the target video through the terminal, a video stripping operation may be performed. For example, the user may select a start time and an end time from the target video played by the terminal, and determine to extract video clips (i.e., sub-videos) between the start time and the end time, and accordingly, the terminal may determine the sub-video selected by the user.

In another implementation, the user may also upload the extracted sub-video directly to the terminal.

For step S102, the scenario description text of the target video can embody video content of the target video. For example, if the target video is a movie, the scenario description text may be a scenario profile of the movie. For another example, if the target video is a certain episode of a television series, the scenario description text may be a scenario introduction of the episode of the television series.

It can be understood that the scenario description text of the target video generally includes a plurality of sentences (i.e., scenario description sentences), so that the scenario description sentence associated with the sub-video can be determined from the plurality of scenario description sentences included in the scenario description text, and the determined scenario description sentence can also embody the video content of the sub-video.

In one embodiment, a corresponding time period for each scenario description sentence in the target video may be predetermined. Further, the scenario description sentence associated with the sub-video may be determined according to a period of time in the target video of the sub-video (i.e., a period of time between the start time and the end time described above) corresponding to a period of time in the target video of each scenario description sentence.

For example, a scenario description sentence in which a corresponding period of time intersects with a period of time of a sub-video may be determined as a scenario description sentence associated with the sub-video.

In one embodiment, the corresponding time period for each scenario description sentence in the target video may be determined in different manners.

In one implementation manner, the total duration of the target video may be divided according to the total number of scenario description sentences to obtain a plurality of time periods, and then each scenario description sentence may be corresponding to each time period according to the sequence of scenario description sentences and the sequence of the time periods.

In another implementation manner, the corresponding lines of each scenario description sentence can be determined according to the similarity between the lines of the target video and the scenario description text. The lines generally correspond to the time periods in the target video, and further, the corresponding time periods of the scenario description sentences in the target video can be determined.

In one embodiment, referring to fig. 2, the step S102 may include the following steps based on fig. 1:

s1021: and determining the speech segments associated with the sub video from the speech segments contained in the target video as target speech segments.

S1022: and determining the scenario description sentence corresponding to the target speech segment based on the target corresponding relation between the speech segment and the scenario description sentence of the target video, and taking the scenario description sentence as the target scenario description sentence.

The target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video.

In the embodiment of the invention, the target corresponding relation between the speech segments and the scenario description sentences of the target video can be predetermined, and further, after the target speech segments associated with the sub video are determined, the scenario description sentences corresponding to the target speech segments in the target corresponding relation can be determined as scenario description sentences (namely, target scenario description sentences) associated with the sub video.

In one embodiment, there is an intersection of a time period corresponding to the target speech segment with a time period of the sub-video.

In the embodiment of the invention, the speech of the target video corresponds to the time, so that the speech can be segmented based on the corresponding time to obtain the speech segment. Correspondingly, the speech segments also correspond to the time of the target video.

If the time period corresponding to the line segment has intersection with the time period of the sub video, the line segment is indicated to contain part or all of the line in the sub video, and the line segment can be determined to be associated with the sub video.

In one embodiment, referring to fig. 3, fig. 3 is a flowchart of a method for generating a target correspondence according to an embodiment of the present invention, where the method may include the following steps:

S301: segmenting the speech of the target video to obtain a plurality of speech fragments.

S302: and dividing sentences of the scenario description text of the target video to obtain a plurality of scenario description sentences.

S303: extracting keywords in each keyword segment and each scenario description sentence to obtain a keyword set of each keyword segment and a keyword set of each scenario description sentence.

S304: and determining the target corresponding relation between the speech segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences.

In one implementation, the speech may be segmented with the target time period as a boundary of the segmentation. Wherein, no line exists in the target time period, and the duration of the target time period is a preset duration (for example, may be 10 seconds). For example, if there are 3 target time periods in the target video, the speech of the target video may be divided into 4 speech segments. The time interval between every two adjacent speech segments is a target time period.

In one implementation, the scenario description text may be claused based on preset punctuation. The preset punctuation marks may include at least one of: periods, questions, semicolons and exclamation marks.

The extracted keywords may include at least one of: nouns and verbs. In particular, nouns may include place names and person names.

In one implementation manner, for a speech segment and a scenario description sentence, the ratio of the number of keywords included in the intersection of two keyword sets to the number of keywords included in the union of the two keyword sets may be calculated as the similarity of the two keyword sets, that is, the similarity of the speech segment and the scenario description sentence.

For step S304, in one implementation manner, for each keyword segment, a scenario description sentence with the maximum similarity between the keyword set and the keyword set of the keyword segment may be determined from each scenario description sentence, and the scenario description sentence corresponding to the keyword segment is used as the scenario description sentence corresponding to the target corresponding relationship.

In another implementation, referring to fig. 4, on the basis of fig. 3, the step S304 may include the following steps:

s3041: based on the similarity between the keyword sets of the plurality of speech segments and the keyword sets of the plurality of scenario description sentences and preset constraint conditions, a plurality of corresponding relations between the speech segments and the scenario description sentences in the target video are determined and used as first corresponding relations.

The preset constraint conditions comprise: and the scenario description sentence corresponding to the speech segment with the earlier time period is positioned before the scenario description sentence corresponding to the speech segment with the later time period in the scenario description text.

S3042: and determining the corresponding relation with the maximum total similarity in the first corresponding relations as a target corresponding relation.

In the embodiment of the invention, since the scenario description text can describe the video content of the target video, each scenario description sentence in the scenario description text also has a time sequence.

That is, in the scenario description text, the period corresponding to the scenario description sentence at the earlier position is earlier than the period corresponding to the scenario description sentence at the later position. That is, the period of the line segment corresponding to the scenario description sentence at the earlier position is earlier than the period of the line segment corresponding to the scenario description sentence at the later position.

And ensuring that the scenario description sentence corresponding to the speech segment with the earlier time period is positioned before the scenario description sentence corresponding to the speech segment with the later time period in the scenario description text in each first corresponding relation determined based on the preset constraint condition. Accordingly, the accuracy of the corresponding relation between the line segments and the scenario description sentences in the first corresponding relation can be improved.

In one embodiment, the speech segments may be ordered in chronological order, and the scenario description sentences may be ordered in chronological order in the scenario description text.

Then, for the first speech segment, one scenario description sentence (which may be referred to as a first scenario description sentence) may be selected from the scenario description sentences as a scenario description sentence corresponding to the first speech segment.

For the second speech segment, one scenario description sentence (may be referred to as a second scenario description sentence) may be selected from the first scenario description sentence and each scenario description sentence located after the first scenario description sentence as a scenario description sentence corresponding to the second speech segment.

Then, for the first line segment, one scenario description sentence (may be referred to as a fourth scenario description sentence) may be selected from scenario description sentences other than the first scenario description sentence as a scenario description sentence corresponding to the first line segment.

Then, for the second line segment, one line description sentence (may be referred to as a fifth line description sentence) may be selected from the fourth line description sentence and each line description sentence located after the fourth line description sentence as a line description sentence corresponding to the second line segment.

Based on the above processing, until M first correspondence relationships are determined, M represents the number of scenario description sentences.

Based on the processing, a plurality of first corresponding relations can be determined, and each first corresponding relation meets the preset constraint condition.

Correspondingly, in the first corresponding relation (namely the target corresponding relation) with the maximum total similarity, the overall similarity of the speech segments and the scenario description sentences is relatively high, namely the accuracy of the target corresponding relation is highest, and furthermore, the matching degree between the target scenario description sentences and the sub-videos determined based on the target corresponding relation is high, so that the video titles of the sub-videos generated based on the target scenario description sentences can more accurately represent the video contents of the sub-videos.

In one embodiment, the target correspondence may be determined based on a dynamic programming algorithm according to the preset constraint condition.

Specifically, the target correspondence (i.e., determining scenario description sentences corresponding to each speech segment) may be obtained by the following codes:

function ALAIGN-PLOT-SUBTITLE(plots,subtitles):P

m < -LENGTH (subtitles) #m is the number of the line segments

n < -LENGTH (plots) # n is the number of scenario description sentences

Creating a matching score matrix Dm+1, n+1, which is a two-dimensional matrix of (m+1) x (n+1), wherein the element Di, j represents the best matching score of the first i speech segments and the first j scenario description sentences

The optimal path matrix T [ m+1, n+1] is a two-dimensional matrix of (m+1) x (n+1), and the element T [ i, j ] stores the coordinates of the front node corresponding to the optimal matching path where D [ i, j ] is located

The optimal matching matrix M [ m+1, n+1] of the lines is a two-dimensional matrix of (m+1) x (n+1), and the element M [ i, j ] stores the optimal matching point of the ith line segment in the previous j scenario description sentences

# initialization

D[0,0]＝0,T[0,0]＝(-1,-1),M[0,0]＝-1

for each subtitle i from 1to m do

D[i,0]＝0,T[i,0]＝(0,0),M[i,0]＝-1

end for

for each plot j from 1to n do

D[0,j]＝0,T[0,j]＝(0,0),M[0,j]＝-1

end for

Iterative optimum value #

for each subtitle i from 1to m do

for each plot j from 1to n do

The # ith speech segment does not match any scenario description sentence, and the score is consistent with the best matching score of the previous i-1 speech segments

score1＝D[i-1][j]

No. j scenario description sentence does not match any speech segment, and the score is consistent with the best match of the previous j-1 scenario description sentences

score2＝D[i][j-1]

The ith speech segment is matched with the jth scenario descriptive sentence, and the jth scenario descriptive sentence is not matched by the previous speech segment

score3＝D[i-1][j-1]+match_score(subtitles[i],plots[j])

The ith speech segment is matched with the jth scenario descriptive sentence, and the jth scenario descriptive sentence is matched with the previous speech segment

score4＝D[i-1][j]+match_score(subtitles[i],plots[j])

Comparing the four conditions, selecting the maximum value as the best matching score of the first i line segments and the first j scenario description sentences

Case1:max(score1,score2,score3,score4)is score1

D[i][j]＝score1

T [ i ] [ j ] = (i-1, j) # (i, j) the coordinates of the previous node in the optimal path are (i-1, j)

M [ i ] [ j ] = -1# ith speech fragment without any matching

Case 2:max(score1,score2,socre3,score4)is score2

D[i][j]＝score2

T [ i ] [ j ] = (i, j-1) # (i, j) the coordinates of the previous node in the optimal path are (i, j-1)

The best match of the ith speech segment of M [ i ] [ j ] = M [ i ] [ j-1] # in the j previous scenario description sentences is consistent with the best match in the j previous scenario description sentences

Case 3:max(score1,score2,socre3,score4)is score3

D[i][j]＝score3

T [ i ] [ j ] = (i-1, j-1) # (i, j) the coordinates of the previous node in the optimal path are (i-1, j-1)

The best match of the ith speech segment of M [ i ] [ j ] = j# is the jth scenario description sentence

Case 4:max(score1,score2,socre3,score4)is score4

D[i][j]＝score4

end for

# reconstruction optimal path

Creating best matches P [ m ] of speech segments to scenario descriptive sentences

Starting from the last node, # reconstruct the best matching path through the stored leading node in the best matching path where each node is located

cur_i,cur_j＝m,

do

P[cur_i]＝M[cur_i,cur_j]

cur_i,cur_j＝T[cur_i,cur_j]

while cur_i>0and cur_j>0

return P

match_score (subtitles [ i ], plots [ j ] represent the similarity of the ith speech segment and the jth scenario description sentence.

Based on the codes, the optimal path can be determined according to a dynamic programming algorithm. The optimal path includes a plurality of locations, each location representing a location of each overall similarity in the matching score matrix D [ i ] [ j ]. Further, it is possible to determine each position in the optimal path, the line coordinate value corresponding to each position represents the sequence number of the line segment, and the column coordinate value corresponding to the position represents the sequence number of the scenario description sentence, that is, the line segment is matched with the scenario description sentence, that is, in the target correspondence, the line segment is corresponding to the scenario description sentence.

In one embodiment, referring to fig. 5, the step S103 may include the following steps, based on fig. 2:

s1031: and if the target scenario description sentence is one, taking the target scenario description sentence as a video title of the sub-video.

S1032: if the target scenario description sentences are multiple, selecting one from the multiple target scenario description sentences as the video title of the sub-video.

In the embodiment of the invention, a target scenario description sentence can be selected as the video title of the sub-video based on different modes.

In one implementation manner, based on the target correspondence, selecting a target scenario description sentence of which the time period of the first corresponding line segment belongs to the time period of the sub-video from a plurality of target scenario description sentences according to the sequence in the scenario description text, and taking the target scenario description sentence as the video title of the sub-video.

In the embodiment of the invention, if the target scenario is a plurality of scenario descriptions, the scenario description sentence of the first completely covered by the sub video can be determined according to the sequence in the scenario description text, that is, according to the sequence of the corresponding time period.

And a scenario description sentence completely covered by the sub video, namely, a period of the speech section corresponding to the scenario description sentence in the target corresponding relation belongs to a period of the sub video.

In another implementation manner, based on the target correspondence, a target scenario description sentence with the largest intersection of a time period of the corresponding speech segment and a time period of the sub-video is selected from a plurality of target scenario description sentences as a video title of the sub-video.

If the number of the target scenario descriptions is multiple, determining a speech segment corresponding to each target scenario description from the target corresponding relation, and acquiring an intersection of a time period of the speech segment and a time period of the sub-video. Furthermore, the corresponding target scenario description sentence with the largest intersection can be determined and used as the video title of the sub-video.

Referring to fig. 6, fig. 6 is a schematic diagram of a generation target correspondence provided in an embodiment of the present invention.

In fig. 6, for each video in the video library, scenario description text of the video may be acquired, and a speech included in the video may be acquired. Furthermore, the scenario description text clauses and the lines can be segmented respectively to obtain scenario description sentences and line segments. Then, based on a dynamic programming algorithm, matching the scenario description sentences with the line segments according to preset constraint conditions to obtain target corresponding relations, and determining the time period corresponding to each scenario description sentence.

Based on the same inventive concept, the embodiment of the present invention further provides a video title generating apparatus, referring to fig. 7, fig. 7 is a structural diagram of the video title generating apparatus provided in the embodiment of the present invention, where the apparatus may include:

a sub-video determining module 701, configured to determine a sub-video in the target video;

a target scenario description sentence determining module 702, configured to determine, from scenario description texts of the target video, scenario description sentences associated with the sub-video as target scenario description sentences; the scenario description text is used for describing video content of the target video;

the title generation module 703 is configured to generate a video title of the sub-video based on the target scenario description sentence.

Optionally, the target scenario description sentence determining module 702 includes:

Optionally, the apparatus further includes:

Optionally, the target correspondence determining module includes:

Optionally, the title generating module 703 includes:

or,

selecting a target scenario description sentence with the largest intersection between a time period of a corresponding speech segment and a time period of the sub-video from the plurality of target scenario description sentences based on the target correspondence, and taking the target scenario description sentence as a video title of the sub-video

The embodiment of the present invention further provides an electronic device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804,

A memory 803 for storing a computer program;

the processor 801, when executing the program stored in the memory 803, implements the following steps:

determining sub-videos in the target video;

The communication bus mentioned by the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, there is also provided a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements the video title generation method of any of the above embodiments.

In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform the video title generation method of any of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device, computer readable storage medium, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments being referred to in the section of the description of the method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method of video title generation, the method comprising:

determining sub-videos in the target video;

generating a video title of the sub-video based on the target scenario description sentence;

determining scenario description sentences associated with the sub-videos from scenario description texts of the target videos as target scenario description sentences, wherein the scenario description sentences comprise the following steps:

determining a scenario description sentence corresponding to the target speech segment based on a target corresponding relation between the speech segment and the scenario description sentence of the target video, and taking the scenario description sentence as a target scenario description sentence; the target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video;

The target correspondence is obtained by the following steps:

2. The method of claim 1, wherein a time period corresponding to the target line segment intersects a time period of the sub-video.

3. The method of claim 1, wherein the determining the target correspondence between the speech segments and the scenario description sentences in the target video based on the similarity between the keyword sets of the speech segments and the keyword sets of the scenario description sentences comprises:

4. The method of claim 1, wherein the generating the video title of the sub-video based on the target scenario description sentence comprises:

5. The method of claim 4, wherein selecting one of the plurality of target scenario descriptions as the video title of the sub-video comprises:

or,

6. A video title generation apparatus, the apparatus comprising:

the title generation module is used for generating a video title of the sub-video based on the target scenario description sentence;

the target scenario description sentence determining module comprises:

the target scenario description sentence determining submodule is used for determining scenario description sentences corresponding to the target speech fragments based on target corresponding relations between the speech fragments of the target video and the scenario description sentences, and taking the scenario description sentences as target scenario description sentences; the target corresponding relation is determined based on the similarity between the speech segment and the scenario description sentence of the target video;

the apparatus further comprises:

7. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-5 when executing a program stored on a memory.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.