CN115935004A

Movatterモバイル変換

Info

Publication number: CN115935004A
Application number: CN202211560228.7A
Authority: CN
Inventors: 白雪
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2023-04-07

Abstract

本公开提供了一种确定视频标签的方法，涉及人工智能技术领域，尤其涉及图像处理、计算机视觉技术领域。具体实现方案为：将视频序列按照镜头划分为多个子视频；从子视频中确定至少一个视频帧作为子视频的关键帧，并确定关键帧的初始标签序列，根据初始标签序列，确定子视频的有效标签；以及根据子视频的有效标签，确定视频序列的标签。本公开还提供了一种确定视频标签的方法、视频推荐方法、视频查询方法、装置、电子设备和存储介质。

The disclosure provides a method for determining video tags, which relates to the technical field of artificial intelligence, and in particular to the technical fields of image processing and computer vision. The specific implementation scheme is: divide the video sequence into multiple sub-videos according to the shots; determine at least one video frame from the sub-videos as the key frame of the sub-video, and determine the initial label sequence of the key frame, and determine the sub-video according to the initial label sequence. an effective label; and determining the label of the video sequence according to the effective label of the sub-video. The present disclosure also provides a method for determining a video tag, a video recommendation method, a video query method, a device, an electronic device and a storage medium.

Description

Translated fromChinese

技术领域technical field

本公开涉及人工智能技术领域，尤其涉及图像处理、计算机视觉技术领域。更具体地，本公开提供了一种确定视频标签的方法、视频推荐方法、视频查询方法、装置、电子设备和存储介质。The present disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of image processing and computer vision. More specifically, the present disclosure provides a method for determining a video tag, a video recommendation method, a video query method, a device, an electronic device, and a storage medium.

背景技术Background technique

视频标签的提取是媒体资源管理中的核心功能之一，是视频入库、检索、推荐等需求的前置算子。在提取视频标签的过程中发现，视频处理耗时随视频时长线性增长，长视频处理很慢，极大的限制了相关算法的应用场景。The extraction of video tags is one of the core functions in media resource management, and it is a pre-operator for video warehousing, retrieval, recommendation and other requirements. In the process of extracting video tags, it was found that the time spent on video processing increases linearly with the length of the video, and the processing of long videos is very slow, which greatly limits the application scenarios of related algorithms.

发明内容Contents of the invention

本公开提供了一种确定视频标签的方法、视频推荐方法、视频查询方法、装置、设备以及存储介质。The present disclosure provides a method for determining a video tag, a video recommendation method, a video query method, a device, a device, and a storage medium.

根据第一方面，提供了一种确定视频标签的方法，该方法包括：将视频序列按照镜头划分为多个子视频；从子视频中确定至少一个视频帧作为子视频的关键帧，并确定关键帧的初始标签序列，根据初始标签序列，确定子视频的有效标签；以及根据子视频的有效标签，确定视频序列的标签。According to the first aspect, a method for determining video tags is provided, the method comprising: dividing a video sequence into a plurality of sub-videos according to shots; determining at least one video frame from the sub-videos as a key frame of the sub-video, and determining the key frame According to the initial label sequence of the initial label sequence, the effective label of the sub-video is determined; and according to the effective label of the sub-video, the label of the video sequence is determined.

根据第二方面，提供了一种视频推荐方法，该方法包括：获取用户标签；以及根据用户标签与视频库中各视频的标签之间的第一相似度，从视频库中确定用于推荐给用户的第一目标视频；其中，视频的标签是根据上述确定视频标签的方法得到的。According to the second aspect, a video recommendation method is provided, the method includes: obtaining user tags; and according to the first similarity between the user tags and the tags of each video in the video library, determining the The first target video of the user; wherein, the tag of the video is obtained according to the above method for determining the tag of the video.

根据第三方面，提供了一种视频查询方法，该方法包括：接收查询视频的请求，请求包括查询词；以及根据查询词与视频库中各视频的标签之间的第二相似度，从视频库中确定与查询词对应的第二目标视频；其中，视频的标签是根据上述确定视频标签的方法得到的。According to a third aspect, a video query method is provided, the method comprising: receiving a request for querying a video, the request including a query word; The second target video corresponding to the query word is determined in the library; wherein, the video tag is obtained according to the method for determining the video tag above.

根据第四方面，提供了一种确定视频标签的装置，该装置包括：划分模块，用于将视频序列按照镜头划分为多个子视频；第一确定模块，用于从子视频中确定至少一个视频帧作为子视频的关键帧，并确定关键帧的初始标签序列，根据初始标签序列，确定子视频的有效标签；以及第二确定模块，用于根据子视频的有效标签，确定视频序列的标签。According to a fourth aspect, there is provided a device for determining video tags, the device comprising: a division module for dividing a video sequence into multiple sub-videos according to shots; a first determination module for determining at least one video from the sub-videos The frame is used as the key frame of the sub-video, and the initial label sequence of the key frame is determined, and the effective label of the sub-video is determined according to the initial label sequence; and the second determination module is used to determine the label of the video sequence according to the effective label of the sub-video.

根据第五方面，提供了一种视频推荐装置，该装置包括：获取模块，用于获取用户标签；以及第五确定模块，用于根据用户标签与视频库中各视频的标签之间的第一相似度，从视频库中确定用于推荐给用户的第一目标视频；其中，视频的标签是根据上述确定视频标签的装置得到的。According to the fifth aspect, there is provided a video recommendation device, the device includes: an acquisition module, used to acquire user tags; The similarity is to determine the first target video for recommending to the user from the video library; wherein, the tag of the video is obtained according to the above-mentioned device for determining the tag of the video.

根据第六方面，提供了一种视频查询装置，该装置包括：接收模块，用于接收查询视频的请求，请求包括查询词；以及第六确定模块，用于根据查询词与视频库中各视频的标签之间的第二相似度，从视频库中确定与查询词对应的第二目标视频；其中，视频的标签是根据上述确定视频标签的装置得到的。According to the sixth aspect, there is provided a video query device, the device includes: a receiving module for receiving a request for video query, the request includes query words; According to the second similarity between the tags, the second target video corresponding to the query word is determined from the video library; wherein, the tag of the video is obtained according to the above-mentioned device for determining the tag of the video.

根据第七方面，提供了一种电子设备，包括：至少一个处理器；以及与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行根据本公开提供的方法。According to a seventh aspect, there is provided an electronic device, comprising: at least one processor; and a memory communicatively connected to at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are processed by at least one processor The processor is executed, so that at least one processor can execute the method provided according to the present disclosure.

根据第八方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，该计算机指令用于使计算机执行根据本公开提供的方法。According to an eighth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method provided according to the present disclosure.

根据第九方面，提供了一种计算机程序产品，包括计算机程序，所述计算机程序存储于可读存储介质和电子设备其中至少之一上，所述计算机程序在被处理器执行时实现根据本公开提供的方法。According to a ninth aspect, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, the computer program when executed by a processor to implement the provided method.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本公开的限定。其中：The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:

图1是根据本公开的一个实施例的可以应用确定视频标签的方法、视频推荐方法以及视频查询方法的示例性系统架构示意图；FIG. 1 is a schematic diagram of an exemplary system architecture that can be applied to a method for determining a video tag, a video recommendation method, and a video query method according to an embodiment of the present disclosure;

图2是根据本公开的一个实施例的确定视频标签的方法的流程图；Fig. 2 is the flowchart of the method for determining video tag according to one embodiment of the present disclosure;

图3是根据本公开的一个实施例的确定视频标签的方法的框图；3 is a block diagram of a method for determining video tags according to an embodiment of the present disclosure;

图4是根据本公开的另一个实施例的确定视频标签的方法的流程图；FIG. 4 is a flowchart of a method for determining a video tag according to another embodiment of the present disclosure;

图5是根据本公开的一个实施例的视频推荐方法的流程图；FIG. 5 is a flowchart of a video recommendation method according to an embodiment of the present disclosure;

图6是根据本公开的一个实施例的视频查询方法的流程图；Fig. 6 is a flowchart of a video query method according to an embodiment of the present disclosure;

图7是根据本公开的一个实施例的确定视频标签的装置的框图；7 is a block diagram of an apparatus for determining video tags according to an embodiment of the present disclosure;

图8是根据本公开的一个实施例的视频推荐装置的框图；FIG. 8 is a block diagram of a video recommendation device according to an embodiment of the present disclosure;

图9是根据本公开的一个实施例的视频查询装置的框图；FIG. 9 is a block diagram of a video query device according to an embodiment of the present disclosure;

图10是根据本公开的一个实施例的确定视频标签的方法、视频推荐方法以及视频查询方法中的至少之一的电子设备的框图。FIG. 10 is a block diagram of an electronic device according to at least one of a method for determining a video tag, a video recommendation method, and a video query method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

视频标签提取的主流方法是首先对视频帧逐帧进行图像分类，得到单帧的标签确定结果，再通过一系列策略把单帧的标签融合并生成视频级别的标签。The mainstream method of video label extraction is to first classify video frames frame by frame, obtain the label determination results of single frames, and then fuse the labels of single frames through a series of strategies to generate video-level labels.

然而，视频内部有大量的冗余信息，一个镜头内的每一帧几乎都蕴含同样的元素。有些短暂出现的内容并不是视频的主体，输出的标签中会存在一些无关干扰项。逐帧进行处理会包含大量的冗余信息，在处理长视频时会消耗大量的资源，处理效率低。However, there is a lot of redundant information in the video, and almost every frame in a shot contains the same elements. Some short-lived content is not the main body of the video, and there will be some irrelevant interference items in the output tags. Processing frame by frame will contain a lot of redundant information, and it will consume a lot of resources when processing long videos, and the processing efficiency is low.

此外，逐帧处理还会造成视频处理的倍速比(视频时长/视频处理时长)低，只有视频处理的倍速比大于一定数值(例如1)的情况下，才能达到实时处理，因此，难以实现视频的实时处理。In addition, the frame-by-frame processing will also cause the video processing speed ratio (video duration/video processing time length) to be low. Only when the video processing speed ratio is greater than a certain value (such as 1) can real-time processing be achieved. Therefore, it is difficult to realize video real-time processing.

一种确定视频标签的方法，可以从视频中抽取一帧或多帧关键帧，对关键帧进行分类处理，得到关键帧的标签，再将关键帧的标签进行融合，得到视频级别的标签。A method for determining video labels, which can extract one or more key frames from a video, classify and process the key frames to obtain the labels of the key frames, and then fuse the labels of the key frames to obtain video-level labels.

该确定视频标签的方法能够提高视频处理的效率，但是关键帧的选取成为难题，随机抽取或等间隔抽取无法保证抽取到的关键帧的重要性，导致标签确定结果的准去率低和稳定性差。而根据单帧包含的画面的重要程度进行关键帧的抽取，也会占用大量的计算资源，因此也存在效率低的问题。This method of determining video tags can improve the efficiency of video processing, but the selection of key frames has become a difficult problem. Random extraction or equal interval extraction cannot guarantee the importance of the extracted key frames, resulting in low accuracy and poor stability of tag determination results. . Extracting key frames according to the importance of the pictures contained in a single frame will also occupy a large amount of computing resources, so there is also a problem of low efficiency.

本公开的技术方案中，所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

在本公开的技术方案中，在获取或采集用户个人信息之前，均获取了用户的授权或同意。In the technical solution of the present disclosure, before acquiring or collecting the user's personal information, the user's authorization or consent is obtained.

图1是根据本公开一个实施例的可以应用确定视频标签的方法、视频推荐方法以及视频查询方法的示例性系统架构示意图。需要注意的是，图1所示仅为可以应用本公开实施例的系统架构的示例，以帮助本领域技术人员理解本公开的技术内容，但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。Fig. 1 is a schematic diagram of an exemplary system architecture in which a method for determining a video tag, a video recommendation method, and a video query method can be applied according to an embodiment of the present disclosure. It should be noted that, what is shown in FIG. 1 is only an example of the system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used in other device, system, environment or scenario.

如图1所示，根据该实施例的系统架构100可以包括终端设备101、102、103，网络104和服务器105。网络104用以在终端设备1 01、1 02、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线和/或无线通信链路等等。As shown in FIG. 1 , asystem architecture 100 according to this embodiment may includeterminal devices 101 , 102 , 103 , anetwork 104 and aserver 105 . Thenetwork 104 is used to provide a medium for communication links between theterminal devices 101, 102, 103 and theserver 105.Network 104 may include various connection types, such as wired and/or wireless communication links, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105进行交互，以接收或发送消息等。终端设备101、102、103可以是各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机等等。Users can useterminal devices 101 , 102 , 103 to interact withserver 105 vianetwork 104 to receive or send messages and the like. Theterminal devices 101, 102, 103 may be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers and the like.

本公开实施例所提供的确定视频标签的方法、视频推荐方法以及视频查询方法中的至少之一一般可以由服务器105执行。相应地，本公开实施例所提供的确定视频标签的装置、视频推荐装置以及视频查询装置中的至少之一一般可以设置于服务器105中。本公开实施例所提供的确定视频标签的方法、视频推荐方法以及视频查询方法中的至少之一也可以由不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群执行。相应地，本公开实施例所提供的确定视频标签的装置、视频推荐装置以及视频查询装置中的至少之一也可以设置于不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群中。At least one of the method for determining a video tag, the method for recommending a video, and the method for searching a video provided in the embodiments of the present disclosure may generally be executed by theserver 105 . Correspondingly, at least one of the device for determining a video tag, the device for recommending a video, and the device for querying a video provided in the embodiments of the present disclosure may generally be set in theserver 105 . At least one of the method for determining video tags, the video recommendation method and the video query method provided by the embodiments of the present disclosure may also be provided by a server that is different from theserver 105 and can communicate with theterminal devices 101, 102, 103 and/or theserver 105 Or server cluster execution. Correspondingly, at least one of the device for determining video tags, the device for recommending videos, and the device for video query provided by the embodiments of the present disclosure may also be set different from theserver 105 and capable of communicating with theterminal devices 101, 102, 103 and/or theserver 105 communicating servers or server clusters.

图2是根据本公开的一个实施例的确定视频标签的方法的流程图。FIG. 2 is a flowchart of a method for determining video tags according to one embodiment of the present disclosure.

如图2所示，该确定视频标签的方法200可以包括操作S210～操作S240。As shown in FIG. 2 , themethod 200 for determining a video tag may include operation S210 to operation S240.

在操作S210，将视频序列按照镜头划分为多个子视频。In operation S210, the video sequence is divided into a plurality of sub-videos by shots.

例如，视频序列可以由多个镜头画面组成，每个镜头可以包含多帧图像。将视频序列以单个镜头画面为粒度进行划分，可以得到多个子视频。可以理解，每个子视频内对应一个镜头，每个子视频的多帧图像仅包含一个镜头的画面，在子视频内不存在镜头的切换。For example, a video sequence may consist of multiple shots, and each shot may contain multiple frames of images. By dividing the video sequence at the granularity of a single shot, multiple sub-videos can be obtained. It can be understood that each sub-video corresponds to a shot, and the multi-frame images of each sub-video only include the picture of one shot, and there is no switching of shots in the sub-video.

在操作S220，从子视频中确定至少一个视频帧作为子视频的关键帧，并确定关键帧的初始标签序列，根据初始标签序列，确定子视频的有效标签。In operation S220, at least one video frame is determined from the sub-video as a key frame of the sub-video, and an initial tag sequence of the key frame is determined, and a valid tag of the sub-video is determined according to the initial tag sequence.

例如，可以根据实际需求，针对多个子视频中的某一个子视频，从中选取关键帧。还可以针对多个子视频中的任意部分子视频(例如任意两个子视频、任意三个子视频等)，从中选取关键帧。还可以针对每个子视频，从中选取关键帧。For example, key frames may be selected from one of the sub-videos according to actual needs. It is also possible to select key frames from any part of the sub-videos (for example, any two sub-videos, any three sub-videos, etc.) in the multiple sub-videos. You can also select keyframes from each sub-video.

示例性地，针对每个子视频，可以按照等分原则从该子视频中选取出预设数量(例如n个，n为大于1的整数，例如n＝3)的视频帧作为关键帧。Exemplarily, for each sub-video, a preset number (for example, n, where n is an integer greater than 1, for example, n=3) of video frames may be selected from the sub-video according to the equal division principle as key frames.

示例性地，针对每个子视频，可以按照该子视频的时长从该子视频中选取出预设数量的视频帧作为关键帧。例如对于时长为10s的子视频，从中选取2个关键帧。对于时长为20s的子视频，从中选取4个关键帧等等。Exemplarily, for each sub-video, a preset number of video frames may be selected from the sub-video according to the duration of the sub-video as key frames. For example, for a sub-video with a duration of 10s, 2 key frames are selected from it. For a sub-video whose duration is 20s, select 4 key frames and so on.

示例性地，还可以按照其他指标(例如清晰度)对子视频进行关键帧的选取。例如，针对清晰度越高的子视频，从中选取的关键帧的数量越高，清晰度越低的子视频，从中选取的关键帧的数量越低。Exemplarily, key frames may also be selected for the sub-video according to other indicators (such as sharpness). For example, for a sub-video with higher definition, the number of key frames selected therefrom is higher, and for a sub-video with lower definition, the number of key frames selected therefrom is lower.

每个子视频对应一个镜头，由于一个镜头内的每一帧几乎都蕴含同样的元素，因此从子视频中选取预设数量的关键帧进行后续的图像分类和标签确定的处理，能够减少冗余。Each sub-video corresponds to a shot. Since each frame in a shot contains almost the same element, selecting a preset number of key frames from the sub-video for subsequent image classification and label determination can reduce redundancy.

例如，可以对子视频的关键帧进行图像分类，得到关键帧的标签。例如可以针对每个关键帧进行分类，到每个关键帧的标签。还可以针对某一个关键帧进行分类，得到该某一个关键帧的标签。还可以针对多个关键帧中的任意部分关键帧分别进行分类，得到部分关键帧中的每个关键帧的标签。关键帧的标签可以包括多个，且每个标签可以具有置信度评估值，该置信度评估值可以是深度学习模型(图像分类模型)在对关键帧进行分类时产生的。深度学习模型可以是CNN(Convolutional Neural Network，卷积神经网络)等。For example, image classification may be performed on the key frames of the sub-video to obtain the labels of the key frames. For example, it is possible to classify each keyframe, to the label of each keyframe. It is also possible to classify a certain key frame to obtain the label of the certain key frame. It is also possible to classify any partial keyframes among the plurality of keyframes, and obtain the label of each keyframe in the partial keyframes. There may be multiple tags of the key frame, and each tag may have a confidence evaluation value, which may be generated by a deep learning model (image classification model) when classifying the key frame. The deep learning model can be CNN (Convolutional Neural Network, Convolutional Neural Network), etc.

因此，针对子视频内的关键帧，该关键帧可以具有初始标签序列，初始标签序列可以包含多个标签，且多个标签按照置信度评估值从大到小的顺序排列。例如，某一关键帧的初始标签序列为{体育，足球，比赛，世界杯，名星}，“体育”、“足球”、“比赛”、“世界杯”和“名星”的置信度评估值依次减小。Therefore, for a key frame in the sub-video, the key frame may have an initial label sequence, and the initial label sequence may include multiple labels, and the multiple labels are arranged in descending order of confidence evaluation values. For example, the initial label sequence of a certain key frame is {sports, football, game, world cup, celebrity}, and the confidence evaluation values of "sports", "football", "game", "world cup" and "star" decrease in turn .

示例性地，子视频内的每个关键帧可以均具有初始标签序列。Exemplarily, each key frame in the sub-video may have an initial label sequence.

示例性地，针对每个子视频，可以根据该子视频的至少一个关键帧各自的初始标签序列确定该子视频的有效标签。例如将至少一个初始标签序列各自的置信度最高的标签确定为该子视频的有效标签等等。Exemplarily, for each sub-video, valid tags of the sub-video may be determined according to respective initial tag sequences of at least one key frame of the sub-video. For example, the label with the highest confidence in at least one initial label sequence is determined as the effective label of the sub-video, and so on.

可以理解，子视频的有效标签是镜头级别的标签，镜头级别的标签相比于单帧图像的标签减少了冗余信息，并且覆盖了整个视频序列的信息。It can be understood that the effective label of the sub-video is a shot-level label, which reduces redundant information compared with the label of a single frame image, and covers the information of the entire video sequence.

在操作S230，根据子视频的有效标签，确定视频序列的标签。In operation S230, a tag of the video sequence is determined according to the valid tags of the sub-videos.

例如，可以根据多个子视频各自的有效标签，确定视频序列的标签。还可以根据多个子视频中的某一个子视频的有效标签，确定视频序列的标签。还可以根据多个子视频中的任意部分子视频(例如任意两个子视频、任意三个子视频等)的有效标签，确定视频序列的标签。For example, the tags of the video sequence may be determined according to the respective effective tags of multiple sub-videos. The label of the video sequence may also be determined according to the effective label of a certain sub-video among the multiple sub-videos. The label of the video sequence may also be determined according to valid labels of any part of the sub-videos (for example, any two sub-videos, any three sub-videos, etc.) among the multiple sub-videos.

示例性地，可以将多个子视频各自的有效标签添加到有效标签列表中，再使用屏蔽词对有效标签列表中的标签进行过滤，得到视频序列的标签，该视频序列的标签即视频级别的标签。Exemplarily, it is possible to add the respective valid tags of multiple sub-videos to the valid tag list, and then use masking words to filter the tags in the valid tag list to obtain the tags of the video sequence, which are video-level tags .

本公开的实施例将视频序列按镜头划分为多个子视频，从子视频中选取至少一个关键帧进行图像分类，得到至少一个关键帧的初始标签序列，根据至少一个关键帧的初始标签序列确定子视频的有效标签，即镜头级别的标签，根据镜头级别的标签确定视频级别的标签，减少了信息冗余，大幅提高了处理效率。In the embodiment of the present disclosure, the video sequence is divided into multiple sub-videos by shots, at least one key frame is selected from the sub-videos for image classification, and the initial label sequence of at least one key frame is obtained, and the sub-video is determined according to the initial label sequence of at least one key frame. The effective label of the video is the label of the shot level, and the label of the video level is determined according to the label of the shot level, which reduces information redundancy and greatly improves the processing efficiency.

图3是根据本公开的一个实施例的确定视频标签的方法的框图。FIG. 3 is a block diagram of a method of determining video tags according to one embodiment of the present disclosure.

如图3所示，视频序列300按镜头可以划分为子视频310、……、子视频320。As shown in FIG. 3 , the video sequence 300 can be divided into sub-videos 310 , . . . , sub-videos 320 by shot.

针对子视频310，从中选取关键帧311和关键帧312。针对关键帧311，通过图像分类处理可以得到初始标签序列313。针对关键帧312，通过图像分类处理可以得到初始标签序列314。针对初始标签序列313和初始标签序列314可以分别从中选取置信度最高的k个标签(topk，例如k＝4)，分别得到目标标签序列315和目标标签序列316。根据目标标签序列315和目标标签序列316，可以确定镜头级别有效标签317。For the sub-video 310, a key frame 311 and a key frame 312 are selected therefrom. For the key frame 311, an initial label sequence 313 can be obtained through image classification processing. For the key frame 312, an initial label sequence 314 can be obtained through image classification processing. For the initial label sequence 313 and the initial label sequence 314, k labels with the highest confidence (topk, for example, k=4) can be selected to obtain the target label sequence 315 and the target label sequence 316 respectively. According to the target tag sequence 315 and the target tag sequence 316 , a shot-level valid tag 317 can be determined.

类似地，针对子视频320，从中选取关键帧321和关键帧322。针对关键帧321，通过图像分类处理可以得到初始标签序列323。针对关键帧322，通过图像分类处理可以得到初始标签序列324。针对初始标签序列323和初始标签序列324可以分别从中选取置信度最高的k个标签(topk，例如k＝4)，分别得到目标标签序列325和目标标签序列326。根据目标标签序列325和目标标签序列326，可以确定镜头级别有效标签327。Similarly, for the sub-video 320, a key frame 321 and a key frame 322 are selected therefrom. For the key frame 321, an initial label sequence 323 can be obtained through image classification processing. For the key frame 322, an initial label sequence 324 can be obtained through image classification processing. For the initial label sequence 323 and the initial label sequence 324, k labels with the highest confidence (topk, for example, k=4) can be selected to obtain the target label sequence 325 and the target label sequence 326 respectively. According to the target tag sequence 325 and the target tag sequence 326 , a shot-level valid tag 327 can be determined.

根据镜头级别有效标签317和镜头级别有效标签327，可以确定视频级别标签330。例如镜头级别有效标签317和镜头级别有效标签327的集合作为视频级别标签330。Based on the shot-level valid tag 317 and the shot-level valid tag 327 , a video-level tag 330 can be determined. For example, a set of shot-level valid tags 317 and shot-level valid tags 327 serves as video-level tags 330 .

本公开实施例将视频按照镜头进行切分，确定镜头级别的有效标签，基于镜头级别的有效标签确定视频级别的标签，能够减少镜头的冗余信息并且各镜头涵盖整个视频，避免重要信息的丢失，因此能够提高处理效率和提高标签预测的准确率。In the embodiment of the present disclosure, the video is segmented according to the shots, the effective tags at the shot level are determined, and the tags at the video level are determined based on the effective tags at the shot level, which can reduce the redundant information of the shots and each shot covers the entire video, avoiding the loss of important information , so it can improve the processing efficiency and improve the accuracy of label prediction.

图4是根据本公开的另一个实施例的确定视频标签的方法的流程图。FIG. 4 is a flowchart of a method for determining video tags according to another embodiment of the present disclosure.

如图4所示，该方法包括操作S410～S480。示例性地，操作S420～S450可以是针对每个子视频的操作，也可以是针对某一个子视频的操作，还可以是针对任意部分子视频的操作。操作S440可以是针对每个关键帧的操作，也可以是针对某一个关键帧的操作，还可以是针对多个关键帧中的任意部分关键帧的操作。As shown in FIG. 4, the method includes operations S410-S480. Exemplarily, operations S420-S450 may be operations for each sub-video, may also be operations for a certain sub-video, or may be operations for any part of the sub-videos. Operation S440 may be performed on each key frame, may also be performed on a certain key frame, or may be performed on any part of the multiple key frames.

在操作S410，对视频序列按照镜头划分为多个子视频。操作S410与操作S210类似，这里不再赘述。In operation S410, the video sequence is divided into a plurality of sub-videos according to shots. Operation S410 is similar to operation S210 and will not be repeated here.

在操作S420，从子视频中确定至少一个关键帧，并确定关键帧的目标标签序列。In operation S420, at least one key frame is determined from the sub-video, and a target label sequence of the key frame is determined.

例如，可以按照等分原则、视频时长或清晰度指标，从该子视频中确定至少一个关键帧，关键帧的选取方式可以参照操作S220，这里不再赘述。For example, at least one key frame may be determined from the sub-video according to the equal division principle, video duration or definition index, and the selection method of the key frame may refer to operation S220, which will not be repeated here.

例如，可以对关键帧进行图像分类，得到关键帧的初始标签序列，从初始标签序列中选取置信度最高的k个标签(topk，例如k＝4)，作为关键帧的目标标签序列。For example, image classification can be performed on key frames to obtain the initial label sequence of the key frame, and k labels with the highest confidence (topk, for example k=4) are selected from the initial label sequence as the target label sequence of the key frame.

示例性地，可以对每个关键帧进行图像分类、确定初始标签序列和目标标签序列，得到每个关键帧的目标标签序列。Exemplarily, image classification may be performed on each key frame, an initial label sequence and a target label sequence may be determined, and a target label sequence of each key frame may be obtained.

示例性地，某一子视频包括n(n为大于1的整数，例如n＝3)个关键帧，每个关键帧具有目标标签序列，因此，该子视频包括n个目标标签序列。Exemplarily, a certain sub-video includes n (n is an integer greater than 1, for example, n=3) key frames, and each key frame has a target label sequence, therefore, the sub-video includes n target label sequences.

在操作S430，判断子视频内的n个关键帧各自的目标标签序列中的置信度最高(top1)的标签是否一致。如果一致，将该置信度最高的标签确定为该子视频的有效标签，并执行操作S450。否则，执行操作S440。In operation S430, it is determined whether the label with the highest confidence (top1) in the target label sequences of the n key frames in the sub-video is consistent. If they are consistent, determine the label with the highest confidence as the valid label of the sub-video, and perform operation S450. Otherwise, perform operation S440.

在一个示例中，针对当前子视频，包括3个关键帧。第1个关键帧的目标标签序列为{a，b，c，d}，第2个关键帧的目标标签序列为{a，c，e，d}，第3个关键帧的目标标签序列为{a，d，e，f}，由于该三个关键帧的top1标签均为“a”，因此，可以将“a”确定为该当前子视频的有效标签，该有效标签是镜头级别的标签。In one example, for the current sub-video, 3 key frames are included. The target label sequence of the first keyframe is {a, b, c, d}, the target label sequence of the second keyframe is {a, c, e, d}, and the target label sequence of the third keyframe is {a, d, e, f}, since the top1 labels of the three keyframes are all "a", "a" can be determined as the effective label of the current sub-video, which is a shot-level label .

在另一个示例中，针对当前子视频，包括3个关键帧。第1个关键帧的目标标签序列为{b，c，d，e}，第2个关键帧的目标标签序列为{a，c，e，d}，第3个关键帧的目标标签序列为{e，d，f，a}，该三个关键帧的top1标签不一致，则针对该三个关键帧，执行操作S440。In another example, for the current sub-video, 3 key frames are included. The target label sequence of the first keyframe is {b, c, d, e}, the target label sequence of the second keyframe is {a, c, e, d}, and the target label sequence of the third keyframe is {e, d, f, a}, the top1 labels of the three key frames are inconsistent, and for the three key frames, perform operation S440.

在操作S440，判断当前帧的目标标签序列中top1标签是否存在于其他帧的topk中。如果是，将该当前帧的top1标签确定为有效标签，并执行操作S450，否则确定当前帧不包含有效标签。In operation S440, it is determined whether the top1 tag in the target tag sequence of the current frame exists in topk of other frames. If yes, determine the top1 label of the current frame as a valid label, and perform operation S450, otherwise determine that the current frame does not contain a valid label.

在一个示例中，第1个关键帧的目标标签序列为{b，c，d，e}，第2个关键帧的目标标签序列为{a，c，e，d}，第3个关键帧的目标标签序列为{e，d，f，a}，由于第2帧的top1标签“a”存在于第3帧的目标标签序列中，第3帧的top1标签“e”存在于第1帧和第2帧的目标标签序列中。因此，可以将标签“a”和标签“e”确定为当前子视频的有效标签。In one example, the target label sequence for the 1st keyframe is {b,c,d,e}, the target label sequence for the 2nd keyframe is {a,c,e,d}, and the 3rd keyframe The target label sequence of is {e, d, f, a}, since the top1 label "a" of frame 2 exists in the target label sequence of frame 3, the top1 label "e" of frame 3 exists in frame 1 and in the target label sequence of frame 2. Therefore, label "a" and label "e" can be determined as valid labels for the current sub-video.

本实施例以镜头切分为标签提取的前置算子，在与镜头对应的子视频中选取至少一个关键帧进行标签提取，减少了信息冗余，大幅提高了算子运行的倍速比。同时由于选取关键帧的数量大幅减少，并且也加入了关键帧标签的一致性判断，使得输出噪声的几率大大降低，提高了结果的的准确率和稳定性。In this embodiment, the shot is divided into a pre-operator for label extraction, and at least one key frame is selected from the sub-video corresponding to the shot for label extraction, which reduces information redundancy and greatly improves the speed ratio of the operator. At the same time, since the number of selected key frames is greatly reduced, and the consistency judgment of key frame labels is also added, the probability of output noise is greatly reduced, and the accuracy and stability of the results are improved.

在操作S450，将子视频的有效标签添加到有效标签列表中。In operation S450, valid tags of the sub-video are added to a valid tag list.

在一个示例中，在获得标签“a”和标签“e”之后，可以将标签“a”和标签“e”添加到有效标签列表中。In one example, after label "a" and label "e" are obtained, label "a" and label "e" may be added to the list of valid labels.

在另一个示例中，针对每个关键帧，在操作S440的执行结果均为否的情况下，无法获得当前子视频的有效标签，如果所有子视频均没有获得有效标签，有效标签列表为空。In another example, for each key frame, if the execution result of operation S440 is negative, no valid tags of the current sub-video can be obtained, and if no valid tags are obtained for all sub-videos, the list of valid tags is empty.

在操作S460，判断有效标签列表是否为空。如果是，执行操作S470，否则执行操作S480。In operation S460, it is determined whether the valid tag list is empty. If yes, perform operation S470, otherwise perform operation S480.

在操作S470，从子视频的至少一个目标标签序列中确定候选标签。In operation S470, candidate tags are determined from at least one target tag sequence of the sub-video.

例如，如果有效标签列表为空，即没有获得镜头级别有效标签。需要从子视频的至少一个目标标签序列中捞回部分标签，作为候选标签。For example, if the list of valid tags is empty, no shot-level valid tags are obtained. It is necessary to retrieve some labels from at least one target label sequence of the sub-video as candidate labels.

一种捞回部分标签的策略，可以将所有子视频各自的至少一个目标标签序列组成目标标签序列集合，从标签序列集合中确定置信度最高的标签作为候选标签。A strategy for retrieving partial labels. At least one target label sequence of all sub-videos can be combined into a target label sequence set, and the label with the highest confidence is determined from the label sequence set as a candidate label.

例如，所有子视频各自的至少一个目标标签序列组成的目标标签序列集合共包含N(例如，N＝10)个目标序列标签，每个目标标签序列的top1标签具有置信度评估值，例如，第1个目标标签序列的top1标签的置信度评估值为90％，第2个目标标签序列的top1标签的置信度评估值为75％，等等。可以选取置信度评估值最高的top1标签作为候选标签。For example, the target label sequence set composed of at least one target label sequence of all sub-videos contains a total of N (for example, N=10) target sequence labels, and the top1 label of each target label sequence has a confidence evaluation value, for example, the first The confidence evaluation value of the top1 tag of 1 target tag sequence is 90%, the confidence evaluation value of the top1 tag of the 2nd target tag sequence is 75%, and so on. The top1 label with the highest confidence evaluation value can be selected as the candidate label.

一种捞回部分标签的策略，可以从多个子视频中确定时长最长的子视频作为候选子视频，并确定候选子视频的至少一个目标标签序列中置信度评估值最高的标签，作为候选标签。A strategy for retrieving partial tags, which can determine the sub-video with the longest duration from multiple sub-videos as a candidate sub-video, and determine the tag with the highest confidence evaluation value in at least one target tag sequence of the candidate sub-video as a candidate tag .

例如，多个子视频中子视频A的时长最长，可以将子视频A确定为候选子视频，该候选子视频包括n(例如n＝3)个目标特征序列，每个目标特征序列的top1标签具有置信度评估值，可以选取置信度评估值最高的top1标签作为候选标签。For example, sub-video A has the longest duration among multiple sub-videos, and sub-video A can be determined as a candidate sub-video, and the candidate sub-video includes n (for example, n=3) target feature sequences, and the top1 label of each target feature sequence With a confidence evaluation value, the top1 label with the highest confidence evaluation value can be selected as a candidate label.

本公开实施例设置标签捞回策略，在没有获得镜头级别有效标签的情况下，基于标签捞回策略捞回候选标签，基于候选标签确定视频级别的标签，能够避免视频标签为空的情况。在一个示例中，候选标签是从多个子视频各自的至少一个目标标签序列中捞回的，各子视频的目标标签序列涵盖了整个视频序列，因此，能够避免重要信息的丢失，保证候选标签的准确率，进而能够保证视频级别标签的准确率。In the embodiment of the present disclosure, a tag retrieval strategy is set. When no effective tag at the shot level is obtained, candidate tags are retrieved based on the tag retrieval strategy, and video-level tags are determined based on the candidate tags, which can avoid the situation that the video tag is empty. In one example, the candidate labels are retrieved from at least one target label sequence of multiple sub-videos, and the target label sequences of each sub-video cover the entire video sequence. Therefore, the loss of important information can be avoided and the candidate labels can be guaranteed Accuracy, which in turn can ensure the accuracy of video-level tags.

在操作S480，使用屏蔽词对有效标签列表中的标签进行过滤，得到视频序列的标签。In operation S480, the tags in the valid tag list are filtered using masked words to obtain tags of the video sequence.

例如，可以确定效标签列表中与屏蔽词一致(例如与屏蔽词之间的相似度大于95％)的有效标签，并将与屏蔽词一致有效标签从有效标签列表中删除，最终得到的有效标签列表可以作为视频序列的标签列表。For example, it is possible to determine the effective tags in the effective tag list that are consistent with the masked words (for example, the similarity with the masked words is greater than 95%), and delete the effective tags that are consistent with the masked words from the effective tag list, and the final effective tag A list can be used as a list of labels for a video sequence.

图5是根据本公开的视频推荐方法的流程图。FIG. 5 is a flowchart of a video recommendation method according to the present disclosure.

如图5所示，视频推荐方法500包括操作S510～操作S520。As shown in FIG. 5 , thevideo recommendation method 500 includes operation S510 to operation S520.

在操作S510，获取用户标签。In operation S510, a user tag is acquired.

在操作S520，根据用户标签与视频库中各视频的标签之间的第一相似度，从视频库中确定用于推荐给用户的第一目标视频。In operation S520, a first target video for recommendation to the user is determined from the video library according to a first similarity between the user tag and the tags of each video in the video library.

例如，视频的标签是根据上述确定视频标签的方法确定的。用户标签可以是根据用户的历史浏览记录、历史操作(如收藏、转发)记录确定。确定用户标签与视频库中的各视频的标签之间的第一相似度，可以将第一相似度高于阈值(例如80％)的第一目标视频推荐给用户。For example, the tag of the video is determined according to the above method for determining the tag of the video. The user label can be determined according to the user's historical browsing records and historical operation (such as bookmarking, forwarding) records. Determine the first similarity between the user tag and the tags of each video in the video library, and recommend the first target video with the first similarity higher than a threshold (for example, 80%) to the user.

例如，视频库中的视频也可以是根据视频的标签进行存储的，例如具有“体育”类标签的视频保存在一起，具有“教育”类标签的视频保存在一起等等，能够便于相似标签的视频的推荐。For example, the videos in the video library can also be stored according to the tags of the videos, for example, videos with tags of "sports" are saved together, videos with tags of "education" are saved together, etc. Video recommendation.

图6是根据本公开的视频查询方法的流程图。FIG. 6 is a flowchart of a video query method according to the present disclosure.

如图6所示，视频查询方法600包括操作S610～操作S620。As shown in FIG. 6 , thevideo query method 600 includes operation S610 to operation S620.

在操作S610，接收查询视频的请求，请求包括查询词。In operation S610, a request for querying a video is received, the request including a query word.

在操作S620，根据查询词与视频库中各视频的标签之间的第二相似度，从视频库中确定与查询词对应的第二目标视频。In operation S620, a second target video corresponding to the query word is determined from the video library according to the second similarity between the query word and tags of each video in the video library.

例如，视频的标签是根据上述确定视频标签的方法确定的。用户通过客户端输入查询词可以查询到想要看的视频，例如用户输入“世界杯”，可以生成包含“世界杯”的查询请求，客户端将该查询请求发送给服务器，服务器响应于该查询请求，可以计算查询词“世界杯”与视频库中各视频的标签之间的第二相似度，将第二相似度大于阈值(例如90％)的视频作为与“世界杯”对应的第二目标视频。并且可以将该第二目标视频发送给用户的客户端。For example, the tag of the video is determined according to the above method for determining the tag of the video. Users can query the video they want to watch by inputting query words through the client. For example, if the user enters "World Cup", a query request including "World Cup" can be generated. The client sends the query request to the server, and the server responds to the query request. The second similarity between the query word "World Cup" and the tags of each video in the video database can be calculated, and the video with the second similarity greater than a threshold (for example, 90%) is used as the second target video corresponding to "World Cup". And the second target video can be sent to the user's client.

例如，视频库中的视频也可以是根据视频的标签进行存储的，例如具有“体育”类标签的视频保存在一起，具有“教育”类标签的视频保存在一起等等，根据查询词，可以快速定位到对应类别标签的视频集合，例如根据查询词“世界杯”可以快速定位到“体育”，从而提高视频查询效率。For example, the videos in the video library can also be stored according to the tags of the videos, for example, videos with tags of "sports" are saved together, videos with tags of "education" are saved together, etc., according to the query words, you can Quickly locate the video collection corresponding to the category label. For example, according to the query word "World Cup", you can quickly locate "sports", thereby improving the efficiency of video query.

图7是根据本公开的一个实施例的确定视频标签的装置的框图。Fig. 7 is a block diagram of an apparatus for determining video tags according to an embodiment of the present disclosure.

如图7所示，该确定视频标签的装置700包括划分模块701、第一确定模块702以及第二确定模块703。As shown in FIG. 7 , theapparatus 700 for determining video tags includes adivision module 701 , afirst determination module 702 and asecond determination module 703 .

划分模块701用于将视频序列按照镜头划分为多个子视频。Thedividing module 701 is used to divide the video sequence into multiple sub-videos according to shots.

第一确定模块702用于从子视频中确定至少一个视频帧作为子视频的关键帧，并确定关键帧的初始标签序列，根据初始标签序列，确定子视频的有效标签。The first determiningmodule 702 is configured to determine at least one video frame from the sub-video as a key frame of the sub-video, and determine an initial tag sequence of the key frame, and determine a valid tag of the sub-video according to the initial tag sequence.

第二确定模块703用于根据子视频的有效标签，确定视频序列的标签。Thesecond determination module 703 is configured to determine the label of the video sequence according to the valid label of the sub-video.

第一确定模块包括分类子模块、选取子模块和第一确定子模块。The first determination module includes a classification submodule, a selection submodule and a first determination submodule.

分类子模块用于对关键帧进行分类，得到关键帧的初始标签序列。The classification sub-module is used to classify the key frames to obtain the initial label sequence of the key frames.

选取子模块用于从关键帧的初始标签序列中选取置信度最高的k个标签，作为关键帧的目标标签序列，k为大于1的整数。The selection sub-module is used to select k labels with the highest confidence from the initial label sequence of the key frame as the target label sequence of the key frame, and k is an integer greater than 1.

第一确定子模块用于根据关键帧的目标标签序列，确定子视频的有效标签。The first determination sub-module is used to determine the valid tags of the sub-video according to the target tag sequence of the key frame.

第一确定子模块包括第一确定单元和第二确定单元。The first determining submodule includes a first determining unit and a second determining unit.

第一确定单元用于在确定至少一个关键帧各自的目标标签序列中置信度最高的标签彼此一致的情况下，将该置信度最高的标签确定为子视频的有效标签。The first determining unit is configured to determine the label with the highest confidence as a valid label of the sub-video in a case where it is determined that the labels with the highest confidence in the respective target label sequences of at least one key frame are consistent with each other.

第二确定单元用于在确定至少一个关键帧各自的目标标签序列中置信度最高的标签不完全一致的情况下，根据目标标签序列中置信度最高的标签与其他目标标签序列的关系，确定子视频的有效标签。The second determination unit is used to determine the sub-determinant according to the relationship between the label with the highest confidence in the target label sequence and other target label sequences when determining that the label with the highest confidence in the respective target label sequences of at least one key frame is not completely consistent. A valid tag for the video.

第二确定单元用于针对每个目标标签序列，在该目标标签序列中置信度最高的标签存在于其他目标标签序列的情况下，将该目标标签序列中置信度最高的标签确定为目标标签序列的有效标签；以及根据至少一个目标标签序列的有效标签，确定子视频的有效标签。The second determination unit is configured to, for each target tag sequence, determine the tag with the highest confidence in the target tag sequence as the target tag sequence when the tag with the highest confidence in the target tag sequence exists in other target tag sequences effective labels; and according to the effective labels of at least one target label sequence, determine the effective labels of the sub-video.

第二确定模块包括添加子模块和过滤子模块。The second determining module includes an adding submodule and a filtering submodule.

添加子模块用于将子视频的有效标签添加到有效标签列表中。The add sub-module is used to add valid tags of sub-videos to the list of valid tags.

过滤子模块用于使用屏蔽词对有效标签列表中的标签进行过滤，得到视频序列的标签。The filtering sub-module is used to filter the tags in the valid tag list by using masking words to obtain the tags of the video sequence.

确定视频标签的装置700还包括第三确定模块和第四确定模块。Theapparatus 700 for determining a video tag further includes a third determining module and a fourth determining module.

第三确定模块用于在确定有效标签列表为空的情况下，从子视频的至少一个目标标签序列中确定候选标签。The third determining module is configured to determine candidate tags from at least one target tag sequence of the sub-video when it is determined that the valid tag list is empty.

第四确定模块用于根据候选标签，确定视频序列的标签。The fourth determination module is used for determining the label of the video sequence according to the candidate label.

根据本公开的实施例，目标标签序列中的标签具有置信度评估值。第三确定模块包括组合子模块和第二确定子模块。According to an embodiment of the present disclosure, the tags in the target tag sequence have confidence evaluation values. The third determination module includes a combination submodule and a second determination submodule.

组合子模块用于将多个子视频各自的至少一个目标标签序列组成目标标签序列集合。The combination sub-module is used to combine at least one target tag sequence of multiple sub-videos into a target tag sequence set.

第二确定子模块用于从目标标签序列集合中确定置信度评估值最高的标签作为候选标签。The second determination submodule is used to determine the label with the highest confidence evaluation value from the target label sequence set as the candidate label.

根据本公开的实施例，目标标签序列中的标签具有置信度评估值。第三确定模块包括第三确定子模块和第四确定子模块。According to an embodiment of the present disclosure, the tags in the target tag sequence have confidence evaluation values. The third determining module includes a third determining submodule and a fourth determining submodule.

第三确定子模块用于从多个子视频中确定时长最长的子视频作为候选子视频。The third determination sub-module is used to determine the sub-video with the longest duration as the candidate sub-video from the plurality of sub-videos.

第四确定子模块用于确定候选子视频的至少一个目标标签序列中置信度评估值最高的标签，作为候选标签。The fourth determination sub-module is configured to determine the label with the highest confidence evaluation value in at least one target label sequence of the candidate sub-video as the candidate label.

图8是根据本公开的一个实施例的视频推荐装置的框图。Fig. 8 is a block diagram of a video recommendation device according to an embodiment of the present disclosure.

如图8所示，该视频推荐装置800包括获取模块801和第五确定模块802。As shown in FIG. 8 , thevideo recommendation apparatus 800 includes anacquisition module 801 and afifth determination module 802 .

获取模块801用于获取用户标签。The acquiringmodule 801 is used to acquire user tags.

第五确定模块802用于根据用户标签与视频库中各视频的标签之间的第一相似度，从视频库中确定用于推荐给用户的第一目标视频。The fifth determiningmodule 802 is configured to determine a first target video for recommendation to the user from the video library according to the first similarity between the user tag and the tags of each video in the video library.

其中，视频的标签是根据上述确定视频标签的装置700得到的。Wherein, the tag of the video is obtained according to the above-mentionedapparatus 700 for determining the tag of the video.

图9是根据本公开的一个实施例的视频查询装置的框图。FIG. 9 is a block diagram of a video search device according to an embodiment of the present disclosure.

如图9所示，该视频查询装置900包括接收模块901和第六确定模块902。As shown in FIG. 9 , thevideo query device 900 includes a receivingmodule 901 and a sixth determiningmodule 902 .

接收模块901用于接收查询视频的请求，请求包括查询词。The receivingmodule 901 is configured to receive a video query request, the request includes query words.

第六确定模块902用于根据查询词与视频库中各视频的标签之间的第二相似度，从视频库中确定与查询词对应的第二目标视频。Thesixth determination module 902 is configured to determine a second target video corresponding to the query word from the video library according to the second similarity between the query word and the tags of each video in the video library.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

图10示出了可以用来实施本公开的实施例的示例电子设备1000的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 10 shows a schematic block diagram of an exampleelectronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图10所示，设备1000包括计算单元1001，其可以根据存储在只读存储器(ROM)1002中的计算机程序或者从存储单元1008加载到随机访问存储器(RAM)1003中的计算机程序，来执行各种适当的动作和处理。在RAM 1003中，还可存储设备1000操作所需的各种程序和数据。计算单元1001、ROM 1002以及RAM 1003通过总线1004彼此相连。输入/输出(I/O)接口1005也连接至总线1004。As shown in FIG. 10 , thedevice 1000 includes acomputing unit 1001 that can be executed according to a computer program stored in a read-only memory (ROM) 1002 or loaded from astorage unit 1008 into a random-access memory (RAM) 1003. Various appropriate actions and treatments. In theRAM 1003, various programs and data necessary for the operation of thedevice 1000 can also be stored. Thecomputing unit 1001,ROM 1002, andRAM 1003 are connected to each other through abus 1004. An input/output (I/O)interface 1005 is also connected to thebus 1004 .

设备1000中的多个部件连接至I/O接口1005，包括：输入单元1006，例如键盘、鼠标等；输出单元1007，例如各种类型的显示器、扬声器等；存储单元1008，例如磁盘、光盘等；以及通信单元1009，例如网卡、调制解调器、无线通信收发机等。通信单元1009允许设备1000通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in thedevice 1000 are connected to the I/O interface 1005, including: aninput unit 1006, such as a keyboard, a mouse, etc.; anoutput unit 1007, such as various types of displays, speakers, etc.; astorage unit 1008, such as a magnetic disk, an optical disk, etc. ; and acommunication unit 1009, such as a network card, a modem, a wireless communication transceiver, and the like. Thecommunication unit 1009 allows thedevice 1000 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元1001可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1001的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1001执行上文所描述的各个方法和处理，例如视频标签的方法、视频推荐方法以及视频查询方法中的至少之一。例如，在一些实施例中，视频标签的方法、视频推荐方法以及视频查询方法中的至少之一可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元1008。在一些实施例中，计算机程序的部分或者全部可以经由ROM 1002和/或通信单元1009而被载入和/或安装到设备1000上。当计算机程序加载到RAM 1003并由计算单元1001执行时，可以执行上文描述的视频标签的方法、视频推荐方法以及视频查询方法中的至少之一的一个或多个步骤。备选地，在其他实施例中，计算单元1001可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行视频标签的方法、视频推荐方法以及视频查询方法中的至少之一。Thecomputing unit 1001 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples ofcomputing units 1001 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. Thecalculation unit 1001 executes various methods and processes described above, such as at least one of a video labeling method, a video recommendation method, and a video query method. For example, in some embodiments, at least one of the video tagging method, the video recommendation method and the video query method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as thestorage unit 1008 . In some embodiments, part or all of the computer program may be loaded and/or installed on thedevice 1000 via theROM 1002 and/or thecommunication unit 1009. When the computer program is loaded into theRAM 1003 and executed by thecomputing unit 1001, one or more steps of at least one of the above-described video label method, video recommendation method and video query method can be performed. Alternatively, in other embodiments, thecomputing unit 1001 may be configured to execute at least one of the video labeling method, the video recommendation method and the video query method in any other appropriate manner (for example, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本公开保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等，均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.