CN109889891B

Movatterモバイル変換

Info

Publication number: CN109889891B
Application number: CN201910165332.8A
Authority: CN
Inventors: 张晗
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2023-03-24
Anticipated expiration: 2039-03-05
Also published as: CN109889891A

Abstract

The embodiment of the invention discloses a method, a device and a storage medium for acquiring a target media file, wherein the method comprises the following steps: loading media data of a plurality of media files; based on the media data, respectively performing feature extraction on each media file to obtain a plurality of feature values of each media file, where the plurality of feature values include: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file; obtaining the scores of the media files based on a plurality of characteristic values of the media files and the mapping relation between the characteristic values and the scores of the media files; and selecting a preset number of media files from the plurality of media files as target media files based on the scores of the media files.

Description

Method, device and storage medium for acquiring target media file

Technical Field

The present invention relates to data processing technologies, and in particular, to a method, an apparatus, and a storage medium for acquiring a target media file.

Background

With the development of internet technology, a user can watch media files such as videos, pictures and texts through a client of a mobile terminal, and the client can also recommend high-quality media files to the user so as to improve the click rate of the media files, as shown in fig. 1, the user can watch the recommended videos through main feeds of a content distribution platform (a viewpoint).

Disclosure of Invention

The embodiment of the invention provides a method, a device and a storage medium for acquiring a target media file, which can quickly and accurately acquire a high-quality target media file.

The technical scheme of the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a method for acquiring a target media file, including:

loading media data of a plurality of media files;

based on the media data, respectively performing feature extraction on each media file to obtain a plurality of feature values of each media file, where the plurality of feature values include: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file;

obtaining the scores of the media files based on a plurality of characteristic values of the media files and the mapping relation between the characteristic values and the scores of the media files;

and selecting a preset number of media files from the plurality of media files as target media files based on the scores of the media files.

In a second aspect, an embodiment of the present invention provides a method for acquiring a target media file, including:

responding to an acquisition instruction of a target media file, and sending an acquisition request of the target media file;

receiving the returned target media file, wherein the target media file is selected from a plurality of media files based on scores of the media files, the scores of the media files are calculated based on a plurality of characteristic values of the media files, and mapping relations between the characteristic values and the scores of the media files, and the characteristic values comprise: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file;

and displaying the target media file through a user interface.

In a third aspect, an embodiment of the present invention provides an apparatus for acquiring a target media file, including:

a loading unit for loading media data of a plurality of media files;

an extracting unit, configured to perform feature extraction on each media file based on the media data, to obtain a plurality of feature values of each media file, where the plurality of feature values include: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file;

the mapping unit is used for obtaining the scores of the media files based on a plurality of characteristic values of the media files and the mapping relation between the characteristic values and the scores of the media files;

and the selecting unit is used for selecting a preset number of media files from the plurality of media files as target media files based on the scores of the media files.

In a fourth aspect, an embodiment of the present invention provides an apparatus for acquiring a target media file, including:

the sending unit is used for responding to an acquisition instruction of a target media file and sending an acquisition request of the target media file;

a receiving unit, configured to receive the returned target media file, where the target media file is selected from multiple media files based on scores of the media files, the scores of the media files are calculated based on multiple feature values of the media files, and a mapping relationship between the feature values and the scores of the media files, and the multiple feature values include: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file;

and the display unit is used for displaying the target media file through a user interface.

In a fifth aspect, an embodiment of the present invention provides an apparatus for acquiring a target media file, including:

a memory configured to store a program for acquiring a target media file;

and the processor is configured to run the program, wherein the program executes the method for acquiring the target media file provided by the embodiment of the invention when running.

In a sixth aspect, an embodiment of the present invention provides a storage medium, where an executable program is stored, and when the executable program is executed by a processor, the method for acquiring a target media file is implemented, where the method is provided in an embodiment of the present invention.

The application of the embodiment of the invention has the following beneficial effects:

by applying the method, the device and the storage medium for acquiring the target media file in the embodiment of the invention, the characteristic values of the obtained media file in the process of extracting the characteristics of the media file comprise: the characteristic values of the attribute characteristics of the media files and the characteristic values of the statistical characteristics of the media files, namely the adopted characteristics of the media files comprise the inherent attribute characteristics of the media files and the posterior statistical characteristics of the media files, the adopted characteristic information is comprehensive, then the media files are scored based on the obtained characteristic values, the obtained scores can accurately and truly reflect the advantages and disadvantages of the media files, and further the acquisition accuracy of the target media files is higher.

Drawings

Fig. 1 is a schematic diagram of a client presenting a recommended video according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a user representation provided by an embodiment of the present invention;

FIG. 3 is a block diagram illustrating an architecture of asystem 100 for obtaining a target media file according to an embodiment of the present invention;

fig. 4 is a schematic hardware structure diagram of an apparatus for acquiring a target media file according to an embodiment of the present invention;

fig. 5 is a first flowchart illustrating a method for acquiring a target media file according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating loading media data based on a user representation according to an embodiment of the invention;

FIG. 7 is a schematic diagram of a structure of a feature value according to an embodiment of the present invention;

fig. 8 is a schematic flowchart of a second method for acquiring a target media file according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an architecture for acquiring a target media file according to an embodiment of the present invention;

fig. 10 is a schematic diagram illustrating an architecture of video scoring reverse ranking according to an embodiment of the present invention;

FIG. 11 is a block diagram illustrating an architecture of model training according to an embodiment of the present invention;

FIG. 12 is a first schematic structural diagram illustrating an apparatus for acquiring a target media file according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a second configuration of an apparatus for acquiring a target media file according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the examples provided herein are merely illustrative of the present invention and are not intended to limit the present invention. In addition, the following embodiments are provided as partial embodiments for implementing the present invention, not all embodiments for implementing the present invention, and the technical solutions described in the embodiments of the present invention may be implemented in any combination without conflict.

It should be noted that, in the embodiments of the present invention, the terms "comprises", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, an element defined by the phrases "comprising a component of 8230; \8230;" does not exclude the presence of additional related elements in a method or apparatus that comprises the element (e.g., steps in a method or elements in an apparatus, such as elements that may be part of a circuit, part of a processor, part of a program or software, etc.).

For example, the method for acquiring a target media file according to the embodiment of the present invention includes a series of steps, but the method for acquiring a target media file according to the embodiment of the present invention is not limited to the described steps, and similarly, the apparatus for acquiring a target media file according to the embodiment of the present invention includes a series of units, but the apparatus according to the embodiment of the present invention is not limited to include the explicitly described units, and may also include units that are required to be configured to acquire relevant information or perform processing based on the information.

In the description that follows, references to the terms "first", "second", and the like, are intended only to distinguish similar objects and not to indicate a particular ordering for the objects, it being understood that "first", "second", and the like may be interchanged under certain circumstances or sequences of events to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.

Before further detailed description of the present invention, terms and expressions referred to in the embodiments of the present invention are described, and the terms and expressions referred to in the embodiments of the present invention are applicable to the following explanations.

1) The user representation, which refers to a virtual representation of a real user, is a target user model established on a series of attribute data, and refers herein to a hierarchical interest model of a corresponding user abstracted according to historical behavior data of the user, for indicating interest classification of the user, as shown in fig. 2, which is a schematic diagram of the user representation provided by the embodiment of the present invention.

2) Media files, media available in the internet in various forms (e.g., video, audio, teletext, etc.) such as video files presented in a client, articles including teletext forms, etc.

3) The time-new degree: the reference standard used for measuring the timeliness of the media file reflects the timeliness of the media file in time; for example, the time newness of video news released on the day is higher than the time newness of video news released a week ago.

4) In response to the condition or state indicating that the executed operation depends on, one or more of the executed operations may be in real-time or may have a set delay when the dependent condition or state is satisfied; there is no restriction on the order of execution of the operations performed unless otherwise specified.

In some embodiments, the acquisition of the target media file may be performed by: evaluating the quality of the media files by adopting the attributes of the media files, scoring the media files based on the attributes of the media files, sequencing the media files based on the scores, and selecting target media files based on sequencing results; the attributes of the media files such as the grade of the published media, the grade of the media files, the number of tags of the media files and the like, wherein the grade of the published media files is a comprehensive score of the media according to the quality, data representation and the like of the media files published by the media history; similarly, the media file rank is a composite score of the article based on the length of the title of the media file, the number of title keywords, text information, picture information, length, and the like. The mode for acquiring the target media file is established on the premise of strong prior knowledge, the quality of the media file is evaluated by adopting a rule subjectively formulated by a user (for example, all media files issued by high-quality media are considered to be superior to media files issued by other media), and the subjectivity is strong.

In some embodiments, the acquisition of the target media file may also be performed by: evaluating the quality of the media files based on the posterior statistical information of the media files, and performing greedy sorting based on the posterior statistical information: such as e-greedy algorithm, thompson sampling, etc. The e-greedy algorithm sorts according to the click rate of the current media file each time, and the current optimal media file is considered to be global optimal; the Topson sampling is modeled by gamma distribution, the modeling is carried out through the current click quantity and the display quantity of the media file, and the current profit value of the media file is obtained in a sampling mode each time. The method for acquiring the target media file strongly depends on the posterior statistical information of the article, the posterior statistical information of the article is used for modeling, the inherent attribute information of the media file is completely abandoned, and a greedy algorithm is generally local optimal and not global optimal.

Fig. 3 is an alternative architecture diagram of thesystem 100 for acquiring a target media file according to an embodiment of the present invention, referring to fig. 3, in order to support an exemplary application, a terminal 400 (exemplary terminal 400-1 and terminal 400-2 are shown) is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented using a wireless link.

In some embodiments, the terminal 400 is configured to send an acquisition request of a target media file to the server when a user triggers an acquisition instruction of the media file through the client (for example, opens a viewpoint page of a mobile phone QQ);

the server 200 is configured to receive an acquisition request of a target media file sent by a terminal, load media data of a plurality of media files, and perform feature extraction on each media file based on the media data to obtain a plurality of feature values of each media file, where the plurality of feature values include: the method comprises the steps that the characteristic values of attribute characteristics of media files and the characteristic values of statistical characteristics of the media files are obtained, scores of the media files are obtained based on a plurality of characteristic values of the media files and the mapping relation between the characteristic values and the scores of the media files, and a preset number of media files in the media files are selected as target media files based on the scores of the media files;

the server 200 is further configured to send the selected target media file to the terminal 400;

the terminal 400 is further configured to display the received target media file on a graphical interface 410 (graphical interface 410-1 and graphical interface 410-2 are shown as examples).

Next, a device for acquiring a target media file according to an embodiment of the present invention will be described. The apparatus for acquiring a target media file according to the embodiment of the present invention may be implemented as hardware or a combination of hardware and software, and various exemplary implementations of the apparatus according to the embodiment of the present invention are described below.

The hardware structure of the apparatus for acquiring a target media file according to the embodiment of the present invention is described in detail below, and it is understood that fig. 4 only shows an exemplary structure of the apparatus for acquiring a target media file, and not a whole structure, and a part of or a whole structure shown in fig. 4 may be implemented as needed.

Thedevice 20 for acquiring a target media file provided by the embodiment of the invention comprises: at least oneprocessor 201,memory 202,user interface 203, and at least onenetwork interface 204. The various components in thedevice 20 for obtaining a target media file are coupled together by abus system 205. It will be appreciated that thebus system 205 is used to enable communications among the components. Thebus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled asbus system 205 in fig. 4.

Theuser interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.

It will be appreciated that thememory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory.

Thememory 202 in embodiments of the present invention is used to store various types of data to support the operation of theapparatus 20 for retrieving a target media file. Examples of such data include: any executable instructions, such as executable instructions, for operating on the apparatus for acquiring atarget media file 20 may be included in the executable instructions, and the program implementing the method for acquiring a target media file according to the embodiment of the present invention may be included in the executable instructions.

The method for acquiring the target media file disclosed by the embodiment of the invention can be applied to theprocessor 201, or can be implemented by theprocessor 201. Theprocessor 201 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method of obtaining the target media file may be performed by integrated logic circuits of hardware or instructions in the form of software in theprocessor 201. TheProcessor 201 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Theprocessor 201 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software module may be located in a storage medium located in thememory 202, and theprocessor 201 reads the information in thememory 202, and performs the steps of the method for obtaining the target media file provided by the embodiment of the present invention in combination with hardware thereof.

Next, a method for acquiring a target media file according to an embodiment of the present invention is described. In some embodiments, referring to fig. 5, fig. 5 is a schematic flowchart of a method for acquiring a target media file according to an embodiment of the present invention, where the method for acquiring a target media file according to an embodiment of the present invention includes:

step 301: the server loads media data for a plurality of media files.

In some embodiments, the server may perform the loading of the media data by:

the server acquires historical behavior data of a target user; determining a user representation indicative of a classification of interest of a target user based on historical behavior data; media data corresponding to a plurality of media files of the user representation is loaded.

Here, in actual implementation, the server may obtain historical behavior data of the target user based on the target user identifier, such as a viewed/clicked media file, a corresponding media file type, a viewing/clicking number, and the like, perform calculation on a user portrait of the target user based on the historical behavior data of the target user to determine an interest classification of the target user, where fig. 6 is a schematic diagram of loading media data based on the user portrait provided in the embodiment of the present invention, and referring to fig. 6, the user portrait includes a label of a department, the server loads video data related to the department, the loaded video file exists as a candidate video file, and further, select a target video file (a video file to be recommended) from the candidate video files to recommend a high-quality media file that best fits the user interest to the user.

In practical applications, the media data of the media file loaded by the server may include attribute information (positive data) and posterior statistical data of the media file; taking a media file as a video as an example, the attribute information of the video file can be the first-level classification, the second-level classification, the third-level classification, the duration, a label (tag), a source, a topic, a cover score, a quality score, an explosion score, a time freshness, whether to group a picture, whether to large a picture, a video grade, a time freshness and the like of the video file, and the posterior statistical data of the video file can be the number of clicks, the number of plays, the click rate, the duration, the number of likes, the number of comments, the number of double clicks, the number of collections, the number of shares and the like of the video file; the posterior statistical data can be obtained according to click logs, duration logs, behavior logs and the like reported offline.

Step 302: based on the media data, respectively extracting the characteristics of each media file to obtain a plurality of characteristic values of each media file, wherein the plurality of characteristic values comprise: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file.

In some embodiments, the server performs the following operations on each media file separately to achieve feature extraction on each media file:

the method comprises the steps that a server obtains original values of at least two characteristics of a media file, wherein the at least two characteristics comprise attribute characteristics and statistical characteristics of the media file; and respectively obtaining the characteristic value of each characteristic based on the original value of each characteristic and the corresponding characteristic name.

In order to distinguish the statistical data performances of the media file in different time periods, the partial statistical characteristics may be further defined in different time windows, such as four time windows of hours, days, weeks, months, and the like, for example, the number of clicks of media file day/week/month, the click rate of media file day/week/month, the number of clicks of media file day/week/month, the number of comments of media file day/week/month, the number of collections of media file day/week/month, the number of shares of media file day/week/month, the number of forwards of media file day/week/month, and the like.

In the embodiment of the present invention, the feature intersection is also called a feature combination, and is a composite feature formed by combining individual features, and the feature is a composite feature formed by combining a single unique feature click rate and a single click number, taking the intersection of the click rate and the click number of a media file as an example.

In some embodiments, after obtaining the original values of the attribute features and the statistical features of the media file, the server may further obtain feature values of the respective features by:

hashing the original value of each feature to obtain a first hash value of each feature, and hashing the feature name character string of each feature to obtain a second hash value of each feature; and obtaining the characteristic value of each characteristic based on the first hash value and the second hash value of each characteristic.

Here, in actual implementation, the server may specifically obtain the first hash value of the feature by: the server maps the original value of the feature to a 64-bit hash space, and the obtained 64-bit hash value is the first hash value of the feature. Taking a single feature as an example, the original values of the features generally have three types, a uint64 type, a float type and a character string type; for example, the features such as number of clicks and number of like are generally of type uint64, the features such as click rate and number of like are generally of type float, and the features such as text media are generally of type string.

In practical implementation, the server may specifically obtain the second hash value of the feature by: the server maps the feature name character string to a 64-bit hash space, and the obtained 64-bit hash value is the second hash value of the feature.

In practical implementation, the server may specifically obtain the feature value of each feature according to the first hash value and the second hash value of each feature in the following manner: the server takes the lower 16 bits of the second hash value to indicate the feature type, takes the lower 48 bits of the first hash value to indicate the feature index (i.e. the offset of the feature within the class of features), and then combines the feature values into a feature value of 64 bits of the feature, wherein the first 16 bits of the feature value represent the feature type, and the last 48 bits represent the feature index. Compared with a continuous feature, the method of hashing the features can reduce the collision among the features, increase the distinguishability of the features, and obtain an example of the feature value of the features as shown in fig. 7.

Step 303: and obtaining the scores of the media files based on the plurality of characteristic values of the media files and the mapping relation between the characteristic values and the scores of the media files.

In some embodiments, the server may obtain the media text specifically through a pre-trained machine learning modelIn practical application, the adopted machine learning model can be selected according to actual needs, for example: logistic Regression (LR) model, factorization Machine (FM) model, field-aware Factorization Machine (FFM) model, depth Factorization Machine (Deep FM) model, and breadth Factorization Machine (Width FM) model&Depth (wide)&deep) model; taking the acquisition of the score by selecting the LR model as an example, the server inputs a plurality of feature values of each media file into a logistic regression model (classification model) obtained by pre-training, respectively, to obtain the score of each media file, and the formula adopted may be: y = w₀ +w₁ *x₁ +w₂ *x₂ +w₃ *x₃ +…+w_n *x_n (ii) a Wherein x is_n Is the nth characteristic value, w, of the media file_n Is x_n Y is the score of the media file, y is equal to [0,1 ]]In practical application, after obtaining n feature values corresponding to n features of a certain media file, the server inputs the n feature values into the trained LR model, and obtains a score between 0 and 1 corresponding to the value of the media file.

Next, the training of the LR model will be described by taking the LR model as an example. In practical implementation, the LR model is obtained by training according to the positive sample data and the negative sample data, and in one-time display of the media files, sample data corresponding to the requested media file among the displayed multiple media files is used as the positive sample data, and sample data corresponding to the media file that is not requested among the displayed multiple media files is used as the negative sample data.

Here, taking an n value of 63 and extracting 34 attribute features and 29 statistical features of a media file as an example, a positive sample data takes a feature value of 34 attribute features and a feature value of 29 statistical features of a media file as input and a score of 1 as output, and a negative sample data takes a feature value of 34 attribute features and a feature value of 29 statistical features of a media file as input and a score of 0 as output, and a training LR model predicts the performance of the score of the corresponding media file based on the input n feature values.

In practical implementation, for the training of the LR model, an online machine learning FTRL (Follow-the-regularized-Leader) algorithm may be used for the real-time training of the large-scale sparse logistic regression model.

Step 304: and selecting a preset number of media files from the plurality of media files as target media files based on the scores of the media files.

In some embodiments, the server may select the target media file by:

the server ranks the plurality of media files according to the scores of the media files to obtain a media file sequence, and selects the media files from the first media file in the media file sequence until a preset number of media files are selected as target media files.

The server obtains scores of 10 media files, and the preset number is 3. The scores of the 10 media files obtained by the server are all between 0 and 1, the server sorts the 10 obtained scores according to the sequence of the scores from high to low, such as 0.9, 0.85, 0.83, 0.8, 0.75, 0.7, 0.66, 0.64, 0.6 and 0.58, and the media files corresponding to the scores of 0.9, 0.85 and 0.83 are selected as target media files.

In some embodiments, the server may push the media file based on the selected target media file, and the server sends the target media file to the client of the target user for presentation. The pushed target media file is a high-quality media file determined based on the user portrait, so that the demand of the target user is met, and the click rate and the play rate of the media file are improved.

Next, a method for acquiring a target media file according to an embodiment of the present invention is described with a media file as a video file as an example, fig. 8 is a schematic flow chart of the method for acquiring a target media file according to the embodiment of the present invention, and referring to fig. 8, the method for acquiring a target media file according to the embodiment of the present invention includes:

step 401: and responding to the acquisition instruction of the target video file, and sending an acquisition request of the target video file to the server by the target user client.

Here, in practical applications, the target user client may be any client with a video file pushing (recommending) function, for example, a watching point module in the QQ client, and when the target user clicks a watching point page through the QQ client, the QQ client sends an acquisition request of the target video file (recommended video) to the server to perform video recommendation on the watching point page.

Step 402: the server determines a user representation of the target user based on the received acquisition request.

In actual implementation, the server analyzes the received acquisition request of the target video file to obtain a target user identifier, acquires historical behavior data of the target user, such as video watching data, video comment data, video collection data and the like, based on the target user identifier, and then calculates a user portrait of the target user based on the historical behavior data of the target user to determine interest classification of the target user.

Step 403: the server loads video data of a corresponding plurality of videos based on the user representation of the target user.

Here, since the user representation of the target user is a tagged user model abstracted based on the historical behavior data of the target user, that is, the target user is tagged based on the historical behavior data of the target user, and the tag is a mark capable of representing a certain dimension characteristic of the target user, in actual implementation, video data (video forward arrangement data) related to the tag can be acquired based on the tag in the abstracted user model, where the video data includes a video title, a video description, a classification, a tag, a comment number, and the like.

Step 404: and the server respectively extracts the characteristics of each video based on the loaded video data to obtain n characteristic values of each video.

Here, in actual implementation, for each video file, the server performs feature extraction by: the server obtains original values of n features including attribute features and statistical features of the video, then maps the original value of each feature to a 64-bit hash space, takes the lower 48 bits of each obtained 64-bit hash value to indicate a feature index, then maps a feature name character string of each feature to the 64-bit hash space, takes the lower 16 bits of each obtained 64-bit hash value to indicate a feature type, and then combines the 16-bit hash value indicating the feature type and the 48-bit hash value indicating the feature index into a 64-bit feature value of the feature, the first 16 bits of which represent the feature type and the second 48 bits of which represent the feature index. The server obtains n characteristic values of each video by adopting the same method, and in practical application, the n value can be set according to actual requirements.

Specifically, the server may calculate the feature value using the following formula:

Y＝hash((feature_name&0xFFFF)<<48+feature_value&0xFFFFFFFFFFFF。

step 405: and the server respectively inputs the n characteristic values of each video into the LR model obtained by training to obtain the score of each video.

Step 406: and the server selects a preset number of target videos from the plurality of videos based on the obtained scores of the videos, and sends corresponding video data to the target user client.

Here, when the server selects the video files, the video files may be selected based on the scores, and specifically, the video files may be sorted according to the scores to obtain a video file sequence, and the video file selection is performed starting from the first video file in the video file sequence until a preset number of video files are selected as the target video files.

In practical implementation, the server may send video data (e.g., video cover data) of the selected video to the target user client by streaming.

Step 407: and the target user client displays the video data sent by the server.

In practical application, a target user watches videos based on video data displayed by a client, if the target user clicks videos which the target user wants to watch, the client requests the video data of the videos from a server, user operation data corresponding to the videos are recorded at the same time, new positive sample data (sample data corresponding to the videos requested by the user in the current video display process) and new negative sample data (sample data corresponding to the videos requested by the user in the current video display process) are constructed based on the currently displayed videos, then the LR model obtained by training is updated by adopting the constructed positive sample data and the constructed negative sample data, and the LR model after parameter updating is obtained for subsequent grade acquisition of the videos.

Here, a description is given of one-time video display of the client, and in practical application, a request (click) of a user for a video is delayed relative to the video display, so that a time window is set, and it is considered that requests (click) for the video occurring within one time window (which may be specifically set according to actual conditions, such as 15 minutes) are all video requests in the corresponding video display process; taking a time window of 15 minutes as an example, within 15 minutes from the display of the video, sample data corresponding to the video requested by the user in the displayed video are all positive sample data, and within the 15 minutes, sample data corresponding to the video not requested by the user in the displayed video are all negative sample data.

Next, a method for acquiring a target media file according to an embodiment of the present invention will be described by taking a media file as a video file, a client as a QQ client, and a video recommendation engine (viewpoint) provided in the QQ client as an example. Fig. 9 is a schematic diagram of an architecture for acquiring a target media file according to an embodiment of the present invention, and referring to fig. 9, a method for acquiring a target media file according to an embodiment of the present invention mainly includes an offline portion and an online portion; the offline part mainly calculates user images according to historical behavior data of users, mainly comprises images with different dimensions such as tags and channels, the online part mainly comprises recalls of candidate videos, ranking and scoring of videos, display of video diversity and the like, and the parts are explained respectively.

For the offline part, the user portrait is a long-term accumulation of user interests and is a hierarchical interest model, and as shown in fig. 2, a first-level classification, a second-level classification and a tag are sequentially performed from the top layer to the bottom layer; taking the tag of "science ratio" as an example, the tag is a video related to "science ratio" in the recall video library and serves as a candidate video. In actual implementation, when the user starts the recommendation service, the server may perform user portrait calculation on the user based on the user identifier carried in the request sent by the client, so as to recall the relevant video.

For the online part, fig. 10 is a schematic diagram of an architecture of video ranking according to an embodiment of the present invention, and referring to fig. 10, the architecture includes a video ranking part and a model training part; next, each part will be explained.

And the video sorting part comprises video data loading, feature extraction and grading sorting.

First, loading of video data is explained.

Here, in actual implementation (i.e., when the recommendation service is started), the server loads forward data of a corresponding video based on the user portrait, where the forward data includes all attribute feature information of the video, such as all attribute feature data related to the video, such as a first-level classification, a second-level classification, a third-level classification, a duration, a tag, a source, a topic, a cover score, a quality score, a breaking score, a time-freshness degree, whether to group a picture, whether to large-size picture, a video level, and a time-freshness degree; meanwhile, the server loads statistical characteristic data (a series of data expression information of the video after exposure) of the corresponding video, such as different statistical characteristic data of the number of clicks, the number of plays, the click rate, the duration, the number of prawns, the number of comments, the number of double clicks, the number of collections, the number of shares and the like of the video. The statistical characteristic data can be obtained according to statistics of click logs, duration logs, behavior logs and the like reported offline, and are pushed to the online in the form of offline statistical files, and are loaded when the forward data are loaded, so that data support is provided for subsequent characteristic extraction.

Next, feature extraction will be explained. The feature extraction comprises three parts, namely feature engineering, feature index and feature coding.

The feature engineering mainly extracts attribute features and statistical features of the video from the loaded video data to obtain original values of the features.

In some embodiments, from the perspective of video content, a video includes important attributes such as a first-level classification, a second-level classification, a third-level classification, a duration, a tag, a source, a topic, a cover score, a quality score, a time novelty, whether to group a graph, whether to big a graph, an article level, and a time novelty, attribute single features are extracted in a targeted manner, and for more important single attribute features such as the topic, the duration, the cover score, the quality score, and the explosion score, intersection between the attribute features and video basic categories such as the first-level classification, the second-level classification, and the third-level classification is increased, so that finally, 26 types of single attribute features, 8 types of cross attribute features, and 34 types of video attribute features are obtained.

Taking video as an example of video news, the following attribute features can be extracted: video news first class classification, video news second class classification, video news third class classification, video news duration, video news source, video news tag, video news title tag, whether video news is a group diagram, whether video news is a big diagram, video news regional news grade, video news event grade, video news time-freshness, video news title byte count, video news ID, video news picture attribute, video news release duration, video news delivery time, video news region, video news topic the method comprises the following steps of video news title word number, video news source quality score, video news cover score, video news quality score, video news breaking score, video news cover and video classification intersection, video news cover and video duration intersection, video news quality and video classification intersection, video news quality and video duration intersection, video news breaking score and video news classification intersection, video news breaking score and video duration intersection, video news topic and video news release duration intersection, and video news classification and video region intersection.

From the perspective of a recommendation service, after exposure, a video has a series of data expression information, such as different statistical characteristics of the number of clicks, the number of plays, the click rate, the duration, the number of praise, the number of comments, the number of double clicks, the number of collections, the number of shares and the like of the video.

Taking video as an example of video news, the following statistical features can be extracted: video news hour click rate, video news day click rate, video news week click rate, video news popularity, video news day/week/month play count, video news day/week/month share count, video news day/week/month forward count, video news day/week/month collection count, video news day/week/month BIU count, video news source click rate, video news source click count, video news reading duration, video news comment count, video news comment rate, video news reading completion count, video news user like count, video news like rate, video news user dislike count, video news dislike rate, video news hour click count, video news channel click count, video news topic click rate, video news media click rate, video news click rate and click count intersection, video news week click rate and click rate intersection, video news hour-level ctr and click count, video channel and hot click count intersection, video news topic and hot topic click rate, video news topic and video news topic click rate.

And when the server executes the feature engineering operation, acquiring the original value of the attribute feature of the 34 types of videos and the original value of the posterior statistical feature of the 29 types of videos based on the loaded video data.

The feature index is to calculate the offset of the feature in the class of features, when the feature is indexed, there are generally one or more input values (i.e., original values), where the feature index is calculated according to the input one or more input values, and in practical implementation, the feature index can be obtained by mapping the original values of the feature to a 64-bit hash space.

Taking the single feature as an example, there are generally three types of input, a uint64 type, a float type, and a string type. Features such as click number, like click like number are generally of type uint64, and the feature index is the input x value at this time; for example, the features such as click rate and like are generally float features, and the feature index is x 10000 at this time; for example, a feature such as a distribution medium is generally a character string feature, and in this case, the feature index is hash (x), and a value is calculated by hashing the character string.

Taking cross characteristics as an example, inputting a plurality of parameters, taking cross characteristics of channels and click numbers as an example, the two single characteristics respectively correspond to two uint64 values, which are x1 and x2 respectively, and the embodiment of the invention performs connection by multiplying prime numbers, specifically, the connection is performed in a manner of x 1+ 13131+ x 2. The method can be expanded, and features with input values in any format can obtain respective index values according to the single feature index calculation mode, and then are connected in the prime number multiplication mode. Similarly, the method can be expanded from 2 input features to a plurality of input features.

The feature coding is mainly coding calculation of feature values. In order to increase the feature distinctiveness and give consideration to the online performance, the embodiment of the present invention maps the feature value to a 64-bit hash space. Representing the feature type by using the first 16 bits of a 64-bit space, and obtaining the feature type by hashing a feature name character string and taking the lower 16 bits; using the last 48 bits to represent the feature index, and hashing the feature value to obtain the lower 48 bits, namely:

Y＝hash((feature_name&0xFFFF)<<48+feature_value&0xFFFFFFFFFFFF；

compared with continuous features, the method for hashing the features can reduce the conflict among the features and increase the distinguishability of the features.

The ranking of the scores will be explained next.

The server inputs a plurality of characteristic values of the video obtained by characteristic extraction into the LR model obtained by training, and a score corresponding to the video is obtained. The forward calculation method of the LR model is as follows:

y＝w₀ +w₁ *x₁ +w₂ *x₂ +w₃ *x₃ +…+w_n *x_n ；

wherein x is_n Is the n-th feature value of the video, w_n Is x_n Y is the score of the video, y is equal to [0,1 ]]。

In practical implementation, the unordered _ map container access parameter of stl can be used, but the search time is too high, and the container access parameter of google dense _ map can also be used, so that the search time can be reduced by about 2/3 in space time.

After the server obtains the scores of the videos, the videos are sorted based on the scores, and the videos with high scores and a preset number are selected based on the sorting result to be recommended.

The model training section will be explained next. Fig. 11 is a schematic diagram of an architecture of model training provided in an embodiment of the present invention, and referring to fig. 11, the model training mainly includes three parts, namely log merging, feature extraction, and model training, which are described below.

The log merging is mainly to aggregate all information of one request according to a click log, a display log and a feature log. Because the click is longer than the general delay of the display, a time window problem exists, in the embodiment of the present invention, a 15min time window is adopted, and it is considered that one displayed click occurs within 15 min. And for each display article requested each time, searching whether the display article is clicked and corresponding characteristic data, and writing the combined log data on kafka.

The feature extraction is operated on a spark frame, corresponding feature data are respectively extracted according to the combined log data, and a positive sample and a negative sample of model training are constructed, wherein sample data corresponding to a clicked video is positive sample data, and sample data corresponding to a non-clicked video is negative sample data. When the characteristics are extracted, the method simultaneously counts core indexes such as the number of training samples, the number of testing samples, the average sample length, the positive sample rate and the like, and is used for monitoring the running state of the model. In practical application, an open source framework MXNET can be specifically adopted for model training.

Fig. 12 is a schematic structural diagram of an apparatus for acquiring a target media file according to an embodiment of the present invention, where the apparatus is located on a server side, and referring to fig. 12, the apparatus for acquiring a target media file according to an embodiment of the present invention includes:

a loading unit 121 for loading media data of a plurality of media files;

an extracting unit 122, configured to perform feature extraction on each media file based on the media data, to obtain a plurality of feature values of each media file, where the plurality of feature values include: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file;

a mapping unit 123, configured to obtain a score of each media file based on a plurality of feature values of each media file and a mapping relationship between the feature values and the scores of the media files;

a selecting unit 124, configured to select a preset number of media files from the multiple media files as target media files based on the scores of the media files.

In some embodiments, the loading unit is specifically configured to obtain historical behavior data of a target user;

determining, based on the historical behavior data, a user representation indicative of an interest classification of the target user;

media data corresponding to a plurality of media files of the user representation is loaded.

In some embodiments, the extracting unit is specifically configured to perform the following operations on each of the media files:

acquiring original values of at least two characteristics of the media file, wherein the at least two characteristics comprise attribute characteristics and statistical characteristics of the media file;

and respectively obtaining the characteristic value of each characteristic based on the original value of each characteristic and the corresponding characteristic name.

In some embodiments, the extraction unit is specifically configured to hash an original value of each feature to obtain a first hash value of each feature, and hash a feature name string of each feature to obtain a second hash value of each feature;

and respectively obtaining the characteristic value of each characteristic based on the first hash value and the second hash value of each characteristic.

In some embodiments, the mapping unit is specifically configured to input a plurality of feature values of each of the media files into a logistic regression model, so as to obtain a score of each of the media files;

the logistic regression model is obtained by training according to positive sample data and negative sample data;

in the process of displaying the media files once, sample data corresponding to the requested media files in the displayed media files is used as positive sample data, and sample data corresponding to the media files which are not requested in the displayed media files is used as negative sample data.

In some embodiments, the selecting unit is specifically configured to sort the multiple media files according to scores of the media files to obtain a media file sequence;

and starting to select media files from the first media file in the media file sequence until a preset number of media files are selected as target media files.

In some embodiments, the apparatus further comprises:

and the pushing unit is used for sending the target media file to a client of a target user for presentation.

Fig. 13 is a schematic structural diagram of an apparatus for obtaining a target media file according to an embodiment of the present invention, and referring to fig. 13, the apparatus is located at a client side, and the apparatus for obtaining a target media file according to an embodiment of the present invention includes:

a sending unit 131, configured to send an acquisition request of a target media file in response to an acquisition instruction of the target media file;

a receiving unit 132, configured to receive the returned target media file, where the target media file is selected from multiple media files based on scores of the media files, the scores of the media files are calculated based on multiple feature values of the media files and a mapping relationship between the feature values and the scores of the media files, and the multiple feature values include: the characteristic value of the attribute characteristic of the media file and the characteristic value of the statistical characteristic of the media file;

the presentation unit 133 is configured to present the target media file through a user interface.

The embodiment of the invention also provides a device for acquiring the target media file, which comprises:

a memory for storing an executable program;

and the processor is used for realizing the method for acquiring the target media file provided by the embodiment of the invention when executing the executable program stored in the memory.

The embodiment of the invention also provides a storage medium which stores an executable program, and when the executable program is executed by a processor, the method for acquiring the target media file provided by the embodiment of the invention is realized.

Here, it should be noted that: the above description related to the apparatus for obtaining a target media file is similar to the above description of the method, and for the technical details not disclosed in the apparatus for obtaining a target media file according to the embodiment of the present invention, please refer to the description of the embodiment of the method of the present invention.

All or part of the steps of the embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Random Access Memory (RAM), a Read-Only Memory (ROM), a magnetic disk, and an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a RAM, a ROM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for obtaining a target media file, comprising:

loading media data of a plurality of media files, wherein the media data comprise attribute information of the media files and statistical characteristic data obtained by statistics based on posterior statistical data of the media files;

performing feature extraction on each media file based on the attribute information of the media files to obtain original values of a plurality of attribute features of each media file, and performing feature extraction on each media file based on statistical feature data of the media files to obtain original values of a plurality of statistical features of each media file;

for each attribute feature, carrying out hash on an original value of the attribute feature to obtain a first hash value of the attribute feature, and carrying out hash on a feature name of the attribute feature to obtain a second hash value of the attribute feature; obtaining a characteristic value of the attribute characteristic based on the first hash value of the attribute characteristic and the second hash value of the attribute characteristic;

for each statistical feature, carrying out hash on an original value of the statistical feature to obtain a first hash value of the statistical feature, and carrying out hash on a feature name of the statistical feature to obtain a second hash value of the statistical feature; obtaining a characteristic value of the statistical characteristic based on the first hash value of the statistical characteristic and the second hash value of the statistical characteristic;

wherein the attribute characteristics include: individual attribute features, composite attribute features, said composite attribute features being formed by feature intersection of at least two individual attribute features, said statistical features comprising: the method comprises the following steps of (1) carrying out individual statistical characteristics and composite statistical characteristics, wherein the composite statistical characteristics are formed by carrying out characteristic intersection on at least two individual statistical characteristics;

2. The method of claim 1, wherein the loading media data for a plurality of media files comprises:

acquiring historical behavior data of a target user;

3. The method of claim 1, wherein deriving the score for each of the media files based on a plurality of feature values for each of the media files and a mapping between the feature values and the scores for the media files comprises:

respectively inputting a plurality of characteristic values of each media file into a logistic regression model, and correspondingly obtaining the score of each media file;

4. The method of claim 1, wherein selecting a predetermined number of the plurality of media files as target media files based on the scores of each of the media files comprises:

based on the scores of the media files, sequencing the media files according to the scores to obtain a media file sequence;

5. The method of claim 1, wherein the method further comprises:

and sending the target media file to a client of a target user for presentation.

6. A method for retrieving a target media file, comprising:

and displaying the target media file through a user interface.

7. An apparatus for retrieving a target media file, comprising:

the loading unit is used for loading media data of a plurality of media files, wherein the media data comprise attribute information of the media files and statistical characteristic data obtained by statistics based on posterior statistical data of the media files;

an extraction unit, configured to perform feature extraction on each media file based on attribute information of the media file to obtain original values of a plurality of attribute features of each media file, and perform feature extraction on each media file based on statistical feature data of the media file to obtain original values of a plurality of statistical features of each media file; for each attribute feature, carrying out hash on an original value of each attribute feature to obtain a first hash value of the attribute feature, and carrying out hash on a feature name of the attribute feature to obtain a second hash value of the attribute feature; obtaining a characteristic value of the attribute characteristic based on the first hash value of the attribute characteristic and the second hash value of the attribute characteristic;

8. The apparatus of claim 7,

the loading unit is specifically used for acquiring historical behavior data of a target user;

9. The apparatus of claim 7,

the mapping unit is specifically configured to input the multiple feature values of each media file into a logistic regression model, so as to obtain a score of each media file;

10. The apparatus of claim 7,

the selecting unit is specifically configured to sort the plurality of media files according to the scores of the media files to obtain a media file sequence;

11. An apparatus for retrieving a target media file, comprising:

12. An apparatus for retrieving a target media file, comprising:

a memory for storing executable instructions;

a processor configured to implement the method of retrieving a target media file of any one of claims 1 to 6 when executing executable instructions stored in the memory.

13. A computer-readable storage medium storing executable instructions for implementing the method of retrieving a target media file of any one of claims 1 to 6 when executed by a processor.