CN115114509B

Movatterモバイル変換

Info

Publication number: CN115114509B
Application number: CN202110290204.3A
Authority: CN
Inventors: 黄剑辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2025-10-03
Anticipated expiration: 2041-03-18
Also published as: CN115114509A

Abstract

The embodiment of the application discloses a content pushing method, a feature vector determining method and a related device, which at least relate to machine learning in artificial intelligence, acquire target content to be processed, wherein the target content comprises first information from a first mode and second information from a second mode, generate a first undetermined vector corresponding to the first information and a second undetermined vector corresponding to the second information, determine a first activation vector according to the first undetermined amount, the first activation vector comprises a first weight parameter determined by the first undetermined amount, adjust the second undetermined vector through the first activation vector to obtain a second mode vector, the first weight parameter is used for enhancing key information in the second undetermined vector and inhibiting non-key information in the second undetermined vector, and generate feature vectors corresponding to the target content according to the first undetermined vector and the second mode vector, so that the undetermined vectors of different modes are fused better, and the accuracy of the feature vectors is improved.

Description

Content pushing method, feature vector determining method and related device

Technical Field

The present application relates to the field of data processing, and in particular, to a content pushing method, a feature vector determining method, and a related device.

Background

Various contents can be transmitted on the Internet, and the vectorization expression of the contents can be generated according to the contents, wherein the eigenvector can quantitatively express the related information of the corresponding contents, so that the eigenvector can be applied to a large number of contents processing scenes, for example, the contents are classified by the eigenvector of the contents, pushed by the contents, and the like.

It can be seen whether the feature vector can accurately directly affect the subsequent content processing scenario. However, many contents currently have related information from multiple dimensions, for example, related information of a video may be from two dimensions of a video title and a video frame, text vectors are extracted based on the video title, image vectors are extracted based on the video frame, and feature vectors for expressing the video need to be constructed by the text vectors and the image vectors together.

In the above examples, the vectors generated based on different sources (such as text and video frames) belong to multi-modal vectors, and how to reasonably construct a feature vector corresponding to a content through the multi-modal vectors is a technical problem to be solved at present.

Disclosure of Invention

In order to solve the technical problems, the application provides a content pushing method, a feature vector determining method and a related device, which are used for reasonably constructing a feature vector corresponding to content through multi-modal vectors.

The embodiment of the application discloses the following technical scheme:

in one aspect, the present application provides a feature vector determining method, the method comprising:

acquiring target content to be processed, wherein the target content comprises first information from a first modality and second information from a second modality;

Generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information;

determining a first activation vector according to the first quantity to be oriented, wherein the first activation vector comprises a first weight parameter determined by the first quantity to be oriented;

the second undetermined vector is adjusted through the first activated vector to obtain a second modal vector, wherein the first weight parameter is used for enhancing key information in the second undetermined vector and inhibiting non-key information in the second undetermined vector;

And generating a feature vector corresponding to the target content according to the first undetermined vector and the second modal vector.

In another aspect, the present application provides a feature vector determining apparatus, which includes an acquisition unit, a first generation unit, a first determination unit, an adjustment unit, and a second generation unit;

the acquisition unit is used for acquiring target content to be processed, wherein the target content comprises first information from a first modality and second information from a second modality;

the first generating unit is used for generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information;

The first determining unit is configured to determine a first activation vector according to the first to-be-oriented amount, where the first activation vector includes a first weight parameter determined by the first to-be-oriented amount;

The adjusting unit is configured to adjust the second to-be-determined vector through the first activation vector to obtain a second modal vector, where the first weight parameter is used to enhance key information in the second to-be-determined vector and suppress non-key information in the second to-be-determined vector;

the second generating unit is configured to generate a feature vector corresponding to the target content according to the first undetermined vector and the second modal vector.

In another aspect, the present application provides a content pushing method, including:

Determining object characteristics of a target object;

The method comprises the steps of determining target content associated with object features from undetermined content according to feature vectors of the undetermined content, wherein the target content comprises first information from a first modality and second information from a second modality, the feature vectors of the target content are generated according to first undetermined vectors and second modality vectors, the first undetermined vectors correspond to the first information, the second undetermined vectors correspond to the second information, the second modality vectors are obtained by adjusting the second undetermined vectors through first activating vectors, the first activating vectors comprise first weight parameters determined through the first undetermined vectors, and the first weight parameters are used for enhancing key information in the second undetermined vectors and suppressing non-key information in the second undetermined vectors;

and returning the target content based on the target object.

In another aspect, the present application provides a content pushing apparatus including a determining unit and a returning unit;

The device comprises a determining unit, a determining unit and a determining unit, wherein the determining unit is used for determining object characteristics of a target object, determining target content related to the object characteristics from undetermined content according to characteristic vectors of undetermined content, wherein the target content comprises first information from a first modality and second information from a second modality, the characteristic vectors of the target content are generated according to first undetermined vectors and second modality vectors, the first undetermined vectors correspond to the first information, the second undetermined vectors correspond to the second information, the second modality vectors are obtained by adjusting the second undetermined vectors through first activating vectors, the first activating vectors comprise first weight parameters determined through the first undetermined vectors, and the first weight parameters are used for enhancing key information in the second undetermined vectors and inhibiting non-key information in the second undetermined vectors;

The return unit is used for returning the target content based on the target object.

In another aspect, the application provides a computer device comprising a processor and a memory:

The memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of the above aspect according to instructions in the program code.

In another aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for executing the method described in the above aspect.

In another aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method described in the above aspect.

According to the technical scheme, the object characteristics of the target object are determined, the target content is determined from the undetermined content according to the association of the characteristic vector of the undetermined content and the object characteristics, the target content of the characteristic vector to be generated has first information from a first mode and second information from a second mode, and the characteristic vector of the target content needs to be generated by integrating the first information and the second information. And generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information, and generating a feature vector of the target content based on the first to-be-positioned vector and the second to-be-positioned vector. Because the information from different modes has the expression modes corresponding to the modes, the different expression modes have differences in expression characteristics and key points, and in order to avoid the adverse effect of the different expression modes on the dispersion of the expression key points of the generated feature vectors, the undetermined vectors from the different modes need to be fused positively. The first activation vector may be determined according to the first to-be-oriented amount, the first activation vector including a first weight parameter determined by the first to-be-oriented amount, the first weight parameter being determined by a criticality of information in the first to-be-oriented amount, an emphasis point of the information in the first to-be-oriented amount may be embodied, so as to purposefully adjust information included in the second to-be-oriented feature vector based on the emphasis point of the first to-be-oriented feature vector, thereby enhancing critical information and suppressing non-critical information in the second to-be-oriented vector. The expression emphasis of the second modal vector and the first to-be-oriented vector which are obtained through adjustment gradually tends to be unified, so that better fusion of the to-be-oriented vectors of different modalities is realized, key information related to a target content subject in the generated feature vector is enhanced, and the accuracy of the feature vector is improved. Therefore, the probability that the target content corresponding to the feature vector meets the requirement of the target object is higher, and the target content is returned to the target object, so that the pushing accuracy of the target content can be ensured.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario of a feature vector determining method according to an embodiment of the present application;

FIG. 2 is a flowchart of a feature vector determination method according to an embodiment of the present application;

FIG. 3 is a diagram of a multi-modal vector fusion approach;

fig. 4 is a schematic application scenario diagram of a feature vector determining method according to an embodiment of the present application;

Fig. 5 is a flowchart of a content pushing method according to an embodiment of the present application;

fig. 6 is a schematic diagram of a feature vector determining apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of a content pushing device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

Aiming at the problem of constructing a feature vector corresponding to one content based on a multi-modal vector, in the related art, the multi-modal vector is generally simply spliced (concat mode), but the multi-modal vector is simply spliced to cause less interaction, so that the accuracy is poor, and the mode of constructing the feature vector corresponding to one content through the multi-modal vector is unreasonable.

Based on the above, the embodiment of the application provides a content pushing method, a feature vector determining method and a related device, which are used for improving the accuracy of feature vectors generated based on multi-modal vectors.

The feature vector determining method provided by the embodiment of the application is realized based on artificial intelligence, wherein the artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

In the embodiment of the application, the artificial intelligence software technology mainly comprises the directions of natural language processing, machine learning and the like. For example, it may relate to image semantic Understanding (IMAGE SEMANTIC Understanding, ISU) in computer vision (images) (ComputerVision, CV), semantic Understanding (Semantic Understanding) in natural language processing (Nature Language processing, NLP), deep learning (DEEP LEARNING) in machine learning (MACHINE LEARNING, ML), and the like.

The feature vector determining method provided by the application can be applied to the feature vector determining device with the data processing capability, the content pushing method provided by the application can be applied to the content pushing device with the data processing capability, and the feature vector determining device or the content pushing device can be a terminal device or a server. The terminal device may be a smart phone, a desktop computer, a notebook computer, a tablet computer, an intelligent sound box, an intelligent watch, etc., but is not limited to these; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing service. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

The feature vector determining device or the content pushing device can have the capability of implementing the computer vision technology, wherein the computer vision is a science for researching how to make a machine "look at", and further, a camera and a computer are used for replacing the human eyes to identify, detect, measure and other machine vision of a target, and further, the image processing is further carried out, so that the computer is processed into an image which is more suitable for the human eyes to observe or transmit to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

The feature vector determination device or content pushing device may be capable of performing natural language processing (Nature Language processing, NLP), an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like. In the embodiment of the application, the text processing device can process the text through the technologies of text preprocessing, semantic understanding and the like in natural language processing.

The feature vector determining device or the content pushing device can have the capability of machine learning, wherein the machine learning is a multi-field interdisciplinary and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically involve techniques such as artificial neural networks.

The artificial intelligence model adopted in the feature vector determining method provided by the embodiment of the application mainly relates to application of machine learning, and the first to-be-determined vector and the second to-be-determined vector are extracted through deep learning in the machine learning, so that the second to-be-determined vector is adjusted through the first activation vector determined by the first to-be-determined vector, and the accuracy of the generated feature vector is improved on the premise of not increasing the operand.

In order to facilitate understanding of the technical scheme of the present application, the feature vector determining method provided by the embodiment of the present application is described below in conjunction with an actual application scenario.

Referring to fig. 1, the application scenario of the feature vector determining method provided by the embodiment of the application is shown. In the application scenario shown in fig. 1, the foregoing feature vector determining apparatus is a server 100, and is configured to generate a feature vector corresponding to the target content.

The server 100 obtains target content to be processed, where the target content includes first information from a first modality and second information from a second modality, that is, the first information and the second information come from different aspects of the target content, and needs to integrate the first information and the second information to generate feature vectors corresponding to the target content. The target content to be processed is content which needs to generate a feature vector, and the content can be video, audio, user relationship based on a knowledge graph and the like. In the application scenario shown in fig. 1, the target content is video, the first information may be text information from a caption, and the second information may be image information from a video frame.

The server 100 generates a first to-be-determined vector corresponding to the first information and a second to-be-determined vector corresponding to the second information so as to generate a feature vector of the target content based on the first to-be-determined vector and the second to-be-determined vector. In the scenario shown in fig. 1, the first pending vector is a text pending vector and the second pending vector is an image pending vector.

The information from different modes has different expression modes, the different expression modes can be different in expression characteristics and key points, the first to-be-determined vector and the second to-be-determined vector are derived from vectors corresponding to the information of different modes of the target content, the first to-be-determined vector and the second to-be-determined vector are focused on different aspects of the target content, for example, the text to-be-determined vector is focused on the title content of the video, and the image to-be-determined vector is focused on the video frame content of the video.

In order to avoid adverse effects of different expression modes on the dispersion of expression emphasis on the generated feature vectors, forward fusion is required on the undetermined vectors from different modes, so that the second undetermined vector can be guided to be adjusted through the emphasis of the first undetermined vector, and the expression emphasis of the first undetermined vector and the second undetermined vector gradually tend to be unified.

The server 100 determines a first activation vector according to the first to-be-oriented amount, where the first activation vector includes a first weight parameter determined by the first to-be-oriented amount, where the first weight parameter is determined by a criticality of information in the first to-be-oriented amount, and may represent an emphasis point of the information in the first to-be-oriented amount, and when the first activation vector is used to adjust the second to-be-oriented feature vector, the first weight parameter included in the first activation vector may play a role in guiding adjustment, so as to adjust information included in the second to-be-oriented feature vector specifically based on the emphasis point of the first to-be-oriented feature vector, thereby enhancing critical information in the second to-be-oriented vector and suppressing non-critical information of the second to-be-oriented vector, and obtaining the second modal vector. In the application scenario shown in fig. 1, a first activation vector may be determined from the text undetermined vectors, and the image undetermined vector may be adjusted by the first activation vector to obtain a second modal vector.

The expression key points of the second modal vector and the first to-be-oriented quantity which are obtained through adjustment gradually tend to be unified, the adverse effect of different expression modes on the dispersion of the expression key points of the feature vector corresponding to the generated target content is reduced, and the feature vector corresponding to the generated target content can more accurately represent the related information of the target content according to the two expression modes.

Therefore, in the process of generating the feature vector corresponding to the target content, better fusion of the undetermined vectors of different modes is realized, key information related to the subject of the target content in the generated feature vector is enhanced, and the accuracy is improved.

The following describes a feature vector determining method provided by the embodiment of the present application by using a server as a feature vector determining device with reference to the accompanying drawings.

Referring to fig. 2, fig. 2 is a flowchart of a feature vector determining method according to an embodiment of the present application, as shown in fig. 2, the feature vector determining method includes the following steps:

s201, obtaining target content to be processed.

The target content is content that needs to be processed such as sorting, pushing, etc., wherein the content is information or experience that the creator presents to the user, and for example, the content may be video, audio, user relationship based on a knowledge graph, text, etc.

The target content typically has information from different modalities and may include first information from a first modality and second information from a second modality. The first mode and the second mode are different modes, the information of the different modes has different expression modes, and the expression characteristics and the key points of the different expression modes are different.

For example, if the target content is video content, the first information may be text information from a caption, and the second modality information may be image information from a video frame. The text information is focused on the caption of the video content and used for attracting users to click and watch, the text information is generally expressed by characters, the image information is focused on the video frame for displaying the video content and used for comprehensively explaining the video content, the text information and the image information from different modes are generally expressed by images, and the expression characteristics and the emphasis of the text information and the image information are obviously different.

S202, generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information.

The first information and the second information are derived from different modes of the target content, the first information and the second information are required to be integrated to generate feature vectors of the target content, so that a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information can be generated, and the feature vectors corresponding to the target content are generated based on the first to-be-positioned vector and the second to-be-positioned vector.

The manner in which the first pending vector and the second pending vector are generated is not particularly limited in the present application. For example, the first information may be input into the first feature module, a first to-be-oriented vector corresponding to the first information is generated, and the second information may be input into the second feature module, a second to-be-oriented vector corresponding to the second information is generated. The first feature module may be the same as or different from the second feature module, and may be set according to requirements of the first information and the second information.

Taking the target content as the video content as an example, text information can be input into a trained bidirectional coding representation (Bidirectional Encoder Representations from Transformers, BERT) model based on deformation to generate text undetermined vectors corresponding to the text information, and image information can be input into a trained residual neutral network (residual neural network, resNet) to generate image undetermined vectors corresponding to the image information.

And S203, determining a first activation vector according to the first quantity to be oriented.

Because the first information and the second information come from different modes, the expression modes of the first information and the second information are different, and the information carried by the first to-be-positioned vector and the second to-be-positioned vector correspondingly reflect target contents through different expression modes.

However, the above manner can make the interaction between the first to-be-positioned vector and the second to-be-positioned vector less, so that the generation of the feature vector corresponding to the target content based on the to-be-positioned vectors of different expression modes can bring adverse effects of the dispersion of the expression key points, and therefore, the to-be-positioned vectors from different modes need to be fused in the forward direction.

The forward fusion refers to that along with the high degree of fusion of the first to-be-determined vector and the second to-be-determined vector, the accuracy of the feature vector corresponding to the generated target content is high, in other words, the high degree of fusion of the first to-be-determined vector and the second to-be-determined vector has a positive effect on the feature vector, so that the information carried in the feature vector is more attached to the key information related to the subject of the target content, namely the information with higher degree of correlation with the central thought expressed by the target content.

Therefore, in order to improve the fusion degree of the first to-be-determined vector and the second to-be-determined vector, the first to-be-determined vector and the second to-be-determined vector can reflect target contents towards the same expression mode, so that adverse effects on dispersion of expression key points caused by subsequently generated feature vectors are reduced. Based on the unified purpose of enabling the feature vector to more accurately embody the key information of the target content, the first to-be-positioned vector and the second to-be-positioned vector can be enabled to embody the target content towards the same expression mode, and the second to-be-positioned feature vector can be adjusted according to the first to-be-positioned vector.

Specifically, the first activation vector is determined according to the first to-be-oriented amount, the first activation vector includes a first weight parameter determined by the first to-be-oriented amount, the first weight parameter is determined by the criticality of information carried by the first to-be-oriented amount, and the emphasis of the information in the first to-be-oriented amount can be expressed, or in other words, in the first mode, the first to-be-oriented amount expresses the expression emphasis of the subject of the target content.

For example, the more critical the information carried by the first pending vector is, the more the subject of the target content can be represented, the larger the corresponding value of the first weight parameter is, and similarly, the less critical the information carried by the first pending vector is, the smaller the value of the first weight parameter is. Therefore, the first weight parameter in the first activation vector can be utilized to guide the second undetermined vector to be adjusted, namely, the second undetermined vector is adjusted to reflect the expression key point of the target content theme in the second mode.

And S204, adjusting the second undetermined vector through the first activation vector to obtain a second modal vector.

From the foregoing, the first weight parameter included in the first activation vector may represent an emphasis point of the information carried by the first to-be-oriented amount, and the information carried by the second to-be-oriented vector may be adjusted in a targeted manner based on the emphasis point of the first to-be-oriented vector, that is, the key information in the second to-be-oriented vector is enhanced based on the first weight parameter, and the non-key information in the second to-be-oriented vector is suppressed, so as to obtain the second modal vector.

For example, the larger the value of the first weight parameter is, the more critical the information carried by the first to-be-determined vector corresponding to the first weight parameter is, and the more the subject of the target content can be embodied. The emphasis of the first to-be-determined vector reflected based on the first weight parameter enhances information in the second to-be-determined vector to obtain a second modal vector, so that the expression mode of the second modal vector is more similar to the expression mode of the first to-be-directed vector in the aspect of reflecting the subject of the target content.

As a possible implementation manner, the first activation vector may be a threshold function (gate function), where the principle of the threshold function is to output a threshold vector, and adjust the state of a corresponding output neuron by using the threshold vector, where the lower the threshold vector value, the more severely the neuron is suppressed, and the influence will be weakened, so that suppression of non-critical information in the second pending vector is achieved. Similarly, the higher the threshold vector value, the correspondingly enhanced the neuron, the greater the influence will be, and the enhancement of the key information in the second undetermined vector will be realized.

The embodiment of the application does not specifically limit the threshold function, for example, the first activation vector may be a sigmoid function (a logic function), the sigmoid function controls the threshold vector value in the [0,1] interval, if the threshold vector value is closer to 0, the neuron is severely inhibited, the influence is weakened, and the inhibition of non-critical information in the second undetermined vector is realized. The more closely the threshold vector value is to 1, the more important the information representing the neuron characterization. It will be appreciated that suppression of only non-critical information of the second pending vector also highlights critical information of the second pending vector, which is equivalent to enhancement. As another example, the first activation vector may also be a tanh function (a hyperbolic function).

Therefore, although the first to-be-determined vector and the second to-be-determined vector have different expression modes, the second to-be-determined vector can be adjusted through the first weight parameter included in the first activation vector, so that the expression key points of the second mode vector obtained after adjustment and the first to-be-determined vector gradually tend to be uniform, and the subject of target content can be embodied more. Moreover, as the expression emphasis of the second modal vector and the first to-be-directed quantity are not completely the same, the second modal vector and the first to-be-directed quantity can be mutually complemented, so that the feature vector corresponding to the target content generated according to the second modal vector and the first to-be-directed quantity can more accurately represent the related information of the target content.

And S205, generating a feature vector corresponding to the target content according to the first to-be-positioned vector and the second modal vector.

The feature vector corresponding to the target content is generated, fusion is not realized by simply splicing the first to-be-determined vector and the second to-be-determined vector, but the second to-be-determined vector is adjusted through a first weight parameter included in the first activation vector determined according to the first to-be-determined vector, so that the second to-be-determined vector can interact with the first to-be-determined vector between fusion and the obtained second modal vector. By adjusting, the expression emphasis of the second modal vector and the first to-be-directed quantity gradually tend to be unified, the adverse effect of dispersion of the expression emphasis of the feature vector corresponding to the target content is reduced, and the feature vector corresponding to the target content can be generated according to the second modal vector and the first to-be-directed quantity, so that the related information of the target content can be more accurately represented.

After the feature vector is obtained, corresponding processing is performed on the target content according to the feature vector, for example, the content category of the target content is determined. The embodiment of the application does not specifically limit the manner of classifying the target content, for example, the feature vector corresponding to the target content is input into a pre-trained classification model to obtain the content category corresponding to the target content, so as to realize recommendation, search and the like of the target content.

For example, in a recommendation scene of the target content, user characteristics of the user to be recommended can be obtained, if the matching degree of the characteristic vector of the target content and the user characteristics of the user to be recommended meets a first preset matching condition, the target content is recommended to the user to be recommended if the probability that the target content meets the requirement of the target user is higher, and the experience of the user to be recommended is improved.

For another example, in a search scene of the target content, semantic features of the to-be-answered question can be obtained, if the matching degree of the feature vector of the target content and the semantic features of the to-be-answered question meets a second preset matching condition, the possibility that the target content is an answer to the to-be-answered question is higher, the target content is included in the search result and is recommended to the to-be-answered question as an answer, and accuracy of the search result is improved. As can be seen from the above technical solution, the target content to be generated with the feature vector has the first information from the first modality and the second information from the second modality, and the feature vector of the target content needs to be generated by integrating the first information and the second information. And generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information, and generating a feature vector of the target content based on the first to-be-positioned vector and the second to-be-positioned vector. Because the information from different modes has the expression modes corresponding to the modes, the different expression modes have differences in expression characteristics and key points, and in order to avoid the adverse effect of the different expression modes on the dispersion of the expression key points of the generated feature vectors, the undetermined vectors from the different modes need to be fused positively. The first activation vector may be determined according to the first to-be-oriented amount, the first activation vector including a first weight parameter determined by the first to-be-oriented amount, the first weight parameter being determined by a criticality of information in the first to-be-oriented amount, an emphasis point of the information in the first to-be-oriented amount may be embodied, so as to purposefully adjust information included in the second to-be-oriented feature vector based on the emphasis point of the first to-be-oriented feature vector, thereby enhancing critical information and suppressing non-critical information in the second to-be-oriented vector. The expression emphasis of the second modal vector and the first to-be-oriented vector which are obtained through adjustment gradually tends to be unified, so that better fusion of the to-be-oriented vectors of different modalities is realized, key information related to a target content subject in the generated feature vector is enhanced, and accuracy is improved.

As a possible implementation manner, the second to-be-oriented vector can be adjusted according to the first to-be-oriented vector, and on the basis, the first to-be-oriented vector can be adjusted according to the second to-be-oriented vector, so that mutual adjustment between the first to-be-oriented vector and the second to-be-oriented vector is realized.

Specifically, the second activation vector is determined according to the second to-be-determined vector, the second activation vector includes a second weight parameter determined by the second to-be-determined vector, the second weight parameter is determined by the key degree of the information carried by the second to-be-determined vector, and the emphasis of the information in the second to-be-determined vector can be expressed, or in other words, in the second mode, the second to-be-determined vector expresses the expression emphasis of the subject of the target content.

The information carried by the first to-be-oriented vector can be adjusted in a targeted manner based on the emphasis point of the second to-be-oriented vector, namely, the key information in the first to-be-oriented vector is enhanced through the second weight parameter included in the second activation vector, and the non-key information in the first to-be-oriented vector is restrained, so that the first modal vector is obtained.

Therefore, even if the first undetermined vector and the second undetermined vector have different expression modes, the second undetermined vector can be adjusted according to the first weight parameter included in the first activation vector, and the first undetermined vector is adjusted according to the second weight parameter included in the second activation function, so that the expression emphasis of the first modal vector and the second modal vector obtained after adjustment gradually tends to be unified, namely the expression emphasis of a target theme is reflected by the expression modes of each other, the fusion degree of the first modal vector and the second modal vector is higher, the theme of target content can be reflected, the feature vector corresponding to the target content is generated according to the first modal vector and the second modal vector, and the accuracy is higher.

As a possible implementation manner, the second activation vector may also be a threshold function (gate function), so as to implement enhancement of critical information and suppression of non-critical information in the second pending vector.

As one possible implementation, the first activation quantity and the second activation vector may be the same threshold function, thereby making the feature vector generated based on the first modality vector and the second modality vector more accurate.

For example, if the dimension of the first vector V₁ to be oriented is M dimension, the dimension of the second vector V₂ to be oriented is N dimension, the three-dimensional vector W is introduced by tensor multiplication, and the dimension of the three-dimensional vector W is R^M×T×N dimension, then the feature vector with the dimension of T dimension is obtained after the first vector V₁ and the second vector V₂ to be oriented are fused by the three-dimensional vector W, which may be expressed as FusionV =v₁×W×V₂. From the dimension of the introduced three-dimensional vector W, i.e., the dimension R^M×T×N, it can be seen that the three-dimensional vector W contains an excessive number of parameters, which aggravates the amount of computation.

In view of this, a method of reducing the amount of fusion calculation while increasing the degree of fusion to follow-up will be described below.

If the dimension of the first vector to be determined is the first dimension, the dimension of the second vector to be determined is the second dimension, a first mapping matrix can be introduced when the first activation vector is determined according to the first vector to be determined, the dimension of the first activation vector is adjusted through the first mapping matrix, the first vector to be determined is mapped from the first dimension to the second dimension, and the first activation vector with the dimension of the second dimension is obtained, so that the dimension of the first activation vector is equal to the dimension of the second vector to be determined.

For example, if the first dimension number is M dimensions, the second dimension number is N dimensions, the first mapping matrix is W₁, and the dimension of the first mapping matrix W₁ is R^M×N dimensions, mapping the first to-be-oriented quantity from M dimensions to N dimensions may be implemented, to obtain a first activation vector with a dimension of N dimensions.

For example, if the first dimension number is M dimensions, the second dimension number is N dimensions, the second mapping matrix is W₂, and the dimension of W₂ of the second mapping matrix is R^N×M dimensions, mapping the second undetermined vector from N dimensions to M dimensions may be implemented, to obtain a second activation vector with dimension M dimensions.

It should be noted that the first dimension and the second dimension may be equal or unequal, which is not particularly limited in the present application.

Therefore, the dimensions of the first activation vector and/or the second activation vector can be adjusted by introducing the first mapping matrix and/or the second mapping matrix, so that the second undetermined vector and/or the first to-be-oriented vector can be better adjusted, the subsequent fusion degree is improved, and more accurate feature vectors are obtained. And the first mapping matrix and the second mapping matrix are two-dimensional vectors, compared with the three-dimensional vectors, the two-dimensional vectors are fused, so that the operation amount can be reduced, and especially when the dimension difference between the first vector to be positioned and the second vector to be positioned is large, the operation amount can be greatly reduced, and the operation cost is saved.

As a possible implementation manner, if the target content is used for content classification, the first feature module, the second feature module, and the first mapping matrix may form a vector generation model, and a training manner of the vector generation model is described below, where the training manner includes the following steps:

S1, acquiring training samples associated with target content.

The training samples have an association with the target content, e.g., if the target content is video, the training samples are also video. The training sample is similar to the target content, also having information from a different modality, including information from a first modality and including information from a second modality. And the training samples have sample tags for identifying the categories of the content to which they belong, for example, the training samples belong to the categories of game video, food video, and the like.

And S2, determining a first to-be-determined sample vector and a second to-be-determined sample vector of the training sample through an initial vector generation model.

The initial vector generation model comprises a first feature module, a second feature module and a first mapping matrix, wherein a first to-be-determined sample vector of the training sample can be determined through the first feature module, and a second to-be-determined sample vector of the training sample can be determined through the second feature module.

And S3, determining a first to-be-determined activation vector according to a first mapping matrix and a first to-be-determined sample vector in the initial vector generation model, and obtaining a second modal sample vector based on the first to-be-determined activation vector and the second to-be-determined sample vector.

And mapping the first to-be-determined sample vector from the current dimension to the dimension of the second to-be-determined sample vector through the first mapping matrix to obtain a first to-be-determined activation vector, wherein the dimension of the first to-be-determined activation vector is the dimension of the second to-be-determined sample vector.

The second pending sample vector is adjusted by the first pending activation vector, specifically, see S204, to obtain a second mode sample vector.

S4, taking the first sample vector to be determined and the second sample vector in the second mode as sample vectors of training samples, and obtaining corresponding classification results through a classifier.

And generating a sample vector corresponding to the training sample according to the first sample vector to be determined and the second mode sample vector, and inputting the sample vector into the classifier to obtain a classification result corresponding to the training sample.

The embodiment of the application is not particularly limited to the classifier, and can be a linear classifier (linear regression), a logistic regression classifier (logistic regression), a support vector machine (support vector machine, SVM) and the like.

And S5, carrying out parameter adjustment on the first feature module and the first mapping matrix according to the difference between the sample label and the classification result so as to obtain a vector generation model.

And determining the difference between the sample label and the classification result, and adjusting the relevant parameters of the first feature module and the first mapping matrix based on the difference, so that the classification result output by the initial vector generation model is close to the sample label, and when training of the initial vector generation model is completed, namely the relevant parameters in the initial vector generation model are adjusted, the initial vector generation model with the adjusted relevant parameters can be a vector generation model.

As a possible implementation, the vector generation model may include not only the first feature module, the second feature module, and the first mapping matrix, but also the second mapping matrix. The specific training process is as follows:

And S11, acquiring training samples associated with target content.

And S22, determining a first to-be-determined sample vector and a second to-be-determined sample vector of the training sample through an initial vector generation model.

And S33, determining a first to-be-determined activation vector according to a first mapping matrix and a first to-be-determined sample vector in the initial vector generation model, and obtaining a second modal sample vector based on the first to-be-determined activation vector and the second to-be-determined sample vector. And determining a second undetermined activation vector according to a second mapping matrix and a second undetermined sample vector in the initial vector generation model, and obtaining a first modal sample vector based on the second undetermined activation vector and the first undetermined sample vector.

S44, taking the first-mode sample vector and the second-mode sample vector as sample vectors of training samples, and obtaining corresponding classification results through a classifier.

And S55, carrying out parameter adjustment on the first feature module, the first mapping matrix and the second mapping matrix according to the difference between the sample label and the classification result so as to obtain a vector generation model.

The relevant points can be seen in S1-S5, and are not described in detail herein.

Next, a description will be given of a feature vector determining method provided in the embodiment of the present application, taking a target content to be processed as a target video as an example, with reference to fig. 4. Referring to fig. 4, the application scenario of the feature vector determining method provided by the embodiment of the present application is shown.

In the application scenario shown in fig. 4, the vector generation model is generated through the foregoing training in S11-S55, and includes a first feature module, a second feature module, a first mapping matrix, and a second mapping matrix. The first feature module is a BERT model, the second feature module is a ResNet model, the first mapping matrix is omega₁, and the second mapping matrix is omega₂.

The target video comprises text information from a caption, "homely braised pork practice at will", and image information from video frames, wherein the image information can be extracted through key frames to acquire one of the key frames.

Inputting the text information 'homely braised pork practice at first school' into a BERT model to obtain a text undetermined vector x₁, and inputting the extracted key frame into a ResNet model to obtain an image undetermined vector x₂. The dimension of the text undetermined vector x₁ is 4 dimensions, and the dimension of the image undetermined vector x₂ is 3 dimensions.

Mapping the text undetermined vector x₁ from 4 dimensions to 3 dimensions according to a first mapping matrix omega₁ to obtain a first activation vector with 3 dimensionsWherein b₁ is an adjustable parameter for adjusting the first activation vector. Similarly, the image undetermined vector x₂ is mapped from 3 dimensions to 4 dimensions according to a second mapping matrix omega₂ to obtain a second activation vector with 4 dimensionsWherein b₂ is an adjustable parameter for adjusting the second activation vector.

The obtained first activation vectorAnd multiplying the image with the undetermined vector x₂ to obtain a second modal vector, namely a new expression of the image side. The second activation vector to be obtainedAnd multiplying the text undetermined vector x₁ by the point to obtain a first modal vector, namely a new expression on the text side.

And splicing the first modal vector and the second modal vector to generate a feature vector corresponding to the target video. And inputting the feature vector into an SVM model to obtain a classification result of the target video, namely the food video.

Therefore, interaction of all mode features in the model is enhanced through the first activation function and the second activation function, namely, text undetermined vectors are enhanced and suppressed through image undetermined vectors corresponding to the image side, the image undetermined vectors are enhanced and suppressed through the text undetermined vectors corresponding to the text side, the purposes of text from the graph and the text from the graph are achieved, namely, suppression of vector feature dimensions irrelevant to target video theme contents is achieved, and feature information more relevant to the target video theme contents is enhanced.

Meanwhile, the first mapping matrix and the second mapping matrix are two-dimensional vectors, compared with the three-dimensional vectors introduced in tensor multiplication, the two-dimensional vectors are fused, so that the operation amount can be reduced, and particularly when the dimension difference between the text undetermined vector and the image undetermined vector is large, the operation amount can be greatly reduced, and the operation cost is saved.

In addition to the feature vector pushing method provided in the above embodiment, the embodiment of the present application further provides a content pushing method. In the following, referring to fig. 5, a content pushing method provided by an embodiment of the present application is described with a server as a content pushing device.

Referring to fig. 5, the flowchart of a content pushing method according to an embodiment of the present application is shown. As shown in fig. 5, the content pushing method includes the steps of:

s501, determining object characteristics of a target object.

The target object is an object pushing target content and can be a target user, a target keyword and the like. In order to push proper content to the target object, the vectorization expression of the target object, namely the object characteristics of the target object, which can be the user characteristics corresponding to the target user, the word characteristics corresponding to the target keywords and the like, can be determined, and the requirement of the target object can be clarified through the object characteristics of the target object.

S502, determining target content associated with the object feature from the undetermined content according to the feature vector of the undetermined content.

After the requirements of the target object are defined, the content meeting the requirements of the target object can be selected from the undetermined content as target content. In order to improve the accuracy of determining the target content of the content, the feature vector of the undetermined content can be determined, and the target content is determined by determining the association degree between the feature vector of the undetermined content and the object feature. For example, the higher the association degree between the feature vector of the undetermined content and the object feature, which indicates that the undetermined content corresponding to the feature vector meets the requirement of the target object, the undetermined content can be taken as the target content.

The manner of determining the feature vector of the undetermined content may be referred to the feature vector determining method described in the foregoing S201-S205, and will not be described herein.

And S503, returning the target content based on the target object.

The target content is more likely to meet the requirement of the target object, so the target content can be returned to the target object, and the pushing accuracy of the target content is ensured.

For example, if the target object is a target user, and the object feature is a user feature corresponding to the target user, the target content may be pushed to the target user, so that the target content better meets the requirement of the target user, and the experience of the target user is improved.

For another example, if the target object is a target keyword, the object feature is a word feature corresponding to the target keyword, and a corresponding search result is returned based on the target keyword, where the search result includes target content, so that the target content is more matched with the target keyword, and accuracy of the search result is improved. According to the content pushing method provided by the embodiment of the application, the object characteristics of the target object are determined, the target content is determined from the undetermined content according to the association of the characteristic vector of the undetermined content and the object characteristics, the target content of which the characteristic vector is to be generated has the first information from the first mode and the second information from the second mode, and the characteristic vector of the target content is generated by integrating the first information and the second information. And generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information, and generating a feature vector of the target content based on the first to-be-positioned vector and the second to-be-positioned vector. Because the information from different modes has the expression modes corresponding to the modes, the different expression modes have differences in expression characteristics and key points, and in order to avoid the adverse effect of the different expression modes on the dispersion of the expression key points of the generated feature vectors, the undetermined vectors from the different modes need to be fused positively. The first activation vector may be determined according to the first to-be-oriented amount, the first activation vector including a first weight parameter determined by the first to-be-oriented amount, the first weight parameter being determined by a criticality of information in the first to-be-oriented amount, an emphasis point of the information in the first to-be-oriented amount may be embodied, so as to purposefully adjust information included in the second to-be-oriented feature vector based on the emphasis point of the first to-be-oriented feature vector, thereby enhancing critical information and suppressing non-critical information in the second to-be-oriented vector. The expression emphasis of the second modal vector and the first to-be-oriented vector which are obtained through adjustment gradually tends to be unified, so that better fusion of the to-be-oriented vectors of different modalities is realized, key information related to a target content subject in the generated feature vector is enhanced, and the accuracy of the feature vector is improved. Therefore, the probability that the target content corresponding to the feature vector meets the requirement of the target object is higher, and the target content is returned to the target object, so that the pushing accuracy of the target content can be ensured.

The embodiment of the application also provides a feature vector determining device aiming at the feature vector determining method provided by the embodiment.

Referring to fig. 6, a schematic diagram of a feature vector determining apparatus according to an embodiment of the present application is shown. As shown in fig. 6, the feature vector determining apparatus 600 includes an acquisition unit 601, a first generation unit 602, a first determination unit 603, an adjustment unit 604, and a second generation unit 605;

The acquiring unit 601 is configured to acquire target content to be processed, where the target content includes first information from a first modality and second information from a second modality;

the first generating unit 602 is configured to generate a first to-be-determined vector corresponding to the first information and a second to-be-determined vector corresponding to the second information;

The first determining unit 603 is configured to determine a first activation vector according to the first to-be-oriented amount, where the first activation vector includes a first weight parameter determined by the first to-be-oriented amount;

The adjusting unit 604 is configured to adjust the second to-be-determined vector by using the first activation vector to obtain a second modal vector, where the first weight parameter is used to enhance key information in the second to-be-determined vector and suppress non-key information in the second to-be-determined vector;

the second generating unit 605 is configured to generate a feature vector corresponding to the target content according to the first undetermined vector and the second modal vector.

As a possible implementation manner, the first determining unit 603 is further configured to:

Determining a second activation vector according to the second to-be-determined vector, wherein the second activation vector comprises a second weight parameter determined by the second to-be-determined vector;

the second generating unit is configured to:

The first quantity to be oriented is adjusted through the second activation vector to obtain a first modal vector, wherein the second weight parameter is used for enhancing key information in the first vector to be oriented and inhibiting non-key information in the first vector to be oriented;

And generating a feature vector corresponding to the target content according to the first modal vector and the second modal vector.

As a possible implementation manner, the dimension of the first undetermined vector is a first dimension number, the dimension of the second undetermined vector is a second dimension number, and the first determining unit 603 is configured to:

and according to a first mapping matrix, mapping the first undetermined vector from the first dimension to the second dimension number to obtain the first activation vector with the dimension of the second dimension.

and according to a second mapping matrix, mapping the second undetermined vector from the second dimension to the first dimension to obtain the second activation vector with the dimension being the first dimension.

As a possible implementation manner, the vector generation model includes a first feature module, a second feature module and a first mapping matrix, and the first generation unit 602 is configured to:

generating a first to-be-positioned vector corresponding to the first information through the first feature module, and generating a second to-be-positioned vector corresponding to the second information through the second feature module;

the device further comprises a training unit for:

Obtaining a training sample associated with the target content, the training sample comprising information from the first modality and information from the second modality, the training sample having a sample tag for identifying a category of content to which the training sample belongs;

Determining a first to-be-determined sample vector and a second to-be-determined sample vector of the training sample through an initial vector generation model;

Determining a first to-be-determined activation vector according to a first mapping matrix in the initial vector generation model and the first to-be-determined sample vector, and obtaining a second modal sample vector based on the first to-be-determined activation vector and the second to-be-determined sample vector;

Taking the first sample vector to be determined and the second mode sample vector as sample vectors of the training samples, and obtaining corresponding classification results through a classifier;

and according to the difference between the sample label and the classification result, carrying out parameter adjustment on the first characteristic module and the first mapping matrix to obtain the vector generation model.

As a possible implementation manner, the apparatus further includes a second determining unit, configured to:

and determining the content category of the target content according to the feature vector.

As a possible implementation manner, the target content is video content, the video content includes text information from a caption and image information from a video frame, and the first generating unit 602 is configured to:

And generating a text undetermined vector corresponding to the text information and an image undetermined vector corresponding to the image information.

The embodiment of the application provides a feature vector determining device, wherein target content for generating a feature vector has first information from a first mode and second information from a second mode, and the feature vector of the target content is generated by integrating the first information and the second information. And generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information, and generating a feature vector of the target content based on the first to-be-positioned vector and the second to-be-positioned vector. Because the information from different modes has the expression modes corresponding to the modes, the different expression modes have differences in expression characteristics and key points, and the information carried in the first to-be-determined vector and the second to-be-determined vector generated by the first information and the second information also shows the expression modes, in order to avoid the adverse effect of the different expression modes on the dispersion of the expression key points of the generated feature vector, the to-be-determined vectors from different modes need to be fused forward. The first activation vector may be determined according to a first amount to be directed. Wherein the first activation vector comprises a first weight parameter determined by a first amount to be oriented, the first weight parameter being determined by a criticality of information in the first amount to be oriented, an emphasis point of the information in the first amount to be oriented being represented, and when the second feature vector to be oriented is adjusted by the first activation vector, the first weight parameter can play a role in guiding adjustment so as to pertinently adjust information included in the second undetermined feature vector based on the emphasis point of the first undetermined feature vector, thereby enhancing key information and inhibiting non-key information in the second undetermined vector. The expression emphasis of the second modal vector and the first to-be-oriented quantity which are obtained through adjustment gradually tend to be unified, so that the feature vector corresponding to the target content can be generated according to the second modal vector and the first to-be-oriented quantity, and the related information of the target content can be more accurately represented. Therefore, in the process of generating the feature vector corresponding to the target content, better fusion of the undetermined vectors of different modes is realized, key information related to the subject of the target content in the generated feature vector is enhanced, and the accuracy is improved.

The embodiment of the application also provides a content pushing device aiming at the content pushing method provided by the embodiment.

Referring to fig. 7, a schematic diagram of a content pushing device according to an embodiment of the present application is shown. As shown in fig. 7, the content pushing apparatus 700 includes a determining unit 701 and a returning unit 702;

The determining unit 701 is configured to determine an object feature of a target object, determine, from among the undetermined contents, target contents associated with the object feature according to feature vectors of the undetermined contents, where the target contents include first information from a first modality and second information from a second modality, the feature vectors of the target contents are generated according to a first undetermined vector and a second modality vector, the first undetermined vector corresponds to the first information, the second undetermined vector corresponds to the second information, the second modality vector is obtained by adjusting the second undetermined vector through a first activation vector, and the first activation vector includes a first weight parameter determined by the first undetermined vector, and the first weight parameter is used to enhance key information in the second undetermined vector and suppress non-key information in the second undetermined vector;

The returning unit 702 is configured to return the target content based on the target object.

As a possible implementation manner, if the target object is a target user, the object feature is a user feature corresponding to the target user, and the return unit 702 is configured to:

Pushing the target content to the target user;

if the target object is a target keyword, the object feature is a word feature corresponding to the target keyword, and the returning unit 702 is configured to:

And returning corresponding search results based on the target keywords, wherein the search results comprise the target content.

According to the content pushing device provided by the embodiment of the application, the object characteristics of the target object are determined, the target content is determined from the undetermined content according to the association of the characteristic vector of the undetermined content and the object characteristics, the target content of which the characteristic vector is to be generated has the first information from the first mode and the second information from the second mode, and the characteristic vector of the target content is generated by integrating the first information and the second information. And generating a first to-be-positioned vector corresponding to the first information and a second to-be-positioned vector corresponding to the second information, and generating a feature vector of the target content based on the first to-be-positioned vector and the second to-be-positioned vector. Because the information from different modes has the expression modes corresponding to the modes, the different expression modes have differences in expression characteristics and key points, and in order to avoid the adverse effect of the different expression modes on the dispersion of the expression key points of the generated feature vectors, the undetermined vectors from the different modes need to be fused positively. The first activation vector may be determined according to the first to-be-oriented amount, the first activation vector including a first weight parameter determined by the first to-be-oriented amount, the first weight parameter being determined by a criticality of information in the first to-be-oriented amount, an emphasis point of the information in the first to-be-oriented amount may be embodied, so as to purposefully adjust information included in the second to-be-oriented feature vector based on the emphasis point of the first to-be-oriented feature vector, thereby enhancing critical information and suppressing non-critical information in the second to-be-oriented vector. The expression emphasis of the second modal vector and the first to-be-oriented vector which are obtained through adjustment gradually tends to be unified, so that better fusion of the to-be-oriented vectors of different modalities is realized, key information related to a target content subject in the generated feature vector is enhanced, and the accuracy of the feature vector is improved. Therefore, the probability that the target content corresponding to the feature vector meets the requirement of the target object is higher, and the target content is returned to the target object, so that the pushing accuracy of the target content can be ensured.

The foregoing feature vector determining device and the content pushing device may also be a computer device, which may be a server or a terminal device, and the computer device provided by the embodiments of the present application will be described from the perspective of hardware materialization. Fig. 8 is a schematic structural diagram of a server, and fig. 9 is a schematic structural diagram of a terminal device.

Referring to fig. 8, fig. 8 is a schematic diagram of a server structure according to an embodiment of the present application, where the server 1400 may have a relatively large difference due to configuration or performance, and may include one or more central processing units (centralprocessing units, CPU) 1422 (e.g., one or more processors) and a memory 1432, and one or more storage mediums 1430 (e.g., one or more mass storage devices) that store application programs 1442 or data 1444. Wherein the memory 1432 and storage medium 1430 can be transitory or persistent storage. The program stored in the storage medium 1430 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, the central processor 1422 may be provided in communication with a storage medium 1430 to perform a series of instruction operations in the storage medium 1430 on the server 1400.

The server 1400 can also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input/output interfaces 1458, and/or one or more operating systems 1441, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 8.

Wherein, the CPU 1422 is configured to perform the following steps:

Or the following steps are executed:

Determining object characteristics of a target object;

and returning the target content based on the target object.

Optionally, the CPU 1422 may further perform method steps of any specific implementation of the feature vector determining method or the content pushing method in the embodiment of the present application.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present application. Fig. 9 is a block diagram illustrating a part of a structure of a smart phone related to a terminal device according to an embodiment of the present application, where the smart phone includes a Radio Frequency (RF) circuit 1510, a memory 1520, an input unit 1530, a display unit 1540, a sensor 1550, an audio circuit 1560, a wireless fidelity (WIRELESS FIDELITY, wiFi) module 1570, a processor 1580, and a power supply 1590. Those skilled in the art will appreciate that the smartphone structure shown in fig. 9 is not limiting of the smartphone and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes each component of the smart phone in detail with reference to fig. 9:

The RF circuit 1510 is used for receiving and transmitting signals during a message or a call, specifically, receiving downlink information from a base station, processing the downlink information by the processor 1580, and transmitting uplink data to the base station. Generally, RF circuitry 1510 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LowNoiseAmplifier, LNA for short), a duplexer, and the like. In addition, the RF circuitry 1510 may also communicate with networks and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (Global System ofMobile communication, GSM), general packet Radio Service (GENERAL PACKET Radio Service, GPRS), code division multiple access (Code DivisionMultiple Access, CDMA), wideband code division multiple access (Wideband Code DivisionMultipleAccess, WCDMA), long term evolution (Long Term Evolution, LTE), email, short message Service (ShortMessaging Service, SMS), etc.

The memory 1520 may be used to store software programs and modules, and the processor 1580 implements various functional applications and data processing of the smartphone by running the software programs and modules stored in the memory 1520. The memory 1520 may mainly include a storage program area that may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the smart phone (such as audio data, a phonebook, etc.), etc. In addition, memory 1520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1530 may be used to receive input numerical or character information and generate key signal inputs related to user settings and function control of the smart phone. In particular, the input unit 1530 may include a touch panel 1531 and other input devices 1532. The touch panel 1531, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1531 or thereabout by using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1531 may include two parts, a touch detection device and a touch controller. The touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1580, and can receive and execute commands sent by the processor 1580. In addition, the touch panel 1531 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1530 may include other input devices 1532 in addition to the touch panel 1531. In particular, other input devices 1532 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 1540 may be used to display information input by a user or information provided to the user and various menus of the smart phone. The display unit 1540 may include a display panel 1541, and optionally, the display panel 1541 may be configured in the form of a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1531 may cover the display panel 1541, and when the touch panel 1531 detects a touch operation thereon or thereabout, the touch operation is transferred to the processor 1580 to determine the type of touch event, and then the processor 1580 provides a corresponding visual output on the display panel 1541 according to the type of touch event. Although in fig. 9, the touch panel 1531 and the display panel 1541 are two separate components to implement the input and input functions of the smart phone, in some embodiments, the touch panel 1531 may be integrated with the display panel 1541 to implement the input and output functions of the smart phone.

The smartphone may also include at least one sensor 1550, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 1541 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1541 and/or the backlight when the smartphone is moved to the ear. The accelerometer sensor can detect the acceleration in all directions (generally three axes), can detect the gravity and the direction when the accelerometer sensor is static, can be used for identifying the gesture of the smart phone (such as transverse and vertical screen switching, related games, magnetometer gesture calibration), vibration identification related functions (such as pedometer and knocking), and the like, and other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors and the like which are also configured by the smart phone are not repeated herein.

Audio circuitry 1560, speaker 1561, and microphone 1562 may provide an audio interface between a user and a smart phone. The audio circuit 1560 may transmit the received electrical signal converted from audio data to the speaker 1561 for conversion into audio signals for output by the speaker 1561, while the microphone 1562 may convert the collected audio signals into electrical signals for receipt by the audio circuit 1560 for conversion into audio data for processing by the audio data output processor 1580 for transmission to, for example, another smart phone via the RF circuit 1510 or for output to the memory 1520 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a smart phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1570, so that wireless broadband Internet access is provided for the user. Although fig. 9 shows a WiFi module 1570, it is understood that it does not belong to the essential constitution of a smartphone, and can be omitted entirely as desired within the scope of not changing the essence of the invention.

Processor 1580 is a control center of the smartphone, connects various parts of the entire smartphone with various interfaces and lines, performs various functions of the smartphone and processes data by running or executing software programs and/or modules stored in memory 1520, and invoking data stored in memory 1520. In the alternative, the processor 1580 may include one or more processing units, and preferably the processor 1580 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It is to be appreciated that the modem processor described above may not be integrated into the processor 1580.

The smart phone also includes a power source 1590 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 1580 via a power management system, such as to provide for managing charging, discharging, and power consumption.

Although not shown, the smart phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In an embodiment of the present application, the memory 1520 included in the smart phone may store program codes and transmit the program codes to the processor.

The processor 1580 included in the smart phone may execute the feature vector determining method or the content pushing method provided in the foregoing embodiment according to the instruction in the program code.

The embodiment of the application also provides a computer readable storage medium for storing a computer program for executing the feature vector determining method or the content pushing method provided in the above embodiment.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the feature vector determination method or the content push method provided in the various alternative implementations of the above aspect.

It will be appreciated by those of ordinary skill in the art that implementing all or part of the steps of the above method embodiments may be implemented by hardware associated with program instructions, where the above program may be stored in a computer readable storage medium, where the program when executed performs the steps including the above method embodiments, and where the storage medium may be at least one of a read-only memory (ROM), a RAM, a magnetic disk, or an optical disk, etc. various media that may store program code.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A content pushing method, the method comprising:

Determining object characteristics of a target object;

Determining target content associated with the object feature from the undetermined content according to the association degree between the feature vector of the undetermined content and the object feature, wherein the target content comprises first information from a first modality and second information from a second modality, the feature vector of the target content is generated according to a first modality vector and a second modality vector, a first to-be-oriented amount corresponds to the first information, the second to-be-oriented vector corresponds to the second information, the second to-be-oriented vector is obtained by adjusting the second to-be-oriented vector through a first activation vector, the first activation vector comprises a first weight parameter determined by the degree of criticality of information carried by the first to-be-oriented vector, the first weight parameter is used for pertinently adjusting the information contained by the second to-be-oriented vector based on the side key information of the first to-oriented vector so as to enhance the key information in the second to-oriented vector and inhibit non-key information in the second to-oriented vector, the first to-be-oriented vector is obtained by adjusting the second activation vector through the second activation vector;

and returning the target content based on the target object.

2. The method of claim 1, wherein if the target object is a target user, the object feature is a user feature corresponding to the target user, and the returning the target content based on the target object comprises:

Pushing the target content to the target user;

if the target object is a target keyword, the object feature is a word feature corresponding to the target keyword, and the returning the target content based on the target object includes:

3. A method of feature vector determination, the method comprising:

Determining a first activation vector according to the first to-be-oriented quantity, wherein the first activation vector comprises a first weight parameter determined by the key degree of information carried by the first to-be-oriented vector;

The second undetermined vector is adjusted through the first activation vector to obtain a second modal vector, wherein the first weight parameter is used for adjusting information included in the second undetermined vector in a targeted manner based on the emphasis point of the first undetermined vector so as to enhance key information in the second undetermined vector and inhibit non-key information in the second undetermined vector;

determining a second activation vector according to the second to-be-determined vector, wherein the second activation vector comprises a second weight parameter determined by the key degree of information carried by the second to-be-determined vector;

the first quantity to be oriented is adjusted through the second activation vector to obtain a first modal vector, wherein the second weight parameter is used for adjusting information included in the first vector to be oriented in a targeted manner based on the emphasis point of the second vector to be oriented so as to enhance key information in the first vector to be oriented and inhibit non-key information in the first vector to be oriented;

4. A method according to claim 3, wherein the first pending vector has a first dimension and the second pending vector has a second dimension, the determining the first activation vector from the first pending vector comprising:

5. A method according to claim 3, wherein the first pending vector has a first dimension and the second pending vector has a second dimension, the determining a second activation vector from the second pending vector comprising:

6. The method according to any one of claims 3-5, wherein the vector generation model includes a first feature module, a second feature module, and a first mapping matrix, and wherein the generating a first to-be-oriented vector corresponding to the first information and a second to-be-oriented vector corresponding to the second information includes:

The vector generation model is trained by:

7. The method according to any one of claims 3-5, further comprising:

8. The method of claim 3, wherein the target content is video content, the video content including text information from a caption and image information from a video frame, the generating a first to-be-oriented vector corresponding to the first information and a second to-be-oriented vector corresponding to the second information comprising:

9. A content pushing apparatus is characterized by comprising a determining unit and a returning unit;

the device comprises a determining unit, a determining unit and a determining unit, wherein the determining unit is used for determining object characteristics of a target object, the target content related to the object characteristics is determined from the undetermined content according to the association degree between characteristic vectors of the undetermined content and the object characteristics, the target content comprises first information from a first modality and second information from a second modality, the characteristic vectors of the target content are generated according to a first modality vector and a second modality vector, a first to-be-oriented amount corresponds to the first information, the second to-be-oriented vector corresponds to the second information, the second modality vector is obtained by adjusting the second to-be-oriented vector through a first activation vector, the first activation vector comprises a first weight parameter determined through the degree of the key of the information carried by the first to-be-oriented vector, the first weight parameter is used for adjusting the information included by the second to enhance the key information included by the second to-be-oriented vector based on the side key of the first to-oriented vector, the second to-be-oriented vector is used for enhancing the key information included by the second to-oriented vector, the first to-oriented vector is used for enhancing the key information carried by the second to-oriented vector;

10. The apparatus according to claim 9, wherein if the target object is a target user, the object feature is a user feature corresponding to the target user, and the return unit is configured to:

Pushing the target content to the target user;

If the target object is a target keyword, the object feature is a word feature corresponding to the target keyword, and the return unit is configured to:

11. The device for determining the characteristic vector is characterized by comprising an acquisition unit, a first generation unit, a first determination unit, an adjustment unit and a second generation unit;

The first determining unit is configured to determine a first activation vector according to the first to-be-oriented amount, where the first activation vector includes a first weight parameter determined by a criticality of information carried by the first to-be-oriented vector;

the adjusting unit is configured to adjust the second to-be-determined vector through the first activation vector to obtain a second modal vector, where the first weight parameter is configured to purposefully adjust information included in the second to-be-determined vector based on an emphasis point of the first to-be-determined vector so as to enhance key information in the second to-be-determined vector and suppress non-key information in the second to-be-determined vector;

The first determining unit is further configured to determine a second activation vector according to the second to-be-determined vector, where the second activation vector includes a second weight parameter determined by a criticality of information carried by the second to-be-determined vector;

The second generating unit is configured to adjust the first to-be-oriented amount through the second activation vector to obtain a first modal vector, where the second weight parameter is configured to purposefully adjust information included in the first to-be-oriented vector based on an emphasis point of the second to-be-oriented vector to enhance key information in the first to-be-oriented vector and suppress non-key information in the first to-be-oriented vector;

The second generating unit is configured to generate a feature vector corresponding to the target content according to the first modal vector and the second modal vector.

12. The apparatus of claim 11, wherein the first pending vector has a first dimension and the second pending vector has a second dimension, the first determining unit configured to:

13. The apparatus of claim 11, wherein the first pending vector has a first dimension and the second pending vector has a second dimension, the first determining unit configured to:

14. The apparatus according to any of the claims 11-13, wherein the vector generation model comprises a first feature module, a second feature module and a first mapping matrix, the first generation unit being configured to:

the device further comprises a training unit for:

15. The apparatus according to any one of claims 11-13, further comprising a second determining unit for:

16. The apparatus of claim 11, wherein the target content is video content including text information from a caption and image information from a video frame, the first generation unit configured to:

17. A computer device, the device comprising a processor and a memory:

The processor is configured to perform the method of claim 1 or 2, or to perform the method of any of claims 3-8, according to instructions in the program code.

18. A computer readable storage medium for storing a computer program for performing the method of claim 1 or 2 or for performing the method of any of claims 3-8.

19. A computer program product, characterized in that the computer program product comprises computer instructions, which are executed by a processor of a computer device, such that the computer device performs the method of claim 1 or 2, or performs the method of any of claims 3-8.