CN106959971A

Movatterモバイル変換

Info

Publication number: CN106959971A
Application number: CN201610018733.7A
Authority: CN
Inventors: 周强
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-01-12
Filing date: 2016-01-12
Publication date: 2017-07-18
Anticipated expiration: 2036-01-12
Also published as: CN106959971B; WO2017121272A1

Abstract

The invention discloses a kind of processing method and processing device of user behavior data.Wherein, this method includes：Obtain user behavior data, determine the preference score value for the search terms that the data acquisition system in each dimension of user's correspondence is included after search term to be positioned is obtained, the multiple retrieval by window for obtaining that there is corresponding relation with search term are inquired about according to search term, and obtain the weighted value of the data acquisition system in each dimension of each retrieval by window correspondence；The weighted value of data acquisition system in the preference score value of the search terms included according to the data acquisition system in each dimension each dimension corresponding with each retrieval by window is obtained, calculates behavior weighted value determined by the coupled relation obtained between each user and search term；The behavior weighted value according to determined by the coupled relation between each user and search term, determines the user's group that search term to be positioned is positioned.The present invention solve it is simple realized by structural data crowd orient, the not accurate enough technical problem of positioning result.

Description

User behavior data processing method and device

Technical Field

The invention relates to the field of computers, in particular to a method and a device for processing user behavior data.

Background

At present, a user can generate a large amount of structured data when using an internet product (for example, shopping at a web portal), and a merchant often realizes crowd targeting through the structured data to analyze the interest of the user, for example, a tag crowd targeting technology of the DMP completes a target marking and targeting activity of people around the world by using basic information and basic behaviors of the user, and further pushes advertisements or applications to a targeted user group.

It should be noted here that a large amount of unstructured data (e.g., text data) is also generated when a user uses an internet product, and compared with the structured data, comments and titles of the user in the text data may reflect more fine-grained interest and preference of the user, and business information mined from the text data may be more valuable, so in the related art, crowd orientation is achieved simply through the structured data, and a positioning result is not accurate enough.

Aiming at the problems that people orientation is realized through structured data and the positioning result is not accurate enough, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for processing user behavior data, which are used for at least solving the technical problems that crowd orientation is realized through structured data and the positioning result is not accurate enough.

According to an aspect of the embodiments of the present invention, there is provided a method for processing user behavior data, including: acquiring user behavior data, wherein the user behavior data comprises access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; after the search terms to be positioned are obtained, a plurality of positioning search terms which have a corresponding relation with the search terms are obtained according to search term query, and the weight value of each positioning search term corresponding to the data set on each dimension is obtained; calculating to obtain a behavior weight value determined by a coupling relation between each user and a search word according to preference scores of retrieval items contained in the data sets in each dimension and a weight value of the data sets in each dimension corresponding to each positioning retrieval item; and determining a user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

According to another aspect of the embodiments of the present invention, there is also provided a device for processing user behavior data, including: the first acquisition unit is used for acquiring user behavior data, wherein the user behavior data comprise access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; the first determining unit is used for determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; the second acquisition unit is used for acquiring a plurality of positioning retrieval items which have a corresponding relation with the search terms according to the search term query after acquiring the search terms to be positioned, and acquiring the weight value of each positioning retrieval item corresponding to the data set on each dimension; the third acquisition unit is used for calculating a behavior weight value determined by the coupling relation between each user and the search term according to the preference score of the search term contained in the data set of each user in each dimension and the weight value of the data set of each positioning search term in each dimension; and the second determining unit is used for determining the user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

In the embodiment of the invention, user behavior data are acquired, wherein the user behavior data comprise access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; after the search terms to be positioned are obtained, a plurality of positioning search terms which have a corresponding relation with the search terms are obtained according to search term query, and the weight value of each positioning search term corresponding to the data set on each dimension is obtained; calculating to obtain a behavior weight value determined by a coupling relation between each user and a search word according to preference scores of retrieval items contained in the data sets in each dimension and a weight value of the data sets in each dimension corresponding to each positioning retrieval item; the user group positioned by the search word to be positioned is determined according to the behavior weighted value determined by the coupling relation between each user and the search word, and the technical problems that crowd orientation is achieved through structured data and the positioning result is not accurate enough are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a computer terminal of a method for processing user behavior data according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for processing user behavior data according to an embodiment of the invention;

FIG. 3 is a schematic diagram of an alternative method of processing user behavior data according to an embodiment of the invention;

FIG. 4 is a schematic diagram of an alternative method of processing user behavior data according to an embodiment of the invention;

fig. 5 is a schematic structural diagram of a device for processing user behavior data according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an alternative apparatus for processing user behavior data according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an alternative apparatus for processing user behavior data according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an alternative apparatus for processing user behavior data according to an embodiment of the present invention; and

fig. 9 is a block diagram of a hardware configuration of a computer terminal of a method for processing user behavior data according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The terms of art in this application are explained below:

ETL: is an abbreviation of english Extract-Transform-Load, and is used to describe the process of extracting (Extract), converting (Transform), and loading (Load) data from the source end to the destination end. The term ETL is more commonly used in data warehouses, but its objects are not limited to data warehouses. The ETL is an important ring for constructing a data warehouse, and a user extracts required data from a data source, and finally loads the data into the data warehouse according to a predefined data warehouse model after data cleaning.

LR: the abbreviation of Logistic regression is a commonly used linear classifier.

SVM: support Vector machine (svm) (support Vector machine) is a supervised learning model that is commonly used for pattern recognition, classification, and regression analysis.

Lucene: lucene is a sub-item of the apache software foundation 4jakarta project group, is an open source code full-text search engine toolkit, but is not a complete full-text search engine but a full-text search engine architecture, and provides a complete query engine, an index engine and a partial text analysis engine (English and German western languages).

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for processing user behavior data, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The method provided by the first embodiment of the present application may be executed in a computer terminal or a similar computing device. Taking the example of the method running on the computer terminal, fig. 1 is a hardware structure block diagram of the computer terminal of the method for processing user behavior data according to the embodiment of the present invention. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be configured to store software programs and modules of application software, such as program instructions/modules corresponding to the processing method of user behavior data in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implementing the vulnerability detection method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

Under the operating environment, the application provides a processing method of user behavior data as shown in fig. 2. Fig. 2 is a flowchart of a method for processing user behavior data according to a first embodiment of the present invention, where the method may include:

step S22, obtaining user behavior data, where the user behavior data includes access data sets generated after a plurality of users access the target object, and the access data sets at least include data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set.

In the step S22, the USER may be an accessing USER of a portal site (e.g., a shopping site), the target object may be a product ITEM in the portal site, the product ITEM may be a commodity, a video, music, or the like, after the accessing USER clicks the product ITEM of the portal site, searches for queries, comments, favorite pages, or the like, a large number of accessing data sets (e.g., text data) are generated, and the website server may obtain the accessing data sets generated by the accessing target object by the USER. It should be noted that each access data set acquired by the website server can be described by using three dimensions: CATEGORY, i.e. the above-mentioned classification information, is used to describe the classification of the product ITEM, attribute PROPERTY is used to describe the own attribute of the product ITEM, KEYWORD is used to describe the name of the product ITEM, and each KEYWORD may have a weight of word frequency or TFIDF. It should be noted that, in the three dimensions used to describe a product ITEM, each product ITEM can only have one CATEGORY, and each product ITEM can have multiple attributes, PROPERTY.

It should be noted that, in the present solution, the original behavior data of the USER may be statistically summarized by using a targeted supervised learning algorithm (e.g., LR, SVM), and then, the behavior of the USER on the ITEM product is decomposed into the above three dimensions, alternatively, the data specification of the product ITEM in the present solution may be the following table one, and the data specification of the USER behavior may be the following table two.

Table one:

column name	Description of field
		item_id	Article ID
category	Categories of
		keywords	Keyword
description	Description of the invention
		properties	Properties

Table two:

column name	Description of field
		user_id	User ID
item_id	Article ID
		bhv_type	Type of behavior
count	Number of articles

Taking the example that the USER accesses the shopping website TB, there are many products in the shopping website TB, the product classification may be categories such as makeup, mother and baby, food, video, and song, and the USER may operate the specific products under the classification, for example, the USER may click on the "sunstar drive movie" index button under the movie classification in the TB page, the target object selected and operated by the USER is the "sunstar drive movie" product, the "sunstar drive movie" product may be expressed by three dimensions (categories, attributes, and keywords), the category of the "sunstar drive movie" product is a movie, the attribute is a video, and the keyword is a sunstar drive movie.

Step S24, determining preference scores of the user corresponding to the search terms included in the data sets in each dimension, where the data sets in each dimension include at least one search term.

In the above step S24, in the three dimensions for expressing the product ITEM, each dimension may include a plurality of search terms, the plurality of search terms may be a plurality of attributes of each dimension, the user may operate on a specific search term in each dimension, and then the present solution may determine a preference score of the user for each search term according to the specific operation of the user on each search term.

Still taking the example that the USER accesses the shopping website TB, in three dimensions of a target object "saturday movie" product selected by the TB page, the CATEGORY "movie" of the "saturday movie" product is "movie", the CATEGORY "movie" may include a first search item "domestic movie", a second search item "comedy movie", and the like, the attribute "project" of the "saturday movie" product is "video", and the attribute "video" may include a third search item "high definition video", and a fourth search item "standard definition video". It should be noted that the keyword of the product may be an attribute of itself. The USER can perform any operation on the plurality of search terms such as the first search term, the second search term, the third search term, the fourth search term and the like, and the preference scores of the USER on the plurality of search terms such as the first search term, the second search term, the third search term, the fourth search term and the like can be determined according to the specific operation behaviors (such as the operation times) of the USER on the plurality of search terms.

Step S26, after the search term to be positioned is obtained, a plurality of positioning search terms having corresponding relations with the search term are obtained according to the search term query, and the weight value of the data set on each dimension corresponding to each positioning search term is obtained.

In the step S26, if the operator of the website wishes to implement crowd targeting by using the search term, that is, the operator of the website wishes to define any one or more users interested in the search term a, that is, locate a group of users according to the search term, so as to further perform applications such as data pushing, analysis, and the like corresponding to the located user group, for example, after locating interests and hobbies of different consumer groups by using a certain word as the search term, advertisement information related to the search term may be pushed to the users located in the same group, that is, in an optional example, the operator of the website may directly input the search term to be located to the server, or may provide a text to the server, and the server may obtain the located search term from the text by word segmentation.

It should be noted that the search term input by the operator may also be described by using three dimensions, each dimension may also include a plurality of positioning search terms, it should be noted that the attribute of each dimension in the three dimensions describing the search term to be positioned is "positioning search term", and the attribute of each dimension in the three dimensions of the product accessed by the access user is "search term" which is different from each other. After receiving a search word input by an operator, the scheme can expand a plurality of positioning retrieval items TERM corresponding to the search word through query, and the positioning retrieval items TERM can be contained in three dimensions for describing the search word. According to the scheme, the weight value of each dimension corresponding to each positioning retrieval item TREM can be obtained through a preset algorithm. It should be noted that the operator may wish to group users who are interested in the search terms.

Still taking the example that the USER accesses the shopping website TB, after the website server collects a large amount of behavior data of the USER, the website operator of the shopping website TB may input a text TXT to the website server, the data processing terminal may perform word segmentation screening processing on the text TXT to generate a search word "sunstar drive movie", three dimensions for expressing the "sunstar drive movie" are pre-stored in the data processing terminal, a plurality of positioning retrieval items TERM are pre-stored in each dimension, and after the data processing terminal may query a plurality of positioning retrieval items TERM corresponding to the "sunstar drive movie", a weight value of each positioning retrieval item TERM corresponding to each dimension may be obtained through a preset algorithm. It should be noted that the TXT text input by the website facilitator may be text content describing a website related product, and the scheme may perform word segmentation and screening on the text content, so as to obtain the search word.

Step S28, calculating a behavior weight value determined by the coupling relationship between each user and the search term according to the preference score of the search term contained in the data set in each dimension and the weight value of the data set in each dimension corresponding to each positioning search term.

In the step S28, the present solution may calculate a behavior weight value determined by the coupling relationship between each user and the search term according to the preference score of the search term included in the data set in each dimension obtained in the step S24 and the weight value of the data set in each dimension corresponding to each positioning search term in the step S26, where the behavior weight value may be used to represent the degree of interest of each user in the search term to be positioned input by the website operator.

It should be noted that, when a user accesses a web portal, a coupling relationship between the user and a search term may be generated through an operation (clicking, browsing, downloading, etc.) on the search term in the web portal, for example, when the user performs a clicking operation on the search term, a first coupling relationship may be generated between a behavior of the user and the search term, and the first coupling relationship may be used to represent a degree of interest of the user in the search term, where the larger the number of times the user clicks, the larger the first coupling relationship is, the larger a weight value of the behavior determined according to the first coupling relationship is, and the larger the degree of interest of the user in the search term is also indicated.

Still taking the example that the USER accesses the shopping website TB, the data processing terminal of the website server may query a plurality of positioning search items corresponding to the "sunstar drive movie" according to the to-be-positioned search word "sunstar drive movie" input by the website operator, then calculate a first weight value of each positioning search item for each belonging dimension, then obtain a preference score of the USER for each search item of the product "sunstar drive movie" in the TB website, then calculate and generate a behavior weight value of the USER for the "sunstar drive movie" according to the first weight value and the preference score, where the behavior weight value may be used to represent the interest level of the USER for the "sunstar drive movie".

And step S30, determining the user group in which the search term to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search term.

In step S30, the present solution may select a plurality of users meeting a predetermined condition according to the magnitude of the behavior weight value determined by the coupling relationship between each user and the search term, and then determine the plurality of users meeting the predetermined condition as the user group related to the search term. Preferably, this embodiment may also determine, as the user group, a user whose weighted value determined by the coupling relationship is greater than 0. It should be noted that after determining the user group of the search term, the operator may push relevant advertisement information to each user in the user group.

In the solution disclosed in the first embodiment of the present application, if people who are interested in a product want to be located, first, the solution may obtain user behavior data, where the user behavior data includes access data sets generated after a plurality of users access a target object, and the access data sets at least include data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; then, determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; then, after a search term to be positioned is obtained, a plurality of positioning search terms which have a corresponding relation with the search term are obtained according to search term query, and a weight value of a data set on each dimension corresponding to each positioning search term is obtained; then, according to preference scores of retrieval items contained in the data sets in each dimension and weight values of the data sets in each dimension corresponding to each positioning retrieval item, behavior weight values determined by the coupling relation between each user and the search terms are obtained through calculation; finally, the user group in which the search word to be positioned is positioned can be determined according to the behavior weight value determined by the coupling relation between each user and the search word. It is easy to notice that the scheme can obtain the user behavior data from the website server, generate the preference score of the user for the search item of the product according to the user behavior data, then generate the first weight value of each positioning search item in the search term for the corresponding dimension according to the search term input by the operator, and finally generate the behavior weight value of the user according to the preference score and the first weight value. Therefore, the technical problems that crowd orientation is achieved through structured data and the positioning result is not accurate enough are solved through the scheme of the first embodiment provided by the application.

In an alternative embodiment provided by the present application, in step S24, the step of determining the preference score of the search term included in the data set corresponding to each dimension by the user may include:

step S241, respectively obtaining at least one first search term included in the keyword set, at least one second search term included in the attribute information set, and at least one third search term included in the classification information set.

Step S242, respectively counting the number of times of visit by the user to the search items in the data set in each dimension, and the number of times of visit by the user to the search items in the data set in each dimension.

Step S243, calculating to obtain preference scores of the retrieval items included in the data sets in each dimension corresponding to the user according to the number of times of visit by the user to the retrieval items in the data sets in each dimension and the number of times of visit by the user to the retrieval items in the data sets in each dimension.

In the above steps S241 to S243, the present solution may obtain each search term in each of three dimensions of the product, then calculate a preference score of the user for each search term in each dimension according to the number of times the user accesses each search term and the number of times the user accesses each search term, and then form a Document (Document), which may include three fields (fields), similar to the search engine: CATEGORY, PROPERTY, KEYWORD. Each field contains several search terms (term), and the user's preference score for each search term can be described in the document. Because the real-time requirement of the result of crowd positioning (circled people) is generally not high, the data volume (millions to billions) is far smaller than that of a text search system (billions to billions), documents do not need to maintain inverted indexes, and the technical implementation is simpler than that of the text search system.

In an optional embodiment provided by the present application, in step S243, according to the number of times of visit by each person of the search term in the data set in each dimension and the number of times of visit by the user of the search term in the data set in each dimension, the preference score of the search term included in the data set in each dimension corresponding to the user is obtained by calculating, through the following calculation formula, the preference score tf (t, d) of the search term included in the data set in any one dimension corresponding to the user:

preference scoreWherein,

w_iweight value for access behavior occurring in data set in ith dimension, N_iThe access times counted after the user performs the access action on the retrieval item t in the data set on the ith dimension are calculated; n is_iThe number of per-person accesses for retrieving an item t in the data set in the ith dimension is determined, wherein the item t is any one item in the data set, and the access behavior comprises any one of the following types: click, collect, and comment.

In an optional embodiment provided by the present application, in step S26, after obtaining the search term to be located, obtaining, according to the search term query, a plurality of location search terms having a corresponding relationship with the search term, and obtaining a weight value of the data set in each dimension corresponding to each location search term may include:

step S261, obtaining a search term to be positioned, and obtaining a plurality of positioning search terms having a corresponding relationship with the search term according to the search term query.

Step S262, determining a dimensional relationship of the data set on each dimension corresponding to the search term according to the plurality of positioning search terms obtained by the query.

Step S263, calculating a weight value of the data set in each dimension corresponding to each positioning retrieval item according to the dimension relationship of the data set in each dimension corresponding to the search term.

In the above steps S261 to S263, the present scheme may further perform query according to the to-be-positioned search term input by the operator to obtain a plurality of positioning search terms corresponding to the to-be-positioned search term, where it should be noted that the plurality of positioning search terms exist in three dimensions used for describing the to-be-positioned search term, the present scheme may first determine a dimension relationship of a data set in each dimension corresponding to the search term, and then calculate a weight value of the data set in each dimension corresponding to each positioning search term according to the dimension relationship.

In an optional embodiment provided by the present application, in step S262, the dimension relationship of the data set corresponding to each dimension of the search term may be determined by the following calculation formula:

wherein,

a represents a data set containing any search term in the data sets in the three dimensions, and B represents a data set containing any positioning search term t in the data sets in the three dimensions.

In the formula, the scheme can generate the relation from the search WORD to three dimensions of the ITEM, when an operator inputs the search WORD to perform crowd orientation, the scheme generates the relation from the search WORD to the three dimensions of the ITEM through query expansion, namely WORD-CATEGORY, WORD-PROPERTY and KEYWORD-KEYWORD, and the scheme can use Jaccard Distance algorithm (Jaccordid Distance) to consider the co-occurrence relation from the search WORD to other dimensions on the ITEM.

In an optional embodiment provided by the present application, in step S263, the present solution may obtain a weight value of the data set corresponding to each dimension for each positioning search term through the following calculation formula:

wherein r (w, t) is the dimensional relation of the search word corresponding to the data set in each dimension, w is the correlation between the search word w and the search term t, and i (w) is the word frequency of the search word in the text.

It should be noted that, in the above formula, weight calculation may simply use weighted summation to finally obtain a label definition after query expansion, and in this scheme, each domain in the above document may be assigned with a weight value.

In an optional embodiment provided by the present application, the step of obtaining the search term to be located in step S261 includes:

step S2611, after receiving the keyword input by the query user, determines the input keyword to be a search term to be located.

In the step S2611, the querying user may be an operator who wants to locate the crowd, and after the operator inputs the keyword, the scheme may directly determine that the keyword input by the operator is a search term to be located.

Step S2612, after receiving the text input by the query user, the text is conducted word segmentation processing, and at least one keyword obtained through word segmentation processing is a search word to be positioned.

In the step S2612, if the operator inputs a text TXT, the scheme may perform word segmentation and screening on the text TXT, and then determine at least one keyword obtained through word segmentation as a search word to be located.

In the above-described arrangement of two parallel steps S2611 and S2612, the operator may input a keyword or a text in this arrangement.

In an optional embodiment provided by the present application, in step S28, the step of calculating a behavior weight value determined by a coupling relationship between each user and a search term according to a preference score of a retrieval item included in a data set in each dimension and a weight value of the data set in each dimension obtained for each positioning retrieval item includes:

in step S281, the IDF value IDF (t) of the positioning search item in the user behavior data is obtained.

In step S282, the highest weight value coord (q, d) of the positioning search term in the plurality of documents is obtained.

Step S283, normalize the search term queried in the same document, and obtain a normalized search term score querynom (q, d).

In step S284, the weight values of the positioning search term in the plurality of documents are normalized to obtain the normalized scores norm (t.field) of the plurality of documents.

In step S285, a behavior weight value Score (q, d) determined by the coupling relationship between each user and the search term is obtained through the following calculation formula.

Score(q,d)＝coord(q,d)*queryNorm(q,d)*∑_t∈qtf(t,d)*idf²And (t) t.boost.field, wherein tft, d is a preference score of a user corresponding to a retrieval item contained in the data set in each dimension, t.boost is a weight value of each positioning retrieval item corresponding to the data set in each dimension, and f.boost is a weight value of the data set in each dimension.

In an optional embodiment provided by the present application, the IDF value IDF (t) of the positioning search term in the user behavior data may be calculated by the following calculation formula:

in an optional embodiment provided by the present application, the present solution may obtain the highest weight value coord (q, d) of the positioning search term in the plurality of documents by calculating according to the following calculation formula:

in an alternative embodiment provided by the present application, the normalized search term score querynom (q, d) can be obtained by calculating according to the following calculation formula:

in an alternative embodiment provided by the present application, the present solution may obtain the normalized scores norm (t.field) of the above documents by calculating according to the following formula:

the domain is a data set on any dimension in the access data set.

It should be noted that, unlike the standard search scoring algorithm, the algorithm used in the present solution ignores the weight d.boost of the Document (Document), and the overall weight q.boost of the Query (Query), and there is only one f.boost corresponding to each TERM, that is, there is only one domain corresponding to each TERM.

An alternative embodiment of the present application is described below with reference to fig. 3 to 4, and this embodiment may include the following steps;

step A, a data extraction abstract module imports user behavior data into a data warehouse, such as ODPS and Hadoop, an ETL process is carried out, and offline data meeting data specifications are output.

In the above step a, the present embodiment needs to abstract out two subjects: USER, representing the body of the circled person, the final population being a subset of the overall USER, which may have a TAG attribute describing the USER's demographic characteristics, such as gender, age. ITEM, which represents an object where a user acts, including but not limited to merchandise, video, music, etc. Each ITEM will be described by three dimensions: CATEGORY, which represents the classification of ITEMs, is a many-to-one relationship, i.e., each ITEM has one and only one CATEGORY. PROPERTY, which means the PROPERTY of ITEM, is a many-to-many relationship, for example, music as ITEM may have properties of composer, word writer, singer, style, etc. KEYWORD, which represents ITEM description information, each KEYWORD may be weighted with word frequency or TFIDF. It should be noted that only KEYWORD is necessary for three dimensions, and others may not be represented in the data (CATEGORY only, PROPERTY is null).

Step B, the USER document generation module decomposes the behavior of the USER to the ITEM into preference scores of three dimensions of the UESR to the ITEM, namely: UESER-CATEGORY, USER-PROPERTY, USER-KEYWORD. The scheme can adopt a targeted supervised learning algorithm (such as LR and SVM) to perform statistical summary on the data, and then normalize the data to 0-1. The aggregation of all preferences generates each user's own preferred Document (Document), which, like the search engine, includes three fields (fields), with reference to FIG. 4: CATEGORY, PROPERTY, KEYWORD. Each field contains several search terms (term) describing the user's preference score for a certain category, a certain word. Because the real-time requirement of the circus results is generally not high, the data volume (millions to billions) is far smaller than that of a text search system (billions to billions), documents do not need to maintain inverted indexes, and the technical implementation is simpler compared with the text search system.

And step C, calculating the relation of three dimensions from the search word to the ITEM by the keyword correlation calculation module, and providing a function of query expansion in the process of inputting the keyword to circle people. Calculating the relation of three dimensions from the search WORD to the ITEM, namely WORD-CATEGORY, WORD-PROPERTY and KEYWORD-KEYWORD.

And step D, inputting the text or the key words provided by the user by the label definition generating module, wherein the provided text system needs to firstly carry out word segmentation screening processing to obtain the key words, and inquiring and expanding corresponding positioning search items (term). The label definition generating module finally generates the weight of each positioning retrieval ITEM on each dimension according to the relation of the search terms to the three dimensions of the ITEM, and the weight calculation can simply use weighted summation. Finally, the label definition after Query expansion is obtained, which is equivalent to the Query (Query) in the search system.

Step E, the scoring module generates a user behavior weight value according to the weight of each positioning retrieval ITEM on each dimension and the preference of the UESR to three dimensions of the ITEM according to the search scoring algorithm of Lucened, wherein the user behavior weight value can be used for representing the interest size of the ITEM. It should be noted that the scoring algorithm may be a BM25 algorithm.

In conclusion, the invention provides a set of universal solution, operators can finish the definition of a specific crowd only by providing keywords, and can provide interpretable crowd definitions, thereby improving the iteration efficiency of products and reducing the development cost, further finishing more accurate crowd orientation and improving the advertising service effect of the operators.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, there is further provided a device for processing user behavior data, where the device is used to implement the method for processing user behavior data, and as shown in fig. 5, the device may include:

a first obtaining unit 50, configured to obtain user behavior data, where the user behavior data includes access data sets generated after a plurality of users access a target object, and the access data sets include data sets in at least the following three dimensions: a keyword set, an attribute information set and a classification information set.

The USER may be an accessing USER of a portal website (such as a shopping website), the target object may be a product ITEM in the portal website, the product ITEM may be a commodity, a video, music, or the like, a large number of accessing data sets (such as text data) may be generated after the accessing USER clicks, searches, queries, reviews, collects a webpage, or the like on the product ITEM of the portal website, and the website server may obtain the accessing data sets generated by the accessing target object of the USER. It should be noted that each access data set acquired by the website server can be described by using three dimensions: CATEGORY, i.e. the above-mentioned classification information, is used to describe the classification of the product ITEM, attribute PROPERTY is used to describe the own attribute of the product ITEM, KEYWORD is used to describe the name of the product ITEM, and each KEYWORD may have a weight of word frequency or TFIDF. It should be noted that, in the three dimensions used to describe a product ITEM, each product ITEM can only have one CATEGORY, and each product ITEM can have multiple attributes, PROPERTY.

The first determining unit 52 is configured to determine preference scores of the user for search terms included in data sets in each dimension, where the data sets in each dimension include at least one search term.

In three dimensions for expressing the product ITEM, each dimension can comprise a plurality of search ITEMs, the plurality of search ITEMs can be a plurality of attributes of each dimension, a user can operate on specific search ITEMs in each dimension, and then the scheme can determine the preference score of the user for each search ITEM according to the specific operation of the user on each search ITEM.

The second obtaining unit 54 is configured to, after obtaining the search term to be located, obtain, according to the search term query, a plurality of positioning search terms having a corresponding relationship with the search term, and obtain a weight value of the data set in each dimension corresponding to each positioning search term.

If the operator of the website wishes to implement crowd targeting by using the search term, that is, the operator of the website wishes to define any one or more users interested in the search term a, that is, to locate a group of users according to the search term, so as to further perform applications such as data pushing, analysis and the like on the located user group, for example, after locating interests and hobbies of different consumer groups by using a certain vocabulary as the search term, advertisement information related to the search term can be pushed to the users located in the same group, that is, in an optional example, the operator of the website can directly input the search term to be located to the server, and can also provide a text to the server, and the server can obtain the search term to be located from the text by word segmentation screening.

The third obtaining unit 56 calculates a behavior weight value determined by the coupling relationship between each user and the search term according to the preference score of the search term included in the data set in each dimension and the weight value of the data set in each dimension obtained by obtaining each positioning search term.

When a user accesses a portal website, a coupling relationship between the user and a search word can be generated through operations (clicking, browsing, downloading and the like) on the search word in the website, for example, when the user clicks the search word, a first coupling relationship can be generated between the behavior of the user and the search word, the first coupling relationship can be used for representing the degree of interest of the user on the search word, the more times the user clicks, the larger the first coupling relationship is, the larger the behavior weight value determined according to the first coupling relationship is, and the larger the degree of interest of the user on the search word is.

The second determining unit 58 determines the user group in which the search term to be positioned is positioned according to the behavior weight value determined by the coupling relationship between each user and the search term.

According to the scheme, a plurality of users meeting the preset condition can be selected according to the behavior weight value determined by the coupling relation between each user and the search word, and then the plurality of users meeting the preset condition are determined as the user group related to the search word. Preferably, this embodiment may also determine, as the user group, a user whose weighted value determined by the coupling relationship is greater than 0. It should be noted that after determining the user group of the search term, the operator may push relevant advertisement information to each user in the user group.

In the solution disclosed in the second embodiment of the present application, if people interested in a product want to be located, first, the solution may obtain user behavior data, where the user behavior data includes access data sets generated after a plurality of users access a target object, and the access data sets at least include data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; then, determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; then, after a search term to be positioned is obtained, a plurality of positioning search terms which have a corresponding relation with the search term are obtained according to search term query, and a weight value of a data set on each dimension corresponding to each positioning search term is obtained; then, according to preference scores of retrieval items contained in the data sets in each dimension and weight values of the data sets in each dimension corresponding to each positioning retrieval item, behavior weight values determined by the coupling relation between each user and the search terms are obtained through calculation; finally, the user group in which the search word to be positioned is positioned can be determined according to the behavior weight value determined by the coupling relation between each user and the search word. It is easy to notice that the scheme can obtain the user behavior data from the website server, generate the preference score of the user for the search item of the product according to the user behavior data, then generate the first weight value of each positioning search item in the search term for the corresponding dimension according to the search term input by the operator, and finally generate the behavior weight value of the user according to the preference score and the first weight value. Therefore, the technical problems that crowd orientation is achieved through structured data and the positioning result is not accurate enough are solved through the scheme of the second embodiment provided by the application.

In an alternative embodiment provided by the present application, as shown in fig. 6, the first determining unit 52 includes: a first obtaining module 521, configured to obtain at least one first search term included in the keyword set, at least one second search term included in the attribute information set, and at least one third search term included in the classification information set, respectively; a counting module 523, configured to count the number of times that a user accesses a search item in a data set in each dimension; the first calculating module 524 is configured to calculate, according to the number of times that the user has visited the search item in the data set in each dimension and the number of times that the user has visited the search item in the data set in each dimension, a preference score of the user for the search item included in the data set in each dimension.

In an optional embodiment provided herein, the first calculation module 524 includes: a sub-calculating module 5241, configured to calculate a preference score tf (t, d) of a search term included in a data set corresponding to any dimension by using the following calculation formula: preference scoreWherein, w_iWeight value for access behavior occurring in data set in ith dimension, N_iThe access times counted after the user performs the access action on the retrieval item t in the data set on the ith dimension are calculated; n is_iThe number of per-person accesses for retrieving an item t in the data set in the ith dimension is determined, wherein the item t is any one item in the data set, and the access behavior comprises any one of the following types: click, collect, and comment.

In an alternative embodiment provided by the present application, as shown in fig. 7, the second obtaining unit 54 includes: a second obtaining module 541, configured to obtain a search term to be located, and obtain, according to a search term query, a plurality of location search terms having a corresponding relationship with the search term; a first determining module 542, configured to determine, according to the multiple positioning search terms obtained through the query, a dimension relationship of the data set in each dimension corresponding to the search term; the second calculating module 543 is configured to calculate, according to the dimension relationship of the data set in each dimension corresponding to the search term, a weight value of the data set in each dimension corresponding to each positioning retrieval item.

In an optional embodiment provided by the present application, the apparatus further includes: a first computing unit for communicatingDetermining the dimension relation of the data set corresponding to each dimension of the search word according to the following calculation formula:a represents a data set containing any search term in the data sets in the three dimensions, and B represents a data set containing any positioning retrieval item t in the data sets in the three dimensions.

In an optional embodiment provided by the present application, the apparatus further includes: the second calculating unit is used for calculating the weight value of the data set corresponding to each dimension of each positioning retrieval item through the following calculation formula:wherein r (w, t) is the dimensional relation of the search word corresponding to the data set in each dimension, w is the correlation between the search word w and the search term t, and i (w) is the word frequency of the search word in the text.

In an optional embodiment provided by the present application, the second obtaining module 541 includes: a second determining module 5411, configured to determine, after receiving a keyword input by a querying user, that the input keyword is a search term to be located; or, the first processing module 5412 is configured to perform word segmentation on the text after receiving the text input by the query user, where at least one keyword obtained through the word segmentation is a search term to be located.

In an alternative embodiment provided by the present application, as shown in fig. 8, the second determining unit 58 includes: a third obtaining module 581, configured to obtain an IDF value IDF (t) of the location search item in the user behavior data; a fourth obtaining module 582, configured to obtain a highest weight value coord (q, d) of the positioning search term in the plurality of documents; a second processing module 583, which normalizes the search terms queried in the same document to obtain normalized search term score queryNorm (q, d); third processing module 584, the positioning retrieval items are normalized on the weighted values of the plurality of documents to obtain the normalized scores norm (t.field) of the plurality of documents, and a third calculating module 585, which is used for obtaining the behavior weighted value Score (q, d) determined by the coupling relation between each user and the search term through a calculation formula that the Score (q, d) is coord (q, d) queryNorm (q, d) ∑_t∈qtf(t,d)*idf²And (t) t.boost.field, wherein tft, d is a preference score of a user corresponding to a retrieval item contained in the data set in each dimension, t.boost is a weight value of each positioning retrieval item corresponding to the data set in each dimension, and f.boost is a weight value of the data set in each dimension.

In an optional embodiment provided by the present application, the apparatus further includes: a third calculating unit, configured to calculate an IDF value IDF (t) of the positioning search term in the user behavior data by using the following calculation formula:

in an optional embodiment provided by the present application, the apparatus further includes: a fourth calculating unit, configured to calculate a highest weight value coord (q, d) of the positioning search term in the plurality of documents by using the following calculation formula:

in an optional embodiment provided by the present application, the apparatus further includes: a fifth calculating unit, configured to calculate a normalized search word score querynom (q, d) by using the following calculation formula:

in an optional embodiment provided by the present application, the apparatus further includes: a sixth calculating unit configured to calculate a normalized score norm (t.field) of the plurality of documents by the following calculation formula:the domain is a data set on any dimension in the access data set.

Example 3

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute the program code of the following steps in the vulnerability detection method of the application program: acquiring user behavior data, wherein the user behavior data comprises access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; after the search terms to be positioned are obtained, a plurality of positioning search terms which have a corresponding relation with the search terms are obtained according to search term query, and the weight value of each positioning search term corresponding to the data set on each dimension is obtained; calculating to obtain a behavior weight value determined by a coupling relation between each user and a search word according to preference scores of retrieval items contained in the data sets in each dimension and a weight value of the data sets in each dimension corresponding to each positioning retrieval item; and determining a user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

Alternatively, fig. 9 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 9, the computer terminal a may include: one or more processors (only one shown), memory.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the security vulnerability detection method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, that is, the above-mentioned method for detecting a system vulnerability attack is implemented. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring user behavior data, wherein the user behavior data comprises access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; after the search terms to be positioned are obtained, a plurality of positioning search terms which have a corresponding relation with the search terms are obtained according to search term query, and the weight value of each positioning search term corresponding to the data set on each dimension is obtained; calculating to obtain a behavior weight value determined by a coupling relation between each user and a search word according to preference scores of retrieval items contained in the data sets in each dimension and a weight value of the data sets in each dimension corresponding to each positioning retrieval item; and determining a user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

Optionally, the processor may further execute the program code of the following steps: respectively acquiring at least one first retrieval item contained in the keyword set, at least one second retrieval item contained in the attribute information set and at least one third retrieval item contained in the classification information set; respectively counting the number of times of per-person access of the retrieval items in the data sets in each dimension and the number of times of access of users to the retrieval items in the data sets in each dimension; and calculating preference scores of the retrieval items contained in the data sets corresponding to the dimensions of the user according to the per-person access times of the retrieval items in the data sets on the dimensions and the access times of the user accessing the retrieval items in the data sets on the dimensions.

Optionally, the processor may further execute the program code of the following steps: calculating a preference score tf (t, d) of a retrieval item contained in a data set corresponding to any dimension by the user according to the following calculation formula: preference scoreWherein, w_iWeight value for access behavior occurring in data set in ith dimension, N_iThe access times counted after the user performs the access action on the retrieval item t in the data set on the ith dimension are calculated; n is_iThe number of per-person accesses for retrieving an item t in the data set in the ith dimension is determined, wherein the item t is any one item in the data set, and the access behavior comprises any one of the following types: click, collect, and comment.

Optionally, the processor may further execute the program code of the following steps: acquiring a search term to be positioned, and inquiring according to the search term to obtain a plurality of positioning retrieval items corresponding to the search term; determining the dimensionality relation of the data set corresponding to each dimensionality of the search word according to a plurality of positioning retrieval items obtained through query; and calculating the weight value of the data set corresponding to each dimension of each positioning retrieval item according to the dimension relation of the data set corresponding to each dimension of the search word.

Optionally, the processor may further execute the program code of the following steps:a represents a data set containing any search term in the data sets in the three dimensions, and B represents a data set containing any positioning retrieval item t in the data sets in the three dimensions.

Optionally, the processor may further execute the program code of the following steps:wherein r (w, t) is the dimensional relation of the search word corresponding to the data set in each dimension, w is the correlation between the search word w and the search term t, and i (w) is the word frequency of the search word in the text.

Optionally, the processor may further execute the program code of the following steps: after receiving a keyword input by a query user, determining the input keyword as a search term to be positioned; or after receiving the text input by the query user, performing word segmentation on the text, wherein at least one keyword obtained by the word segmentation is a search word to be positioned.

Optionally, the processor may further execute the program code of the following steps: obtaining an IDF value IDF (t) of a positioning retrieval item in user behavior data; obtaining a highest weight value coord (q, d) of a positioning retrieval item in a plurality of documents; normalizing the search terms queried in the same document to obtain normalized search term value queryNorm (q, d); locating weights of search terms in multiple documentsLine normalization processing to obtain normalized scores norm (t.field) of multiple documents, and obtaining behavior weight value Score (q, d) determined by the coupling relation between each user and the search term through the following calculation formula, wherein the Score (q, d) is coord (q, d) queryNorm (q, d) ∑_t∈qtf(t,d)*idf²T, d is a preference score of a user corresponding to a retrieval item contained in the data set in each dimension, t, boost is a weight value of the data set in each dimension corresponding to each positioning retrieval item, and f, boost is a weight value of the data set in each dimension.

Optionally, the processor may further execute the program code of the following steps: calculating an IDF value IDF (t) of the positioning retrieval item in the user behavior data by the following calculation formula:

optionally, the processor may further execute the program code of the following steps: calculating the highest weight value coord (q, d) of the positioning retrieval item in the plurality of documents by the following calculation formula:

optionally, the processor may further execute the program code of the following steps: the normalized search term score querynom (q, d) is calculated by the following calculation formula:

optionally, the processor may further execute the program code of the following steps: the normalization score norm (t.field) of a plurality of documents is calculated by the following calculation formula:the domain is a data set on any dimension in the access data set.

The embodiment of the invention provides a method for processing user behavior data. By acquiring user behavior data, wherein the user behavior data comprises access data sets generated after a plurality of users access a target object, the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; after the search terms to be positioned are obtained, a plurality of positioning search terms which have a corresponding relation with the search terms are obtained according to search term query, and the weight value of each positioning search term corresponding to the data set on each dimension is obtained; calculating to obtain a behavior weight value determined by a coupling relation between each user and a search word according to preference scores of retrieval items contained in the data sets in each dimension and a weight value of the data sets in each dimension corresponding to each positioning retrieval item; and determining a user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

The technical problems that crowd orientation is achieved through structured data and the positioning result is not accurate enough are solved.

It can be understood by those skilled in the art that the structure shown in the figures is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 4

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the processing method of the user behavior data provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring user behavior data, wherein the user behavior data comprises access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set; determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item; after the search terms to be positioned are obtained, a plurality of positioning search terms which have a corresponding relation with the search terms are obtained according to search term query, and the weight value of each positioning search term corresponding to the data set on each dimension is obtained; calculating to obtain a behavior weight value determined by a coupling relation between each user and a search word according to preference scores of retrieval items contained in the data sets in each dimension and a weight value of the data sets in each dimension corresponding to each positioning retrieval item; and determining a user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A method for processing user behavior data is characterized by comprising the following steps:

acquiring user behavior data, wherein the user behavior data comprises access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set;

determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item;

after a search term to be positioned is obtained, a plurality of positioning search terms which have a corresponding relation with the search term are obtained according to the search term query, and a weight value of a data set on each dimension corresponding to each positioning search term is obtained;

calculating a behavior weight value determined by the coupling relation between each user and the search term according to the preference score of the search term contained in the data set in each dimension and the weight value of the data set in each dimension corresponding to each positioning search term;

and determining a user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

2. The method of claim 1, wherein determining preference scores for search terms included in the data set corresponding to each dimension for the user comprises:

respectively acquiring at least one first retrieval item contained in the keyword set, at least one second retrieval item contained in the attribute information set and at least one third retrieval item contained in the classification information set;

respectively counting the number of times of per-person access of retrieval items in the data set on each dimension and the number of times of access of the user to the retrieval items in the data set on each dimension;

and calculating preference scores of the retrieval items contained in the data sets corresponding to each dimension by the user according to the per-person access times of the retrieval items in the data sets on each dimension and the access times of the user to the retrieval items in the data sets on each dimension.

3. The method according to claim 2, wherein the preference score of the user corresponding to the search item contained in the data set in each dimension is calculated according to the number of times of visit by each person of the search item in the data set in each dimension and the number of times of visit by the user for visiting the search item in the data set in each dimension:

calculating a preference score tf (t, d) of a retrieval item contained in a data set corresponding to any dimension by the user according to the following calculation formula:

preference score

t f (t, d) = {Σw}_{i} * \frac{N_{i}}{n_{i}},

Wherein,

w_iweight value for access behavior occurring in data set in ith dimension, N_iPerforming access behavior on a retrieval item t by a user in a data set on an ith dimension; n is_iThe number of per-person accesses for retrieving an item t in the data set in the ith dimension is determined, wherein the item t is any one item in the data set, and the access behavior comprises any one of the following types: click, collect, and comment.

4. The method of claim 3, wherein after obtaining a search term to be located, obtaining a plurality of location search terms having a corresponding relationship with the search term according to the search term query, and obtaining a weight value of each location search term corresponding to the data set in each dimension comprises:

acquiring the search terms to be positioned, and inquiring to obtain a plurality of positioning retrieval items corresponding to the search terms according to the search terms;

determining the dimension relation of the search terms corresponding to the data set on each dimension according to the plurality of positioning retrieval items obtained through query;

and calculating to obtain a weight value of each positioning retrieval item corresponding to the data set in each dimension according to the dimension relation of the search term corresponding to the data set in each dimension.

5. The method of claim 4, wherein the dimensional relationship of the search term corresponding to the data set in each dimension is determined by the following calculation formula:

wherein,

a represents a data set containing any one search term in the data sets in the three dimensions, B represents a data set containing any one positioning search term t in the data sets in the three dimensions, and w is the correlation between the search term w and the search term t.

6. The method of claim 5, wherein the weight value of the data set corresponding to each dimension for each positioning search term is calculated by the following calculation formula:

t . b o o s t = \frac{Σ_{w i n T} I (w) * r (w, t)}{Σ_{w i n T} I (w)},

wherein,

the r (w, t) is the dimensional relation of the search term corresponding to the data set in each dimension, w is the correlation between the search term w and the retrieval term t, and I (w) is the word frequency of the search term in the text.

7. The method of claim 6, wherein the step of obtaining the search term to be located comprises:

after receiving a keyword input by a query user, determining the input keyword as the search word to be positioned; or,

after receiving the text input by the query user, performing word segmentation on the text, wherein at least one keyword obtained by the word segmentation is the search word to be positioned.

8. The method of claim 7, wherein calculating a behavior weight value determined by a coupling relationship between each user and the search term according to the preference score of the search term contained in the data set in each dimension and the weight value of the data set in each dimension corresponding to each positioning search term for the participle in the plurality of documents in the positioning search term comprises:

obtaining an IDF value IDF (t) of the positioning retrieval item in the user behavior data;

obtaining the highest weight value coord (q, d) of the positioning retrieval item in a plurality of documents;

normalizing the search terms queried in the same document to obtain normalized search term score queryNorm (q, d);

the positioning retrieval items are normalized in the weight values of the plurality of documents to obtain normalization values norm (t.field) of the plurality of documents;

obtaining a behavior weight value Score (q, d) determined by a coupling relation between each user and the search term through the following calculation formula:

Score(q,d)＝coord(q,d)*queryNorm(q,d)*∑_t∈qtf(t,d)*idf²(t) t.boost.field, wherein tft, d is a preference score of a retrieval item included in the data set corresponding to each dimension by the user, and t.boost is a weight value of the data set corresponding to each dimension by each positioning retrieval item.

9. The method according to claim 8, wherein the IDF value IDF (t) of the positioning search term in the user behavior data is calculated by the following calculation formula:

10. the method according to claim 8, wherein the highest weight value coord (q, d) of the positioning search term in the plurality of documents is calculated by the following calculation formula:

11. the method according to claim 8, characterized in that the normalized search term score queryNorm (q, d) is calculated by the following calculation formula:

q u e r y N o r m (q, d) = \frac{1}{\sqrt{Σ_{t &Element; q} {(i d f (t) * t . b o o s t)}^{2}}} .

12. the method of claim 8, wherein the normalization score norm (t.field) of the plurality of documents is calculated by the following calculation formula:

the field is a data set in any dimension in the access data set, and the f.

13. An apparatus for processing user behavior data, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the user behavior data comprises access data sets generated after a plurality of users access a target object, and the access data sets at least comprise data sets in the following three dimensions: a keyword set, an attribute information set and a classification information set;

the first determining unit is used for determining preference scores of retrieval items contained in a data set corresponding to each dimension by a user, wherein the data set on each dimension contains at least one retrieval item;

the second acquisition unit is used for acquiring a search term to be positioned, inquiring a plurality of positioning search items corresponding to the search term according to the search term, and acquiring a weight value of each positioning search item corresponding to the data set in each dimension;

a third obtaining unit, configured to obtain a behavior weight value determined by a coupling relationship between each user and the search term by calculating according to a preference score of a search term included in the data set in each dimension and a weight value of the data set in each dimension obtained by obtaining the positioning search term;

and the second determining unit is used for determining the user group in which the search word to be positioned is positioned according to the behavior weight value determined by the coupling relation between each user and the search word.

14. The apparatus of claim 13, wherein the first determining unit comprises:

a first obtaining module, configured to obtain at least one first search term included in the keyword set, at least one second search term included in the attribute information set, and at least one third search term included in the classification information set, respectively;

the statistical module is used for respectively counting the number of times of visit of each person of the retrieval items in the data sets on each dimension and the number of times of visit of the user for visiting the retrieval items in the data sets on each dimension;

and the first calculation module is used for calculating and obtaining preference scores of the retrieval items contained in the data sets corresponding to the dimensions of the user according to the per-person access times of the retrieval items in the data sets on the dimensions and the access times of the user to the retrieval items in the data sets on the dimensions.

15. The apparatus of claim 14, wherein the first computing module comprises:

a sub-calculation module, configured to calculate, by using the following calculation formula, a preference score tf (t, d) of a search term included in a data set corresponding to any one dimension by the user:

preference score

t f (t, d) = {Σw}_{i} * \frac{N_{i}}{n_{i}},

Wherein,

16. The apparatus of claim 15, wherein the second obtaining unit comprises:

the second acquisition module is used for acquiring the search terms to be positioned and inquiring to obtain a plurality of positioning retrieval items corresponding to the search terms according to the search terms;

the first determining module is used for determining the dimension relation of the search terms corresponding to the data set on each dimension according to the plurality of positioning retrieval items obtained by query;

and the second calculation module is used for calculating the weight value of the data set corresponding to each dimension of each positioning retrieval item according to the dimension relation of the data set corresponding to each dimension of the search term.

17. The apparatus of claim 16, further comprising:

a first calculating unit, configured to determine, by using the following calculation formula, a dimensional relationship of the search term with respect to the data set in each dimension:

wherein,

18. The apparatus of claim 17, further comprising:

a second calculating unit, configured to calculate a weight value of the data set in each dimension corresponding to each positioning search term by using the following calculation formula:

t . b o o s t = \frac{Σ_{w i n T} I (w) * r (w, t)}{Σ_{w i n T} I (w)},

wherein,

19. The apparatus of claim 18, wherein the second obtaining module comprises:

the second determination module is used for determining the input keyword as the search word to be positioned after receiving the keyword input by the query user; or,

the first processing module is used for performing word segmentation processing on the text after receiving the text input by the query user, wherein at least one keyword obtained by the word segmentation processing is the search word to be positioned.

20. The apparatus of claim 19, wherein the second determining unit comprises:

a third obtaining module, configured to obtain an IDF value IDF (t) of the positioning search item in the user behavior data;

a fourth obtaining module, configured to obtain a highest weight value coord (q, d) of the positioning search term in multiple documents;

the second processing module is used for carrying out normalization processing on the search terms queried in the same document to obtain normalized search term scores queryNorm (q, d);

the third processing module is used for carrying out normalization processing on the weight values of the plurality of documents by the positioning retrieval item to obtain the normalized values norm (t.field) of the plurality of documents;

a third calculating module, configured to obtain a behavior weight value Score (q, d) determined by a coupling relationship between each user and the search term through the following calculation formula:

Score(q,d)＝coord(q,d)*queryNorm(q,d)*∑_t∈qtf(t,d)*idf²(t)*t.boost*normt.field，and t.boost is the weight value of the data set corresponding to each dimension of each positioning retrieval item.

21. The apparatus of claim 20, further comprising:

a third calculating unit, configured to calculate an IDF value IDF (t) of the positioning search term in the user behavior data by using the following calculation formula:

22. the apparatus of claim 20, further comprising:

a fourth calculating unit, configured to calculate a highest weight value coord (q, d) of the positioning search term in the plurality of documents by using the following calculation formula:

23. the apparatus of claim 20, further comprising:

a fifth calculating unit, which calculates a normalized search word score querynom (q, d) by the following calculation formula:

q u e r y N o r m (q, d) = \frac{1}{\sqrt{Σ_{t &Element; q} {(i d f (t) * t . b o o s t)}^{2}}} .

24. the apparatus of claim 20, further comprising:

a sixth calculating unit that calculates a normalization score norm (t.field) of the plurality of documents by the following calculation formula:

the field is a data set in any dimension in the access data set, and the f.