Disclosure of Invention
The invention provides a method and a device for acquiring information keywords, aiming at the defects in the prior art and used for at least partially solving the problem of how to automatically acquire the information keywords.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a method for acquiring information keywords, which comprises the following steps:
determining a second keyword set, wherein keywords in the second keyword set are keywords of current hotspot information;
determining a third keyword set, wherein keywords in the third keyword set are common keywords of all tracked objects;
calculating a union of the second keyword set, the third keyword set and a preset first keyword set to determine an information keyword; and the keywords in the first keyword set are individual keywords of each tracked object.
Preferably, the determining the third keyword set specifically includes:
calculating the coverage of each keyword to be selected;
determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set and the second keyword set;
and determining a third key word set according to the number of key words in a preset third key word set and the first temporary set.
Preferably, the calculating the coverage of each candidate keyword specifically includes:
acquiring the number of tracking objects related to each keyword to be selected and the total number of the tracking objects;
and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects to obtain the coverage of each keyword to be selected.
Preferably, the determining a third keyword set according to the number of keywords in a preset third keyword set and the first temporary set specifically includes:
comparing the number of keywords in the first temporary set with the number of keywords in a preset third keyword set;
if the former is larger than or equal to the latter, sorting the keywords in the first temporary set from large to small according to coverage, and selecting a preset number of keywords in the sorting as elements of the third keyword set, wherein the preset number is the number of the keywords in the third keyword set;
if the former is smaller than the latter, the third set of keywords is the first temporary set.
Preferably, the determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set, and the second keyword set specifically includes:
comparing the coverage of each keyword to be selected with a preset threshold, and if the coverage of each keyword to be selected is greater than the preset threshold, taking the corresponding keyword to be selected as an element of a second temporary set;
and calculating the intersection of the second temporary set, the first keyword set and the second keyword set and negating to obtain the first temporary set.
The present invention also provides a keyword management apparatus, the apparatus comprising: the system comprises a first processing module, a second processing module and a third processing module;
the first processing module is used for determining a second keyword set, wherein keywords in the second keyword set are keywords of current hotspot information;
the second processing module is used for determining a third keyword set, wherein keywords in the third keyword set are common keywords of all tracked objects;
the third processing module is used for calculating a union of the second keyword set, the third keyword set and a preset first keyword set to determine an information intelligence keyword; and the keywords in the first keyword set are individual keywords of each tracked object.
Preferably, the second processing module is specifically configured to calculate a coverage of each keyword to be selected; determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set and the second keyword set; and determining a third key word set according to the number of key words in a preset third key word set and the first temporary set.
Preferably, the second processing module is specifically configured to obtain the number of the tracked objects related to each keyword to be selected and the total number of the tracked objects; and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects to obtain the coverage of each keyword to be selected.
Preferably, the second processing module is configured to compare the number of keywords in the first temporary set with a preset number of keywords in a third keyword set; when the former is larger than or equal to the latter, sorting the keywords in the first temporary set from large to small according to coverage, and selecting a preset number of keywords in the sorting as elements of the third keyword set, wherein the preset number is the number of the keywords in the third keyword set; when the former is smaller than the latter, the third set of keywords is the first temporary set.
Preferably, the third processing module is configured to compare the coverage of each keyword to be selected with a preset threshold, and when the coverage of each keyword is greater than the preset threshold, use the corresponding keyword to be selected as an element of a second temporary set, and calculate and negate an intersection of the second temporary set, the first keyword set, and the second keyword set, so as to obtain the first temporary set.
According to the invention, the information keywords can be quickly determined by calculating the union of the current hotspot information keyword set, the common keyword set of each tracked object and the individual keyword set of each tracked object, and the determined information keywords not only cover the current hotspot information, but also have pertinence, can meet the individual requirements of each user (namely the tracked object), and have the characteristics of multiple dimensionality and wide coverage.
Detailed Description
The technical solution of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention provides a method for acquiring information keywords, which is applied to an information resource system, wherein the information resource system comprises the following steps: the system comprises a material collecting device, a resource pool establishing device, an information resource pool, a tracking object database, a tracking object management device, a keyword database and a keyword management device, wherein the tracking object management device is used for updating the tracking object database, the keyword management device is used for updating the keyword database, the material collecting device respectively obtains a tracking object information material and a keyword information material through the tracking object management device and the keyword management device, and the resource pool establishing device encodes the tracking object information material and the keyword information material to obtain information and stores the information in the information resource pool.
In the embodiment of the invention, the tracked objects mainly comprise global mainstream operators and large Internet companies, and are stored in the tracked object database.
The intelligence information keywords include individual keywords and common keywords. The keywords are formed according to the extraction of the strategy and key work of the company, the keywords to be searched for information retrieval of each tracked object are common keywords, such as 5G, cloud computing, big data, the Internet of things and the like, and the common keywords are stored in a common keyword module of a keyword database. And combing and refining the keywords according to the current hot spot information of the tracked object to be the individual keywords of the tracked object, wherein the individual keywords are stored in an individual keyword module of a keyword database.
As shown in fig. 1, the method for acquiring information keywords comprises the following steps:
step 101, determining a second keyword set, wherein keywords in the second keyword set are keywords of current hotspot information.
Specifically, the number of keywords in the second keyword set kw2 is n2, n2 is a preset value, and n2 is greater than or equal to 0. kw2 ═ kw21,kw22,kw23,…kw2n2And each keyword in the second keyword set kw2 is a keyword of the current hotspot information and can be obtained by calculating the most active selection algorithm, wherein the most active selection algorithm is the existing algorithm and is not described herein again.
Step 102, determining a third keyword set, wherein the keywords in the third keyword set are common keywords of all tracked objects.
Specifically, the number of the keywords in the third keyword set kw3 is n3, n3 is a preset value, and n3 is greater than or equal to 0. kw3 ═ kw31,kw32,kw33,…kw3n3And each keyword in the third keyword set kw3 is a common keyword of each tracked object, and includes a hotspot vocabulary concerned by each tracked object, for example: the internet of things, block chains, big data and the like.
A specific implementation of determining the third set of keywords kw3 is described in more detail later in connection with fig. 2.
And 103, calculating a union of the second keyword set, the third keyword set and a preset first keyword set to determine the information intelligence keyword.
Specifically, the first gateThe number of key words in the key word set kw1 is n1, n1 is a preset value, and n1 is greater than or equal to 0. The total number of the information intelligence keywords is n, n is n1+ n2+ n3, n1 is more than or equal to 0 and less than or equal to n, n2 is more than or equal to 0 and less than or equal to n, and n3 is more than or equal to 0 and less than or equal to n. kw1 ═ kw11,kw12,kw13,…kw1n1And the keywords in the first keyword set kw1 are individual keywords of each tracked object, and include the long-term attention field and hot vocabulary of each tracked object, and each keyword in the first keyword set kw1 can be set by each tracked object.
The set of the finally determined information intelligence keywords is kw, kw-kw 1-kw 2-kw 3.
It can be seen fromstep 101 andstep 103 that, by calculating the union of the current hot spot information keyword set, the common keyword set of each tracked object, and the individual keyword set of each tracked object, the invention can quickly determine the information keywords, and the determined information keywords not only cover the current hot spot information, but also have pertinence, can meet the individual requirements of each user (i.e. tracked object), and has the characteristics of multiple dimensions and wide coverage.
Further, as shown in fig. 2, the determining the third keyword set (i.e. step 102) specifically includes the following steps:
step 201, calculating the coverage of each keyword to be selected.
Specifically, the number T1 of the tracked objects related to each candidate keyword is obtainediAnd the total number T of the tracked objects, and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects so as to obtain the coverage q of each keyword to be selectediI.e. qi=T1iand/T, wherein i represents a key to be selected.
The tracking object related to each candidate keyword refers to a tracking object which focuses on the candidate keyword, that is, a tracking object which selects the candidate keyword as a common keyword and/or an individual keyword.
Step 202, according to the coverage q of each keyword to be selectediA preset threshold Q, a first keyword set kw1 and a second keyword setIn total kw2, a first temporary set temp1kw is determined.
Specifically, the process of determining the first temporary set temp1kw is described in detail later with reference to fig. 4.
Step 203, determining a third keyword set kw3 according to the number n3 of the keywords in the preset third keyword set kw3 and the first temporary set temp1 kw.
Specifically, the keywords in the first temporary set temp1kw may be final information intelligence keywords, i.e., the third set of keywords kw3 is the same as the first temporary set temp1 kw. The keywords in the first temporary set temp1kw may be different from the final information intelligence keywords, i.e. the range of the first temporary set temp1kw is larger than the range of the third set kw 3.
The scheme of how to determine the third set of keywords kw3 is described in detail later with reference to fig. 3.
As can be seen fromstep 201 and 203, the coverage q of each keyword to be selected is determinediAs a criterion for determining the keywords in the third keyword set kw3, keywords with high coverage and wide coverage may be selected, so that different requirements of each tracked object can be covered.
The process of determining the third set of keywords kw3 (i.e., step 203) is described in detail below with reference to fig. 3. As shown in fig. 3, the process of determining the third keyword set kw3 includes the following steps:
step 301, comparing the number of keywords in the first temporary set with the number of keywords in a preset third keyword set, and if the former is greater than the latter, executingstep 302; otherwise,step 304 is performed.
Specifically, assuming that the number of keywords in the first temporary set temp1kw is n ', n ' is compared with the number n3 of keywords in the third set kw3, and if n ' > n3, it indicates that the number of keywords in the first temporary set temp1kw is greater than the required number of keywords in the third set kw3, at this time, a more suitable keyword needs to be further selected from the first temporary set temp1kw and placed in the third set kw3 (i.e.,step 302 and step 303 are executed); if n' is less than or equal to n3, it means that the number of keywords in the first temporary set temp1kw is less than or equal to the required number of keywords in the third keyword set kw3, at this time, all the keywords in the first temporary set temp1kw are placed in the third keyword set kw3 (i.e.,step 304 is executed).
Step 302, sorting the keywords in the first temporary set from large to small according to coverage.
Specifically, n' keywords in the first temporary set temp1kw are arranged according to the coverage qiSorting from big to small, wherein the coverage of each keyword qiCalculated instep 201.
Step 303, selecting a preset number of keywords in the sequence as elements of a third keyword set.
Specifically, the preset number is the number n3 of keywords in the third keyword set, that is, in the coverage ranking, the first n3 keywords are selected to form the third keyword set kw 3.
Step 304, the third set of keywords is the first temporary set.
Specifically, if the number n' of keywords in the first temporary set temp1kw does not reach the number of keywords required by the third set kw3, the entire first temporary set temp1kw is used as the third set kw 3.
As can be seen fromsteps 301 to 303, by filtering the keywords in the first temporary set temp1kw, the information intelligence keywords (i.e. the third keyword set kw3) thus selected have a larger coverage and a wider coverage.
The process of determining the first temporary set temp1kw (i.e., step 202) is described in detail below with reference to fig. 4. As shown in fig. 3, the process of determining the first temporary set temp1kw includes the following steps:
step 401, comparing the coverage of each keyword to be selected with a preset threshold, and if the coverage of each keyword to be selected is greater than the preset threshold, executingstep 402; otherwise, discarding the candidate keyword.
Specifically, a threshold Q is preset, and the coverage Q of each keyword to be selected is determinediRespectively compared with a threshold value Q if the coverage of the key word to be selectedDegree q ofi>Q, if the keyword to be selected is qualified, putting the keyword to be selected into the second temporary set temp2kw (i.e. executing step 402); if the coverage of the key word to be selected is qiAnd if the value is less than or equal to Q, the keyword to be selected is not qualified, and the keyword to be selected is discarded.
Step 402, using the corresponding candidate keyword as an element of the second temporary set.
Step 403, calculating and negating the intersection of the second temporary set, the first keyword set and the second keyword set to obtain the first temporary set.
In particular, the first temporary set
In this way, the same keys in the second temporary set temp2kw, the first key set kw1 and the second key set kw2 may be excluded. Keyword duplication within the third set of keywords kw3 is avoided when subsequently determining the third set of keywords kw3 from the first temporary set temp1 kw.
Based on the same technical concept, an embodiment of the present invention further provides a keyword management apparatus, as shown in fig. 5, the keyword management apparatus includes: afirst processing module 51, asecond processing module 52 and athird processing module 53.
Thefirst processing module 51 is configured to determine a second keyword set, where keywords in the second keyword set are keywords of current hotspot information.
Thesecond processing module 52 is configured to determine a third keyword set, where the keywords in the third keyword set are common keywords of each tracked object.
Thethird processing module 53 is configured to calculate a union of the second keyword set, the third keyword set, and a preset first keyword set to determine an information intelligence keyword; and the keywords in the first keyword set are individual keywords of each tracked object.
Preferably, thesecond processing module 52 is specifically configured to calculate a coverage of each keyword to be selected; determining a first temporary set according to the coverage of each keyword to be selected, a preset threshold, the first keyword set and the second keyword set; and determining a third key word set according to the number of key words in a preset third key word set and the first temporary set.
Preferably, thesecond processing module 52 is specifically configured to obtain the number of the tracked objects related to each keyword to be selected, and the total number of the tracked objects; and respectively calculating the ratio of the number of the tracked objects related to each keyword to be selected to the total number of the tracked objects to obtain the coverage of each keyword to be selected.
Preferably, thesecond processing module 52 is configured to compare the number of keywords in the first temporary set with the number of keywords in a preset third keyword set; when the former is larger than or equal to the latter, sorting the keywords in the first temporary set from large to small according to coverage, and selecting a preset number of keywords in the sorting as elements of the third keyword set, wherein the preset number is the number of the keywords in the third keyword set; when the former is smaller than the latter, the third set of keywords is the first temporary set.
Preferably, thethird processing module 53 is configured to compare the coverage of each keyword to be selected with a preset threshold, and when the coverage of each keyword is greater than the preset threshold, use the corresponding keyword to be selected as an element of the second temporary set, and calculate and negate an intersection of the second temporary set, the first keyword set, and the second keyword set, so as to obtain the first temporary set.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.