Summary of the invention
In view of this, embodiments providing a kind of acquisition methods and equipment of affair character, for solving when processing mass data, obtaining the accurate not problem of customized information of interested event.
First aspect, provides a kind of acquisition methods of affair character, comprising:
Obtaining the Feature Words set for describing object event, wherein, in described Feature Words set, comprising multiple Feature Words;
From the described Feature Words set obtained, determine at least one Feature Words of the attribute describing described object event;
For each Feature Words determined, in residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined;
According to obtaining at least one group of corresponding relation, obtain the feature of described object event.
In conjunction with first aspect, in the implementation that the first is possible, described method also comprises:
Set up the mapping relations between the feature of described object event and at least one group of corresponding relation obtained.
In conjunction with first aspect, or in conjunction with the first possible implementation of first aspect, in the implementation that the second is possible, from the described multiple Feature Words obtained, determine at least one Feature Words of the attribute describing described object event, comprising:
For the described Feature Words set obtained, perform following operation, until to determine in described Feature Words set all for describing the Feature Words of the attribute of described object event:
Select any one Feature Words;
Determine the context of this Feature Words in original document selected; And according to described context, judge that whether this Feature Words is the Feature Words of the attribute for describing described object event;
If this Feature Words that judged result is selection is the Feature Words of the attribute for describing described object event, then this Feature Words is labeled as the Feature Words of the attribute for describing described object event, and selects next Feature Words, continue to perform aforesaid operations;
If this Feature Words that judged result is selection is not the Feature Words of the attribute for describing described object event, then selecting next Feature Words, continuing to perform aforesaid operations.
In conjunction with the implementation that the second of first aspect is possible, in the implementation that the third is possible, according to described context, judging that whether this Feature Words is the Feature Words of the attribute for describing described object event, comprising:
According to described context, by grammatical analysis and syntactic analysis, determine that whether this Feature Words is the centre word of described context;
If determine, this Feature Words is the centre word of described context, then determine that this Feature Words is the Feature Words of the attribute for describing described object event;
If determine, this Feature Words is not the centre word of described context, then determine that this Feature Words is not the Feature Words of the attribute for describing described object event.
In conjunction with the implementation that the second of first aspect is possible, or in conjunction with the third possible implementation of first aspect, in the 4th kind of possible implementation, to determine in described Feature Words set all for the attribute of described object event described Feature Words after, described method also comprises:
Judge to determine in described Feature Words set all for the attribute of described object event described Feature Words in whether there is synonym;
When there is synonym in judged result, from meet synonym condition multiple attributes for describing described object event Feature Words select a Feature Words, as the Feature Words of the attribute of the described object event described by the multiple Feature Words meeting synonym condition.
In conjunction with first aspect, or in conjunction with the first possible implementation of first aspect, or in conjunction with the implementation that the second of first aspect is possible, or in conjunction with the third possible implementation of first aspect, or in conjunction with the 4th kind of possible implementation of first aspect, in the 5th kind of possible implementation, in residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, comprising:
In residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, select a Feature Words;
For one that determines for describing the Feature Words of the attribute of described object event, according to semantic rules, judge that whether this Feature Words selected is the hyponym of the Feature Words that this is determined;
If hyponym, then determine the particular content of the attribute of the described object event of this Feature Words described by this Feature Words determined selected.
In conjunction with first aspect, or in conjunction with the first possible implementation of first aspect, or in conjunction with the implementation that the second of first aspect is possible, or in conjunction with the third possible implementation of first aspect, or in conjunction with the 4th kind of possible implementation of first aspect, or in conjunction with the 5th kind of possible implementation of first aspect, in the 6th kind of possible implementation, obtaining the Feature Words set for describing object event, comprising:
When processing mass data, from mass data, obtain the multiple Feature Words for describing object event by cluster analysis mode;
The multiple Feature Words obtained are combined the Feature Words set obtained for describing object event.
In conjunction with first aspect, or in conjunction with the first possible implementation of first aspect, or in conjunction with the implementation that the second of first aspect is possible, or in conjunction with the third possible implementation of first aspect, or in conjunction with the 4th kind of possible implementation of first aspect, or in conjunction with the 5th kind of possible implementation of first aspect, or in conjunction with the 6th kind of possible implementation of first aspect, in the 7th kind of possible implementation, according to obtaining after at least one group of corresponding relation obtain the feature of described object event, described method also comprises:
The feature of the described object event relatively obtained and the feature of the described object event preset;
According to comparative result, determine comprise in the feature of the described object event obtained for describing in the attribute of object event, with the attribute not identical for the attribute describing object event that comprise in the feature of the described object event preset;
Using the not identical attribute the determined newly-increased attribute as described object event.
Second aspect, provides a kind of acquisition equipment of affair character, comprising:
Acquisition module, for obtaining the Feature Words set for describing object event, wherein, comprises multiple Feature Words in described Feature Words set;
Determination module, for from the described Feature Words set obtained, determines at least one Feature Words of the attribute describing described object event;
Abstraction module, for for each Feature Words determined, in residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined; According to obtaining at least one group of corresponding relation, obtain the feature of described object event.
In conjunction with second aspect, in the implementation that the first is possible, described acquisition equipment also comprises:
Set up module, for setting up the mapping relations between the feature of described object event and at least one group of corresponding relation obtained.
In conjunction with second aspect, or in conjunction with the first possible implementation of second aspect, in the implementation that the second is possible, described determination module, specifically for the described Feature Words set for acquisition, perform following operation, until to determine in described Feature Words set all for describing the Feature Words of the attribute of described object event:
Select any one Feature Words;
Determine the context of this Feature Words in original document selected; And according to described context, judge that whether this Feature Words is the Feature Words of the attribute for describing described object event;
If this Feature Words that judged result is selection is the Feature Words of the attribute for describing described object event, then this Feature Words is labeled as the Feature Words of the attribute for describing described object event, and selects next Feature Words, continue to perform aforesaid operations;
If this Feature Words that judged result is selection is not the Feature Words of the attribute for describing described object event, then selecting next Feature Words, continuing to perform aforesaid operations.
In conjunction with the implementation that the second of second aspect is possible, in the implementation that the third is possible, described determination module, specifically for according to described context, by grammatical analysis and syntactic analysis, determine that whether this Feature Words is the centre word of described context;
If determine, this Feature Words is the centre word of described context, then determine that this Feature Words is the Feature Words of the attribute for describing described object event;
If determine, this Feature Words is not the centre word of described context, then determine that this Feature Words is not the Feature Words of the attribute for describing described object event.
In conjunction with the implementation that the second of second aspect is possible, or in conjunction with the third possible implementation of second aspect, in the 4th kind of possible implementation, described acquisition equipment also comprises: judge module, wherein:
Described judge module, for to determine in described Feature Words set all for the attribute of described object event described Feature Words after, to judge to determine in described Feature Words set all for the attribute of described object event described Feature Words in whether there is synonym;
When there is synonym in judged result, from meet synonym condition multiple attributes for describing described object event Feature Words select a Feature Words, as the Feature Words of the attribute of the described object event described by the multiple Feature Words meeting synonym condition.
In conjunction with second aspect, or in conjunction with the first possible implementation of second aspect, or in conjunction with the implementation that the second of second aspect is possible, or in conjunction with the third possible implementation of second aspect, or in conjunction with the 4th kind of possible implementation of second aspect, in the 5th kind of possible implementation, described abstraction module, specifically in the residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, select a Feature Words;
For one that determines for describing the Feature Words of the attribute of described object event, according to semantic rules, judge that whether this Feature Words selected is the hyponym of the Feature Words that this is determined;
If hyponym, then determine the particular content of the attribute of the described object event of this Feature Words described by this Feature Words determined selected.
In conjunction with second aspect, or in conjunction with the first possible implementation of second aspect, or in conjunction with the implementation that the second of second aspect is possible, or in conjunction with the third possible implementation of second aspect, or in conjunction with the 4th kind of possible implementation of second aspect, or in conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation, described acquisition module, specifically for when processing mass data, from mass data, obtain the multiple Feature Words for describing object event by cluster analysis mode;
The multiple Feature Words obtained are combined the Feature Words set obtained for describing object event.
In conjunction with second aspect, or in conjunction with the first possible implementation of second aspect, or in conjunction with the implementation that the second of second aspect is possible, or in conjunction with the third possible implementation of second aspect, or in conjunction with the 4th kind of possible implementation of second aspect, or in conjunction with the 5th kind of possible implementation of second aspect, or in conjunction with the 6th kind of possible implementation of second aspect, in the 7th kind of possible implementation, described acquisition equipment also comprises: comparison module, wherein:
Described comparison module, for after at least one group of corresponding relation obtain the feature of described object event, comparing the feature of the described object event obtained and the feature of the described object event preset according to obtaining;
According to comparative result, determine comprise in the feature of the described object event obtained for describing in the attribute of object event, with the attribute not identical for the attribute describing object event that comprise in the feature of the described object event preset;
Using the not identical attribute the determined newly-increased attribute as described object event.
Beneficial effect of the present invention is as follows:
The embodiment of the present invention obtains the Feature Words set for describing object event, comprises multiple Feature Words in described Feature Words set, from the described Feature Words set obtained, determine at least one Feature Words of the attribute describing described object event, for each Feature Words determined, in residue character word from described Feature Words set except the Feature Words for describing described object event attribute, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined, according to obtaining at least one group of corresponding relation, obtain the feature of described object event, like this, for the multiple Feature Words for describing any one event of magnanimity, dynamically determine the Feature Words of the Feature Words of the attribute describing this event and the particular content for the attribute that describes this event, and set up the Feature Words that determines the attribute describing this event and for the particular content of the attribute that describes this event Feature Words between corresponding relation, by the many groups corresponding relation obtained, determine the feature of object event, contribute to fullying understand this object event, improve the precision obtaining object event customized information, for this object event of follow-up quick position lays the foundation.
Embodiment
In order to realize object of the present invention, embodiments providing a kind of acquisition methods and equipment of affair character, obtaining the Feature Words set for describing object event, in described Feature Words set, comprising multiple Feature Words, from the described Feature Words set obtained, determine at least one Feature Words of the attribute describing described object event, for each Feature Words determined, in residue character word from described Feature Words set except the Feature Words for describing described object event attribute, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined, according to obtaining at least one group of corresponding relation, obtain the feature of described object event, like this, for the multiple Feature Words for describing any one event of magnanimity, dynamically determine the Feature Words of the Feature Words of the attribute describing this event and the particular content for the attribute that describes this event, and set up the Feature Words that determines the attribute describing this event and for the particular content of the attribute that describes this event Feature Words between corresponding relation, by the many groups corresponding relation obtained, determine the feature of object event, contribute to fullying understand this object event, improve the precision obtaining object event customized information, for this object event of follow-up quick position lays the foundation.
Below in conjunction with Figure of description, each embodiment of the present invention is described in further detail.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.
The schematic flow sheet of the acquisition methods of a kind of affair character that Fig. 1 provides for the embodiment of the present invention.Described method can be as described below.
Step 101: obtain the Feature Words set for describing object event.
Wherein, multiple Feature Words is comprised in described Feature Words set.
In a step 101, when processing mass data, by the mode of cluster analysis, from mass data, obtain the multiple Feature Words for describing object event; The multiple Feature Words obtained are combined the Feature Words set obtained for describing object event.
Wherein, the mode of cluster analysis at least comprises: based on clustering algorithm (English: k-Means algorithm), the implicit Dirichlet distribute (English: Latent Dirichlet Allocation of distance; Abbreviation: a kind of LDA).
Or, according to the written material of the event of description, from this written material, arranging out multiple Feature Words of this event of description, the multiple Feature Words arranged out being combined the Feature Words set obtained for describing this event.
It should be noted that, the Feature Words that Feature Words set comprises can be participle, and this participle by participle software (such as: Chinese lexical analysis system is (English: Institute of Computing TechnologyChinese Lexical Analysis System; Abbreviation: ICTCLAS) etc.) process obtain; Can be phrase, this phrase be by obtaining through Shallow Semantic Parsing, chunk parsing text; Can also be named entity, such as: except the named entity such as mechanism's name, place name, name of traditional named entity recognition, be also included within the named entity in restriction field, song title, singer, concert name etc.
Such as: the Feature Words set obtained for describing singing event is: { music, program, China, Liu Dehua, song, film, performance, next life edge, singer, performance, art, model, performer, participate in, young, satellite TV, dancing, contest, party, idol, sing, hold, Beijing, international, age, creation, the lyrics, concert, the U.S., birthday, great master, represent, hold, theme, Hong Kong, welcome guest, broadcast, artist, attract, glamour, national, broadcast, classical, guitar, sing, moulding, the popular feeling, make, epoch }.
The Feature Words set obtained for describing mobile phone is: perfection, water-proof function, IP degree of protection, IP55, brand, operating system, screen, battery, continuation of the journey, price, outward appearance, performance, take pictures, tonequality, lovely, small and exquisite, very comfortable, beautiful, comfortable, clear, durable, economical, external form, fast, very cheap, simple, intact, very large, fine and smooth, bright, ultra-thin, not all right, heating, sharp.
Step 102: from the described Feature Words set obtained, determine at least one Feature Words of the attribute describing described object event.
In a step 102, for the described Feature Words set obtained, perform following operation, until to determine in described Feature Words set all for describing the Feature Words of the attribute of described object event:
Select any one Feature Words;
Determine the context of this Feature Words in original document selected; And according to described context, judge that whether this Feature Words is the Feature Words of the attribute for describing described object event;
If this Feature Words that judged result is selection is the Feature Words of the attribute for describing described object event, then this Feature Words is labeled as the Feature Words of the attribute for describing described object event, and selects next Feature Words, continue to perform aforesaid operations;
If this Feature Words that judged result is selection is not the Feature Words of the attribute for describing described object event, then selecting next Feature Words, continuing to perform aforesaid operations.
Particularly, according to described context, judging that whether this Feature Words is the Feature Words of the attribute for describing described object event, comprising:
According to described context, by grammatical analysis and syntactic analysis, determine that whether this Feature Words is the centre word of described context;
If determine, this Feature Words is the centre word of described context, then determine that this Feature Words is the Feature Words of the attribute for describing described object event;
If determine, this Feature Words is not the centre word of described context, then determine that this Feature Words is not the Feature Words of the attribute for describing described object event.
Particularly, according to the context at this phrase place, by the grammatical analysis of context and syntactic analysis, judge word centered by this phrase whether, if word centered by the phrase obtained, so determine that this phrase belongs to the Feature Words of the attribute for describing event.
Such as: for the Feature Words " singer " for describing in the Feature Words set of singing event, from in the context of original document " Pekinese's concert; have the singer from Hong-Kong; wherein have the song that everybody likes very much: next life edge ", analyzing phrase " singer of Hong-Kong " is noun phrase, " Hong-Kong " in phrase is modified " singer ", the centre word that " singer " is phrase, therefore, " singer " is the Feature Words of the attribute for describing singing event; Analyze word centered by " song " that qualifier " likes " below, therefore, " song " is the Feature Words of the attribute for describing singing event;
Adopt in the same way, for the Feature Words for describing in the Feature Words set of mobile phone, the Feature Words obtained for describing mobile phone attribute at least comprises: price, outward appearance, screen, battery etc.
Alternatively, when determining in described Feature Words set all for describing the Feature Words of the attribute of described object event, described method also comprises:
Judge to determine in described Feature Words set all for the attribute of described object event described Feature Words in whether there is synonym;
Judged result be to determine in described Feature Words set all for the attribute of described object event described Feature Words in there is synonym time, from meet synonym condition multiple attributes for describing described object event Feature Words select a Feature Words, as the Feature Words of the attribute of the described object event described by the multiple Feature Words meeting synonym condition.
For the Feature Words for describing in the Feature Words set of singing event, the Feature Words obtaining the attribute for describing singing event at least comprises: program, song, performer, artist, singer.
Wherein, performer, artist, singer meet synonym condition, and so from performer, artist, singer, select a Feature Words, such as: performer, the Feature Words so obtaining the attribute for describing singing event at least comprises: program, song, performer.
Step 103: for each Feature Words determined, in residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined.
In step 103, in the residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, select a Feature Words;
For one that determines for describing the Feature Words of the attribute of described object event, according to semantic rules, judge that whether this Feature Words selected is the hyponym of the Feature Words that this is determined;
If hyponym, then determine the particular content of the attribute of the described object event of this Feature Words described by this Feature Words determined selected.
Particularly, can judge by semantic knowledge-base (such as: wordNet or HowNet) when whether this Feature Words judging to select is this for describing the subordinate concept of the Feature Words of the attribute of described object event.Such as: in music field, the title of song is the subordinate concept of song.
For in the residue character word in the Feature Words set for describing singing event except the Feature Words of the attribute for describing singing event, the Feature Words of the particular content of the attribute for describing singing event that the Feature Words " song " for describing the attribute of singing event is corresponding is " edge in next life "; Feature Words for the particular content describing the attribute for describing singing event of Feature Words " program " correspondence of the attribute of singing event is " concert "; Feature Words for the particular content describing the attribute for describing singing event of Feature Words " performer " correspondence of the attribute of singing event is " XXX ".
Again such as: in the residue character word in the Feature Words set for describing mobile phone except the Feature Words for describing mobile phone attribute, the Feature Words for the particular content describing the attribute for describing mobile phone of Feature Words " price " correspondence of the attribute of mobile phone is " cheaply "; Feature Words for the particular content describing the attribute for describing mobile phone of Feature Words " outward appearance " correspondence of the attribute of mobile phone is " beautiful, ultra-thin "; Feature Words for the particular content describing the attribute for describing mobile phone of Feature Words " screen " correspondence of the attribute of mobile phone is " bright "; Feature Words for the particular content describing the attribute for describing mobile phone of Feature Words " battery " correspondence of the attribute of mobile phone is " heating "; Feature Words for the particular content describing the attribute for describing mobile phone of Feature Words " water-proof function " correspondence of the attribute of mobile phone is " IP55 "; " take pictures " Feature Words of particular content of the corresponding attribute for describing mobile phone of Feature Words for describing the attribute of mobile phone is " sharp ".
When obtaining at least one Feature Words of particular content of the attribute for describing described object event corresponding to each Feature Words for the attribute describing described object event, set up the attribute for describing described object event Feature Words and for the particular content of the attribute that describes described object event at least one Feature Words between corresponding relation.
Step 104: according to obtaining at least one group of corresponding relation, obtain the feature of described object event.
At step 104, when obtaining the feature of described object event, set up the mapping relations between the feature of described object event and at least one group of corresponding relation obtained.
Alternatively, according to obtain at least one group of corresponding relation obtain the feature of described object event time, described method also comprises:
The feature of the described object event relatively obtained and the feature of the described object event preset;
According to comparative result, determine comprise in the feature of the described object event obtained for describing in the attribute of object event, with the attribute not identical for the attribute describing object event that comprise in the feature of the described object event preset;
Using the not identical attribute the determined newly-increased attribute as described object event.
After obtaining the corresponding relation between the attribute of object event and the particular content of attribute, when receiving the searching request that user sends, according to the attribute of the event identifier to be searched comprised in described searching request with this event to be searched, utilize the corresponding relation between the attribute of the event stored and the particular content of attribute, determine the particular content of the attribute of event to be searched, and the particular content of the attribute of the event to be searched determined is sent to user, make user can understand this event fast.
By the scheme of the embodiment of the present invention, obtaining the Feature Words set for describing object event, in described Feature Words set, comprising multiple Feature Words, from the described Feature Words set obtained, determine at least one Feature Words of the attribute describing described object event, for each Feature Words determined, in residue character word from described Feature Words set except the Feature Words for describing described object event attribute, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined, according to obtaining at least one group of corresponding relation, obtain the feature of described object event, like this, for the multiple Feature Words for describing any one event of magnanimity, dynamically determine the Feature Words of the Feature Words of the attribute describing this event and the particular content for the attribute that describes this event, and set up the Feature Words that determines the attribute describing this event and for the particular content of the attribute that describes this event Feature Words between corresponding relation, by the many groups corresponding relation obtained, determine the feature of object event, contribute to fullying understand this object event, improve the precision obtaining object event customized information, for this object event of follow-up quick position lays the foundation.
The structural representation of the acquisition equipment of a kind of affair character that Fig. 2 provides for the embodiment of the present invention.Described acquisition equipment comprises: acquisition module 21, determination module 22 and abstraction module 23, wherein:
Acquisition module 21, for obtaining the Feature Words set for describing object event, wherein, comprises multiple Feature Words in described Feature Words set;
Determination module 22, for from the described Feature Words set obtained, determines at least one Feature Words of the attribute describing described object event;
Abstraction module 23, for for each Feature Words determined, in residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined; According to obtaining at least one group of corresponding relation, obtain the feature of described object event.
Alternatively, described acquisition equipment also comprises: set up module 24, wherein:
Set up module 24, for setting up the mapping relations between the feature of described object event and at least one group of corresponding relation obtained.
Particularly, described determination module 22, specifically for for the described Feature Words set obtained, performs following operation, until to determine in described Feature Words set all for describing the Feature Words of the attribute of described object event:
Select any one Feature Words;
Determine the context of this Feature Words in original document selected; And according to described context, judge that whether this Feature Words is the Feature Words of the attribute for describing described object event;
If this Feature Words that judged result is selection is the Feature Words of the attribute for describing described object event, then this Feature Words is labeled as the Feature Words of the attribute for describing described object event, and selects next Feature Words, continue to perform aforesaid operations;
If this Feature Words that judged result is selection is not the Feature Words of the attribute for describing described object event, then selecting next Feature Words, continuing to perform aforesaid operations.
Particularly, described determination module 22, specifically for according to described context, by grammatical analysis and syntactic analysis, determines that whether this Feature Words is the centre word of described context;
If determine, this Feature Words is the centre word of described context, then determine that this Feature Words is the Feature Words of the attribute for describing described object event;
If determine, this Feature Words is not the centre word of described context, then determine that this Feature Words is not the Feature Words of the attribute for describing described object event.
Particularly, described acquisition equipment also comprises: judge module 25, wherein:
Described judge module 25, for to determine in described Feature Words set all for the attribute of described object event described Feature Words after, to judge to determine in described Feature Words set all for the attribute of described object event described Feature Words in whether there is synonym;
When there is synonym in judged result, from meet synonym condition multiple attributes for describing described object event Feature Words select a Feature Words, as the Feature Words of the attribute of the described object event described by the multiple Feature Words meeting synonym condition.
Particularly, described abstraction module 23, specifically in the residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, selects a Feature Words;
For one that determines for describing the Feature Words of the attribute of described object event, according to semantic rules, judge that whether this Feature Words selected is the hyponym of the Feature Words that this is determined;
If hyponym, then determine the particular content of the attribute of the described object event of this Feature Words described by this Feature Words determined selected.
Particularly, described acquisition module 21, specifically for when processing mass data, obtains the multiple Feature Words for describing object event by cluster analysis mode from mass data;
The multiple Feature Words obtained are combined the Feature Words set obtained for describing object event.
Alternatively, described acquisition equipment also comprises: comparison module 26, wherein:
Described comparison module 26, for after at least one group of corresponding relation obtain the feature of described object event, comparing the feature of the described object event obtained and the feature of the described object event preset according to obtaining;
According to comparative result, determine comprise in the feature of the described object event obtained for describing in the attribute of object event, with the attribute not identical for the attribute describing object event that comprise in the feature of the described object event preset;
Using the not identical attribute the determined newly-increased attribute as described object event.
Acquisition equipment described in the embodiment of the present invention, can be realized by hardware mode, also can be realized by software mode.For the multiple Feature Words for describing any one event of magnanimity, dynamically determine the Feature Words of the Feature Words of the attribute describing this event and the particular content for the attribute that describes this event, and set up the Feature Words that determines the attribute describing this event and for the particular content of the attribute that describes this event Feature Words between corresponding relation, by the many groups corresponding relation obtained, determine the feature of object event, contribute to fullying understand this object event, improve the precision obtaining object event customized information, for this object event of follow-up quick position lays the foundation.
The structural representation of the acquisition equipment of a kind of affair character that Fig. 3 provides for the embodiment of the present invention.Described acquisition equipment possesses the function of foregoing description, can adopt universal computer architecture.Described acquisition equipment comprises processor 31, interface 32 and storer 33.Processor 31 is connected with network interface 32, and is connected with storer 33.Such as bus couple processor 31, interface 32 and storer 33 can be passed through.Wherein:
Processor 31 can be central processing unit (English: central processing unit, abbreviation: CPU), or the combination of CPU and hardware chip.
Interface 32 can for following one or more: the network interface controller providing line interface is (English: network interface controller, abbreviation: NIC), such as Ethernet NIC, this Ethernet NIC can provide copper cash and/or optical fiber interface; There is provided the NIC of wave point, such as WLAN (wireless local area network) (English: wireless local area network, abbreviation: WLAN) NIC.
Storer 33 is for program code stored, and described processor 31 obtains the program code of storage from storer, performs correspondingly process according to the programmatic agent obtained.
Storer 33 can be that (English: volatile memory), such as (English: random-access memory, abridges: RAM) random access memory volatile memory; Or nonvolatile memory is (English: non-volatile memory), such as ROM (read-only memory) is (English: read-only memory, abbreviation: ROM), flash memory is (English: flash memory), hard disk is (English: hard disk drive, abbreviation: HDD) or solid state hard disc (English: solid-state drive, abbreviation: SSD); Or the combination of the storer of mentioned kind.Storer 33 can also comprise Content Addressable Memory (English: content-addressable memory, abbreviation: CAM).
Particularly, described processor 31 performs the program deposited in described storer 33, performs following operation:
Obtaining the Feature Words set for describing object event, wherein, in described Feature Words set, comprising multiple Feature Words;
From the described Feature Words set obtained, determine at least one Feature Words of the attribute describing described object event;
For each Feature Words determined, in residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, extract at least one Feature Words of the particular content of the attribute that this Feature Words identifies, and set up the corresponding relation between this Feature Words and at least one Feature Words of extraction determined;
According to obtaining at least one group of corresponding relation, obtain the feature of described object event.
Alternatively, described processor 31, also for performing:
Set up the mapping relations between the feature of described object event and at least one group of corresponding relation obtained.
Particularly, described processor 31, from the described multiple Feature Words obtained, determines at least one Feature Words of the attribute describing described object event, comprising:
For the described Feature Words set obtained, perform following operation, until to determine in described Feature Words set all for describing the Feature Words of the attribute of described object event:
Select any one Feature Words;
Determine the context of this Feature Words in original document selected; And according to described context, judge that whether this Feature Words is the Feature Words of the attribute for describing described object event;
If this Feature Words that judged result is selection is the Feature Words of the attribute for describing described object event, then this Feature Words is labeled as the Feature Words of the attribute for describing described object event, and selects next Feature Words, continue to perform aforesaid operations;
If this Feature Words that judged result is selection is not the Feature Words of the attribute for describing described object event, then selecting next Feature Words, continuing to perform aforesaid operations.
Particularly, described processor 31, according to described context, judging that whether this Feature Words is the Feature Words of the attribute for describing described object event, comprising:
According to described context, by grammatical analysis and syntactic analysis, determine that whether this Feature Words is the centre word of described context;
If determine, this Feature Words is the centre word of described context, then determine that this Feature Words is the Feature Words of the attribute for describing described object event;
If determine, this Feature Words is not the centre word of described context, then determine that this Feature Words is not the Feature Words of the attribute for describing described object event.
Particularly, described processor 31 to determine in described Feature Words set all for the attribute of described object event described Feature Words after, described method also comprises:
Judge to determine in described Feature Words set all for the attribute of described object event described Feature Words in whether there is synonym;
When there is synonym in judged result, from meet synonym condition multiple attributes for describing described object event Feature Words select a Feature Words, as the Feature Words of the attribute of the described object event described by the multiple Feature Words meeting synonym condition.
Particularly, in the residue character word of described processor 31 from described Feature Words set except the Feature Words of the attribute for describing described object event, extracting at least one Feature Words of the particular content of the attribute that this Feature Words identifies, comprising:
In residue character word from described Feature Words set except the Feature Words of the attribute for describing described object event, select a Feature Words;
For one that determines for describing the Feature Words of the attribute of described object event, according to semantic rules, judge that whether this Feature Words selected is the hyponym of the Feature Words that this is determined;
If hyponym, then determine the particular content of the attribute of the described object event of this Feature Words described by this Feature Words determined selected.
Particularly, described processor 31 obtains the Feature Words set for describing object event, comprising:
When processing mass data, from mass data, obtain the multiple Feature Words for describing object event by cluster analysis mode;
The multiple Feature Words obtained are combined the Feature Words set obtained for describing object event.
Particularly, described processor 31 is according to obtaining after at least one group of corresponding relation obtain the feature of described object event, and described method also comprises:
The feature of the described object event relatively obtained and the feature of the described object event preset;
According to comparative result, determine comprise in the feature of the described object event obtained for describing in the attribute of object event, with the attribute not identical for the attribute describing object event that comprise in the feature of the described object event preset;
Using the not identical attribute the determined newly-increased attribute as described object event.
It will be understood by those skilled in the art that embodiments of the invention can be provided as method, device (equipment) or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, device (equipment) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Although describe the preferred embodiments of the present invention, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the scope of the invention.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.