Specific embodiment
The application embodiment is described in further detail below in conjunction with attached drawing.
In technical solution provided by the embodiments of the present application, by building extend dictionary, generate content initial labels itAfterwards, the extension tag that content is generated in conjunction with initial labels and extension dictionary, so that the extension to number of labels is realized, so that interiorThe label of appearance is more abundant.
In the embodiment of the present application, " label " is the word for referring to embody the feature of content.In addition, the embodiment of the present application" content " referred to, can be the media resources such as video, music, novel.By taking video as an example, may include film, TV play,Variety, sports cast, animation etc..In the embodiment of the present application, mainly by taking content is video as an example, to technical scheme intoRow is introduced and explanation.For other types of content, technical scheme is for solving the above problems to realize to number of tagsThe extension of amount is equally applicable.
Referring to FIG. 1, the schematic diagram of the implementation environment provided it illustrates the application one embodiment.The implementation environment canTo include: terminal 10 and server 20.
Terminal 10 can be such as mobile phone, tablet computer, E-book reader, multimedia play equipment, wearable device,The electronic equipments such as PC (Personal Computer, personal computer).Browser or application program can be installed in terminal 10Client is obtained content by browser or application client and is shown from server 20.
Server 20 is used to provide content for terminal 10.For example, server 20 can be website for providing content orThe background server of person's application program.Server 20 can be a server, be also possible to the clothes being made of multiple serversBusiness device cluster or a cloud computing service center.
It can be communicated with each other by network 30 between terminal 10 and server 20.The network 30 can be cable network,It is also possible to wireless network.
In a kind of possible application scenarios, the label of content can be synchronized and be shown when showing content by terminal 10, so as toUser understands the core point of content based on label.In alternatively possible application scenarios, terminal 10 is supported based on labelContent search function.User inputs after search key in terminal 10, and terminal 10 will have to be consistent with the search keyThe content of label be supplied to user as search result.Certainly, above-mentioned is only to list two kinds of typical cases about content tabApplication scenarios, for other possible application scenarios, the embodiment of the present application is not repeated.
Referring to FIG. 2, it illustrates the flow charts of the generation method of the content tab of the application one embodiment offer.It shouldMethod can be applied in the server 20 of implementation environment shown in FIG. 1.This method may include the following steps:
Step 201, n initial labels of object content are obtained, n is positive integer.
Initial labels refer to the label extracted according to the relevant information of object content, and above-mentioned relevant information can be anyRelevant to object content information, such as title, description information, object content itself, comment information.In one example, withFor object content is film " avenger alliance 3 ", initial labels may include: avenger alliance, unrestrained prestige, megahero,Universe.
In the embodiment of the present application, the mode for the initial labels for obtaining object content is not construed as limiting, in Examples belowExplanation can be introduced to a kind of possible implementation.
Step 202, for i-th of initial labels in n initial labels, in detection extension dictionary whether include and i-thThe corresponding target initial word of a initial labels, i are the positive integer less than or equal to n.
I-th of initial labels can be any one label in above-mentioned n initial labels.In addition, i-th of initial labelsCorresponding target initial word can be i-th of initial labels itself, be also possible to the synonym of i-th of initial labels.
In the embodiment of the present application, the dictionary of building extension in advance.Extend in dictionary includes at least one set of initial word and extensionCorresponding relationship between word.The same initial word can correspond to one or more expansion word, and the same expansion word can also be rightAnswer one or more initial word.Initial word corresponding for any one group and expansion word, the expansion word refer to and the initial wordWord with strong correlation.
Optionally, extension dictionary includes upper dictionary and/or represents dictionary.
It include the corresponding relationship between at least one set of initial word and hypernym in upper dictionary.The same initial word can be rightOne or more hypernym is answered, the same hypernym can also correspond to one or more initial word.For any one group pairThe initial word and hypernym answered, the hypernym refer to compared to the initial word, conceptually the wider array of descriptor of extension.For example," flower " is the hypernym of " fresh flower ", and " plant " is the hypernym of " flower ", and " music " is the hypernym of " mp3 ".One initial word instituteAny attribute, any one classifying mode for expressing concept, can be the hypernyms of the initial word.For example, " fresh flower is fastPass " hypernym can be " fresh flower ", " express delivery ", " shopping online ", " fresh flower ceremony ", " fresh flower shop ", " gift company " etc.;AgainFor example, the hypernym of " Wang Fei " can be " singer ", " woman ", " mommy ", " daughter ", " Hong Kong ", " Leo " etc..
In conjunction with reference Fig. 3, a kind of schematic diagram of relationship between initial word and hypernym is illustrated." chrysanthemum " andThe hypernym of " peony " is " flower ", and the hypernym of " apple tree " and " peach " is " tree ", and the hypernym of " flower " and " tree " is " to plantObject ".
In upper dictionary, each group of corresponding initial word and hypernym can be stored using following format: { key: " justBeginning word ";Relation: " hypernym ";Value: " hypernym " }.For example, { key: " James ";Relation:"hypernym";Value: " NBA sportsman " }.
It represents in dictionary and includes at least one set of initial word and represent the corresponding relationship between word.The same initial word can be rightOne or more is answered to represent word, the same word that represents can also correspond to one or more initial word.For any one group pairThe initial word answered and word is represented, it is the word for referring to represent the initial word that this, which represents word,.For example, as shown in figure 4, " avengerThe representative word of alliance " may include " small Robert Tang Buddhist nun ", " U.S. team leader " etc.;In another example the representative word of " steel is chivalrous " canTo include " small Robert Tang Buddhist nun ", " Lake Sha Enbu " etc..
In representing dictionary, each group of corresponding initial word can be stored with word is represented using following format: { key: " justBeginning word ";Relation: " expand ";Value: " representing word " }.For example, { key: " avenger alliance ";Relation:"expand";Value: " small Robert Tang Buddhist nun " }.
Step 203, it if including target initial word in extension dictionary, is obtained from extension dictionary corresponding with target initial wordTarget expansion word, and target expansion word is determined as to the extension tag of object content.
Optionally, it if in upper dictionary including target initial word, is obtained from upper dictionary corresponding with target initial wordTarget hypernym, and target hypernym is determined as to the extension tag of object content;If representing in dictionary includes that target is initialWord then obtains corresponding with target initial word object representations word from representing, and object representations word is determined as in target in dictionaryThe extension tag of appearance.
Optionally, for each of above-mentioned n initial labels initial labels, server is performed both by above-mentioned steps 202With 203, to obtain extension tag corresponding with each initial labels respectively.
In addition, server can not if not including target initial word corresponding with i-th of initial labels in extension dictionaryObtain extension tag corresponding with i-th of initial labels.
Step 204, the tally set of object content is generated, which includes initial labels and extension tag.
After server obtains the extension tag of object content, the initial labels of object content and extension tag are integrated,Obtain the tally set of object content.In one example, still by taking object content is film " avenger alliance 3 " as an example, at the beginning ofBeginning label may include: avenger alliance, unrestrained prestige, megahero, universe, it is assumed that server gets the generation of " avenger alliance "Table word includes " small Robert Tang Buddhist nun ", " U.S. team leader ", and the hypernym for getting " unrestrained prestige " includes " caricature ", then film is " multipleChou Zhe alliance 3 " tally set in may include following label: avenger alliance, unrestrained prestige, megahero, universe, small RobertTang Buddhist nun, U.S. team leader, caricature.
Optionally, the tally set of object content is supplied to auditor and audited by server, therefrom by auditorSuitable label is filtered out, eventually as the label of object content.
In conclusion extending dictionary by building in technical solution provided by the embodiments of the present application, the first of content is being generatedAfter beginning label, the extension tag of content is generated in conjunction with initial labels and extension dictionary, to realize the expansion to number of labelsExhibition, so that the label of content is more abundant.
In the alternative embodiment provided based on Fig. 2 embodiment, extension dictionary is generated by following methods:
1, entity dictionary is obtained;
It include at least one entity word in entity dictionary.Entity word refers to the word for characterizing persons or things, entity wordUsually noun.Optionally, entity word is crawled from encyclopaedia website by web crawlers technology, constructs entity dictionary.Encyclopaedia netStation, which refers to, provides the website of the review of various different fields, such as art, science, nature, culture, geography, life, society,The fields such as personage, economy, sport, history.There is more authoritative classification in encyclopaedia website to the persons or things of each different fieldAnd definition, therefore entity word is crawled with feasibility from encyclopaedia website, and more accurate and reliable.
2, the entity word for meeting preset condition is filtered out from entity dictionary as initial word, obtains initial dictionary;
Optionally, the entity word for meeting preset condition refers to that symbolical meanings are significant and the entity word of polysemy is not present.The process of initial dictionary is filtered out from entity dictionary, can be realized by artificial screening.
3, each initial word respectively in initial dictionary generates corresponding expansion word, and be expanded dictionary.
By the agency of above, expansion word include hypernym and/or represent word.When expansion word includes hypernym and represents wordWhen, it can also be led to by corresponding relationship of the dictionary (as being known as extension dictionary) between record initial word and expansion wordTwo dictionaries (as included upper dictionary and representing dictionary) corresponding relationship between record initial word and hypernym respectively is crossed, andInitial word and represent the corresponding relationship between word.
Optionally, the method for generating hypernym, including but not limited to following several:
1, suffix notation before word
Hypernym by obtaining the prefix or suffix of initial word, as the initial word.For example, " peony ", " chrysanthemumThe suffix of flower " is " flower ", can be by " flower " as " peony ", the hypernym of " chrysanthemum ".
2, co-occurrence morphology
Hypernym by obtaining the co-occurrence word of initial word, as the initial word.For example, the co-occurrence word of " James " includes" NBA ", then can hypernym by " NBA " as " James ".The co-occurrence word of so-called initial word refers to common with the initial wordThe frequency of appearance is higher than the word of predetermined threshold value.It optionally, include the related corpus of initial word by acquisition, to the correlationCorpus is analyzed, and the co-occurrence word of initial word is therefrom extracted.
3, ruler model method
By rule template from including initial word and meet in the sentence of specific clause and extract the upper of the initial wordWord.For example, there are a sentence " James are a NBA sportsmen ", then we can think that " James " is likely to wrapContaining a hypernym " NBA sportsman ", of this sort some clause are put into internet corpus and go after retrieval, so that it mayObtain the hypernym of some initial words.
The method of several generation hypernyms described above be only it is exemplary and explanatory, the embodiment of the present application is simultaneously unlimitedIt is fixed other methods to can be used also to generate hypernym.In addition, when generating hypernym, it can be using one of method, it can alsoTo use the combination of a variety of methods, such as a certain initial word, a variety of different methods are respectively adopted to generate the initial wordHypernym, then the hypernym of generation is integrated, the hypernym that frequency of occurrence is greater than threshold value is finally determined as the initial wordHypernym.
Optionally, the method for representing word for generation, can use rule-based method.For example, being based on team and teamRepresentation relation between length, based between film and protagonist representation relation, based on the representation relation between variety and host,Building represents the create-rule of word, is then based on above-mentioned rule, crawls the team of team from related web site by web crawlers technologyThe information such as long, the protagonist information of film, variety host, obtain the representative word of initial word.
In conjunction with reference Fig. 5, the flow chart of the building process of extension dictionary is illustrated.Firstly, being climbed by networkWorm technology crawls entity word from encyclopaedia website, constructs entity dictionary 51;Then it is filtered out from entity dictionary and meets default itemThe entity word of part obtains initial dictionary 52 as initial word;Later, word is generated and represented by hypernym respectively to generate, obtainUpper dictionary 53 and represent dictionary 54.
In conclusion, by obtaining entity dictionary, being sieved from entity dictionary in technical solution provided by the embodiments of the present applicationThe initial dictionary of entity word building for meeting preset condition is selected, hypernym create-rule is then based on and/or represents word generation ruleThen, extension dictionary is generated, to construct the knowledge mapping for tag extension, data is provided for tag extension and supports.
In another alternative embodiment based on Fig. 2 embodiment or the offer of above-mentioned alternative embodiment, pass through following sidesN initial labels of method acquisition object content:
1, the description information of object content is obtained;
Description information includes the information for object content being introduced explanation.Optionally, pass through web crawlers technologyThe description information of object content is crawled from related web site.It, can be by web crawlers technology from encyclopaedia class website by taking film as an exampleOr the description information of film is crawled in video display class website, such as the synopsis of film.
2, word segmentation processing is executed to description information, generates candidate word;
In the embodiment of the present application, algorithm used by word segmentation processing is not construed as limiting.For example, for Chinese, it canTo carry out word segmentation processing using the jieba of open source participle software.
Optionally, this step includes following several sub-steps:
(1), word segmentation processing is executed to description information, obtains at least two words;
(2), the word that target part of speech is chosen from least two words, as candidate word.
That can embody the descriptive words of content characteristic due to needing the candidate word extracted, word segmentation processing itAfterwards, some words can be filtered out according to the part of speech of word as candidate word.For example, above-mentioned target part of speech includes following at least one: noun, adjective, verb.And screen out the word of non-targeted part of speech, not as candidate word.
(3), clustering processing is executed to candidate word, obtains at least one class, include at least one candidate word in each class;
Be between each candidate word obtained after word segmentation processing it is no associated, in the embodiment of the present application, according to eachSemantic similarity between candidate word executes clustering processing to candidate word, obtains at least one class.Belong to of a sort candidate wordWith same or similar semanteme.
Optionally, this step includes following several sub-steps:
(1), the term vector of each candidate word is extracted;
(2), according to the term vector of every two candidate word, the similarity between every two candidate word is calculated;
(3), according to the similarity between every two candidate word, clustering processing is executed to candidate word, obtains at least one class.
It in the embodiment of the present application, can be by the similarity between the term vector of two candidate words of calculating, to obtain twoSimilarity between a candidate word.That is, by judge two candidate words it is semantic whether similar problem, be converted into calculatingThe similarity of term vector.Optionally, term vector training is carried out to candidate word using the word2vec tool of open source, training result isEach candidate word is expressed as the vector of k dimension, and k is positive integer.After the term vector for extracting each candidate word, need to lead toThe method for crossing cluster, is gathered into a class for the similar candidate word of term vector, reason for this is that expressed by different vocabularyThe meaning may be same or similar, it is therefore desirable to cluster semantic same or similar different candidate words.At thisApply in embodiment, is not construed as limiting to used algorithm is clustered, such as K-Means algorithm.
4, the descriptor of each class, the initial labels as object content are obtained.
After clustering to candidate word, the descriptor of each class is obtained, descriptor includes for representing in suchCandidate word.In one example, the mode manually marked is used to mark descriptor for each class.In another example, from everyIt selects a candidate word as such descriptor in the candidate word that a class is included, such as can choose first time in classSelect one candidate word of word or random selection as descriptor.
Illustratively, include following candidate word in some class: rescuing back, send back to, rescue, escape, save, " can will save the nation from extinction "As such descriptor.
After server gets the descriptor of each class, each descriptor that will acquire is determined as the first of object contentBeginning label.
In conclusion in technical solution provided by the embodiments of the present application, provides and mentioned in a kind of description information from contentTake the mode of the initial labels of content.Certainly, the initial labels that other way extraction content also can be used can by taking video as an exampleTo extract initial labels from the title of video, it can also identify that the voice messaging in video is corresponding by speech recognition technologyText, extract initial labels from above-mentioned text, can also based on depth learning technology extract from video content initially markLabel, etc..In the embodiment of the present application, mode used by the initial labels for extracting content is not especially limited.
Following is the application Installation practice, can be used for executing the application embodiment of the method.It is real for the application deviceUndisclosed details in example is applied, the application embodiment of the method is please referred to.
Referring to FIG. 6, it illustrates the block diagrams of the generating means of the content tab of the application one embodiment offer.The dressIt sets to have and realizes the exemplary function of the above method, the function can also be executed corresponding soft by hardware realization by hardwarePart is realized.The device 600 may include: that label acquisition module 610, detection module 620, tag extension module 630 and label are rawAt module 640.
Label acquisition module 610, for obtaining n initial labels of object content, the n is positive integer.
Detection module 620, for for i-th of initial labels in the n initial labels, detection, which extends in dictionary, to beNo includes target initial word corresponding with i-th of initial labels;It wherein, include at least one set of initial in the extension dictionaryCorresponding relationship between word and expansion word, the i are the positive integer less than or equal to n.
Tag extension module 630 is used for when in the extension dictionary including the target initial word, from the expansion wordTarget expansion word corresponding with the target initial word is obtained in library, and the target expansion word is determined as the object contentExtension tag.
Tag generation module 640, for generating the tally set of the object content, the tally set includes the initial markLabel and the extension tag.
In conclusion extending dictionary by building in technical solution provided by the embodiments of the present application, the first of content is being generatedAfter beginning label, the extension tag of content is generated in conjunction with initial labels and extension dictionary, to realize the expansion to number of labelsExhibition, so that the label of content is more abundant.
In the alternative embodiment provided based on Fig. 6 embodiment, the extension dictionary includes upper dictionary, it is described onIt include the corresponding relationship between at least one set of initial word and hypernym in the dictionary of position.
Correspondingly, as shown in fig. 7, the tag extension module 630, comprising: hypernym expanding element 630a.
The hypernym expanding element 630a is used for when in the upper dictionary including the target initial word, from instituteIt states and obtains corresponding with target initial word target hypernym in upper dictionary, and described in the target hypernym is determined asThe extension tag of object content.
In another alternative embodiment based on Fig. 6 embodiment or the offer of above-mentioned alternative embodiment, the expansion wordLibrary includes representing dictionary, and described represent in dictionary includes at least one set of initial word and represent the corresponding relationship between word.
Correspondingly, as shown in fig. 7, the tag extension module 630, comprising: represent word expanding element 630b.
It is described to represent word expanding element 630b, for when it is described represent in dictionary include the target initial word when, from instituteIt states to represent and obtains corresponding with target initial word object representations word in dictionary, and described in the object representations word is determined asThe extension tag of object content.
In another alternative embodiment based on Fig. 6 embodiment or the offer of above-mentioned alternative embodiment, as shown in fig. 7,Described device 600 further include: dictionary obtains module 650, screening module 660 and dictionary creation module 670.
Dictionary obtains module 650, includes at least one entity word in the entity dictionary for obtaining entity dictionary.
Screening module 660, for filtering out the entity word for meeting preset condition from the entity dictionary as described firstBeginning word obtains initial dictionary.
Dictionary creation module 670, for being respectively that each initial word in the initial dictionary generates corresponding expansion word,Obtain the extension dictionary.
In another alternative embodiment based on Fig. 6 embodiment or the offer of above-mentioned alternative embodiment, the label is obtainedModulus block 610, comprising: information acquisition unit, participle unit, cluster cell and label acquiring unit (not shown).
Information acquisition unit, for obtaining the description information of the object content, the description information includes for instituteState the information that explanation is introduced in object content.
Participle unit generates candidate word for executing word segmentation processing to the description information.
Cluster cell obtains at least one class, includes at least in each class for executing clustering processing to the candidate wordOne candidate word.
Label acquiring unit, the initial labels for obtaining the descriptor of each class, as the object content.
Optionally, the cluster cell, is used for: extracting the term vector of each candidate word;According to the word of every two candidate wordVector calculates the similarity between every two candidate word;According to the similarity between every two candidate word, the candidate word is heldRow clustering processing obtains at least one described class.
It should be noted that device provided by the above embodiment, when realizing its function, only with above-mentioned each functional moduleIt divides and carries out for example, can according to need in practical application and be completed by different functional modules above-mentioned function distribution,The internal structure of equipment is divided into different functional modules, to complete all or part of the functions described above.In addition,Apparatus and method embodiment provided by the above embodiment belongs to same design, and specific implementation process is detailed in embodiment of the method, thisIn repeat no more.
Referring to FIG. 8, the structural block diagram of the computer equipment provided it illustrates the application one embodiment.The computerEquipment can be used for implementing the generation method of the content tab provided in above-described embodiment.The computer equipment can be PC or clothesBusiness device or other equipment for having data processing and storage capacity.Specifically:
The computer equipment 800 includes central processing unit (CPU) 801 including random access memory (RAM) 802With the system storage 804 of read-only memory (ROM) 803, and connection system storage 804 and central processing unit 801System bus 805.The computer equipment 800 further includes that the substantially defeated of information is transmitted between each device helped in computerEnter/output system (I/O system) 806, and for storage program area 813, application program 814 and other program modules 815Mass-memory unit 807.
The basic input/output 806 includes display 808 for showing information and inputs letter for userThe input equipment 809 of such as mouse, keyboard etc of breath.Wherein the display 808 and input equipment 809 are all by being connected toThe input and output controller 810 of system bus 805 is connected to central processing unit 801.The basic input/output 806Can also include input and output controller 810 with for receive and handle from keyboard, mouse or electronic touch pen etc. it is multiple itsThe input of his equipment.Similarly, input and output controller 810 also provides output to display screen, printer or other kinds of defeatedEquipment out.
The mass-memory unit 807 is by being connected to the bulk memory controller (not shown) of system bus 805It is connected to central processing unit 801.The mass-memory unit 807 and its associated computer-readable medium are computerEquipment 800 provides non-volatile memories.That is, the mass-memory unit 807 may include such as hard disk or CD-The computer-readable medium (not shown) of ROM drive etc.
Without loss of generality, the computer-readable medium may include computer storage media and communication media.ComputerStorage medium includes information such as computer readable instructions, data structure, program module or other data for storageThe volatile and non-volatile of any method or technique realization, removable and irremovable medium.Computer storage medium includesRAM, ROM, EPROM, EEPROM, flash memory or other solid-state storages its technologies, CD-ROM, DVD or other optical storages, tapeBox, tape, disk storage or other magnetic storage devices.Certainly, skilled person will appreciate that the computer storage mediumIt is not limited to above-mentioned several.Above-mentioned system storage 804 and mass-memory unit 807 may be collectively referred to as memory.
According to the various embodiments of the application, the computer equipment 800 can also be connected by networks such as internetsThe remote computer operation being connected on network.Namely computer equipment 800 can be by being connected on the system bus 805Network Interface Unit 811 is connected to network 812, in other words, Network Interface Unit 811 can be used also to be connected to other typesNetwork or remote computer system (not shown).
The memory further includes that one or more than one program, the one or more programs are stored inIn memory, and it is configured to be executed by one or more than one processor.Said one or more than one program includeFor executing the instruction of the generation method of above content label.
In this example in embodiment, a kind of computer equipment is additionally provided, the computer equipment includes processor and depositsReservoir is stored at least one instruction, at least a Duan Chengxu, code set or instruction set in the memory.Described at least oneInstruction, at least a Duan Chengxu, code set or instruction set are configured to be executed by one or more than one processor, on realizingState the generation method of content tab.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, is stored in the storage mediumAt least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, instituteState the generation method that code set or described instruction collection realize above content label when being executed by the processor of computer equipment.
Optionally, above-mentioned computer readable storage medium can be ROM, RAM, CD-ROM, tape, floppy disk and light data and depositStore up equipment etc..
In the exemplary embodiment, a kind of computer program product is additionally provided, when the computer program product is performedWhen, for realizing the generation method of above content label.
It should be understood that referenced herein " multiple " refer to two or more."and/or", description associationThe incidence relation of object indicates may exist three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously AAnd B, individualism B these three situations.Character "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or".
The foregoing is merely the exemplary embodiments of the application, all in spirit herein not to limit the applicationWithin principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.