Movatterモバイル変換


[0]ホーム

URL:


CN109213988A - Barrage subject distillation method, medium, equipment and system based on N-gram model - Google Patents

Barrage subject distillation method, medium, equipment and system based on N-gram model
Download PDF

Info

Publication number
CN109213988A
CN109213988ACN201710514238.XACN201710514238ACN109213988ACN 109213988 ACN109213988 ACN 109213988ACN 201710514238 ACN201710514238 ACN 201710514238ACN 109213988 ACN109213988 ACN 109213988A
Authority
CN
China
Prior art keywords
word
barrage
gram model
indicates
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710514238.XA
Other languages
Chinese (zh)
Other versions
CN109213988B (en
Inventor
龚灿
陈少杰
张文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co LtdfiledCriticalWuhan Douyu Network Technology Co Ltd
Priority to CN201710514238.XApriorityCriticalpatent/CN109213988B/en
Publication of CN109213988ApublicationCriticalpatent/CN109213988A/en
Application grantedgrantedCritical
Publication of CN109213988BpublicationCriticalpatent/CN109213988B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a kind of barrage subject distillation method, medium, equipment and system based on N-gram model, are related to live streaming field.Method includes the following steps: extracting barrage data;The corresponding feature of word for indicating certain specific intended is extracted, customized dictionary is added to;Customized deactivated dictionary is added in the word of not practical significance;Data prediction: removal " barrage content " field is empty data;Remove the punctuation mark in " barrage content " field;It by the barrage content of data prediction, is indicated using N-gram model, N-gram model indicates that the probability of occurrence of certain word in sentence is related to N-1 word before, and N is positive integer;Every barrage content is cut into one group of term vector, carrys out every barrage content of cutting at word rule according in customized dictionary, according to customized deactivated dictionary come the word of filtering useless.The present invention can accurately extract barrage theme.

Description

Barrage subject distillation method, medium, equipment and system based on N-gram model
Technical field
The present invention relates to live streaming fields, are specifically related to a kind of barrage subject distillation method based on N-gram model, are situated betweenMatter, equipment and system.
Background technique
The live streaming main content of text of platform normally behaves as barrage, and in order to count barrage content, it is flat to need to extract live streamingThe barrage text information of platform.Currently, it is using the side manually to label mostly that the traditional barrage Text Feature Extraction scheme of industry, which is broadcast live,Method, this method largely spend human and material resources cost, when in face of full platform user, main broadcaster and more than one hundred million barrage data, peopleThe obvious inefficiency of method of work processing.Also, existing barrage text is based purely on bag of words to indicate, has ignored singleRelationship between word and context makes the extraction inaccuracy of barrage theme.
Summary of the invention
The purpose of the invention is to overcome the shortcomings of above-mentioned background technique, a kind of barrage based on N-gram model is providedSubject distillation method, medium, equipment and system can accurately extract barrage theme.
The present invention provides a kind of barrage subject distillation method based on N-gram model, comprising the following steps:
Data preparation: barrage data are extracted;
It constructs barrage feature: extracting the corresponding feature of word for indicating certain specific intended, be added to customized dictionary;It willThere is no the word of practical significance that customized deactivated dictionary is added;
Data prediction: removal " barrage content " field is empty data;Remove the punctuate symbol in " barrage content " fieldNumber;
Use N-gram model by barrage content representation for term vector: by the barrage content of data prediction, using N-Gram model indicates that N-gram model indicates that the probability of occurrence of certain word in sentence is to N-1 word before related, and N is positive integer;Every barrage content is cut into one group of term vector, carrys out every barrage content of cutting at word rule according in customized dictionary,According to customized deactivated dictionary come the word of filtering useless.
Based on the above technical solution, the value of the N is 2, i.e., each word has relationship with a word before it.
Based on the above technical solution, in the N-gram model, the new probability formula of sentence appearance are as follows:Wherein, p indicates the probability value that sentence occurs, w11st is indicated in sentenceThe word of position, Π indicates tired and multiplies symbol, and m indicates the word number in sentence, wiIndicate that i-th of word, m, i are positive integer, p (wiwi-1)It indicates the probability that the word on the position i-th and i-1 occurs simultaneously, existing phrase is reconfigured according to the formula, utilization is adjacentTwo words merge and generate new phrase.
Based on the above technical solution, the dimension of the term vector is 600,000 dimensions, i.e., by each barrage table of contentsIt is shown as a vector of 600,000 dimensions, the corresponding word in each position, obtaining final barrage theme indicates.
The present invention also provides a kind of storage medium, computer program, the computer program are stored on the storage mediumThe above method is realized when being executed by processor.
The present invention also provides a kind of electronic equipment, including memory and processor, stored on a processor on memoryThe computer program of operation, processor realize the above method when executing computer program.
The present invention also provides a kind of barrage subject extraction system based on N-gram model, which includes data preparation listMember, barrage feature construction unit, data pre-processing unit, N-gram model indicate unit, cutting unit, in which:
Data preparation unit is used for: extracting barrage data;
Barrage feature construction unit is used for: being extracted the corresponding feature of word for indicating certain specific intended, is added to and makes by oneselfAdopted dictionary;Customized deactivated dictionary is added in the word of not practical significance;
Data pre-processing unit is used for: removal " barrage content " field is empty data;Remove in " barrage content " fieldPunctuation mark;
N-gram model indicates that unit is used for: by the barrage content of data prediction, indicated using N-gram model,N-gram model indicates that the probability of occurrence of certain word in sentence is related to N-1 word before, and N is positive integer;
Cutting unit is used for: every barrage content being cut into one group of term vector, according to advising in customized dictionary at wordThen carry out every barrage content of cutting, according to customized deactivated dictionary come the word of filtering useless.
Based on the above technical solution, the value of the N is 2, i.e., each word has relationship with a word before it.
Based on the above technical solution, in the N-gram model, the new probability formula of sentence appearance are as follows:Wherein, p indicates the probability value that sentence occurs, w11st is indicated in sentenceThe word of position, ∏ indicates tired and multiplies symbol, and m indicates the word number in sentence, wiIndicate that i-th of word, m, i are positive integer, p (wiwi-1)It indicates the probability that the word on the position i-th and i-1 occurs simultaneously, existing phrase is reconfigured according to the formula, utilization is adjacentTwo words merge and generate new phrase.
Based on the above technical solution, the dimension of the term vector is 600,000 dimensions, i.e., by each barrage table of contentsIt is shown as a vector of 600,000 dimensions, the corresponding word in each position, obtaining final barrage theme indicates.
Compared with prior art, advantages of the present invention is as follows:
(1) present invention extracts barrage data;The corresponding feature of word for indicating certain specific intended is extracted, is added to and makes by oneselfAdopted dictionary;Customized deactivated dictionary is added in the word of not practical significance;Data prediction: removal " barrage content " field is skyData;Remove the punctuation mark in " barrage content " field;By the barrage content of data prediction, using N-gram modelIt indicates, N-gram model indicates that the probability of occurrence of certain word in sentence is to N-1 word before related, and N is positive integer;By every bulletCurtain content is cut into one group of term vector, carrys out every barrage content of cutting at word rule according in customized dictionary, according to making by oneselfThe deactivated dictionary of justice carrys out the word of filtering useless.The present invention uses N-gram model by barrage content representation for term vector, N-gramThe barrage representation method of model overcomes existing bag of words and ignores this disadvantage of context, so that barrage indicates more acurrate,Therefore, barrage theme can accurately be extracted.
(2) in N-gram model of the invention, the value of N is 2, i.e., each word has relationship with a word before it.This hairIt is bright that improvement is made based on original 2-gram model, it can reduce computation complexity.
(3) present invention has merged N-gram model and artificial constructed characterization method, can accurately extract single barrage,The main information in single room, and realize that the vectorization of barrage theme indicates.
(4) it is also accumulated while live streaming platform has accumulated many any active ues with the continuous development of live streaming platform serviceThe data of a large amount of text type.Deep excavation is done by the content of text to live streaming platform, it is known that user and roomBetween content is similar, content similarity of room and room, to improve the effect of live streaming platform personalized recommendation.
Detailed description of the invention
Fig. 1 is the flow chart of the barrage subject distillation method in the embodiment of the present invention based on N-gram model.
Fig. 2 is the structural block diagram of electronic equipment in the embodiment of the present invention.
Specific embodiment
With reference to the accompanying drawing and specific embodiment the present invention is described in further detail.
Shown in Figure 1, the embodiment of the present invention provides a kind of barrage subject distillation method based on N-gram model, includingFollowing steps:
S1, data preparation: barrage data are extracted;
S2, building barrage feature: the corresponding feature of word for indicating certain specific intended is extracted, custom words are added toLibrary;Customized deactivated dictionary is added in the word of not practical significance;
S3, data prediction: removal " barrage content " field is empty data;Remove the punctuate in " barrage content " fieldSymbol;
S4, it uses N-gram model by barrage content representation for term vector: by the barrage content of data prediction, usingN-gram model indicates that N-gram model indicates that the probability of occurrence of certain word in sentence is to N-1 word before related, and N is positive wholeNumber;Every barrage content is cut into one group of term vector, according to coming in every barrage of cutting in customized dictionary at word ruleHold, according to customized deactivated dictionary come the word of filtering useless.
The embodiment of the present invention also provides a kind of storage medium, and computer program, computer journey are stored on the storage mediumThe above-mentioned barrage subject distillation method based on N-gram model is realized when sequence is executed by processor.It should be noted that storage is situated betweenMatter include USB flash disk, mobile hard disk, ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory,Random access memory), the various media that can store program code such as magnetic or disk.
Shown in Figure 2, the embodiment of the present invention also provides a kind of electronic equipment, including memory and processor, memoryOn store the computer program run on a processor, processor is realized above-mentioned based on N-gram mould when executing computer programThe barrage subject distillation method of type.
The embodiment of the present invention also provides a kind of barrage subject extraction system based on N-gram model, which includes dataPreparatory unit, barrage feature construction unit, data pre-processing unit, N-gram model indicate unit, cutting unit, in which:
Data preparation unit is used for: extracting barrage data;
Barrage feature construction unit is used for: being extracted the corresponding feature of word for indicating certain specific intended, is added to and makes by oneselfAdopted dictionary;Customized deactivated dictionary is added in the word of not practical significance;
Data pre-processing unit is used for: removal " barrage content " field is empty data;Remove in " barrage content " fieldPunctuation mark;
N-gram model indicates that unit is used for: by the barrage content of data prediction, indicated using N-gram model,N-gram model indicates that the probability of occurrence of certain word in sentence is related to N-1 word before, and N is positive integer;
Cutting unit is used for: every barrage content being cut into one group of term vector, according to advising in customized dictionary at wordThen carry out every barrage content of cutting, according to customized deactivated dictionary come the word of filtering useless.
N-gram model is common a kind of language model in large vocabulary continuous speech recognition, and for Chinese, we claimBe Chinese language model (CLM:Chinese Language Model).Chinese language model is using between adjacent word in contextCollocation information, needing the phonetic continuously without space, stroke, or represents alphabetical or stroke number, be converted into Chinese character stringWhen (i.e. sentence), the sentence with maximum probability can be calculated, thus realize the automatic conversion for arriving Chinese character, it is manual without userSelection avoids the coincident code problem of the corresponding identical phonetic (or stroke string or numeric string) of many Chinese characters.
N-gram model based on it is such a it is assumed that some word appearance only it is related to the word of front N-1, and with otherWhat word is all uncorrelated, and the probability of whole sentence is exactly the product of each word probability of occurrence.These probability can be by directly from corpusIt counts N number of word while the number occurred obtains.
In the embodiment of the present invention, the value of N is 2, i.e., each word has relationship with a word before it.
In N-gram model, the new probability formula of sentence appearance are as follows:ItsIn, p indicates the probability value that sentence occurs, w1Indicate the word of the 1st position in sentence, Π indicates tired and multiplies symbol, and m is indicated in sentenceWord number, wiIndicate that i-th of word, m, i are positive integer, p (wiwi-1) indicate the word on the position i-th and i-1 while occurring generalRate reconfigures existing phrase according to the formula, is merged using two adjacent words and generates new phrase.
The dimension of term vector is 600,000 dimensions, i.e., is a vector of 600,000 dimensions, Mei Gewei by each barrage content representationA corresponding word is set, obtaining final barrage theme indicates.
The text representation for being based purely on bag of words has ignored relationship between single word and context, in contrast, thisInventive embodiments consider the influence of context-sensitive text by N-gram model, keep barrage subject distillation more acurrate;It is anotherAspect, simple uses N-gram model to make text representation excessively complicated, and the embodiment of the present invention makes improvement for N-gram.
It is exemplified below.
Modelling data source: here by nearest one month barrage data of platform as data source.
Modeling procedure:
(1) data preparation: extracting nearest one month barrage data, and the data for mainly including have this word of barrage contentSection, data format are [barrage content];
(2) barrage feature construction: there are many proprietary vocabulary for having platform characteristic, such as " water friend " in barrage, refer to masterThe bean vermicelli broadcast.
Customized proper noun, verb, and it is added to Custom Dictionaries library.Such as: " water friend " word is added customized" water friend " cutting can be a word during subsequent participle, without being cut into " water " and " friend " two words by dictionary.
Feature extraction: the vocabulary of cheers such as " 666 " is replaced with " cheer ";It will " 13617258349 " be this meets handThe numeric string of machine feature, with " mobile phone contact method ", this feature is replaced;By " QQ324567865 " this kind of character string, with " QQThis feature of contact method " replaces, the word with platform characteristic and with certain specific intended of expression of all platform accumulations,It will be converted to a corresponding feature in this way.
The mode of this feature extraction can effectively reduce the dimension of feature, such as all QQ numbers are all converted into " QQ connectionIt is mode " this feature.
(3) data prediction:
Data prediction is done on the basis of step (2): removal " barrage content " field is empty data;Remove " barrageThe punctuation mark for including in content " field.
(4) use N-gram model by barrage content representation for term vector:
Customized dictionary: the content based on platform, manual sorting portion include the custom words of all specific vocabulary of platformLibrary, the accuracy of customized dictionary will affect the extraction accuracy of barrage subject content.
Customized deactivated dictionary: stop words refers to that the word of no too many practical significance compared with other words, this word existIt can be removed before doing content analysis.Specific to judging that a word is stop words, vary with each individual, the present invention different because of sceneThe peculiar stop words dictionary that embodiment is precipitated using live streaming platform itself.
Word cutting and the bag of words of barrage indicate: will be cut into one group of term vector by the barrage of data prediction, oftenBarrage, which is first depending in customized dictionary, carrys out cutting at word rule, while can come according in customized dictionary at word ruleFilter useless word.
For example, barrage content is " the naked wolf of main broadcaster today plays excellent ", it include " today " in stop words," ", " ", after word cutting, barrage becomes [" main broadcaster ", " naked wolf ", " object for appreciation ", " so good ", " so good "].Here in order to remember bulletAfter curtain word cutting, the word order between word and word, be expressed as here with key-value key-value pair [" main broadcaster ": 1, " naked wolf ": 2, " object for appreciation ":3, " so good ": 4, " so good ": 5], key indicates that word, value indicate the word order of word place sentence.
N-gram model is meant that: the probability of occurrence of some word is relevant to its preceding N-1 word in sentence, here NValue be 2, i.e., each word and it before a word have relationship.
In N-gram model, the probability that a sentence occurs is expressed as:
Wherein, p (wi|wi-1) indicate: when the word on the (i-1)-th position occurs, the conditional probability of i-th of word appearance.
According to N-gram model, the probability of sentence appearance is calculated, needs successively to calculate each word and relies on a wordConditional probability p (wi|wi-1)。
From naive Bayesian formula:
p(wi|wi-1) specific calculating process it is extremely complex, the embodiment of the present invention is innovatively by former simplified formula are as follows:
Wherein, p (wiwi-1) indicate that the probability that the word on the position i-th and i-1 occurs simultaneously, p indicate the probability that sentence occursValue, w1Indicate the word of the 1st position in sentence, ∏ indicates tired and multiplies symbol, and m indicates the word number in sentence, wiIndicate i-th of word,M, i is positive integer.
The embodiment of the present invention will calculate the conditional probability that i-th of word occurs when the word on the (i-1)-th position occurs:The word for being reduced to calculate on the position i-th and i-1 while the Probability p occurred(wiwi-1), hence it is evident that alleviate the workload and complexity of calculating.
On the basis of (3) step, existing phrase is reconfigured according to the rule of simplified N-gram, utilizes phaseTwo adjacent words, which merge, generates new phrase, [" main broadcaster ", " naked wolf ", " object for appreciation ", " so good ", " so good "] it is changed into that [" main broadcaster is nakedWolf ", " naked wolf plays ", " so good ", " playing so good "]
Barrage representation method of the embodiment of the present invention based on 2-gram model, overcome bag of words ignore context thisDisadvantage, so that barrage indicates more acurrate;In addition, reducing and calculating in practice again based on improvement is made on original 2-gram modelMiscellaneous degree.
The hash of word maps: setting the dimension of term vector here as 600,000 dimensions (based on Heuristics), i.e., by each barrageContent representation is a vector V of 600,000 dimensions, the corresponding word in each position.Such as: " main broadcaster " is mapped to vector V here(0) position, " naked wolf " are mapped to the position of (1) V, and " object for appreciation " is mapped to the position of (2) V, and " so good " is mapped to the position of (3) V(mapping in reality be it is random be mapped in 600,000 positions, for the convenience of description, word is mapped to specified forward 4On a position, the value of corresponding position indicates the number that word occurs, then [" the naked wolf of main broadcaster ": 1, " naked wolf plays ": 1, " so good ":1, " playing so good ": 1] it is converted to term vector and becomes (1,1,1,1,0,0,0,0,0) later, ellipsis is omitted 590,000 9 thousand 9100 0 (because term vector regular length is 600,000), then obtaining final barrage theme indicates.The subject heading list of barrage is obtainedAfter showing, technology place mat can be done for the identification of rubbish barrage based on the subject distillation of single barrage.
It should be understood that system provided in an embodiment of the present invention is when carrying out intermodule communication, only with above-mentioned each functionThe division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function mouldsBlock is completed, i.e., the internal structure of system is divided into different functional modules, to complete all or part of function described aboveEnergy.
Further, the present invention is not limited to the above-described embodiments, for those skilled in the art,Without departing from the principles of the invention, several improvements and modifications can also be made, these improvements and modifications are also considered as the present inventionProtection scope within.The content being not described in detail in this specification belongs to existing skill well known to professional and technical personnel in the fieldArt.

Claims (10)

CN201710514238.XA2017-06-292017-06-29Barrage theme extraction method, medium, equipment and system based on N-gram modelActiveCN109213988B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710514238.XACN109213988B (en)2017-06-292017-06-29Barrage theme extraction method, medium, equipment and system based on N-gram model

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710514238.XACN109213988B (en)2017-06-292017-06-29Barrage theme extraction method, medium, equipment and system based on N-gram model

Publications (2)

Publication NumberPublication Date
CN109213988Atrue CN109213988A (en)2019-01-15
CN109213988B CN109213988B (en)2022-06-21

Family

ID=64976355

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710514238.XAActiveCN109213988B (en)2017-06-292017-06-29Barrage theme extraction method, medium, equipment and system based on N-gram model

Country Status (1)

CountryLink
CN (1)CN109213988B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109948152A (en)*2019-03-062019-06-28北京工商大学 A Chinese text grammar error correction model method based on LSTM
CN110430448A (en)*2019-07-312019-11-08北京奇艺世纪科技有限公司A kind of barrage processing method, device and electronic equipment
CN113948085A (en)*2021-12-222022-01-18中国科学院自动化研究所Speech recognition method, system, electronic device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2008098507A1 (en)*2007-02-132008-08-21Beijing Sogou Technology Development Co., Ltd.An input method of combining words intelligently, input method system and renewing method
CN101930561A (en)*2010-05-212010-12-29电子科技大学 A Reverse Neural Network Spam Filtering Device Based on N-Gram Word Segmentation Model
CN103207921A (en)*2013-04-282013-07-17福州大学Method for automatically extracting terms from Chinese electronic document
CN103246644A (en)*2013-04-022013-08-14亿赞普(北京)科技有限公司Method and device for processing Internet public opinion information
CN105435453A (en)*2015-12-222016-03-30网易(杭州)网络有限公司Bullet screen information processing method, device and system
CN105516820A (en)*2015-12-102016-04-20腾讯科技(深圳)有限公司Barrage interaction method and device
CN105760507A (en)*2016-02-232016-07-13复旦大学Cross-modal subject correlation modeling method based on deep learning
CN106055538A (en)*2016-05-262016-10-26达而观信息科技(上海)有限公司Automatic extraction method for text labels in combination with theme model and semantic analyses

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2008098507A1 (en)*2007-02-132008-08-21Beijing Sogou Technology Development Co., Ltd.An input method of combining words intelligently, input method system and renewing method
CN101930561A (en)*2010-05-212010-12-29电子科技大学 A Reverse Neural Network Spam Filtering Device Based on N-Gram Word Segmentation Model
CN103246644A (en)*2013-04-022013-08-14亿赞普(北京)科技有限公司Method and device for processing Internet public opinion information
CN103207921A (en)*2013-04-282013-07-17福州大学Method for automatically extracting terms from Chinese electronic document
CN105516820A (en)*2015-12-102016-04-20腾讯科技(深圳)有限公司Barrage interaction method and device
CN105435453A (en)*2015-12-222016-03-30网易(杭州)网络有限公司Bullet screen information processing method, device and system
CN105760507A (en)*2016-02-232016-07-13复旦大学Cross-modal subject correlation modeling method based on deep learning
CN106055538A (en)*2016-05-262016-10-26达而观信息科技(上海)有限公司Automatic extraction method for text labels in combination with theme model and semantic analyses

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李金兰: "有效进行直播平台的弹幕管理", 《有线电视技术》*

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109948152A (en)*2019-03-062019-06-28北京工商大学 A Chinese text grammar error correction model method based on LSTM
CN110430448A (en)*2019-07-312019-11-08北京奇艺世纪科技有限公司A kind of barrage processing method, device and electronic equipment
CN110430448B (en)*2019-07-312021-09-03北京奇艺世纪科技有限公司Bullet screen processing method and device and electronic equipment
CN113948085A (en)*2021-12-222022-01-18中国科学院自动化研究所Speech recognition method, system, electronic device and storage medium
CN113948085B (en)*2021-12-222022-03-25中国科学院自动化研究所Speech recognition method, system, electronic device and storage medium
US11501759B1 (en)2021-12-222022-11-15Institute Of Automation, Chinese Academy Of SciencesMethod, system for speech recognition, electronic device and storage medium

Also Published As

Publication numberPublication date
CN109213988B (en)2022-06-21

Similar Documents

PublicationPublication DateTitle
CN106844346B (en)Short text semantic similarity discrimination method and system based on deep learning model Word2Vec
CN110362819B (en)Text emotion analysis method based on convolutional neural network
CN109710929A (en)A kind of bearing calibration, device, computer equipment and the storage medium of speech recognition text
CN105261358A (en)N-gram grammar model constructing method for voice identification and voice identification system
CN105930362B (en)Search for target identification method, device and terminal
CN107885721A (en)A kind of name entity recognition method based on LSTM
CN103324626B (en)A kind of set up the method for many granularities dictionary, the method for participle and device thereof
CN107885874A (en)Data query method and apparatus, computer equipment and computer-readable recording medium
CN107943911A (en) Data extraction method, device, computer equipment and readable storage medium
CN107480143A (en)Dialogue topic dividing method and system based on context dependence
CN109192225B (en) Method and device for speech emotion recognition and labeling
CN104008166A (en)Dialogue short text clustering method based on form and semantic similarity
CN110297880B (en)Corpus product recommendation method, apparatus, device and storage medium
CN106708798B (en)Character string segmentation method and device
CN109740164B (en) Power Defect Level Recognition Method Based on Deep Semantic Matching
CN106934005A (en)A kind of Text Clustering Method based on density
CN110674378A (en)Chinese semantic recognition method based on cosine similarity and minimum editing distance
CN108664465A (en)One kind automatically generating text method and relevant apparatus
CN112860896A (en)Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN106227714A (en)A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
CN102609500A (en)Question push method, question answering system using same and search engine
CN111192572A (en)Semantic recognition method, device and system
CN113361260B (en) A text processing method, device, equipment and storage medium
CN109213988A (en)Barrage subject distillation method, medium, equipment and system based on N-gram model
CN102999533A (en)Textspeak identification method and system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp