Movatterモバイル変換


[0]ホーム

URL:


CN110309265B - A method to decide whether a video pushes relevant legal knowledge - Google Patents

A method to decide whether a video pushes relevant legal knowledge
Download PDF

Info

Publication number
CN110309265B
CN110309265BCN201910581969.5ACN201910581969ACN110309265BCN 110309265 BCN110309265 BCN 110309265BCN 201910581969 ACN201910581969 ACN 201910581969ACN 110309265 BCN110309265 BCN 110309265B
Authority
CN
China
Prior art keywords
legal
words
push
barrage
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910581969.5A
Other languages
Chinese (zh)
Other versions
CN110309265A (en
Inventor
沈泳龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU HUALV NETWORKING Co.,Ltd.
Original Assignee
Chengdu Hualv Networking Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Hualv Networking Co ltdfiledCriticalChengdu Hualv Networking Co ltd
Priority to CN201910581969.5ApriorityCriticalpatent/CN110309265B/en
Publication of CN110309265ApublicationCriticalpatent/CN110309265A/en
Application grantedgrantedCritical
Publication of CN110309265BpublicationCriticalpatent/CN110309265B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种决定视频是否推送相关法律知识的方法,所述方法包括:构建法律条文库、法律案例库、法律知识词库;根据所述视频播放内容对应的字幕、弹幕信息,获取字幕匹配法律特征词集合、弹幕匹配法律特征词集合、弹幕匹配疑惑词个数、弹幕匹配消极情感词个数、相似度值、热度分析值、视频播放进度百分比;将以上特征输入到机器学习分类器中进行分类,最终决定当前节点是否推送相应的法律条文或案例。本发明能够结合视频弹幕,字幕以及法律条文检索热度等信息,决定当下是否推送相应的法律条文或案例,使推送更加具有针对性,解决了知识过度推送的技术难题。

Figure 201910581969

The invention discloses a method for determining whether a video pushes relevant legal knowledge. The method includes: constructing a legal article database, a legal case database, and a legal knowledge thesaurus; Subtitle matching legal feature word set, barrage matching legal feature word set, barrage matching doubt words, barrage matching negative emotion words, similarity value, heat analysis value, video playback progress percentage; input the above features into Classify in the machine learning classifier, and finally decide whether the current node pushes the corresponding legal provisions or cases. The present invention can combine information such as video barrage, subtitles, and legal article retrieval popularity to determine whether to push corresponding legal articles or cases at the moment, making the push more targeted and solving the technical problem of excessive knowledge push.

Figure 201910581969

Description

Method for determining whether video pushes related legal knowledge or not
Technical Field
The invention relates to the technical field of communication, in particular to a method for determining whether a video pushes related legal knowledge or not.
Background
At present, many video contents are rich and colorful on the network, but some of them involve law related video contents, such as news program, finance program, legal program, civil life program, and related crime related film, some teenagers have no concept for some laws and regulations in the law related program due to immature mind and lack of legal knowledge, and are also easily misled by some illegal plots played in the law related video, parents are therefore oblivious to the children to freely surf the internet, in order to popularize laws to these teenagers, a good network environment is provided for the teenagers, the network supervision of parents is convenient, when the played video contents involve illegal or law related information, related laws and law cases are pushed to the teenagers to watch and learn by the webpage client in a knowledge form, the network supervision and education of the children are very helpful, parents can feel more relieved and children can also be supported to independently surf the internet and learn.
The video content is numerous and different in length, and knowledge is not suitable for being frequently pushed, so that the user experience is influenced.
Therefore, when the time node is played, related legal knowledge needs to be pushed, and the decision method is significant and effective.
Disclosure of Invention
The present invention is directed to solving the above-mentioned drawbacks of the prior art, and provides a method for determining whether a video pushes relevant legal knowledge.
The purpose of the invention can be achieved by adopting the following technical scheme:
a method for determining whether a video pushes legal related knowledge, comprising:
constructing a legal provision library and a legal case library;
when the push waiting time reaches a certain threshold value, acquiring at least one subtitle and one bullet screen information corresponding to the playing content in a preset time period of the video;
extracting words meeting a first preset condition from the at least one subtitle to form a subtitle matching legal feature word set;
extracting words meeting a first preset condition from the at least one bullet screen to form a bullet screen matching legal feature word set;
extracting words meeting a second preset condition from the at least one bullet screen to obtain the number of the bullet screen matched with the questioning words;
extracting words meeting a third preset condition from the at least one barrage to obtain the number of the barrage matched with the negative emotion words;
and (4) carrying out similarity calculation on each text in the legal item library and the legal case library by using the subtitle matching legal feature word set to obtain the legal name and the similarity value of the text with the highest similarity value.
Carrying out heat analysis based on a search engine on the legal name to obtain a heat analysis value;
and acquiring the video playing progress percentage.
Taking a subtitle matching legal feature word set, a bullet screen matching legal feature word set, the number of bullet screen matching confusion words, the number of bullet screen matching negative emotion words, a similarity value, a heat analysis value and a video playing progress percentage as feature items, and performing binary classification by adopting a machine learning model to obtain a classification result;
the classification result is used for determining whether related legal information is pushed to the user at the current moment;
recalculating the word quantity and the pushing waiting time of the next period of time, and recommending again when the pushing waiting time reaches a certain threshold value again;
judging whether the classification result is correct or not according to the condition that a user clicks or browses the pushed clauses, and taking the judgment result and the characteristic items thereof as machine learning training data;
and retraining the machine learning model under a preset condition.
Preferably, the building of the legal case base includes: and automatically abstracting the text of the mined legal case text by using a text automatic abstraction technology, and respectively storing the obtained single legal case abstract text and the corresponding single original legal case text into the same document but different fields of a legal case library.
Preferably, the acquiring at least one subtitle and one bullet screen information corresponding to the playing content in the preset time period of the video includes: and if the video has the subtitle file, extracting the text in the subtitle file as a subtitle set to be processed. And if the subtitle file does not exist in the video, converting the audio played in the audio and video file into a text as a subtitle set to be processed according to a voice recognition technology.
Preferably, the words of the first preset condition include: and acquiring a preset legal knowledge word bank from the legal item library and the legal case library by combining a keyword extraction technology.
The obtaining of the special words in the legal field further comprises:
aiming at the existing preset words, a deep learning word vector tool is adopted for word expansion;
preferably, the words of the second preset condition include: a predetermined set of interrogatories, wherein the set of interrogatories has a degree of interrogatories.
Preferably, the words of the third preset condition include: and acquiring the emotional words with negative properties according to the public emotional dictionary, and constructing a negative emotional word library.
Preferably, the inputting into the machine learning model performs binary classification, including: the training method of the machine learning model adopts a Support Vector Machine (SVM) algorithm.
Preferably, the recalculating the word number and the push waiting time of the next period of time, and performing recommendation again when the push waiting time reaches a certain threshold value again includes: when the machine learning classification result is not recommended, resetting the pushing waiting time to be null, resetting the subtitle matching legal feature word set to be null, resetting the bullet screen matching legal feature word set to be null, resetting the number of bullet screen matching confusion words to be null, resetting the number of bullet screen matching negative emotion words to be null, resetting the similarity value to be null and resetting the heat analysis value to be null; and restarting the calculation of the pushing waiting time, and recommending again when the pushing waiting time reaches a certain threshold value again.
Preferably, the determining whether the classification result is correct according to the condition that the user clicks or browses the pushed essay includes: and acquiring a click browsing log of the user client to the recommendation, and judging that the pushing result is correct when the recommendation result is pushing and the user click browsing recommendation result exceeding a certain threshold percentage exceeds a certain time threshold, otherwise, judging that the pushing result is wrong.
The technical scheme provided by the invention has the following beneficial effects:
according to the method, a legal item library, a legal case library and a word library special for the legal field are constructed, subtitle and bullet screen information corresponding to video playing contents are captured, a subtitle matching legal feature word set, a bullet screen matching legal feature word set, the number of bullet screen matching question words and the number of bullet screen matching negative emotion words are obtained according to word library matching; and calculating the legal text with the highest similarity with the subtitle matching legal feature word set in the legal entry library and the legal case library and the search heat of a search engine thereof, acquiring the video playing progress percentage, inputting the features into a machine learning classifier for classification, and determining whether the current node pushes the corresponding legal entry or case, so that the pushing is more targeted, the technical problem of excessive knowledge pushing is solved, and the user experience is improved.
Drawings
Fig. 1 is a flowchart illustrating a method for determining whether a video pushes legal related knowledge according to an embodiment of the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention will be made with reference to the accompanying drawings and examples, so that how to implement the present invention by applying technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Herein as "exemplary"
Any embodiment described is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure.
It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
As shown in fig. 1, the method for determining whether a video pushes related legal knowledge according to this embodiment includes the following steps:
1) constructing a legal bar library and a legal case library, which specifically comprises the following steps:
1.1) mining a large number of legal provisions from the Internet or legal documents by using the technologies such as a crawler or a regular expression, and storing the detailed legal provisions into an elastic search server by taking a certain provision as a document unit to form a legal provision library.
The ElasticSearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on RESTful web interface. The Elasticsearch was developed in Java and published as open source under the Apache licensing terms, and is currently a popular enterprise-level search engine. The system can achieve real-time search, and is stable, reliable, quick and convenient to install and use.
The format of the single legal provision data is as follows:
sexual intercourse
The second hundred and ninety three
One of the following aggressive sexual act, which disrupts social order, is apprehended, restrained or regulated in the following five years:
(I) beating others at will and having bad mood;
(II) chasing, intercepting, abusing, threatening others, and having bad plot;
(III) the strong holding is hard or is damaged at will, occupies public and private properties, and has serious plot;
and (IV) giving up a jeopardy in a public place. Causing serious disorder of the order of public places.
The method can collect behavior of other people for applying the antecedent money for many times, seriously destroy social order, and can punish money after five years or more and ten years or less.
1.2) mining a large number of legal cases from the Internet or legal documents by using technologies such as a crawler or a regular expression, performing text summarization on each legal case by using a text automatic summarization technology of a HanLP natural language processing package, and respectively storing the obtained single legal case summary and a corresponding single original legal case text into the same document but different fields of an elastic search server, namely an original legal case field and a legal case summary field, so as to form a legal case library.
The automatic abstract is a simple coherent short text which can automatically and accurately reflect the content of a certain document center by using a computer to automatically extract the abstract from an original document. The HanLP natural language processing bag has the characteristics of complete functions, high performance, clear architecture, modern linguistic data and self-definition. When rich functions are provided, the HanLP internal module insists on low coupling, the model insists on inertia loading, the service insists on static providing and the dictionary insists on plaintext publishing, so that the use is very convenient, and meanwhile, the HanLP internal module is provided with some corpus processing tools to help a user to train own corpus.
The format of the single legal case data is as follows:
basic case information
In 2014 3, 11, 23 nights, the victims are allowed to stand on ktv by 4 men and 2 women in one row of the grandson, so that the victims are injured when the victims hold one, grandson, don and Ma, and the facilities in the ktv are broken down. Criminal suspects are that someone in sun and someone in korea are judged to be a slight injury by judicial judgment, criminal suspects form criminal offences by the law of criminal law on one sun, criminal arrests are executed by central office of public bureau of metropolitan area and central office of metropolitan area on day 10 and 23 in 2014, criminal arrest is approved by central office of metropolitan area and central office of metropolitan area on day 11 and 6 in 2014, and fries are detained at guard houses of metropolitan area and central office of metropolitan area. If someone in the house, someone in the sun, someone after the house and someone in the horse are slightly injured, the criminal law is violated by the second hundred and thirty-four rules, so that the criminal law is suspected to constitute an intentional injury crime, and the criminal law is currently in the investigation stage.
Case results
In this case, a person who is one of grandchildren has a first story and is voluntarily convicted, the subjective malignancy is small, the social hazard is light, and the person who is one of grandchildren, another person who is one of graduates and another person who is one of grandchildren is intentionally harmed, and through mediation by a lawyer, three persons voluntarily compensate 8 m yuan for the person who is one of grandchildren and the person who is one of Korean, and meanwhile, the person who is one of grandchildren obtains an understanding of the other person. In 2015, 4, 23, the first criminal judgment book of the first criminal character in 2015, the first criminal judgment book of the first criminal character is used for judging whether a certain person seeks an aggressive criminal and the judgment is performed for 6 months.
The format of the single legal case summary data is as follows:
basic case information
According to the rules of criminal law, a certain tentacle on the sun sets the two hundred and thirty four rules, a certain 4 male and a certain 2 female on the sun fight the person at ktv, a certain one on the sun, a certain one after the horse and a certain one on the horse are slightly injured, so that the suspicion forms an intentional injury crime and the suspicion forms an aggressive sexual guilt.
Case results
Three people voluntarily claim for 8 ten thousand yuan for each of someone else, in the scheme, someone else has a first plot, decides to opportunistically commit crime, voluntarily commit, and judge to be futuristic for 6 months.
2) The method comprises the following steps of constructing a legal knowledge word bank, and specifically comprising:
2.1) extracting a large number of keywords contained in texts in the legal entry library and the legal case library by using a tfidf technology of a machine learning open source software package scimit-lean according to the legal entry library and the legal case library obtained in the step 1), and selecting words with high legal relevance such as 'turning', 'obsessive-compulsive', 'fighting', and the like through manual screening to construct an initial seed word library. Putting the established seed words into a synonym word forest for retrieval (or putting the seed words into a knowledge network body for retrieval) to obtain other similar words of the category, wherein the selection of the similar words is mainly selected semantically, the words not only consider the literal representation, but also record the corresponding words with similar semantics and concepts, and ensure the sufficient entry of the semantic words; and after enough similar words exist, performing second word diffusion collection.
According to the technology, a google open source tool word2vec based on Deep learning technology is adopted for carrying out second commodity lexicon divergence, the linguistic data of the word2vec is trained by the latest linguistic data, so that the problem that the synonym word forest or the known net cannot keep the latest words and expressions to be updated all the time is solved, and the lexicon is continuously maintained and optimized through multiple iterations of the method.
3) Capturing subtitle information corresponding to the playing content in the preset time period of the video, performing keyword matching on the subtitle information according to the legal knowledge word library obtained in the step 2), and adding the matched keyword set into the subtitle legal word set to form a subtitle matching legal feature word set.
And 3.1) acquiring a caption text file corresponding to the playing content in the preset time period of the video, and converting the audio played in the currently played content into a text as caption information if the caption text file cannot be extracted.
3.2) carrying out keyword matching on the subtitle text obtained in the step 3.1) according to the legal knowledge word bank obtained in the step 2), such as: the method comprises the steps that a caption text corresponding to playing content in a preset time period of a video is that when a user plays a certain word and goes to school at noon, a crowd gets a king word to cause serious injury to the brain of the king word, the caption text is matched with a legal knowledge word bank to obtain a keyword set of the crowd, the blow and the serious injury, and the keyword set is added into a caption legal word set to obtain a caption matching legal feature word set.
4) By manually referring to sentences which are used daily and represent questions, extracting main word and symbol characteristics of the questions, and extracting the main word and symbol characteristics including but not limited to ' why, what and so, and so ' to construct a suspicion word library '.
5) According to the emotion dictionary of the Hopki, the NTU emotion dictionary and the like, the emotion dictionary of negative characters is obtained, and a negative emotion word bank is constructed, wherein the emotion words related to the negative emotion are mainly extracted by a linguistic analysis method, and words related to the negative emotion, such as ' anger, death, anger, sadness, nausea, sadness, joy and the like, but not limited to ' anger, sadness and joy ' are extracted.
6) Capturing bullet screen information corresponding to playing contents in the preset time period of the video, performing keyword matching on the bullet screen information according to the legal knowledge word bank obtained in the step 2), the confusion word bank obtained in the step 4) and the negative emotion word set obtained in the step 5), and adding keywords matched from the legal knowledge word bank into the bullet screen legal word set to obtain a bullet screen matching legal feature word set; adding keywords matched from the confusion expression word library into a confusion trigger word set to obtain the number of bullet screen matching confusion words; adding the keywords matched from the negative emotion word set into the bullet screen matching negative emotion word set to obtain the number of the bullet screen matching negative emotion words;
7) the subtitle matching legal feature word set obtained in the step 3) is used as a retrieval key word set, such as: "intercept, abuse, disorder, order", regard Okapi BM25 algorithm as the calculation method of similarity, use search keyword set to go on the similarity search to each text of legal case summary field in legal entry library and legal case library separately, get and match legal characteristic word set highest single legal entry or legal case summary of similarity with subtitle, if it is legal case summary that is got, take brother field legal case text of its identical document as the content of returning. The single search returns the following results:
"the second hundred and ninety-three oppositional defiant acts have one of the following oppositional defiant acts, which destroys social order, with an evading, obligation or restraint … … for five years or less" and a retrieved french similarity score, such as "0.6325".
8) According to the legal provision or the legal case abstract with the highest content similarity with the subtitle matching legal feature word set obtained in the step 7), sending the represented legal title, such as 'oppositional defiant crime', into a Baidu search engine by adopting a crawler technology to obtain the Baidu index of the legal provision in the last month: 1578, it represents the search popularity of the article, that is, the degree of the article required by the ordinary user, so as to obtain the search popularity of the article to be pushed.
9) And acquiring the video playing progress percentage.
10) Matching the subtitle with the legal feature word set obtained in the step 3), the bullet screen with the legal feature word set obtained in the step 6), the number of the bullet screen with the confusion words and the number of the bullet screen with the negative emotion words, scoring the similarity of the French obtained in the step 7), searching the heat value of the to-be-pushed clauses obtained in the step 8), taking the percentage of the video playing progress obtained in the step 9) as the feature items required by machine learning, artificially labeling 10000 pieces of data as a training set, and training a 'push' binary classifier and a 'no-push' binary classifier by adopting a Support Vector Machine (SVM) algorithm according to the training set.
11) In the video playing process, every 10 seconds, the subtitle matching legal feature word set obtained in the step 3), the bullet screen matching legal feature word set obtained in the step 6), the number of the bullet screen matching confusion words and the number of the bullet screen matching negative emotion words, the legal similarity score obtained in the step 7), and the pushed clause retrieval heat value obtained in the step 8) are used as data required by machine learning prediction and are sent into the classifier obtained in the step 9) to obtain the predicted value of the classifier, namely 'pushing' or 'not pushing', so that whether the corresponding legal clause or case is pushed or not at the current video node is determined.
12) According to the condition that a user clicks and reads the pushed legal provisions or legal cases, the legal provisions or legal cases are used as new machine learning samples, the new machine learning samples are added into the training set required by the svm classifier obtained in the step 10), and the classifier is re-optimized and machine learning trained every time 1000 samples are added.
When a client browsed by a user clicks the pushed legal provision or legal case, a client program starts to calculate the time length of the pushed legal provision or legal case browsed by the user in a timing mode, when the time length exceeds 15 seconds, the pushing can be judged to be correct, the subtitle legal word set obtained in the step 3), the bullet screen obtained in the step 6) is matched with a legal feature word set, the number of puzzled words matched with the bullet screen and the number of negative emotional words matched with the bullet screen are counted, the similarity of the French obtained in the step 7) is scored, the pushed provision obtained in the step 8) is searched for the heat value, the video playing progress percentage in the step 9) is used as a feature item, the new sample is added into the svm classifier training set obtained in the step 10), and the svm classifier obtained in the step 10) is re-optimized and machine learning training is carried out every 1000 samples are added.

Claims (9)

Translated fromChinese
1.一种决定视频是否推送相关法律知识的方法,其特征在于,包括:1. a method for deciding whether video pushes relevant legal knowledge, is characterized in that, comprises:构建法律条文库及法律案例库;Construct a database of legal provisions and legal cases;当推送等待时间达到一定阈值时,获取视频预设时间段内播放内容对应的至少一个字When the push wait time reaches a certain threshold, obtain at least one word corresponding to the content played within the preset time period of the video幕和一个弹幕信息;screen and a barrage message;所述至少一个字幕中提取出满足第一预设条件的词,组成字幕匹配法律特征词集合;Extracting words that meet the first preset condition from the at least one subtitle to form a subtitle matching legal feature word set;所述至少一个弹幕中提取出满足第一预设条件的词,组成弹幕匹配法律特征词集合;Extracting words that meet the first preset condition from the at least one bullet screen to form a bullet screen matching legal feature word set;所述至少一个弹幕中提取出满足第二预设条件的词,得到弹幕匹配疑问词个数;Extracting words that satisfy the second preset condition from the at least one bullet screen, and obtaining the number of interrogative words matched by the bullet screen;所述至少一个弹幕中提取出满足第三预设条件的词,得到弹幕匹配消极情感词个数;Extracting words that satisfy the third preset condition from the at least one barrage, and obtaining the number of negative emotion words matched by the barrage;将字幕匹配法律特征词集合对法律条文库及法律案例库中每条文本进行相似度计算,Match the subtitles to the legal feature word set to calculate the similarity of each text in the legal article library and the legal case library.得到相似度值最高那条文本的法律名称及其相似度值;Get the legal name of the text with the highest similarity value and its similarity value;对所述法律名称进行基于搜索引擎的热度分析,得到热度分析值;Perform a search engine-based heat analysis on the legal name to obtain a heat analysis value;获取所述视频播放进度百分比;Get the video playback progress percentage;以字幕匹配法律特征词集合、弹幕匹配法律特征词集合、弹幕匹配疑惑词个数、弹幕匹Match the legal feature word set with subtitles, the barrage matches the legal feature word set, the barrage matches the number of doubtful words, the barrage match配消极情感词个数、相似度值、热度分析值、视频播放进度百分比作为特征项,采用机器学The number of negative emotional words, similarity value, heat analysis value, and video playback progress percentage are used as feature items, and machine learning is adopted.习模型进行二元分类,得到分类结果;The learning model is used for binary classification, and the classification results are obtained;所述分类结果,用于确定当前时刻是否向用户推送相关法律信息;The classification result is used to determine whether to push relevant legal information to the user at the current moment;重新计算下一段时间的词数量和推送等待时间,当推送等待时间再次达到一定阈值Recalculate the number of words and push wait time in the next period, when push wait time reaches a certain threshold again时,再次进行推荐;, recommend again;根据用户点击或浏览推送条文的情况,判断分类结果是否正确,将判定结果及其特征Judging whether the classification result is correct or not according to the user's click or browsing the push articles, the judgment result and its characteristics will be determined.项,作为机器学习训练数据;items, as machine learning training data;在预设条件下重新训练所述机器学习模型。Retrain the machine learning model under preset conditions.2.根据权利要求1所述的方法,其中,所述构建法律案例库,包括:使用文本自动摘要技2. The method according to claim 1, wherein said building a legal case library comprises: using automatic text summarization technology术对挖掘到的法律案例文本进行文本自动摘要,将所得到的单条法律案例摘要文本以及对The technology automatically summarizes the mined legal case text, and the obtained single legal case abstract text and the应的单条原始法律案例文本,分别储存入法律案例库的同一个文档但不同的字段中。The corresponding single original legal case text is stored in the same document but in different fields of the legal case database.3.根据权利要求1所述的方法,其中,所述获取视频预设时间段内播放内容对应的至少3. The method according to claim 1, wherein at least one corresponding to the content to be played in the acquired video preset time period is obtained.一个字幕和一个弹幕信息包括:若视频存在字幕文件,抽取字幕文件中的文本作为待处理A subtitle and a bullet screen information include: if a subtitle file exists in the video, extract the text in the subtitle file as pending processing字幕集合,若视频不存在字幕文件,则根据语音识别技术,将音视频文件中播放的音频Subtitle collection, if there is no subtitle file in the video, the audio played in the audio and video file will be转换为文本,作为待处理字幕集合。Convert to text as a collection of subtitles to process.4.根据权利要求1所述的方法,其中,所述第一预设条件的词包括:从法律条文库及法4. The method according to claim 1, wherein the word of the first preset condition comprises: from a legal article library and a legal律案例库中结合关键词提取技术,获取预设的法律领域专用词库;Combine the keyword extraction technology in the legal case database to obtain a preset special thesaurus in the legal field;所述获取法律领域专用词,还进一步包括:The acquisition of special words in the legal field further includes:针对现有的预设词,采用深度学习词向量工具进行词语扩展。For the existing preset words, the deep learning word vector tool is used for word expansion.5.根据权利要求1所述的方法,其中,所述第二预设条件的词包括:预设的疑问词集合,5. The method according to claim 1, wherein the words of the second preset condition comprise: a preset set of interrogative words,其中,所述疑问词集合具有疑问程度。Wherein, the interrogative word set has a degree of interrogation.6.根据权利要求1所述的方法,其中,所述第三预设条件的词包括:根据公开情感词典,6. The method according to claim 1, wherein the words of the third preset condition include: according to a public sentiment dictionary,获取消极性质的情感词,构建消极情感词库。Obtain negative emotion words and construct a negative emotion thesaurus.7.根据权利要求1的方法,其中,输入机器学习模型中进行二元分类,包括:7. The method of claim 1, wherein the input into a machine learning model for binary classification comprises:所述机器学习模型的训练方法采用支持向量机(SVM)算法。The training method of the machine learning model adopts the support vector machine (SVM) algorithm.8.根据权利要求1的方法,其中,所述重新计算下一段时间的词数量和推送等待时间,8. The method according to claim 1, wherein the recalculation of the word quantity and the push waiting time in the next period of time,当推送等待时间再次达到一定阈值时,再次进行推荐,包括:当所述机器学习分类结果为不When the push waiting time reaches a certain threshold again, the recommendation is performed again, including: when the machine learning classification result is no推荐,则重置推送等待时间为空,重置字幕匹配法律特征词集合为空,重置弹幕匹配法律特If recommended, reset the push wait time to be empty, reset the subtitle matching legal feature word set to be empty, reset the barrage matching legal feature征词集合为空,重置弹幕匹配疑惑词个数为空,重置弹幕匹配消极情感词个数为空,重置相The collection of words is empty, the number of doubtful words in the reset barrage is empty, the number of negative emotion words in the reset barrage is empty, and the number of negative emotion words in the reset barrage is empty.似度值为空,重置热度分析值为空;重新开始推送等待时间的计算,当推送等待时间再次达Likelihood value is empty, reset heat analysis value is empty; restart the calculation of push wait time, when push wait time reaches到一定阈值时,再次并进行推荐。When a certain threshold is reached, it is recommended again and again.9.根据权利要求1的方法,其中,所述根据用户点击或浏览推送条文的情况,判断分类9. The method according to claim 1, wherein the classification is determined according to the situation that the user clicks or browses the push articles结果是否正确,包括:获取用户客户端对推荐的点击浏览日志,当推荐结果为推送,并且超Whether the result is correct, including: obtaining the user client's click browsing log for the recommendation, when the recommendation result is push, and the过一定阈值百分比的用户点击浏览推荐结果超过一定时间阈值时,判定该次推送结果为正When the percentage of users who pass a certain threshold clicks to browse the recommended results and exceeds a certain time threshold, it is determined that the push result is positive确,反之判定该次推送结果为错。Yes, otherwise it is judged that the push result is wrong.
CN201910581969.5A2019-06-302019-06-30 A method to decide whether a video pushes relevant legal knowledgeActiveCN110309265B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910581969.5ACN110309265B (en)2019-06-302019-06-30 A method to decide whether a video pushes relevant legal knowledge

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910581969.5ACN110309265B (en)2019-06-302019-06-30 A method to decide whether a video pushes relevant legal knowledge

Publications (2)

Publication NumberPublication Date
CN110309265A CN110309265A (en)2019-10-08
CN110309265Btrue CN110309265B (en)2021-07-06

Family

ID=68078517

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910581969.5AActiveCN110309265B (en)2019-06-302019-06-30 A method to decide whether a video pushes relevant legal knowledge

Country Status (1)

CountryLink
CN (1)CN110309265B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109558513B (en)*2018-11-302021-09-24百度在线网络技术(北京)有限公司Content recommendation method, device, terminal and storage medium
CN110942070B (en)*2019-11-292023-09-19北京奇艺世纪科技有限公司Content display method, device, electronic equipment and computer readable storage medium
CN111263195B (en)*2020-01-082022-04-15上海米哈游天命科技有限公司Barrage processing method and device, server equipment and storage medium
CN112699186B (en)*2020-11-192025-01-07深圳季连科技有限公司 A method and system for constructing a causal graph based on code words
CN113468377A (en)*2021-07-012021-10-01同方知网(北京)技术有限公司Video and literature association and integration method
CN116758947B (en)*2023-08-142023-10-20北京分音塔科技有限公司 Audio emotion-based auxiliary trial methods, devices, equipment and storage media
CN118193719B (en)*2024-05-132024-08-06西昌学院 A legal knowledge intelligent query method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105022821A (en)*2015-07-202015-11-04广东欧珀移动通信有限公司Content filtering method and terminal
JP2017169133A (en)*2016-03-172017-09-21シャープ株式会社Reception device and broadcast system
CN109474847A (en)*2018-10-302019-03-15百度在线网络技术(北京)有限公司Searching method, device, equipment and storage medium based on video barrage content
US10244203B1 (en)*2013-03-152019-03-26Amazon Technologies, Inc.Adaptable captioning in a video broadcast

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10244203B1 (en)*2013-03-152019-03-26Amazon Technologies, Inc.Adaptable captioning in a video broadcast
CN105022821A (en)*2015-07-202015-11-04广东欧珀移动通信有限公司Content filtering method and terminal
JP2017169133A (en)*2016-03-172017-09-21シャープ株式会社Reception device and broadcast system
CN109474847A (en)*2018-10-302019-03-15百度在线网络技术(北京)有限公司Searching method, device, equipment and storage medium based on video barrage content

Also Published As

Publication numberPublication date
CN110309265A (en)2019-10-08

Similar Documents

PublicationPublication DateTitle
CN110309265B (en) A method to decide whether a video pushes relevant legal knowledge
CN110309393B (en)Data processing method, device, equipment and readable storage medium
Haidar et al.Multilingual cyberbullying detection system: Detecting cyberbullying in Arabic content
Agirre et al.Enriching WordNet concepts with topic signatures
US20070011154A1 (en)System and method for searching for a query
Bowden et al.Slugnerds: A named entity recognition tool for open domain dialogue systems
McKeown et al.“Got You!”: Automatic vandalism detection in wikipedia with web-based shallow syntactic-semantic modeling
Shamim Khan et al.Enhanced web document retrieval using automatic query expansion
Mukherjee et al.Wikisent: Weakly supervised sentiment analysis through extractive summarization with wikipedia
EroğulSentiment analysis in Turkish
JP2006134183A (en) Information classification method and apparatus, program, and storage medium storing program
Azarafza et al.Textrank-based microblogs keyword extraction method for Persian language
Choi et al.Music subject classification based on lyrics and user interpretations
Chardonnens et al.Mining user queries with information extraction methods and linked data
Torres-Moreno et al.Automatic summarization system coupled with a question-answering system (qaas)
Mizzaro et al.Short text categorization exploiting contextual enrichment and external knowledge
Esra’MLexicon-based detection of violence on social media
CanningForensic stylistics
PascaOpen-domain fine-grained class extraction from web search queries
Okumura et al.Automatic labelling of documents based on ontology
Ojokoh et al.Online question answering system
Bettencourt et al.“Who are you?”: Identifying Young Users from a Single Search Query
AcostaLaff-o-tron: Laugh prediction in ted talks
SterckxTopic detection in a million songs
Pradhan et al.Building a Foundation System for Producing Short Answers to Factual Questions.

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
TA01Transfer of patent application right

Effective date of registration:20210622

Address after:611730 No.1, 1st floor, building 3, No.99, West District Avenue, high tech Zone, Chengdu, Sichuan

Applicant after:CHENGDU HUALV NETWORKING Co.,Ltd.

Address before:Room f101-12, No.1 incubation and production building, guanshao shuangchuang (equipment) center, Huake City, 42 Baiwang Avenue, Wujiang District, Shaoguan City, Guangdong Province, 512026

Applicant before:Shaoguan Qizhi Information Technology Co.,Ltd.

TA01Transfer of patent application right
GR01Patent grant
GR01Patent grant
PE01Entry into force of the registration of the contract for pledge of patent right

Denomination of invention:A method for determining whether to push relevant legal knowledge for videos

Effective date of registration:20230310

Granted publication date:20210706

Pledgee:Bank of Chengdu science and technology branch of Limited by Share Ltd.

Pledgor:CHENGDU HUALV NETWORKING CO.,LTD.

Registration number:Y2023980034584

PE01Entry into force of the registration of the contract for pledge of patent right
PC01Cancellation of the registration of the contract for pledge of patent right

Granted publication date:20210706

Pledgee:Bank of Chengdu science and technology branch of Limited by Share Ltd.

Pledgor:CHENGDU HUALV NETWORKING CO.,LTD.

Registration number:Y2023980034584

PC01Cancellation of the registration of the contract for pledge of patent right

[8]ページ先頭

©2009-2025 Movatter.jp