Movatterモバイル変換


[0]ホーム

URL:


CN108804443A - A kind of judicial class case searching method based on multi-feature fusion - Google Patents

A kind of judicial class case searching method based on multi-feature fusion
Download PDF

Info

Publication number
CN108804443A
CN108804443ACN201710289597.XACN201710289597ACN108804443ACN 108804443 ACN108804443 ACN 108804443ACN 201710289597 ACN201710289597 ACN 201710289597ACN 108804443 ACN108804443 ACN 108804443A
Authority
CN
China
Prior art keywords
fusion
query
word
words
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710289597.XA
Other languages
Chinese (zh)
Inventor
耿伟
司华建
贾真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Fu Chi Information Technology Co Ltd
Original Assignee
Anhui Fu Chi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Fu Chi Information Technology Co LtdfiledCriticalAnhui Fu Chi Information Technology Co Ltd
Priority to CN201710289597.XApriorityCriticalpatent/CN108804443A/en
Publication of CN108804443ApublicationCriticalpatent/CN108804443A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The invention discloses a kind of judicial class case searching methods based on multi-feature fusion, are as follows:User input query is asked;User's inquiry request is pre-processed and is segmented, and removes stop words therein, obtains a group polling keyword;Traversal queries set of words successively carries out query semantics extension, and the query semantics lists of keywords after being expanded for each query word in inquiry set of words by semantic dictionary;Document filtering is carried out using information point, search characteristics inverted index obtains the different characteristic vector of lists of keywords, then carries out multiple features fusion;The fusion similarity value between document and query statement is acquired, and obtains final similarity score;Output is ranked up to final search result.The present invention has many advantages, such as that accuracy is high.

Description

A kind of judicial class case searching method based on multi-feature fusion
Technical field
The present invention relates to judicial class case search field, specifically a kind of judicial class case search based on multi-feature fusionMethod.
Background technology
Law is the product of country, refers to ruling class's (ruling group is exactly political party, including king, monarch), in order toRealize the purpose for ruling and managing country, by certain legislative procedure, the basic statute and general law promulgated.Law is completeThe embodiment of body its people's will, national rule tools.
With coming into the open for social information, the trial result of some legal cases is increasingly paid attention in society, trialIn the process, similar judgement document can be recommended in time as reference, the effect of trial can be effectively improved, currently, generally usingBe the text retrieval system based on keyword, only simply compare the similar of two cases using word matching, it is difficult to accurateIdeal search result is got, reason can be summarized as three aspects:Keyword feature is not comprehensive to the description of document information,To keep similarity calculation inaccurate;It is distributed in the keyword of document difference section block, final similar judgement is influenced also different;Fail constraint of the fine consideration contextual information to keywords semantics, to have to the difference that context change is broughtThe differentiation of effect, therefore work out a kind of searching method that accuracy is high and have become current important one of project.
Invention content
The technical problem to be solved by the present invention is in order to overcome in the prior art recall precision is low, accuracy is not high to lackIt falls into, and a kind of judicial class case searching method based on multi-feature fusion is provided.
The present invention solves the technical solution that above-mentioned technical problem provides:The invention discloses one kind being based on multiple features fusionJudicial class case searching method, be as follows:
(1), user input query is asked;
(2), user's inquiry request is pre-processed and is segmented, and remove stop words therein, obtain a group polling keyword;
(3), traversal queries set of words successively, for each query word in inquiry set of words, by semantic dictionary intoRow query semantics extend, and the query semantics lists of keywords after being expanded;
(4), document filtering is carried out using information point, search characteristics inverted index obtains the different characteristic of lists of keywordsVector, then carry out multiple features fusion;
(5), the fusion similarity value between document and query statement is acquired, and obtains final similarity score;
(6), output is ranked up to final search result.
Preferably, in the step (4), the feature vector include divided group keyword feature vector,Language model feature vector, theme word set feature vector.
Preferably, tfidf information of the keyword feature vector of the divided group by statistics piecemeal entry, thenDivided group;
Preferably, the language model feature vector is operated by carrying out the sliding window that size is N, formation lengthFor the word fragment sequence of N, each word segment is known as gram, is counted to the occurrence frequency of whole gram, and according to thingThe threshold value first set is filtered, and forms key gram lists;
Preferably, the theme word set feature vector indicates concept, an one side, table by using themeIt is now a series of relevant key topic words, is the conditional probability of these key words;
Preferably, in the step (5), the similarity marking formula after multiple features fusion is as follows:
score(q,d)
=a*weightword (q, d)+b*gramScore (q, d)+c
*Simcapte(q, d)
Wherein, a+b+c=1, object function are to find one group of possible parameter combination { a, b, c }, pass through mathematical modelDescription and solution and training data make parameter combination (a, b, c), and adaptively adjustment is optimal.Specific method is to limit firstA, the value range of tri- parameters of b, c is (0,1), rule of thumb takes algebraically appropriate.
Compared with prior art, the present invention has following beneficial advantage:
The present invention passes through semantic dictionary and carries out query semantics extension first so that relationship description is more between searching keyword and wordComprehensively, comprehensive and accurate keyword description is constructed, then passes through the multiple features such as the entry weighting of piecemeal, language model, theme word setSimilarity model is constructed, and integrated ordered to search result progress, greatly improves the accuracy rate and recall rate of the retrieval of class case.
Description of the drawings
Fig. 1 is to build multiple features model schematic offline in the embodiment of the present invention 1;
Fig. 2 is the flow diagram of the judicial class case searching method based on multi-feature fusion in the embodiment of the present invention 1;
Fig. 3 is the multiple features fusion schematic diagram in the embodiment of the present invention 1;
Fig. 4 is the vector space model principle schematic in the embodiment of the present invention 1.
Specific implementation mode
It is specific to walk the invention discloses a kind of judicial class case searching method based on multi-feature fusion referring to Fig.1 shown in -4It is rapid as follows:
(1), user input query is asked;
(2), user's inquiry request is pre-processed and is segmented, and remove stop words therein, obtain a group polling keyword;
(3), traversal queries set of words successively, for each query word in inquiry set of words, by semantic dictionary intoRow query semantics extend, and the query semantics lists of keywords after being expanded;
(4), document filtering is carried out using information point, search characteristics inverted index obtains the different characteristic of lists of keywordsVector, then carry out multiple features fusion;
(5), the fusion similarity value between document and query statement is acquired, and obtains final similarity score;
(6), output is ranked up to final search result.
Preferably, in the step (4), the feature vector include divided group keyword feature vector,Language model feature vector, theme word set feature vector.
Preferably, tfidf information of the keyword feature vector of the divided group by statistics piecemeal entry, thenDivided group;
Preferably, the language model feature vector is operated by carrying out the sliding window that size is N, formation lengthFor the word fragment sequence of N, each word segment is known as gram, is counted to the occurrence frequency of whole gram, and according to thingThe threshold value first set is filtered, and forms key gram lists;
Preferably, the theme word set feature vector indicates concept, an one side, table by using themeIt is now a series of relevant key topic words, is the conditional probability of these key words;
Preferably, in the step (5), the similarity marking formula after multiple features fusion is as follows:
Score (q, d)
=a*weightword (q, d)+b*gramScore (q, d)+c
*Slmcapte(q,d)
Wherein, a+b+c=1, object function are to find one group of possible parameter combination { a, b, c }, pass through mathematical modelDescription and solution and training data make parameter combination (a, b, c), and adaptively adjustment is optimal.Specific method is to limit firstA, the value range of tri- parameters of b, c is (0,1), rule of thumb takes algebraically appropriate.
Embodiment 1
The invention discloses a kind of judicial class case searching methods based on multi-feature fusion, are as follows:
(1), user input query is asked;
(2), user's inquiry request is pre-processed and is segmented, and remove stop words therein, obtain a group polling keyword;
(3), traversal queries set of words successively, for each query word in inquiry set of words, by semantic dictionary intoRow query semantics extend, and the query semantics lists of keywords after being expanded;
(4), document filtering is carried out using information point, search characteristics inverted index obtains the different characteristic of lists of keywordsThe keyword of vector, including keyword feature vector, language model feature vector, theme word set feature vector, divided group is specialThe vectorial tfidf information by counting piecemeal entry of sign, then divided group, language model feature vector are N by carrying out sizeSliding window operation, formation length is the word fragment sequence of N, and each word segment is known as gram, goes out to whole gramExisting frequency is counted, and is filtered according to the threshold value being previously set, and is formed key gram lists, is with 2-gram modelsExample, the method for calculating the adjacent similarity score of word, calculation formula are as follows:
Indicate the Words similarity score between query string q and document d;2-gram (q) indicates the 2-gram collection of query stringIt closes, 2-gram (d) indicates the 2-gram set of document
Specific algorithm is described as follows:Input pretreated query string q, document d
Export the adjacent similarity score of word between q and d
A, the 2-gram set 2-gram (q) of q are acquired;
B, the 2-gram set 2-gram (d) of d are acquired;
C, q similarity score gramScore (q, d) adjacent with the word of d are calculated by 2-gram (q) and 2-gram (d);
Theme word set feature vector indicates concept, an one side by using theme, shows as a series of correlationsKey topic word, be the conditional probability of these key words,
Then multiple features fusion is carried out to features described above vector;
(5), the fusion similarity value between document and query statement is acquired, and obtains final similarity score, specificallyStep is
Hypothesized model regards document as a vector being made of t dimensional features, and feature is commonly using word come tableShow, each feature can calculate its weight according to certain basis for estimation, and feature of this t dimensions with weight together constitutes a textBook;
In order to calculate the score value, document and inquiry are all expressed as vector, and document is regarded as a series of words (Term) by we,Each word (Term) is there are one weight (Term weight), and different word (Term) is according to oneself weight in documentMarking to influence document relevance calculates,
Then the weight (term weight) of word (term) in this all document is regarded as a vector by we,
Document=term1, term2 ..., term N }
Document Vector=weight1, weight2 ..., weight N }
Equally query statement is regarded as a simple document by we, is also indicated with vector,
Query=term1, term 2 ..., term N }
Query Vector=weight1, weight2 ..., weight N }
We are put into all document vectors searched out and query vector in one N-dimensional space, and each word (term) isOne-dimensional, vector space model principle is as shown in Figure 4:
Then the similarity value between document and query statement is obtained by following formula:
Query semantics extend so that more comprehensively, the keyword based on divided group is special for relationship description between searching keyword and wordSign embodies keyword distributed intelligence;Keyword feature based on language model embodies keyword dependence and context languageThe constraint of adopted keywords semantics;And query terms and descriptor correlativity, body are introduced based on the keyword feature of theme word setThe likelihood score between inquiry and document block is showed, our target is, the keyword feature of divided group, language model is specialSign, descriptor feature combine, maximize favourable factors and minimize unfavourable ones, complement one another, and describe a document jointly, to according to these feature calculationsSimilarity between inquiry and document,
Similarity marking formula after multiple features fusion is as follows:
Score (q, d)
=a*weightword (q, d)+b*gramScore (q, d)+c
*Slcapte(q, d)
Wherein, a+b+c=1, object function are to find one group of possible parameter combination { a, b, c }, pass through mathematical modelDescription and solution and training data make parameter combination (a, b, c), and adaptively adjustment is optimal.Specific method is to limit firstA, the value range of tri- parameters of b, c is (0,1), rule of thumb takes algebraically appropriate;
(6), output is ranked up to final search result.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripeThe personage for knowing this technology can all carry out modifications and changes to above-described embodiment without violating the spirit and scope of the present invention.CauseThis, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such asAt all equivalent modifications or change, should by the present invention claim be covered.

Claims (6)

CN201710289597.XA2017-04-272017-04-27A kind of judicial class case searching method based on multi-feature fusionPendingCN108804443A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710289597.XACN108804443A (en)2017-04-272017-04-27A kind of judicial class case searching method based on multi-feature fusion

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710289597.XACN108804443A (en)2017-04-272017-04-27A kind of judicial class case searching method based on multi-feature fusion

Publications (1)

Publication NumberPublication Date
CN108804443Atrue CN108804443A (en)2018-11-13

Family

ID=64070316

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710289597.XAPendingCN108804443A (en)2017-04-272017-04-27A kind of judicial class case searching method based on multi-feature fusion

Country Status (1)

CountryLink
CN (1)CN108804443A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110222260A (en)*2019-05-212019-09-10深圳壹账通智能科技有限公司A kind of searching method, device and storage medium
CN110347812A (en)*2019-06-252019-10-18银江股份有限公司A kind of search ordering method and system towards judicial style
CN110582761A (en)*2018-10-242019-12-17阿里巴巴集团控股有限公司 Intelligent Customer Service Based on Vector Propagation Model on Click Map
CN111368022A (en)*2020-02-282020-07-03山东汇贸电子口岸有限公司Method and tool for realizing book screening by using reverse index
CN111797247A (en)*2020-09-102020-10-20平安国际智慧城市科技股份有限公司Case pushing method and device based on artificial intelligence, electronic equipment and medium
CN112131456A (en)*2019-06-242020-12-25腾讯科技(北京)有限公司 An information push method, device, device and storage medium
CN113535805A (en)*2021-06-172021-10-22科大讯飞股份有限公司Data mining method and related device, electronic equipment and storage medium
CN115017257A (en)*2022-04-212022-09-06南京坤爵信息技术有限公司 A Method of Intelligent Super Retrieval Based on KTree Algorithm
CN117708283A (en)*2023-11-292024-03-15北京中关村科金技术有限公司Recall content determining method, recall content determining device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101540017A (en)*2009-04-282009-09-23黑龙江工程学院Feature extraction method based on byte level n-gram and junk mail filter
CN104050243A (en)*2014-05-282014-09-17黄斌Network searching method and system combined with searching and social contact
CN104050235A (en)*2014-03-272014-09-17浙江大学Distributed information retrieval method based on set selection
CN104143005A (en)*2014-08-042014-11-12五八同城信息技术有限公司Related searching system and method
CN104778201A (en)*2015-01-232015-07-15湖南科技大学Multi-query result combination-based prior art retrieval method
CN105117386A (en)*2015-09-192015-12-02杭州电子科技大学Semantic association method based on book content structures
CN106294662A (en)*2016-08-052017-01-04华东师范大学Inquiry based on context-aware theme represents and mixed index method for establishing model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101540017A (en)*2009-04-282009-09-23黑龙江工程学院Feature extraction method based on byte level n-gram and junk mail filter
CN104050235A (en)*2014-03-272014-09-17浙江大学Distributed information retrieval method based on set selection
CN104050243A (en)*2014-05-282014-09-17黄斌Network searching method and system combined with searching and social contact
CN104143005A (en)*2014-08-042014-11-12五八同城信息技术有限公司Related searching system and method
CN104778201A (en)*2015-01-232015-07-15湖南科技大学Multi-query result combination-based prior art retrieval method
CN105117386A (en)*2015-09-192015-12-02杭州电子科技大学Semantic association method based on book content structures
CN106294662A (en)*2016-08-052017-01-04华东师范大学Inquiry based on context-aware theme represents and mixed index method for establishing model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
斯日古楞等: ""融合主题与语言模型的蒙古文信息检索方法研究"", 《计算机应用研究》*

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110582761A (en)*2018-10-242019-12-17阿里巴巴集团控股有限公司 Intelligent Customer Service Based on Vector Propagation Model on Click Map
CN110582761B (en)*2018-10-242023-05-30创新先进技术有限公司 Intelligent Customer Service Based on Vector Propagation Model on Click Graph
CN110222260A (en)*2019-05-212019-09-10深圳壹账通智能科技有限公司A kind of searching method, device and storage medium
CN112131456A (en)*2019-06-242020-12-25腾讯科技(北京)有限公司 An information push method, device, device and storage medium
CN110347812A (en)*2019-06-252019-10-18银江股份有限公司A kind of search ordering method and system towards judicial style
CN110347812B (en)*2019-06-252021-09-10银江股份有限公司Search ordering method and system for judicial texts
CN111368022A (en)*2020-02-282020-07-03山东汇贸电子口岸有限公司Method and tool for realizing book screening by using reverse index
CN111797247A (en)*2020-09-102020-10-20平安国际智慧城市科技股份有限公司Case pushing method and device based on artificial intelligence, electronic equipment and medium
CN113535805A (en)*2021-06-172021-10-22科大讯飞股份有限公司Data mining method and related device, electronic equipment and storage medium
CN113535805B (en)*2021-06-172024-06-04科大讯飞股份有限公司Data mining method, related device, electronic equipment and storage medium
CN115017257A (en)*2022-04-212022-09-06南京坤爵信息技术有限公司 A Method of Intelligent Super Retrieval Based on KTree Algorithm
CN117708283A (en)*2023-11-292024-03-15北京中关村科金技术有限公司Recall content determining method, recall content determining device and electronic equipment

Similar Documents

PublicationPublication DateTitle
CN108804443A (en)A kind of judicial class case searching method based on multi-feature fusion
Asyaky et al.Improving the performance of HDBSCAN on short text clustering by using word embedding and UMAP
CN109101479B (en)Clustering method and device for Chinese sentences
CN105653706B (en)A kind of multilayer quotation based on literature content knowledge mapping recommends method
Radu et al.Clustering documents using the document to vector model for dimensionality reduction
CN104765769A (en) A Short Text Query Expansion and Retrieval Method Based on Word Vector
CN104008090A (en)Multi-subject extraction method based on concept vector model
CN102402561B (en)Searching method and device
US20220114340A1 (en)System and method for an automatic search and comparison tool
CN109299357B (en) A method for topic classification of Lao texts
CN108549697A (en)Information-pushing method, device, equipment based on semantic association and storage medium
CN112632261A (en)Intelligent question and answer method, device, equipment and storage medium
CN114298020B (en)Keyword vectorization method based on topic semantic information and application thereof
CN102637179B (en)Method and device for determining lexical item weighting functions and searching based on functions
Halevy et al.Discovering structure in the universe of attribute names
CN117474703B (en)Topic intelligent recommendation method based on social network
Giglou et al.ParsSimpleQA: The Persian Simple Question Answering Dataset and System over Knowledge Graph
CN104317783B (en) A Calculation Method of Semantic Closeness
Chen et al.Efficient SPARQL queries generator for question answering systems
CN118626611A (en) Retrieval method, device, electronic device and readable storage medium
CN118797005A (en) Intelligent question-answering method, device, electronic device, storage medium and product
CN116306578A (en)Text matching and searching method
Zu et al.Graph-based keyphrase extraction using word and document em beddings
CN113190648A (en)Context semantic based emotion analysis method for microblog short text
CN108595413A (en)A kind of answer extracting method based on semantic dependent tree

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
WD01Invention patent application deemed withdrawn after publication

Application publication date:20181113

WD01Invention patent application deemed withdrawn after publication

[8]ページ先頭

©2009-2025 Movatter.jp