Movatterモバイル変換


[0]ホーム

URL:


CN109344254A - A kind of address information classification method and device - Google Patents

A kind of address information classification method and device
Download PDF

Info

Publication number
CN109344254A
CN109344254ACN201811102935.5ACN201811102935ACN109344254ACN 109344254 ACN109344254 ACN 109344254ACN 201811102935 ACN201811102935 ACN 201811102935ACN 109344254 ACN109344254 ACN 109344254A
Authority
CN
China
Prior art keywords
address information
searching algorithm
processed
sorted
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811102935.5A
Other languages
Chinese (zh)
Other versions
CN109344254B (en
Inventor
李胜
单培
李士勇
张瑞飞
李广刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Science and Technology (Beijing) Co., Ltd.
Original Assignee
Beijing Shenzhou Taiyue Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenzhou Taiyue Software Co LtdfiledCriticalBeijing Shenzhou Taiyue Software Co Ltd
Priority to CN201811102935.5ApriorityCriticalpatent/CN109344254B/en
Publication of CN109344254ApublicationCriticalpatent/CN109344254A/en
Application grantedgrantedCritical
Publication of CN109344254BpublicationCriticalpatent/CN109344254B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

This application provides a kind of address information classification method and device, this method extracts institute's address information to be handled in text;According to each address information to be processed, the integrity degree type of each address information to be processed is determined;According to the position of the integrity degree type and the address information to be processed of each address information to be processed in the text, utilize searching algorithm forward and searching algorithm backward, the corresponding address information to be sorted of each address information to be processed is obtained, the address information to be sorted is sufficient address information;Using the contextual information of each address information to be sorted, classify to each address information to be sorted, obtains the corresponding classification of each address information to be sorted.Therefore, whether the address information that no matter the application extracts is complete, and sufficient address information finally can be obtained and carry out Accurate classification, improve the accuracy of classification results.

Description

A kind of address information classification method and device
Technical field
This application involves text-processing field more particularly to a kind of address information classification method and devices.
Background technique
Following human-machine interaction data is related to address information for more and more, and internet becomes the address information constantly updatedData warehouse has accumulated a large amount of formal Specifications, non-standard address information.And it is related to the industry of address information, address is believedThe demand for ceasing data is higher and higher, so that the analysis, research and decision of all kinds of business provide support.Therefore, how effectively fromAddress descriptive information is extracted in text context, and it is the work of one necessary and very strong practicability that it, which is accurately classified,.
Existing processing method is to carry out address information first with the address information extraction method based on biLSTM technologyExtraction, then classify again to the address information of extraction;But since biLSTM technology needs largely accurately markInformation, such as using being manually labeled, this makes cost of labor increase, and does not have portability.And it is marked using machineNote can then have situations such as mark is inaccurate or imperfect, cause to extract result inaccuracy, finally obtain the classification knot of mistakeFruit.
Summary of the invention
This application provides a kind of address information classification method and devices, to solve to utilize existing address sorting method,Be easy to get to mistake classification results the problem of.
In a first aspect, this application provides a kind of address information classification methods, which comprises
Extract institute's address information to be handled in text;
According to each address information to be processed, the integrity degree type of each address information to be processed, institute are determinedThe integrity degree type for stating address information to be processed includes positive address information and negative sense address information, forward direction address packetIncluding complete or partial address information, negative sense address information includes the address information containing other words;
According to the integrity degree type of each address information to be processed and the address information to be processed in the textIn position, using searching algorithm forward and searching algorithm backward, obtain each address information to be processed it is corresponding to pointClass address information;
Using the contextual information of each address information to be sorted, each address information to be sorted is dividedClass obtains the corresponding classification of each address information to be sorted;
Export each address information to be sorted and corresponding classification.
Second aspect, this application provides a kind of address information sorter, described device includes:
Extraction module, for extracting institute's address information to be handled in text;
Determining module, for determining each address information to be processed according to each address information to be processedIntegrity degree type, the integrity degree type of the address information to be processed includes positive address information and negative sense address information, describedPositive address information includes complete or partial address information, and negative sense address information includes the address letter containing other wordsBreath;
Address determination module to be sorted, for according to the integrity degree type of each address information to be processed and it is described toHandle position of the address information in the text, using searching algorithm forward and searching algorithm backward, obtain it is each described inHandle the corresponding address information to be sorted of address information;
Categorization module, for the contextual information using each address information to be sorted, to each described to be sortedAddress information is classified, and the corresponding classification of each address information to be sorted is obtained;
Output module, for exporting each address information to be sorted and corresponding classification.
From the above technical scheme, this application provides a kind of address information classification method and device, this method is firstThe address information in text is extracted as address information to be processed, according to the integrity degree of address information to be processed and its in the textPosition obtain complete address to be sorted, then to be sortedly using this using searching algorithm forward and searching algorithm backwardThe contextual information of location carries out classification processing to the address to be sorted.Therefore, whether the address information that no matter the application extracts is completeIt is whole, sufficient address information finally can be obtained and carry out Accurate classification, improve the accuracy of classification results.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment belowSingly introduce, it should be apparent that, for those of ordinary skills, without any creative labor,It is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of one embodiment of address information classification method provided by the present application;
Fig. 2 is a kind of flow chart of another embodiment of address information classification method provided by the present application;
Fig. 3 is a kind of structural schematic diagram of address information sorter provided by the present application;
Fig. 4 is the structural schematic diagram of one embodiment of the embodiment of address determination module to be sorted.
Fig. 5 is the structural schematic diagram of another embodiment of the embodiment of address determination module to be sorted.
Fig. 6 is the structural schematic diagram of the first searching algorithm unit.
Specific embodiment
In a first aspect, the embodiment of the present application provides a kind of address information classification method referring to Fig. 1, the method includesFollowing steps:
Step 101: extracting institute's address information to be handled in text.
The extraction of address information to be processed in text can be completed using address information extraction model.Specifically, it utilizesChinese word segmentation system carries out participle and part-of-speech tagging to enough training texts one by one, then using bilSTM model to training textOriginally it is trained, to generate address extraction model.Staff proposes the address information in text using the modelIt takes.
Step 102: according to each address information to be processed, determining the integrity degree of each address information to be processedType, the integrity degree type of the address information to be processed include positive address information and negative sense address information, it is described positivelyLocation information includes that complete or partial address information, negative sense address information include the address information containing other words.
Step 103: being existed according to the integrity degree type of each address information to be processed and the address information to be processedPosition in the text obtains each address information pair to be processed using searching algorithm forward and searching algorithm backwardThe address information to be sorted answered, the address information to be sorted are sufficient address information.
Using searching algorithm forward and the combination of searching algorithm backward, the side of address information to be processed can be accurately marked offThe accuracy and integrality of follow-up data processing can be improved in boundary.
Step 104: using the contextual information of each address information to be sorted, each address to be sorted being believedBreath is classified, and the corresponding classification of each address information to be sorted is obtained.
Step 105: exporting each address information to be sorted and corresponding classification.
From the above technical scheme, this application provides a kind of address information classification method, this method extracts text firstAddress information in this is as address information to be processed, position according to the integrity degree of address information to be processed and its in the textIt sets, using searching algorithm forward and searching algorithm backward, obtains complete address to be sorted, then utilize the address to be sortedContextual information carries out classification processing to the address to be sorted.Therefore, whether the address information that no matter the application extracts is complete,Finally sufficient address information can be obtained and carry out Accurate classification, improve the accuracy of classification results.
Referring to fig. 2, a kind of address information classification method provided in another embodiment of the application, includes the following steps:
Step 201: extracting institute's address information to be handled in text.
The extraction of address information to be processed in text can be completed using address information extraction model.Specifically, it utilizesChinese word segmentation system carries out participle and part-of-speech tagging to enough training texts one by one, then using bilSTM model to training textOriginally it is trained, to generate address extraction model.Staff proposes the address information in text using the modelIt takes.
Step 202: according to each address information to be processed, determining the integrity degree of each address information to be processedType, the integrity degree type of the address information to be processed include positive address information and negative sense address information, it is described positivelyLocation information includes that complete or partial address information, negative sense address information include the address information containing other words.
It may be sufficient address information in the text using the address information that address information extraction model extraction comes out,Or partial address information, or the address information comprising other words.For example, text is " so-and-so (household register: the aa province city bb ccThe area street dd e cell x x unit x, identification card number: xxxxxxxxxxxxx) it reports in the city BB of the AA province area the CC town G H cell xX unit x is stolen, and door lock is intact, and safety cabinet is prized in family ", it is assumed that it by the result that address information model extraction comes out is " aaThe city bb of the province area the cc street dd e cell x x unit x " and " area the CC town G H cell ", the then " city bb of the aa province area the cc street dd e cell xX unit x " is sufficient address information, i.e., positive address information;" area the CC town G H cell " is partial address information, is also belonged toIn positive address information.It include " stolen " word in " x unit x is stolen " if extracting result is " x unit x is stolen ",It is then negative sense address information.
Step 203: if the address information to be processed is positive address information, existing from the address information to be processedPosition in the text starts first direction search corresponding to first searching algorithm, by an adjacent word and instituteIt states address information to be processed to merge, the address information after being merged, wherein when the first searching algorithm is that search is calculated forwardWhen method, first direction is forward direction;When the first searching algorithm is searching algorithm backward, first direction is side backwardTo.
Step 204: if the address information after the merging is positive address information, the address after the merging being believedBreath is determined as address information to be processed, and gos to step 203, believes until searching for first direction to the address to be processedUntil the preset stopping symbol of manner of breathing neighbour.
Step 205: if the address information after the merging is negative sense address information, record is determined as that negative sense address is believedAddress information after the merging is determined as address information to be processed by the read-around ratio of breath, and gos to step 203, untilBe determined as that the read-around ratio of negative sense address information is equal to default read-around ratio, or to first direction search for it is described to be processedUntil the adjacent preset stopping symbol of address information.
Step 206: will be determined as the last time the address information to be processed of positive address information with being determined as first objectLocation information.
Preset stopping symbol can be set by staff, for example, comma, branch etc..For continuing the text in the above example,Assuming that being " city bb of the aa province area the cc street dd e cell x x unit x " and " CC using the result that address information model extraction comes outThe area town G H cell ".It is " city bb of the aa province area the cc street dd e cell x x unit x " for address information to be processed, determines that it isPositive address information, when the first search calculation method is searching algorithm forward, in the text using the address information to be processedPosition is searched for forward, adjacent thereto after search therefore no longer to scan for recycling for comma, obtains first object address letterBreath is " city bb of the aa province area the cc street dd e cell x x unit x ".
And be " area the CC town G H cell " for address information to be processed, positive address information is determined that it is, is searched on firstRope calculate method be forward searching algorithm when, searched for forward using the position of information to be processed in the text, word adjacent theretoLanguage is " city bb ", then merges " city bb " and " area the CC town G H cell ", and the address information after being merged is " the area CC of the city bbThe town G H cell " then determines that the address information after merging still is positive address information, then continues to search for forward, adjacent theretoWord is " AA province ", and " AA province " is merged with " area CC of the city the bb town G H cell ", and the address information after being merged is " the city bb of AA provinceThe area the CC town G H cell " then determines that the address information after merging still is positive address information, then continues to search for forward, with its phaseAdjacent word is " ", " " and " city bb of the AA province area the CC town G H cell " is merged, the address information after being merged is" in the city bb of the AA province area the CC town G H cell " then determines that the address information after merging is negative sense address information, records it and continuously sentenceThe number for being set to negative sense address information is 1, then proceedes to search for forward, and word adjacent thereto is " title ", by " title " and " in AAThe city bb of the province area the CC town G H cell " merges, and the address information after being merged is " claiming in the city bb of the AA province area the CC town G H cell ",Then determine that the address information after merging still is negative sense address information, record it and be continuously determined as that the number of negative sense address information is2, it then proceedes to search for forward, word adjacent thereto is " reporting a case to the security authorities ", " will report a case to the security authorities " and " claim small in the city bb of the AA province area the CC town G HArea " merges, and the address information after being merged is " reporting in the city bb of the AA province area the CC town G H cell ", then determines to mergeAddress information afterwards is still negative sense address information, and recording the number that it is continuously determined as negative sense address information is 3, if presetRead-around ratio is 3, then stops searching for forward, by be determined as the last time positive address information " city bb of the AA province area the CC town G H is smallArea " is determined as first object information.
And for when the first searching algorithm is searching algorithm backward, only different from the direction of search in upper example, other areIt is identical, it repeats no more again.
Step 207: being calculated the position in the text to second search since the first object address informationThe corresponding second direction search of method, an adjacent word is merged with the address information to be processed, after obtaining mergingAddress information, wherein when the second searching algorithm be searching algorithm forward when, second direction is forward direction;It searches when secondRope algorithm be backward searching algorithm when, second direction is rearwardly direction.
Step 208: if the address information after the merging is positive address information, the address after the merging being believedBreath is determined as first object address information, and gos to step 206, until to second direction search for the address to be processedUntil the adjacent preset stopping symbol of information.
Step 209: if the address information after the merging is negative sense address information, record is determined as that negative sense address is believedAddress information after the merging is determined as first object address information by the read-around ratio of breath, and gos to step 206, directlyTo being determined as that the read-around ratio of negative sense address information is equal to default read-around ratio, or to second direction search for described wait locateUntil managing the adjacent preset stopping symbol of address information.
Step 210: the first object address information for being determined as positive address information for the last time is determined as to be sortedlyLocation information.
For continuing the above example, first object address information is " city bb of the aa province area the cc street dd e cell x x unit x "" city bb of the AA province area the CC town G H cell ".It is " the city bb of the aa province area the cc street dd e cell x x unit for address information to be processedNo. x ", positive address information is determined that it is, the second search calculation method is searching algorithm backward, using the information to be processed in textIn position search for backward, it is adjacent thereto after search therefore no longer to scan for recycling for comma, obtain address to be sortedInformation is " city bb of the aa province area the cc street dd e cell x x unit x ".
And be " city bb of the AA province area the CC town G H cell " for address information to be processed, positive address information is determined that it is, theTwo searching algorithms are searching algorithm backward, are searched for backward using the position of information to be processed in the text, word adjacent theretoLanguage is " x ", then merges " x " and " city bb of the AA province area the CC town G H cell ", and the address information after being merged is " AAThe city bb of the province area the CC town G H cell x " then determines that the address information after merging still is positive address information, then continues to search backwardRope, word adjacent thereto are " x unit ", and " x unit " is merged with " city bb of the AA province area the CC town G H cell x ", is mergedAddress information afterwards is " city bb of the AA province area the CC town G H cell x x unit ", then determines that the address information after merging still is positiveAddress information then continues to search for backward, and word adjacent thereto is " No. x ", by " No. x " and " city bb of the AA province area the CC town G H cell xX unit " merges, and the address information after being merged is " city bb of the AA province area the CC town G H cell x x unit x ", thenDetermine that the address information after merging still is positive address information, continuation is searched for backward, and word adjacent thereto is " stolen ", will" stolen " merges with " city bb of the AA province area the CC town G H cell x x unit x ", and the address information after being merged is that " AA is savedThe area CC of the city the bb town G H cell x x unit x is stolen ", the address information is determined for negative sense address information, records its continuous judgementNumber for negative sense address information is 1, then proceedes to search for backward, adjacent thereto for comma, then stops searching for forward, will mostOnce it is determined as that " city bb of the AA province area the CC town the G H cell x x unit x " of positive address information is determined as address letter to be sorted afterwardsBreath.
And for when the second searching algorithm is searching algorithm forward, only different from the direction of search in upper example, other areIt is identical, it repeats no more again.
Step 211: if the address information to be processed is negative sense address information, by the address information to be processed intoRow word segmentation processing obtains multiple participles.
Assuming that extracting address information to be processed in upper example includes " x unit x stolen ", since the address information to be processed isNegative sense address information then carries out word segmentation processing to the address information to be processed, obtains " x unit ", " No. x " and " stolen ".
Step 212: extracting any one address participle in multiple participles, address participle is determined as wait locateManage address information.
Due to word segmentation result be address participle be " x unit ", " No. x ", then can extract it is therein any one conduct toHandle address information.
Step 213: since the address information to be processed to first searching algorithm the position in the textCorresponding first direction search, an adjacent word is merged with the address information to be processed, after being mergedAddress information, wherein when the first searching algorithm is searching algorithm forward, first direction is forward direction;When the first searchAlgorithm be backward searching algorithm when, first direction is rearwardly direction.
Step 214: if the address information after the merging is positive address information, the address after the merging being believedBreath is determined as address information to be processed, and gos to step 212, believes until searching for first direction to the address to be processedUntil the preset stopping symbol of manner of breathing neighbour.
Step 215: if the address information after the merging is negative sense address information, record is determined as that negative sense address is believedAddress information after the merging is determined as address information to be processed by the read-around ratio of breath, and gos to step 212, untilBe determined as that the read-around ratio of negative sense address information is equal to default read-around ratio, or to first direction search for it is described to be processedUntil the adjacent preset stopping symbol of address information.
Step 216: will be determined as the last time the address information to be processed of positive address information with being determined as first objectLocation information.
Step 217: being calculated the position in the text to second search since the first object address informationThe corresponding second direction search of method, an adjacent word is merged with the address information to be processed, after obtaining mergingAddress information, wherein when the second searching algorithm be searching algorithm forward when, second direction is forward direction;It searches when secondRope algorithm be backward searching algorithm when, second direction is rearwardly direction.
Step 218: if the address information after the merging is positive address information, the address after the merging being believedBreath is determined as first object address information, and gos to step 216, until to second direction search for the address to be processedUntil the adjacent preset stopping symbol of information.
Step 219: if the address information after the merging is negative sense address information, record is determined as that negative sense address is believedAddress information after the merging is determined as first object address information by the read-around ratio of breath, and gos to step 214, directlyTo being determined as that the read-around ratio of negative sense address information is equal to default read-around ratio, or to second direction search for described wait locateUntil managing the adjacent preset stopping symbol of address information.
Step 220: the first object address information for being determined as positive address information for the last time is determined as to be sortedlyLocation information.
The processing mode of step 211- step 221 is identical as the processing mode of step 203- step 210, no longer superfluous hereinIt states.It can thus be seen that sufficient address information can be marked off using searching algorithm forward and the combination of searching algorithm backwardBoundary, and do not contain other vocabulary, the accuracy and integrality of the result of subsequent processing can be increased.
Step 221: obtaining the contextual information of each address information to be sorted, obtain each address to be sortedTarget text information belonging to information.
The contextual information of address information to be sorted for address to be sorted position in the text forwardly and rearwardly pre-If the word of quantity, if wherein containing preset punctuation mark, such as comma, fullstop, branch, then with the word between punctuation markSubject to language, to obtain comprising target text information belonging to the address information to be sorted.For example, address information to be sorted is" city bb of the aa province area the cc street dd e cell x x unit x ", word preset quantity forwardly and rearwardly is 3, still, due to thisAddress information rear adjacent to be processed is comma, before only there are two word, then the target text information belonging to it is " so-and-so(household register: the city bb of the aa province area the cc street dd e cell x x unit x ".
Step 222: the address information to be sorted in each target text information is replaced with into preset characters.
Preset characters in the embodiment of the present application without limitation, can be letter or number etc., such as will " so-and-so (familyAddress information to be sorted in nationality: the city bb of the aa province area the cc street dd e cell x x unit x " replaces with character string aaaaaa, thenObtain " so-and-so (household register: aaaaaa ".Address information to be sorted is replaced with into preset characters, can avoid address information pair to be sortedThe interference of subsequent semantic analysis improves the accuracy of classification.
Step 223: semantic classification model is utilized, according to the semanteme of each replaced target text information, by each instituteIt states address information to be sorted in replaced target text information to classify, obtains the class of each address information to be sortedNot.
Semantic classification model is obtained from being trained as TextCNN to training sample.TextCNN is applied to Chinese textPresent treatment has very high accuracy rate.TextCNN common usage scenario is single classification, convolutional layer, pond layer, full articulamentumAfter be then connected to Softmax layers.Probability distribution in Softmax layers of output classification, wherein the classification of maximum probability is this pointThe final output result of class model.Single disaggregated model can even reach 97% accuracy rate under business scenario.
Semantic analysis is carried out to replaced target text information using semantic classification model, then classifies, can obtainTo the classification of address information to be sorted.For example, being carried out to replaced target text information " so-and-so (household register: aaaaaa " semanticAnalysis, and after classification, obtaining aaaaaa is household register address.Aaaaaa is converted into corresponding address information to be sorted again, finallyObtaining result is household register address: the city bb of the aa province area the cc street dd e cell x x unit x.
Step 224: exporting each address information to be sorted and corresponding classification.
From the above technical scheme, this application provides a kind of address information classification method, this method extracts text firstAddress information in this is as address information to be processed, position according to the integrity degree of address information to be processed and its in the textIt sets, using searching algorithm forward and searching algorithm backward, obtains complete address to be sorted, then utilize the address to be sortedContextual information carries out classification processing to the address to be sorted.Therefore, whether the address information that no matter the application extracts is complete,Finally sufficient address information can be obtained and carry out Accurate classification, improve the accuracy of classification results.
Second aspect, referring to Fig. 3, the application provides a kind of address information sorter, and described device includes:
Extraction module 301, for extracting institute's address information to be handled in text;
Determining module 302, for determining each address information to be processed according to each address information to be processedIntegrity degree type, the integrity degree type of the address information to be processed includes positive address information and negative sense address information, instituteState that positive address information includes complete or partial address information, negative sense address information include the address containing other wordsInformation;
Address determination module 303 to be sorted, for according to each address information to be processed integrity degree type and institutePosition of the address information to be processed in the text is stated, using searching algorithm forward and searching algorithm backward, obtains each instituteThe corresponding address information to be sorted of address information to be processed is stated, the address information to be sorted is sufficient address information;
Categorization module 304, for the contextual information using each address information to be sorted, to each described wait divideClass address information is classified, and the corresponding classification of each address information to be sorted is obtained;
Output module 305, for exporting each address information to be sorted and corresponding classification.
Further, referring to fig. 4, the address determination module to be sorted 303 includes:
First searching algorithm unit 401, if being positive address information for the address information to be processed, from describedPosition of the address information to be processed in the text starts, and using the first searching algorithm, obtains first object address information, instituteStating the first searching algorithm is searching algorithm or backward searching algorithm forward;
Second searching algorithm unit 402, for since the first object address information is in the position in the text,Using the second searching algorithm, address information to be sorted is obtained, wherein the address information to be sorted is sufficient address information,When the first searching algorithm is searching algorithm forward, the second searching algorithm is searching algorithm backward;When the first searching algorithm be toAfterwards when searching algorithm, the second searching algorithm is searching algorithm forward.
Further, referring to Fig. 5, the address determination module 303 to be sorted further include:
Participle unit 501, if being negative sense address information for the address information to be processed, by it is described to be processedlyLocation information carries out word segmentation processing, obtains multiple participles;
Extraction unit 502 segments the address true for extracting the participle of any one address in multiple participlesIt is set to address information to be processed;
First searching algorithm unit 401 is also used to since the address information to be processed is in the position in the text,Using the first searching algorithm, first object address information is obtained, first searching algorithm is searching algorithm forward or searches backwardRope algorithm;
Second searching algorithm unit 402 is also used to open from position of the first object address information in the textBegin, using the second searching algorithm, obtain address information to be sorted, wherein the address information to be sorted is sufficient address letterBreath, when the first searching algorithm is searching algorithm forward, the second searching algorithm is searching algorithm backward;When the first searching algorithm isBackward when searching algorithm, the second searching algorithm is searching algorithm forward.
Further, referring to Fig. 6, the first searching algorithm unit 401 includes:
First direction searches for subelement 601, for since the address information to be processed is in the position in the textTo the corresponding first direction search of first searching algorithm, an adjacent word and the address information to be processed are carried outMerge, the address information after being merged, wherein when the first searching algorithm is searching algorithm forward, first direction is forwardDirection;When the first searching algorithm is searching algorithm backward, first direction is rearwardly direction;
Subelement 602 is looped to determine, it, will be described if being positive address information for the address information after the mergingAddress information after merging is determined as address information to be processed, and repeats above-mentioned the step of searching for first direction, until to theOne direction is searched for until the preset stopping symbol adjacent with the address information to be processed;If the address after the merging is believedBreath is negative sense address information, then record is determined as the read-around ratio of negative sense address information, and the address information after the merging is trueIt is set to address information to be processed, and repeats above-mentioned the step of searching for first direction, until is determined as the company of negative sense address informationContinuous number is equal to default read-around ratio, or searches for first direction to the preset stopping adjacent with the address information to be processedUntil symbol;
Subelement 603 is determined, for the address information to be processed for being determined as positive address information for the last time to be determined asFirst object address information.
From the above technical scheme, this application provides a kind of address information classification method, this method extracts text firstAddress information in this is as address information to be processed, position according to the integrity degree of address information to be processed and its in the textIt sets, using searching algorithm forward and searching algorithm backward, obtains complete address to be sorted, then utilize the address to be sortedContextual information carries out classification processing to the address to be sorted.Therefore, whether the address information that no matter the application extracts is complete,Finally sufficient address information can be obtained and carry out Accurate classification, improve the accuracy of classification results.
It is required that those skilled in the art can be understood that the technology in the embodiment of the present application can add by softwareThe mode of general hardware platform realize.Based on this understanding, the technical solution in the embodiment of the present application substantially orOr the part that contributes to existing technology can be embodied in the form of software products, which can depositStorage is in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions computer equipment to as (can be withIt is personal computer, server or the network equipment etc.) execute certain part institutes of each embodiment of the application or embodimentThe method stated.
Various embodiments are described in a progressive manner for this specification, same and similar part between each embodimentCan cross-reference, each embodiment focuses on the differences from other embodiments, especially for device realityFor applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the methodPart explanation.

Claims (10)

If the address information after the merging is positive address information, the address information after the merging is determined as wait locateManage address information, and repeat above-mentioned the step of searching for first direction, until to first direction search for it is described to be processedUntil the adjacent preset stopping symbol of location information;If the address information after the merging is negative sense address information, record is sentencedIt is set to the read-around ratio of negative sense address information, the address information after the merging is determined as address information to be processed, and repeatAbove-mentioned the step of being searched for first direction, until it is determined as that the read-around ratio of negative sense address information is equal to default read-around ratio, orPerson searches for until the preset stopping symbol adjacent with the address information to be processed to first direction;
If the address information after the merging is positive address information, the address information after the merging is determined as firstTarget address information, and repeat it is above-mentioned to second direction search for the step of, until to second direction search for it is described to be processedUntil the adjacent preset stopping symbol of address information;If the address information after the merging is negative sense address information, recordIt is determined as the read-around ratio of negative sense address information, the address information after the merging is determined as first object address information, andThe step of searching for second direction is repeated, until it is determined as that the read-around ratio of negative sense address information is equal to default read-around ratio, orPerson searches for until the preset stopping symbol adjacent with the address information to be processed to second direction;
Subelement is looped to determine, if being positive address information for the address information after the merging, after the mergingAddress information be determined as address information to be processed, and above-mentioned the step of searching for first direction is repeated, until to first directionSearch is until the preset stopping symbol adjacent with the address information to be processed;If the address information after the merging is negativeTo address information, then record is determined as the read-around ratio of negative sense address information, by the address information after the merging be determined as toAddress information is handled, and repeats above-mentioned the step of searching for first direction, until being determined as the read-around ratio of negative sense address informationEqual to default read-around ratio, or searches for first direction to the preset stopping symbol adjacent with the address information to be processed and beOnly;
CN201811102935.5A2018-09-202018-09-20Address information classification method and deviceActiveCN109344254B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811102935.5ACN109344254B (en)2018-09-202018-09-20Address information classification method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811102935.5ACN109344254B (en)2018-09-202018-09-20Address information classification method and device

Publications (2)

Publication NumberPublication Date
CN109344254Atrue CN109344254A (en)2019-02-15
CN109344254B CN109344254B (en)2020-12-18

Family

ID=65306508

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811102935.5AActiveCN109344254B (en)2018-09-202018-09-20Address information classification method and device

Country Status (1)

CountryLink
CN (1)CN109344254B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110738305A (en)*2019-08-272020-01-31深圳市跨越新科技有限公司method and system for analyzing logistics waybill address

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090063551A1 (en)*2000-05-162009-03-05Brian Mark ShusterAddressee-defined mail addressing system and method
CN101980208A (en)*2010-11-102011-02-23百度在线网络技术(北京)有限公司Address query method and system
CN103440312A (en)*2013-08-272013-12-11深圳市华傲数据技术有限公司System and terminal for inquiring zip code for mailing address
US20150331847A1 (en)*2014-05-132015-11-19Lg Cns Co., Ltd.Apparatus and method for classifying and analyzing documents including text
CN107305540A (en)*2016-04-202017-10-31顺丰科技有限公司Address cutting recognition methods
CN107368470A (en)*2017-06-272017-11-21北京神州泰岳软件股份有限公司A kind of method and apparatus for extracting enterprises organizational structure information
CN108509441A (en)*2017-02-242018-09-07菜鸟智能物流控股有限公司Training of address validity classifier, verification method thereof and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090063551A1 (en)*2000-05-162009-03-05Brian Mark ShusterAddressee-defined mail addressing system and method
CN101980208A (en)*2010-11-102011-02-23百度在线网络技术(北京)有限公司Address query method and system
CN103440312A (en)*2013-08-272013-12-11深圳市华傲数据技术有限公司System and terminal for inquiring zip code for mailing address
US20150331847A1 (en)*2014-05-132015-11-19Lg Cns Co., Ltd.Apparatus and method for classifying and analyzing documents including text
CN107305540A (en)*2016-04-202017-10-31顺丰科技有限公司Address cutting recognition methods
CN108509441A (en)*2017-02-242018-09-07菜鸟智能物流控股有限公司Training of address validity classifier, verification method thereof and related device
CN107368470A (en)*2017-06-272017-11-21北京神州泰岳软件股份有限公司A kind of method and apparatus for extracting enterprises organizational structure information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110738305A (en)*2019-08-272020-01-31深圳市跨越新科技有限公司method and system for analyzing logistics waybill address

Also Published As

Publication numberPublication date
CN109344254B (en)2020-12-18

Similar Documents

PublicationPublication DateTitle
US8108413B2 (en)Method and apparatus for automatically discovering features in free form heterogeneous data
CN106446526B (en) Method and device for extracting entity relationship from electronic medical records
US7971150B2 (en)Document categorisation system
CN112163424B (en)Data labeling method, device, equipment and medium
US20170316066A1 (en)Concept-based analysis of structured and unstructured data using concept inheritance
CN110287314B (en) Method and system for long text credibility assessment based on unsupervised clustering
CN114265931B (en) Consumer policy perception analysis method and system based on big data text mining
Babhulgaonkar et al.Language identification for multilingual machine translation
CN114491034B (en)Text classification method and intelligent device
CN110909542A (en)Intelligent semantic series-parallel analysis method and system
CN112257444A (en)Financial information negative entity discovery method and device, electronic equipment and storage medium
CN112579781B (en)Text classification method, device, electronic equipment and medium
US20220050884A1 (en)Utilizing machine learning models to automatically generate a summary or visualization of data
CN116049376A (en)Method, device and system for retrieving and replying information and creating knowledge
CN111782601A (en)Electronic file processing method and device, electronic equipment and machine readable medium
CN108615124B (en)Enterprise evaluation method and system based on word frequency analysis
Lou et al.S2abEL: a dataset for entity linking from scientific tables
US20220253728A1 (en)Method and System for Determining and Reclassifying Valuable Words
CN119202249A (en) A text element extraction method based on natural language processing
CN109344254A (en)A kind of address information classification method and device
CN102165443A (en)Computer-readable recording medium containing a sentence extraction program, sentence extraction method, and sentence extraction device
CN116361457A (en)Training method of intention recognition model, and method and device for analyzing text intention
CN112686055B (en)Semantic recognition method and device, electronic equipment and storage medium
CN109542766A (en)Extensive program similitude based on code mapping and morphological analysis quickly detects and evidence generation method
JP7330691B2 (en) Vocabulary Extraction Support System and Vocabulary Extraction Support Method

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
TA01Transfer of patent application right

Effective date of registration:20190906

Address after:Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant after:China Science and Technology (Beijing) Co., Ltd.

Address before:100089 Beijing city Haidian District wanquanzhuang Road No. 28 Wanliu new building block A Room 601

Applicant before:Beijing Shenzhou Taiyue Software Co., Ltd.

TA01Transfer of patent application right
CB02Change of applicant information

Address after:230000 zone B, 19th floor, building A1, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after:Dingfu Intelligent Technology Co., Ltd

Address before:Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant before:DINFO (BEIJING) SCIENCE DEVELOPMENT Co.,Ltd.

CB02Change of applicant information
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp