Movatterモバイル変換


[0]ホーム

URL:


CN110569322A - Address information analysis method, device and system and data acquisition method - Google Patents

Address information analysis method, device and system and data acquisition method
Download PDF

Info

Publication number
CN110569322A
CN110569322ACN201910684395.4ACN201910684395ACN110569322ACN 110569322 ACN110569322 ACN 110569322ACN 201910684395 ACN201910684395 ACN 201910684395ACN 110569322 ACN110569322 ACN 110569322A
Authority
CN
China
Prior art keywords
address information
geographic
analyzed
data
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910684395.4A
Other languages
Chinese (zh)
Inventor
李男一
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co LtdfiledCriticalSuning Cloud Computing Co Ltd
Priority to CN201910684395.4ApriorityCriticalpatent/CN110569322A/en
Publication of CN110569322ApublicationCriticalpatent/CN110569322A/en
Priority to CA3145918Aprioritypatent/CA3145918A1/en
Priority to PCT/CN2020/096989prioritypatent/WO2021017679A1/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The embodiment of the application discloses an address information analyzing method, device and system and a data obtaining method. The address information analysis method comprises the following steps: acquiring address information to be analyzed in original data; extracting features of the address information to be analyzed by using a natural language processing technology, selecting the extracted features, and vectorizing the selected features to obtain feature vectors; inputting the feature vector into a preset model to obtain an initial array comprising a geographic entity and an administrative division level corresponding to the geographic entity; sorting and de-duplicating the geographic entities in the initial array according to the administrative division level to obtain a standard array; and coding the standard array to obtain a geocoding result. According to the method and the device, the geographic entity and the administrative division of the address information are identified based on the model, a rule base does not need to be established, and occupied resources are few. And the prediction model is optimized by a feature selection algorithm, so that the accuracy and the calculation rate of prediction are improved.

Description

address information analysis method, device and system and data acquisition method
Technical Field
the present application relates to the field of address resolution, and in particular, to an address information resolution method, apparatus, system, and data acquisition method.
background
modern retail enterprises generate massive sales data every day, and the retail enterprises analyze the sales data to serve as bases for enterprise decision making or auxiliary decision making. Particularly address data in sales data, which is the basis data for intelligent retail analysis and decision making. For example, small store site selection decision, logistics resource allocation, geographical dimension sales data analysis and the like all depend on analysis of address data in sales data, so that the high efficiency and accuracy of address data analysis are very important.
At present, a rule cleaning technology adopted for resolving mass address data into standard geocoding is used, specifically, all standard administrative geographic data are constructed into a dictionary base containing rules, then geographic data in original data are provided in a regular expression mode, the extracted geographic data are matched with the dictionary base, then geographic data in a standard form are obtained, and finally the geographic data are converted into geocoding locally and are provided for various retail decision applications at the upper layer.
however, in the above method, all the standard administrative geographic data need to be constructed into a dictionary base containing rules, which consumes a lot of hardware resources. Meanwhile, the time consumed for analysis is long due to the huge data volume of the sales data.
In addition, most of address information in the sales data is manually filled by a user, and many irregular conditions exist, so that part of the data cannot be converted into codes, and the accuracy of the result obtained by analysis is low.
The above problems also occur in address data resolution in other business fields.
disclosure of Invention
The application provides an address information analysis method, device and system and a data acquisition method, and solves the problems that in the prior art, address analysis occupies more resources and is long in analysis time.
The application provides the following scheme:
In one aspect, an address information parsing method is provided, where the method includes:
acquiring address information to be analyzed in original data;
extracting features of the address information to be analyzed by using a natural language processing technology, selecting the extracted features, and vectorizing the selected features to obtain a feature vector to be identified;
Inputting the feature vector to be identified into a preset model to obtain an initial array comprising a geographic entity and an administrative division level corresponding to the geographic entity;
sorting and de-duplicating the geographic entities in the initial array according to the administrative division level to obtain a standard array;
And coding the standard array to obtain a geocoding result.
Preferably, before the feature extraction is performed on the address information to be analyzed by using a natural language processing technology, the method further includes:
Judging whether the address information to be analyzed is analyzed or not according to a prestored historical address information analysis record; the historical address information analysis record comprises historical address information and corresponding historical geographic coding data;
if the geographic coding data are analyzed, acquiring corresponding historical geographic coding data as a geographic coding result;
The characteristic extraction of the address information to be analyzed by using a natural language processing technology comprises the following steps: if the address information is not analyzed, extracting the characteristics of the address information to be analyzed by using a natural language processing technology.
Preferably, before the standard array is encoded to obtain the geocoding result, the method further includes:
matching the standard array with a pre-stored geographical position tree dictionary, and judging whether the standard array is missing or not; the geographic position tree dictionary is formed by dividing step by step according to administrative regions;
If the geographic position tree dictionary is missing, completing the standard array according to the geographic position tree dictionary;
and the step of coding the standard array to obtain the geocoding result comprises the step of coding the complemented standard array to obtain the geocoding result.
Preferably, the encoding the standard array to obtain the geocoding result includes:
And calling a coding interface of an external server, and coding the standard array to obtain a geocoding result.
Preferably, the method further comprises the step of pre-constructing the preset model:
Performing corpus labeling on address data in the sample set to obtain a sample array labeled with a sample geographic entity and an administrative division corresponding to the sample geographic entity;
extracting primary features of address data in the sample set by using a natural language processing technology, determining the primary features meeting certain conditions as target features, and vectorizing the target features to obtain sample feature vectors;
And taking the sample feature vector as input, taking the corresponding sample array as output, and training by using a neural network and a conditional random algorithm material to obtain the preset model.
preferably, the extracting, by using a natural language processing technique, the primary features of the address data in the sample set and determining the primary features meeting a certain condition as target features, and vectorizing the target features to obtain a sample feature vector includes:
Calculating the frequency of appearance of each extracted primary feature in the address text;
Calculating the correlation degree of each primary feature and each administrative division level according to the frequency to be used as a feature weight;
Selecting the primary feature with the correlation and/or the frequency meeting a preset condition as the target feature;
calculating the relevance of each selected target feature and each government district level, taking the average value of the relevance of each target feature as the weight of each target feature, and constructing a weighting matrix according to the weight;
and vectorizing the target characteristics according to the weighting matrix to obtain a sample characteristic vector.
preferably, the method further comprises: and storing the geocoding result in association with the original data.
Preferably, the prediction model is provided in a spark calculation engine, and the geocoding result is stored in an elastic search engine in association with the raw data.
in another aspect of the present application, a data obtaining method is further provided, where the method includes
receiving candidate address information;
Analyzing the candidate address information according to the method to obtain analyzed candidate geocoding data;
And calculating in an association table of pre-stored geocoding results and original data according to the candidate geocoding data and the preset geographic range, and acquiring the geocoding results and the corresponding original data in the preset geographic range.
In another aspect, the present application further provides an address information analyzing apparatus, including:
The device comprises a to-be-analyzed address information acquisition unit, a to-be-analyzed address information acquisition unit and a to-be-analyzed address information acquisition unit, wherein the to-be-analyzed address information acquisition unit is used for acquiring to-be-analyzed address information in original data;
The characteristic extraction unit is used for extracting characteristics of the address information to be analyzed by utilizing a natural language processing technology, selecting the extracted characteristics and vectorizing the selected characteristics to obtain a characteristic vector;
the model prediction unit is used for inputting the feature vector into a preset model to obtain an initial array comprising a geographic entity and an administrative division level corresponding to the geographic entity; the preset model is obtained by training based on the combination of a cyclic neural network and a conditional random field algorithm;
The sorting unit is used for sorting and de-duplicating the geographic entities in the initial array according to the administrative division level to obtain a standard array;
And the geocoding unit is used for coding the standard array to obtain a geocoding result.
In yet another aspect, the present application provides a computer system comprising:
one or more processors; and
A memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:
Acquiring address information to be analyzed in original data;
Extracting the characteristics of the address information to be analyzed by using a natural language processing technology, selecting the extracted characteristics, and vectorizing the selected characteristics to obtain a characteristic vector to be identified;
inputting the feature vector to be identified into a preset model to obtain an initial array comprising a geographic entity and an administrative division level corresponding to the geographic entity;
Sorting and de-duplicating the geographic entities in the initial array according to the administrative division level to obtain a standard array;
coding the standard array to obtain a geocoding result
according to the specific embodiments provided herein, the present application discloses the following technical effects:
According to the technical scheme, the address information is subjected to feature extraction and selection through a natural language processing technology and vectorized to obtain a feature vector to be recognized, and then the feature vector to be recognized is used as model input to predict and obtain an initial array comprising a geographic entity and a corresponding administrative division level; and then, carrying out sorting and de-duplication, and carrying out geocoding to obtain an analysis result. The process does not need to construct a full dictionary base containing rules, reduces the occupation of hardware resources and has lower requirements on a deployment environment. The method has the advantages that the standard geographic data extraction is carried out on the massive address information in a model prediction mode, the influence of the address information input format is avoided, various data changes are self-adapted, the manual maintenance is not needed, and meanwhile the extraction efficiency of the geographic data is improved. Furthermore, the prediction model optimized by the feature selection algorithm of the scheme abandons the messy features with low correlation degree with the administrative division level, so that the accuracy of extracting the geographic information is higher than that of the traditional rule matching, the model calculation speed is improved, and the extracted geographic data is more accurate.
Furthermore, the address information coding function can be packaged into a batch analysis interface and placed in an external independent server, the computing resource of geographic data analysis and extraction is not occupied, the coding efficiency is improved, and the data processing is more real-time. In addition, the scheme can also complement the missing administrative geographic information of the address information, so that the analysis result is more accurate.
Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.
drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a block diagram of a system provided in an embodiment of the present application;
fig. 2 is a flowchart of address-specific information resolution provided in an embodiment of the present application;
fig. 3 is a flowchart of an address resolution method provided in an embodiment of the present application;
FIG. 4 is a block diagram of an apparatus according to an embodiment of the present disclosure;
FIG. 5 is a diagram of a computer system architecture provided by an embodiment of the present application.
Detailed Description
the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived from the embodiments given herein by a person of ordinary skill in the art are intended to be within the scope of the present disclosure.
the method comprises the steps of extracting features of address information through a natural language processing technology, selecting features with high correlation degree to conduct vectorization to obtain feature vectors, utilizing a pre-constructed model and the feature vectors to predict to obtain geographic entities and corresponding administrative division levels, further sequencing to remove duplication to obtain geographic data in a standard form, and further conducting geocoding to obtain coordinates to finish analysis of the address information. Because the address information is subjected to feature extraction selection and vectorization processing, and features with high correlation degree with the administrative division level are extracted, the prediction speed of a subsequent model is increased, and the prediction accuracy is improved. Meanwhile, a full dictionary base containing rules is not required to be constructed by utilizing model prediction, and occupation of hardware resources is reduced.
example one
As shown in fig. 1, the system architecture diagram of the present application includes a raw data system, an address information processing system and an encoding system that can exist independently of each other in hardware. The raw data system is used for providing raw data, such as an external system or an OMS (order management) system. The address information processing system is used for obtaining original data such as order information from an original data system and carrying out a series of processing on the address information of the original data to obtain geographic data in a standard form. The encoding system is used for encoding the geographic data in the standard form to obtain a geographic encoding result (generally coordinates). The coding system is internally packaged with a batch analysis interface, and the address information processing system can finish coding the geographic data in the standard form by calling the batch analysis interface of the coding system.
the address information processing system can also correlate the geocoding result obtained from the coding system with the original data corresponding to the geocoding result and store the geocoding result in an Elasticissearch engine for subsequent searching of the correlated data.
as shown in fig. 1, the address information processing system may further associate the already-analyzed address information with the corresponding geocoding result and store the associated address information as a history analysis record in the address analysis history table. When the address information processing system acquires the address information, matching is firstly carried out in the address resolution history table, if the same address information is matched, the corresponding geocoding result can be directly acquired without executing subsequent processing, and the analysis result of the time is not required to be stored in the address resolution history table again. If the same address information is not matched, the address information is considered to be analyzed for the first time, the address information processing system combines the coding system to realize the analysis coding of the address information according to the normal processing flow, and the geocoding result of the time is stored in an address analysis history table.
In a system configuration of another embodiment, the raw data system may share the same server as the address information processing system. And the encoding system may share the same server with the address information processing system. In comparison, the coding system is arranged in an independent server, and the coding task is completed by packaging the batch analysis interface, and the computing resource of the address information system for analyzing and extracting the address information is not occupied, so that the coding efficiency is improved, and the data processing is more real-time.
in the following embodiments of the present application, description will be given by taking an example in which an encoding system and an address information processing system are separately disposed in different servers, and original data is order data.
in order data, there are fields for representing different attributes of information, such as order person, price, address, etc., by which address information can be quickly located. Since the address information in the original data is mostly manually filled by a person, there are various errors and non-specifications, and for this reason, the address information processing system needs to first convert the address information into geographic data in a standard form. If the address information is "mr. tsujin Xingang No. 18 Binshai new area Li", and non-geographic data exists in the address information, it needs to be converted into geographic data in a standard form, i.e., "Tianjin City | coastal new area | pond street | New gang No. 18 Bistrin".
In order to convert unprocessed address information into geographic data in a standard form, the geographic entity in the address information and the administrative division level corresponding to the geographic entity are firstly extracted. The geographic entities are Tianjin, coastal, Tanggu, etc., and the administrative division level is the level of country, province, city, county, etc. As in the prior art, the regular expressions are used to extract the character strings meeting certain rules out of the geographic entities and the corresponding administrative division levels, so that not only a rule base needs to be constructed, but also the character strings representing addresses meet certain rules. And the extraction cannot be finished for the character strings which do not accord with the rules. In order to solve the problem, the application particularly provides an administrative geographic entity relationship identification model optimized through a feature selection algorithm, and the feature selection is carried out on address information by utilizing a natural language processing technology (NLP) and a feature vector is obtained through calculation. And taking the feature vector as input, and obtaining a prediction result by utilizing the trained administrative geographic entity relationship identification model, namely a binary geographic entity relationship array strategic relationship formed by geographic entities and corresponding administrative division levels. The following formula:
political relation=[(e1,t1),(e2,t2),...(en,tn)]
here e1 … en represents the geographical entity identified, t1 … tn represents the administrative level, the level classification is shown in table 1, and the administrative level in the binary array can be replaced by the token in table 1. Such as the city may be represented by CI. For some non-geographic entities and information at the non-administrative division level, we attribute to redundant information. Of course, the repeated geographic information can be classified as redundant information.
TABLE 1
sign (sign)original words of abbreviationsAdministrative region meanings
COcountryState of the country
PRprovinceProvince of labor
CIcitycity
ARareaDistrict of jurisdiction
STstreetStreet with a light source
ROroadroad
BUbuildingconstruction of buildings
OTotherRedundant information
as shown in fig. 2, taking the address information "mr. jun:
[ ('Tianjin', 'CI'), ('New harbor second road No. 18', 'RO'), ('coastal new zone', 'AR'), ('Limon', 'OT'), ('metabolic', 'OT'), ('Cooperation', 'OT') ]
Obviously, the binary array obtained as described above has several problems:
1. part of the geographical entity is missing. For example, street information is lacked between a new coastal area and a new harbor second road.
2. there is much redundant information. It should be noted that if the same geographic information appears in the address many times, only one is retained, and the rest of the duplicated geographic information should be attributed to redundant information.
In order to solve the above 2 problems, we edit the national administration level geographic information into a tree dictionary by using each administration level and each geographic entity of the level as a node according to the order of administration levels.
And sequencing the binary array predicted by the model to remove duplication, eliminating redundancy, and sequencing according to the administrative division level to obtain a new binary array, namely a standard address. Specifically, the classification coding is performed with reference to the administrative level standard CO > PR > CI > AR > ST > RO > BU, and the classification coding is arranged in ascending order of coding, and information of no corresponding administrative division level and repeated geographic information are eliminated as redundant information. After sorting and de-duplication as described above for the binary array, the following array is obtained as shown in FIG. 2:
[ ('Tianjin', 'CI'), ('coastal New zone', 'AR'), ('New Port No. 18', 'RO') ]
And then matching the sorted and de-duplicated binary array with the tree dictionary to determine whether geographic information is missing in the binary array. In particular, a recursive method can be adopted for defect-finding and completion. Such as the lack of street staffing between a new coastal region and a new harbour second road in the binary array described above.
And if the geographic information is missing, completing the geographic information of the binary array according to the tree dictionary. Geographic data is then obtained in standard form, as shown in fig. 2:
[ ('Tianjin', 'CI'), ('coastal New zone', 'AR'), ('Stadium street', 'ST'), ('New Port No. two way No. 18', 'RO') ]
After the geographic data in the standard form is obtained, the geographic data can be encoded by adopting the encoding technology to obtain a geographic encoding result.
The application mentioned above provides an administrative geographic entity relationship recognition model optimized by a feature selection algorithm, and the construction training process of the model is described as follows:
firstly, the natural language processing technology (NLP) is utilized to carry out feature extraction and selection on sample address information, and a sample feature vector is obtained through calculation. The method comprises the following specific steps:
1. a sample set of address information corpuses is constructed, which may be obtained from the original data system in fig. 1. In order to further improve the accuracy, the method and the device can divide the original address information corpus obtained from the original data system into data which cannot obtain coordinate codes by a coordinate analysis program, obtain data with incorrect coordinates, and obtain data with correct coordinates. Then each classified equal part is screened from the original address information corpus to be used as a basic corpus. And then, performing word segmentation on the screened linguistic data and marking a sample geographic entity of each word segmentation and an administrative division (administrative geographic identification) corresponding to the sample geographic entity. And randomly selecting a certain proportion of labeled data to perform model training, and reserving a certain proportion of labeled data to perform model verification.
2. Feature extraction and selection:
2.1 extracting the characteristics of the labeled address data used for model training, and then recalculating the extracted characteristics into characteristic frequencies FC, N for each geographic administrative division levelikrepresenting the number of times a feature appears in the address information text, as in equation (1), Niindicating the total number of features present in the address information.
2.2, calculating the relevance between each feature pw and each geographic administrative division level t, and obtaining a feature weight W as formula (2):
wherein, EXikThe number of texts present for the feature pw in the other levels than the geographic administrative division level t; UN (UN)ikthe number of texts is the number of texts of which the features pw do not appear in the geographic administrative division level t; and S is the sum of the text numbers of the geographic entities in all the administrative entity classes.
2.3Calculating a weighted average Wavgand characteristic frequency average value FCavgFN in (3) and (4) represents the total number of feature types, when the weight of the feature satisfies W>WavgOr (W)<WavgAnd FC>FCavg) I.e. the selected target feature
3. Calculating a sample feature vector of the target feature:
3.1 there are x geographic administrative division levels, then each selected target feature will get x correlation degrees, and take the average of these x correlation degrees as the weight of each word. Obtaining a weighting matrix A according to the feature weightrc:
Arc=(Wijaij)r*c (5)
3.2 feature vector calculation, let Y ∈ Rn*nwith n independent eigenvectors, principal eigenvalues m1satisfy | m1|>|m2|≥...≥|mnif the vector is any administrative geographic entity feature vector v0=c0Vector sequence { c) constructed as followsk},{vk}:
then there are:
the weighted normalized sample feature vector obtained by constructing the expression (2), (5), (6), (7) and (8) is expressed by the expression (9):
and then, taking the obtained sample feature vector v as vectorization input parameters of model training, and training oppositely quantized training corpora through a neural network and a conditional random field algorithm, such as an RNN (recurrent neural network) and a CRF (conditional random field) algorithm, so as to obtain an administrative geographic entity relationship recognition model. The final output of the model is a binary geographic entity relationship set as follows:
political relation=[(e1,t1),(e2,t2),...(en,tn)]
in the construction of the model, the selected target features have high correlation with the administrative division level, some messy features with low correlation with the administrative division level are abandoned, the adverse effect of the messy features on the result is reduced, and the data volume of the model input is reduced. The characteristic selection is utilized to carry out algorithm optimization, so that the parameters of the input model are not disordered address information, but are selected and optimized characteristic vectors, and the correlation degree of the input parameters with the geographic entities and the corresponding administrative divisions is improved, so that the calculation speed of the model is accelerated, and the accuracy of the identification result is improved.
The method comprises the steps of reading all standard geographic information and address rules into a memory to construct a dictionary tree, taking a server as an example, wherein all the regular dictionary trees need 4GB memory, replacing all the geographic information regular dictionary trees with administrative region geographic entity recognition models by using the scheme of the application, wherein the models only need 200MB of memory space, compared with the prior art, the method only needs 4.88% of the prior art when used for memory parties, and the use cost is reduced.
In addition, compared with the prior art, the method solves the problem of low geographic data quality, increases the effective analysis amount of address analysis, and provides more accurate data basis for upper-layer decision making:
The address analysis technical scheme combining the standard geographic dictionary database with the regular extraction has more limitations in processing address data, and for dirty data scenes with more address information due to human factors, correct geographic information cannot be obtained basically by using the common technical scheme. Here, the evaluation index is defined in conjunction with the address resolution scenario: accuracy, resolution, effective resolution.
as follows, R represents a record set in which the address resolution has obtained correct coordinates, G (wr)iThe method comprises the steps of representing a certain type of analysis error result set i, wherein the main error type is that analysis coordinates have deviation, T represents the total number of addresses needing to be analyzed, S represents a record set that the addresses are successfully analyzed to obtain the coordinates, E represents a failure record set that the coordinates are not obtained after the addresses are analyzed, the accuracy of final address analysis is as shown in the formula (10), the resolution is as shown in the formula (11), and the effective resolution is as shown in the formula (12).
and analyzing a correct result set, wherein R analyzes an error result set:
total number of samples: and (3) a T-analysis success result set, namely an S-E analysis failure result set: e
10000 pieces of address data test results are used for comparative evaluation, the analysis accuracy based on the dictionary and the regular matching technology is 86.41%, wherein 13.59% of the incorrect analysis results are caused by data quality problems such as redundant information and word sequence disorder existing in address information, and meanwhile, the data quality problems also cause that part of data analysis fails to obtain coordinates, and the analysis rate using the technology is only 81%. Under the same condition, the resolution rate of the scheme reaches 98%, is improved by 17% compared with the prior art, and the effective resolution rate is improved from 70% to 93%, as shown in table 2.
TABLE 2 technical index boost
and the administrative geographic entity relationship recognition model is optimized by utilizing a feature selection algorithm, the accuracy of extracting geographic information is higher than that of traditional rule matching, and the extracted geographic data is more accurate.
the following is a specific implementation of the first embodiment of the present application:
And constructing a bottom layer data synchronization task, and storing the originally input address information in the original data system into the HDFS of the analysis task cluster. The analytic task cluster is based on spark technology, data processing tasks are developed by java, and task scheduling distribution is achieved. And deploying a pre-trained administrative geographic entity relationship recognition model in the analysis task cluster, recognizing administrative division levels and geographic entity relationships of low-quality address information, and extracting effective information. The method comprises the steps of performing model training based on an RNN (neural network) and a CRF (conditional random field) algorithm, embedding an administrative geographic entity feature optimization algorithm, and reducing noise of artificial interference information. And then, sequencing and recombining the administrative geographic entities by adopting an administrative hierarchical sequencing algorithm, and checking and mending the data by using the constructed tree dictionary to obtain standard geographic data and provide quality-improved address information for subsequent coding.
and the geocoding function can perform concurrent scheduling on the spark task cluster, and adopts a java developed http analysis address batch analysis interface based on RESTful style to perform coding analysis on address information completed after the model is extracted, so as to obtain standard geocoding information. In order to improve the analysis efficiency, the data can be analyzed and coded in batches by using a single batch submission mode while task concurrent scheduling is adopted, and the analysis and coding throughput is improved under the condition that the cluster pressure is not increased.
Because the independent batch coding analysis service is adopted, resources cannot be occupied with extraction and calculation, the analysis time is obviously shortened, the original 1 million of data needs to be analyzed in 15 days after the administrative geographic entity relationship model is embedded into the spark calculation engine, and the speed is improved by 36 times after the patent scheme is adopted and only 10 hours are needed.
example two
Based on the above description, a second embodiment of the present application provides an address information parsing method, as shown in fig. 3, where the method includes:
s31, acquiring address information to be analyzed in the original data;
s32, extracting and selecting the characteristics of the address information to be analyzed by using a natural language processing technology, and vectorizing the selected characteristics to obtain a characteristic vector to be identified; the specific mode can refer to the steps of feature extraction selection and vectorization in model training.
s33, inputting the feature vector to be recognized into a preset model to obtain an initial array comprising a geographic entity and an administrative division level corresponding to the geographic entity;
S34, sorting and de-duplicating the geographic entities in the initial array according to the administrative division level to obtain a standard array;
S35, the standard array is coded to obtain a geocoding result. Specifically, an encoding interface of an external server may be called to encode the standard array to obtain a geocoding result.
Preferably, before the feature extraction is performed on the address information to be analyzed by using a natural language processing technology, the method further includes:
Judging whether the address information to be analyzed is analyzed or not according to a prestored historical address information analysis record; the historical address information analysis record comprises historical address information and corresponding historical geographic coding data;
If the geographic coding data are analyzed, acquiring corresponding historical geographic coding data as a geographic coding result;
If the address information is not analyzed, extracting the characteristics of the address information to be analyzed by using a natural language processing technology.
To avoid incomplete information in the array, before encoding the standard array to obtain the geocoding result, the method further includes:
matching the standard array with a pre-stored geographical position tree dictionary, and judging whether the standard array is missing or not; the geographic position tree dictionary is formed by dividing step by step according to administrative regions;
If the geographic position tree dictionary is missing, completing the standard array according to the geographic position tree dictionary;
and the step of coding the standard array to obtain the geocoding result comprises the step of coding the complemented standard array to obtain the geocoding result.
The method of the present application further comprises the step of pre-constructing the preset model:
Performing corpus labeling on address data in the sample set to obtain a sample array labeled with a sample geographic entity and an administrative division corresponding to the sample geographic entity;
Extracting primary features of address data in the sample set by using a natural language processing technology, determining the primary features meeting certain conditions as target features, and vectorizing the target features to obtain sample feature vectors;
and taking the sample feature vector as input, taking the corresponding sample array as output, and training by using a neural network and a conditional random algorithm material to obtain the preset model.
Preferably, the extracting, by using a natural language processing technique, the primary features of the address data in the sample set and determining the primary features meeting a certain condition as target features, and vectorizing the target features to obtain a sample feature vector includes:
Calculating the frequency of appearance of each extracted primary feature in the address text;
calculating the correlation degree of each primary feature and each administrative division level according to the frequency to be used as a feature weight;
Selecting the primary feature with the correlation and/or the frequency meeting a preset condition as the target feature;
Calculating the relevance of each selected target feature and each government district level, taking the average value of the relevance of each target feature as the weight of each target feature, and constructing a weighting matrix according to the weight;
And vectorizing the target characteristics according to the weighting matrix to obtain a sample characteristic vector.
The more specific steps of constructing the preset model in advance can be referred to the process of training the model.
the geocoding result can be combined with other data to provide a data basis for subsequent application decision, and for this reason, the geocoding result and the original data corresponding to the result can be stored in an associated manner in the application.
Taking the original data as the sales data as an example, after the address information of the original data is analyzed to obtain an accurate geocoding result, the geocoding result and the corresponding original data can be stored in a correlation manner, and then the commodity sales condition of a certain geographic position can be obtained. This associated information may be stored in the elasticsearch engine for ease of subsequent retrieval.
EXAMPLE III
Based on the above-mentioned association storage, taking an example of requesting to obtain related data in a certain region range, a third embodiment of the present application provides a data obtaining method, including:
Receiving candidate address information;
analyzing the candidate address information according to the address analysis method to obtain analyzed candidate geocoded data;
and calculating in an association table of pre-stored geocoding results and original data according to the candidate geocoding data and the preset geographic range, and acquiring the geocoding results and the corresponding original data in the preset geographic range.
By the method, the original data in a certain geographic range can be obtained by utilizing the geocoding result, and a data basis is provided for subsequent decisions such as sale, popularization and the like.
Example four
Corresponding to the method of the second embodiment, a fourth embodiment of the present invention provides an address information analyzing apparatus, as shown in fig. 4, the apparatus including:
An address information to be resolved obtaining unit 41, configured to obtain address information to be resolved in original data;
a first feature vectorization unit 42, configured to perform feature extraction, selection and vectorization on the address information to be analyzed by using a natural language processing technology to obtain a feature vector;
a model prediction unit 43, configured to input the feature vector into a preset model to obtain an initial array including a geographic entity and an administrative division level corresponding to the geographic entity; the preset model is obtained by training based on the combination of a cyclic neural network and a conditional random field algorithm;
the sorting unit 44 is configured to sort and deduplicate the geographic entities in the initial array according to the administrative division level to obtain a standard array;
And the geocoding unit 45 is used for coding the standard array to obtain a geocoding result.
Preferably, the apparatus further comprises:
The analysis record judging unit 46 is connected with the to-be-analyzed address information acquiring unit 41 and is used for judging whether the to-be-analyzed address information is analyzed or not according to the pre-stored historical address information analysis record; the historical address information analysis record comprises historical address information and corresponding historical geographic coding data;
and the analysis record obtaining unit 47 is connected to the analysis record judging unit 46, and is used for obtaining the corresponding historical geocoding data as a geocoding result when the address information to be analyzed is judged to be analyzed.
the first feature vectorization unit 42 is specifically configured to, when it is determined that the address information to be analyzed is not analyzed, perform feature extraction on the address information to be analyzed by using a natural language processing technology.
to avoid incomplete information in the array, the apparatus further comprises, before encoding the standard array to obtain the geocoding result, the method further comprising:
A completion unit 48, configured to match the standard array obtained by sorting by the sorting unit 44 with a pre-stored geographic position tree dictionary, determine whether the standard array is missing, and complete the standard array according to the geographic position tree dictionary when the standard array is missing; the geographic position tree dictionary is formed by dividing step by step according to administrative regions;
the geocoding unit 45 is specifically configured to code the complemented standard array to obtain a geocoding result.
the device also comprises a unit for constructing the preset model in advance, and specifically comprises
The second characteristic vectorization unit is used for extracting characteristics of the address data in the sample set by using a natural language processing technology, selecting the characteristics, and vectorizing the selected characteristics to obtain sample characteristic vectors; the specific process of this step can be seen in the related description in the first embodiment. Wherein the second feature vectoring unit may be the same as or different from the first feature vectoring unit.
the sample administrative entity relation unit is used for performing corpus labeling on the address data in the sample set to obtain a sample array consisting of sample geographic entities and sample administrative division levels corresponding to the sample geographic entities;
And the model training unit is used for inputting the sample feature vector, outputting the sample array, training through an RNN (recurrent neural network) and a CRF (conditional random field) algorithm, and constructing the preset model.
the geocoding result can be combined with other data to provide a data basis for subsequent application decision, and for this reason, the device further comprises an associated storage unit for storing the geocoding result and original data corresponding to the result in an associated manner.
taking the original data as the sales data as an example, after the address information of the original data is analyzed to obtain an accurate geocoding result, the geocoding result and the corresponding original data can be stored in a correlation manner, and then the commodity sales condition of a certain geographic position can be obtained. This associated information may be stored in the elasticsearch engine for ease of subsequent retrieval.
EXAMPLE five
corresponding to the above method and apparatus, a fifth embodiment of the present application provides a computer system, including:
One or more processors; and
A memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:
Acquiring address information to be analyzed in original data;
Extracting and selecting the characteristics of the address information to be analyzed by using a natural language processing technology, and vectorizing the selected characteristics to obtain characteristic vectors;
inputting the feature vector into a preset model to obtain an initial array comprising a geographic entity and an administrative division level corresponding to the geographic entity;
sorting and de-duplicating the geographic entities in the initial array according to the administrative division level to obtain a standard array;
and coding the standard array to obtain a geocoding result.
Fig. 5 illustrates an architecture of a computer system, which may include, in particular, a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, video display adapter 1511, disk drive 1512, input/output interface 1513, network interface 1514, and memory 1520 may be communicatively coupled via a communication bus 1530.
The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided by the present Application.
the Memory 1520 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the computer system 1500, a Basic Input Output System (BIOS) for controlling low-level operations of the computer system 1500. In addition, a web browser 1523, a data storage management system 1524, an icon font processing system 1525, and the like can also be stored. The icon font processing system 1525 may be an application program that implements the operations of the foregoing steps in this embodiment of the application. In summary, when the technical solution provided by the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1520 and called for execution by the processor 1510.
the input/output interface 1513 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
the network interface 1514 is used to connect a communication module (not shown) to enable the device to communicatively interact with other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
The bus 1530 includes a path to transfer information between the various components of the device, such as the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520.
in addition, the computer system 1500 may also obtain information of specific extraction conditions from the virtual resource object extraction condition information database 1541 for performing condition judgment, and the like.
It should be noted that although the above devices only show the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus 1530, etc., in a specific implementation, the devices may also include other components necessary for proper operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.
from the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a cloud server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The data processing method, device and apparatus provided by the present application are introduced in detail, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific embodiments and the application range may be changed. In view of the above, the description should not be taken as limiting the application.

Claims (10)

CN201910684395.4A2019-07-262019-07-26Address information analysis method, device and system and data acquisition methodPendingCN110569322A (en)

Priority Applications (3)

Application NumberPriority DateFiling DateTitle
CN201910684395.4ACN110569322A (en)2019-07-262019-07-26Address information analysis method, device and system and data acquisition method
CA3145918ACA3145918A1 (en)2019-07-262020-06-19Address information parsing method and apparatus, system and data acquisition method
PCT/CN2020/096989WO2021017679A1 (en)2019-07-262020-06-19Address information parsing method and apparatus, system and data acquisition method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910684395.4ACN110569322A (en)2019-07-262019-07-26Address information analysis method, device and system and data acquisition method

Publications (1)

Publication NumberPublication Date
CN110569322Atrue CN110569322A (en)2019-12-13

Family

ID=68773824

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910684395.4APendingCN110569322A (en)2019-07-262019-07-26Address information analysis method, device and system and data acquisition method

Country Status (3)

CountryLink
CN (1)CN110569322A (en)
CA (1)CA3145918A1 (en)
WO (1)WO2021017679A1 (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111523647A (en)*2020-04-262020-08-11南开大学Network model training method and device, and feature selection model, method and device
CN111901450A (en)*2020-07-152020-11-06安徽淘云科技有限公司Entity address determination method, device, equipment and storage medium
CN112148819A (en)*2020-08-172020-12-29北京来也网络科技有限公司Address recognition method and device combining RPA and AI
CN112257413A (en)*2020-10-302021-01-22深圳壹账通智能科技有限公司Address parameter processing method and related equipment
CN112269861A (en)*2020-10-092021-01-26和美(深圳)信息技术股份有限公司Corpus generation method and system of intelligent robot
WO2021017679A1 (en)*2019-07-262021-02-04苏宁易购集团股份有限公司Address information parsing method and apparatus, system and data acquisition method
CN112488200A (en)*2020-11-302021-03-12上海寻梦信息技术有限公司Logistics address feature extraction method, system, equipment and storage medium
CN112559661A (en)*2020-12-092021-03-26北京百度网讯科技有限公司Method and device for retrieving address type and electronic equipment
CN112801155A (en)*2021-01-202021-05-14廖彩红Business big data analysis method based on artificial intelligence and server
CN112818685A (en)*2021-01-292021-05-18上海寻梦信息技术有限公司Address matching method and device, electronic equipment and storage medium
CN113076746A (en)*2020-01-062021-07-06阿里巴巴集团控股有限公司Data processing method and system, storage medium and computing device
CN113111229A (en)*2020-02-132021-07-13北京明亿科技有限公司Regular expression-based method and device for extracting track-to-ground address of alarm receiving and processing text
CN113111230A (en)*2020-02-132021-07-13北京明亿科技有限公司Regular expression-based alarm receiving and processing text household address extraction method and device
CN113138985A (en)*2021-04-222021-07-20重庆长安汽车股份有限公司GPS data analysis method and system
CN113255346A (en)*2021-07-012021-08-13湖南工商大学Address element identification method based on graph embedding and CRF knowledge integration
CN113553847A (en)*2020-04-242021-10-26中国电信股份有限公司 Method, apparatus, system and storage medium for parsing address text
CN113592037A (en)*2021-08-262021-11-02武大吉奥信息技术有限公司Address matching method based on natural language inference
US20210350375A1 (en)*2020-05-112021-11-11Paypal, Inc.Determination of geographic coordinates using machine learning techniques
CN113642313A (en)*2021-09-022021-11-12阿里巴巴达摩院(杭州)科技有限公司Address text processing method, device, equipment, storage medium and program product
CN113837699A (en)*2021-09-292021-12-24深圳云路信息科技有限责任公司 A three-segment code parsing and processing method and device based on deep learning
CN114301629A (en)*2021-11-262022-04-08北京六方云信息技术有限公司 IP detection method, device, terminal device and storage medium
CN114328886A (en)*2021-12-142022-04-12上海捷晓信息技术有限公司Intelligent logistics address entity recognition system based on deep learning
CN114463053A (en)*2022-01-212022-05-10浪潮卓数大数据产业发展有限公司Enterprise attribution classification method and system
CN114756639A (en)*2022-04-192022-07-15城云科技(中国)有限公司Address standardization model group, construction method and application thereof
CN114780660A (en)*2022-04-272022-07-22深圳依时货拉拉科技有限公司Door address duplicate removal method, device, equipment and storage medium
CN115248837A (en)*2022-09-212022-10-28中科雨辰科技有限公司Data processing system for obtaining geographic entity of text
CN115577065A (en)*2022-12-092023-01-06中信证券股份有限公司Address resolution method and device
CN119130610A (en)*2024-11-082024-12-13乐麦信息技术(杭州)有限公司 A method for processing address database of e-commerce platform

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112989166A (en)*2021-03-262021-06-18杭州有数金融信息服务有限公司Method for calculating actual business territory of enterprise
CN113239293B (en)*2021-05-102024-08-06北京百度网讯科技有限公司 Method, device, medium and program product for determining search results
CN113438280B (en)*2021-06-032023-02-17多点生活(成都)科技有限公司Vehicle starting control method and device
CN113392843A (en)*2021-06-112021-09-14中国工商银行股份有限公司Method and device for processing escort information
CN113868555B (en)*2021-09-282025-09-05北京百度网讯科技有限公司 Trajectory retrieval method, device, equipment, and storage medium
CN113988949B (en)*2021-11-152024-12-06北京有竹居网络技术有限公司 A promotion information processing method, device, equipment, medium, and program product
CN114138923B (en)*2021-12-032024-06-07吉林大学Method for constructing geological map knowledge graph
CN114513550B (en)*2021-12-302024-03-08天翼云科技有限公司Geographic position information processing method and device and electronic equipment
CN114399254B (en)*2022-01-072025-03-14北京明略软件系统有限公司 A method, device, equipment and readable medium for address recommendation
CN114580499B (en)*2022-01-262024-12-03中国烟草总公司陕西省公司 A method, medium and device for predicting tobacco-related delivery by integrating spatiotemporal and network topological characteristics
CN114780656B (en)*2022-03-292025-08-19招商银行股份有限公司Address information standardization method, device, equipment and medium
CN114840623B (en)*2022-04-222025-06-17京东城市(北京)数字科技有限公司 Position determination method, device, electronic device and storage medium
CN115470307A (en)*2022-09-022022-12-13浙江大华技术股份有限公司 A method and device for address matching
CN115174638B (en)*2022-09-062022-12-23广东邦盛新能源科技发展有限公司Networking method and system for photovoltaic panel data acquisition equipment
CN116008481B (en)*2023-01-052024-06-25山东理工大学Air pollutant monitoring method and device based on large-range ground monitoring station
CN116401331B (en)*2023-04-242025-06-24浪潮卓数大数据产业发展有限公司Chinese address administrative division standardization method, system and equipment
CN116501827B (en)*2023-06-262023-09-12北明成功软件(山东)有限公司BIM-based market subject and building address matching and positioning method
CN117235102A (en)*2023-09-152023-12-15以萨技术股份有限公司Population standard address matching method and system based on analytical data warehouse
CN120429333B (en)*2025-07-042025-09-09中国石油大学(华东) Address recommendation method and electronic device combining spatial semantics and retrieval-enhanced generation

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102955832A (en)*2011-08-312013-03-06深圳市华傲数据技术有限公司Correspondence address identifying and standardizing system
CN102955833A (en)*2011-08-312013-03-06深圳市华傲数据技术有限公司Correspondence address identifying and standardizing method
US8732435B1 (en)*2008-07-302014-05-20Altera CorporationSingle buffer multi-channel de-interleaver/interleaver
CN109933797A (en)*2019-03-212019-06-25东南大学 Geocoding method and system based on Jieba word segmentation and address thesaurus
CN109960795A (en)*2019-02-182019-07-02平安科技(深圳)有限公司A kind of address information standardized method, device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9753945B2 (en)*2013-03-132017-09-05Google Inc.Systems, methods, and computer-readable media for interpreting geographical search queries
CN110019617B (en)*2017-12-052022-05-20腾讯科技(深圳)有限公司Method and device for determining address identifier, storage medium and electronic device
CN110569322A (en)*2019-07-262019-12-13苏宁云计算有限公司Address information analysis method, device and system and data acquisition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8732435B1 (en)*2008-07-302014-05-20Altera CorporationSingle buffer multi-channel de-interleaver/interleaver
CN102955832A (en)*2011-08-312013-03-06深圳市华傲数据技术有限公司Correspondence address identifying and standardizing system
CN102955833A (en)*2011-08-312013-03-06深圳市华傲数据技术有限公司Correspondence address identifying and standardizing method
CN109960795A (en)*2019-02-182019-07-02平安科技(深圳)有限公司A kind of address information standardized method, device, computer equipment and storage medium
CN109933797A (en)*2019-03-212019-06-25东南大学 Geocoding method and system based on Jieba word segmentation and address thesaurus

Cited By (39)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2021017679A1 (en)*2019-07-262021-02-04苏宁易购集团股份有限公司Address information parsing method and apparatus, system and data acquisition method
CN113076746A (en)*2020-01-062021-07-06阿里巴巴集团控股有限公司Data processing method and system, storage medium and computing device
CN113076746B (en)*2020-01-062024-05-31阿里巴巴集团控股有限公司Data processing method and system, storage medium and computing device
CN113111229B (en)*2020-02-132024-04-12北京明亿科技有限公司Regular expression-based alarm receiving text track address extraction method and device
CN113111229A (en)*2020-02-132021-07-13北京明亿科技有限公司Regular expression-based method and device for extracting track-to-ground address of alarm receiving and processing text
CN113111230A (en)*2020-02-132021-07-13北京明亿科技有限公司Regular expression-based alarm receiving and processing text household address extraction method and device
CN113111230B (en)*2020-02-132024-04-12北京明亿科技有限公司Regular expression-based alarm receiving text home address extraction method and device
CN113553847A (en)*2020-04-242021-10-26中国电信股份有限公司 Method, apparatus, system and storage medium for parsing address text
CN111523647B (en)*2020-04-262023-11-14南开大学Network model training method and device, feature selection model, method and device
CN111523647A (en)*2020-04-262020-08-11南开大学Network model training method and device, and feature selection model, method and device
US20210350375A1 (en)*2020-05-112021-11-11Paypal, Inc.Determination of geographic coordinates using machine learning techniques
CN111901450B (en)*2020-07-152023-04-18安徽淘云科技股份有限公司Entity address determination method, device, equipment and storage medium
CN111901450A (en)*2020-07-152020-11-06安徽淘云科技有限公司Entity address determination method, device, equipment and storage medium
CN112148819A (en)*2020-08-172020-12-29北京来也网络科技有限公司Address recognition method and device combining RPA and AI
CN112269861A (en)*2020-10-092021-01-26和美(深圳)信息技术股份有限公司Corpus generation method and system of intelligent robot
CN112257413B (en)*2020-10-302022-05-17深圳壹账通智能科技有限公司Address parameter processing method and related equipment
CN112257413A (en)*2020-10-302021-01-22深圳壹账通智能科技有限公司Address parameter processing method and related equipment
CN112488200A (en)*2020-11-302021-03-12上海寻梦信息技术有限公司Logistics address feature extraction method, system, equipment and storage medium
CN112559661A (en)*2020-12-092021-03-26北京百度网讯科技有限公司Method and device for retrieving address type and electronic equipment
CN112559661B (en)*2020-12-092024-03-01北京百度网讯科技有限公司Method and device for retrieving address type and electronic equipment
CN112801155A (en)*2021-01-202021-05-14廖彩红Business big data analysis method based on artificial intelligence and server
CN112801155B (en)*2021-01-202021-10-26贵州江南航天信息网络通信有限公司Business big data analysis method based on artificial intelligence and server
CN112818685A (en)*2021-01-292021-05-18上海寻梦信息技术有限公司Address matching method and device, electronic equipment and storage medium
CN113138985A (en)*2021-04-222021-07-20重庆长安汽车股份有限公司GPS data analysis method and system
CN113255346A (en)*2021-07-012021-08-13湖南工商大学Address element identification method based on graph embedding and CRF knowledge integration
CN113592037B (en)*2021-08-262023-11-24吉奥时空信息技术股份有限公司Address matching method based on natural language inference
CN113592037A (en)*2021-08-262021-11-02武大吉奥信息技术有限公司Address matching method based on natural language inference
CN113642313B (en)*2021-09-022024-03-29阿里巴巴达摩院(杭州)科技有限公司Address text processing method, device, equipment, storage medium and program product
CN113642313A (en)*2021-09-022021-11-12阿里巴巴达摩院(杭州)科技有限公司Address text processing method, device, equipment, storage medium and program product
CN113837699A (en)*2021-09-292021-12-24深圳云路信息科技有限责任公司 A three-segment code parsing and processing method and device based on deep learning
CN114301629A (en)*2021-11-262022-04-08北京六方云信息技术有限公司 IP detection method, device, terminal device and storage medium
CN114328886A (en)*2021-12-142022-04-12上海捷晓信息技术有限公司Intelligent logistics address entity recognition system based on deep learning
CN114463053A (en)*2022-01-212022-05-10浪潮卓数大数据产业发展有限公司Enterprise attribution classification method and system
CN114756639A (en)*2022-04-192022-07-15城云科技(中国)有限公司Address standardization model group, construction method and application thereof
CN114780660A (en)*2022-04-272022-07-22深圳依时货拉拉科技有限公司Door address duplicate removal method, device, equipment and storage medium
CN115248837B (en)*2022-09-212022-12-23中科雨辰科技有限公司Data processing system for obtaining geographic entity of text
CN115248837A (en)*2022-09-212022-10-28中科雨辰科技有限公司Data processing system for obtaining geographic entity of text
CN115577065A (en)*2022-12-092023-01-06中信证券股份有限公司Address resolution method and device
CN119130610A (en)*2024-11-082024-12-13乐麦信息技术(杭州)有限公司 A method for processing address database of e-commerce platform

Also Published As

Publication numberPublication date
CA3145918A1 (en)2021-02-04
WO2021017679A1 (en)2021-02-04

Similar Documents

PublicationPublication DateTitle
CN110569322A (en)Address information analysis method, device and system and data acquisition method
CN109376222B (en)Question-answer matching degree calculation method, question-answer automatic matching method and device
CN112328909B (en)Information recommendation method and device, computer equipment and medium
CN118093962A (en)Data retrieval method, device, system, electronic equipment and readable storage medium
CN110674636B (en)Power consumption behavior analysis method
CN113591881B (en)Intention recognition method and device based on model fusion, electronic equipment and medium
CN113887930A (en)Question-answering robot health degree evaluation method, device, equipment and storage medium
CN116431813A (en)Intelligent customer service problem classification method and device, electronic equipment and storage medium
CN113204662B (en) Method, device and computer equipment for predicting user group based on photo-search behavior
CN115630221A (en)Terminal application interface display data processing method and device and computer equipment
CN114036921B (en)Policy information matching method and device
CN114756654A (en)Dynamic place name and address matching method and device, computer equipment and storage medium
CN114218354A (en) Text analysis method, device, computer equipment and storage medium
CN117633004A (en)Query method, query device, electronic equipment and storage medium
CN116304851A (en)Data standard determining method, apparatus, device, medium and computer program product
CN116795978A (en)Complaint information processing method and device, electronic equipment and medium
JP7272846B2 (en) Document analysis device and document analysis method
CN115617790A (en)Data warehouse creation method, electronic device and storage medium
CN113515383B (en)System resource data distribution method and device
CN116484230B (en)Method for identifying abnormal business data and training method of AI digital person
CN114943234B (en)Enterprise name linking method, enterprise name linking device, computer equipment and storage medium
CN114817526B (en)Text classification method and device, storage medium and terminal
CN119167316B (en)Intelligent combination method and device for multi-channel hotel data
CN117036008B (en)Automatic modeling method and system for multi-source data
CN118626687B (en)Address information automatic identification completion method and device and electronic equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20191213

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp