Specific embodiment
To explain the technical content, the achieved purpose and the effect of the present invention in detail, below in conjunction with embodiment and cooperate attachedFigure is explained.
Fig. 1 is please referred to, the county domain electric business developing state appraisal procedure based on web crawlers technology that the present invention provides a kind of,The method are as follows:
The county domain electric quotient data of each electric business platform is crawled using web crawlers technology;
The county domain electric quotient data crawled is subjected to data cleansing, is parsed into structural data and unstructured data;
County domain electric quotient data after data cleansing is subjected to data characteristics extraction according to sales situation, extracts county domain advantageElectric business shop;
County domain electric quotient data after data cleansing is substituted into county domain electric business Development Assessment model and is calculated, county domain is obtainedThe developing state assessment result in advantage electric business shop, wherein county domain electric business Development Assessment model are as follows: Y=α 1x1+ α 2x2+ α 3x3+α 4x4+ α 5x5+c, wherein independent variable x1 is training system, and independent variable x2 is brand, and independent variable x3 is the talent, and independent variable x4 isTraining service, independent variable x5 are logistics system, and α 1, α 2, α 3, α 4, α 5 are regulated variable, and α 1+ α 2+ α 3+ α 4+ α 5=1, c areDynamic gene, dependent variable Y are shop development scoring.
County domain electric quotient data is obtained from each electric business platform using crawler technology is convenient, the county domain electric quotient data crawled is through numberAccording to the structural data and unstructured data become after cleaning convenient for classified finishing, it is excellent to extract the county sales situation Hao Ge domainGesture electric business shop is assessed by the analysis of five dimensions aspect situation to these advantage electric business shops, on the one hand searches each countyThe Pros and Cons in domain after on the other hand finding out Pros and Cons, are sent out future convenient for government staff and each electric business shopExhibition carries out guidance improvement, influence of this five dimension independents variable for shop development scoring be it is linear, convenient for finding each dimensionSpend the influence developed to shop.
Further, the county domain electric quotient data of each electric business platform is crawled using web crawlers technology, specifically:
Service interface is registered to dubbo service centre by the crawler end for being distributed in each electric business platform;
Control terminal will start acquisition DC instruction and be sent to dubbo service centre;
Dubbo service centre receive control terminal transmission start obtain DC instruction, and dispatch each crawler end instituteThe service of registration is started to work;
The county domain electric quotient data crawled on each electric business platform is sent to dubbo service centre by crawler end;
The county domain electric quotient data received is sent to control terminal by dubbo service centre;
Control terminal receives the county domain electric quotient data that dubbo service centre sends;
Control terminal will stop acquisition DC instruction and be sent to dubbo service centre;
The stopping that dubbo service centre receives control terminal transmission obtains DC instruction, and dispatches each crawler end and stopOnly crawl data.
Using dubbo service framework, remotely crawl that data capability is strong, and error rate is low.
Further, the county domain electric quotient data crawled is subjected to data cleansing, is parsed into structural data and non-knotStructure data, specifically: the county domain electric quotient data crawled includes web data, picture, XML data and json data, is incited somebody to actionThese data are parsed into structural data and unstructured data, wherein structural data includes the text information of commodity, non-knotStructure data include the pictorial information of commodity.
Further, the county domain electric quotient data after data cleansing is subjected to data characteristics extraction according to sales situation, extractedCounty, advantage electric business shop, domain out, specifically:
Resource data table, true table and dimension table are established, the county domain electric quotient data after data cleansing is accessed respectively to moneyAmong source data table, true table and dimension table, wherein the information of resource data table includes electric business platform, delivery county domain, receive countyDomain;The information of true table include store name claim, brand name, product name, sales volume, sales volume, commodity details;Dimension tableInformation includes training system, brand, the talent, training service, logistics system;
Based on sales volume, sales volume, according to the screening conditions of layout, screening extracts county, advantage electric business shop, domain.
Data source is crawled convenient for knowing by resource data table, electric business actual quotient in county domain is known by true tableProduct situation and sales situation are convenient for knowing that the factor for influencing county, electric business shop, domain is convenient for after establishing three tables by dimension tableIt is subsequent that data are carried out to utilize processing.
Further, for county domain electric business Development Assessment model, specifically:
The index of training system x1 is made by whether providing value-added service x11, objective comment x12, replying situation x13 and processIt is formed with time x14, enables x1=x11+x12+x13+x14;
There is number x22 by brand classification number x21, brand, category sales volume x23, continuously increases brand in the index of brand x2X24 and brand sales volume x25 composition, enables x2=x21+x22+x23+x24+x25;
The index of talent x3 is by talent's number x31, talent's increased numbers x32, talent's educational background index x33, talent training durationX34, talent small towns distribution x35 composition, enable x3=x31+x32+x33+x34+x35;
The index of training service x4 is made of number of trainees x41, training number x42, training qualification rate x43, enables x4=x41+x42+x43;
The index of logistics system x5 is sent to ground x52 by logistics duration x51, logistics, logistics evaluation x53 is formed, and enables x5=x51+x52+x53。
By five training system, brand, the talent, training service, logistics system each automatic-refinings of dimension, come each county of analyzing influenceThe development factors in the advantage electric business shop in domain, data are comprehensive, convenient for searching reason, and become by changing each independent variable or reconcilingAmount, can estimate county domain electric business developing state, be that the subsequent electric business development of government and each enterprise has directive significance.
Referring to figure 2., the county domain electric business developing state assessment system based on web crawlers technology that the present invention provides a kind of,Including crawler module, data cleansing module, data characteristics extraction module and county domain electric business evaluation module;Wherein
The crawler module, for crawling the county domain electric quotient data of each electric business platform using web crawlers technology;
The data cleansing module, the county domain electric quotient data for that will crawl carry out data cleansing, are parsed into structureChange data and unstructured data;
The data characteristics extraction module, for counting the county domain electric quotient data after data cleansing according to sales situationAccording to feature extraction, county, advantage electric business shop, domain is extracted;
The county domain electric business evaluation module is commented for the county domain electric quotient data after data cleansing to be substituted into the development of county domain electric businessEstimate in model and calculated, obtains the developing state assessment result in county domain advantage electric business shop, wherein the electric business Development Assessment of county domainModel are as follows: Y=α 1x1+ α 2x2+ α 3x3+ α 4x4+ α 5x5+c, wherein independent variable x1 is training system, and independent variable x2 is brand,Independent variable x3 is the talent, and independent variable x4 is training service, and independent variable x5 is logistics system, and α 1, α 2, α 3, α 4, α 5 are to adjust to becomeAmount, and α 1+ α 2+ α 3+ α 4+ α 5=1, c are Dynamic gene, dependent variable Y is shop development scoring.
Specifically, crawler module, data cleansing module, the data characteristics county extraction module Jun Yu domain electric business evaluation module connectIt connects, crawler module is also connect with each electric business platform, transfers data easy to use, from five dimensions to multiple advantages in each county domainElectric business shop developing state carries out analysis assessment, convenient for finding electric business shop development in this county domain by these advantage electric business shopsPros and Cons, to play directive function to electric business shop future development in this county domain.
Further, the crawler module specifically further includes crawler end, dubbo service centre and control terminal,
The crawler end is distributed in each electric business platform, for service interface to be registered to dubbo service centre;ForThe county domain electric quotient data crawled on each electric business platform is sent to dubbo service centre;
The dubbo service centre starts to obtain DC instruction for receive control terminal transmission, and dispatches eachThe service that crawler end is registered is started to work;County domain electric quotient data for will receive is sent to control terminal;For receivingThe stopping that control terminal is sent obtains DC instruction, and dispatches each crawler end and stop crawling data;
The control terminal is sent to dubbo service centre for that will start acquisition DC instruction;For receivingThe county domain electric quotient data that dubbo service centre sends;It is sent in dubbo service for acquisition DC instruction will to be stoppedThe heart.
Control terminal is connect with county domain electric business evaluation module, and crawler end is connect with each electric business platform, and whether control terminal sending climbsAccess evidence start stop control instruction, dubbo service centre, which refers to the control that control terminal issues to be transmitted to, remotely crawls end, make beThe long-range acquisition data capability of system is strong, and error rate is low.
Further, in the crawler module, the county domain electric quotient data crawled includes web data, picture, XML numberAccording to json data;In the data cleansing module, structural data includes the text information of commodity, and unstructured data includesThe pictorial information of commodity.
Further, in the data characteristics extraction module, further include build table module and screening extraction module, wherein
It is described to build table module, for establishing resource data table, true table and dimension table, by the county domain electric business after data cleansingData are accessed respectively among resource data table, true table and dimension table, wherein the information of resource data table includes that electric business is flatPlatform, delivery county domain, county domain of receiving;The information of true table include store name claim, brand name, product name, sales volume, saleVolume, commodity details;The information of dimension table includes training system, brand, the talent, training service, logistics system;
The screening extraction module, for based on sales volume, sales volume, according to the screening conditions of layout, screening to be mentionedTake out county domain advantage electric business shop.
It builds table module all types of data is subjected to classification and build table, convenient for callings of checking of data, screening extraction module passes throughScreening to all kinds of list datas in table module are built extracts the advantage electric business shop in each county domain, by these advantage electric business shopsAs typical case, to analyze the advantage disadvantage of each county, electric business shop, domain development.
Further, in the county domain electric business evaluation module, for county domain electric business Development Assessment model, specifically:
The index of training system x1 is made by whether providing value-added service x11, objective comment x12, replying situation x13 and processIt is formed with time x14, enables x1=x11+x12+x13+x14;
There is number x22 by brand classification number x21, brand, category sales volume x23, continuously increases brand in the index of brand x2X24 and brand sales volume x25 composition, enables x2=x21+x22+x23+x24+x25;
The index of talent x3 is by talent's number x31, talent's increased numbers x32, talent's educational background index x33, talent training durationX34, talent small towns distribution x35 composition, enable x3=x31+x32+x33+x34+x35;
The index of training service x4 is made of number of trainees x41, training number x42, training qualification rate x43, enables x4=x41+x42+x43;
The index of logistics system x5 is sent to ground x52 by logistics duration x51, logistics, logistics evaluation x53 is formed, and enables x5=x51+x52+x53。
Five training system, brand, the talent, training service, logistics system dimensions are further segmented project and come pair by systemThe advantage electric business shop in each county domain carries out marking assessment, and data are comprehensive, conveniently find out each county domain and each electric business shop advantage andDisadvantage can estimate county domain electric business developing state, to government's future work and each electric business by changing each independent variable or reconciling variableThe improvement of shop future all has directive significance.
Embodiment one provided by the invention are as follows:
A kind of county domain electric business developing state appraisal procedure based on web crawlers technology, the method are as follows:
The county domain electric quotient data of each electric business platform is crawled using web crawlers technology by dubbo service framework;
The county domain electric quotient data crawled is subjected to data cleansing, is parsed into structural data and unstructured data;
County domain electric quotient data after data cleansing is subjected to data characteristics extraction according to sales situation, extracts county domain advantageElectric business shop, for example, the advantage electric business shop screening conditions in each county domain can be one or more of in 15 set forth below:
1) continuous three months growth shop numbers;
2) shop that continuous three months sales volumes increase;
3) shop of continuous sales volume reduction in three months;
4) brand that continuous three months sales volumes increase;
5) brand of continuous sales volume reduction in three months;
6) category that continuous three months sales volumes increase;
7) brand that continuous three months troughputs increase;
8) sales volume every month Sudden Changing Rate increases by 30% shop;
9) sales volume every month Sudden Changing Rate increases by 50% category;
10) sales volume every month Sudden Changing Rate increases by 30% brand;
11) duration is established in average shop with limited laibility;
12) shop increased suddenly every month;
13) shop registered place statistical number;
14) favorable comment number is more than 97% shop;
15) favorable comment number is more than 97% category.
County domain electric quotient data after data cleansing is substituted into county domain electric business Development Assessment model and is calculated, county domain is obtainedThe developing state assessment result in advantage electric business shop, wherein county domain electric business Development Assessment model are as follows: Y=α 1x1+ α 2x2+ α 3x3+α 4x4+ α 5x5+c, wherein independent variable x1 is training system, and independent variable x2 is brand, and independent variable x3 is the talent, and independent variable x4 isTraining service, independent variable x5 are logistics system, and α 1, α 2, α 3, α 4, α 5 are regulated variable, and α 1+ α 2+ α 3+ α 4+ α 5=1, c areDynamic gene, dependent variable Y are shop development scoring, wherein
The index of training system x1 is made by whether providing value-added service x11, objective comment x12, replying situation x13 and processIt is formed with time x14, enables x1=x11+x12+x13+x14;
There is number x22 by brand classification number x21, brand, category sales volume x23, continuously increases brand in the index of brand x2X24 and brand sales volume x25 composition, enables x2=x21+x22+x23+x24+x25;
The index of talent x3 is by talent's number x31, talent's increased numbers x32, talent's educational background index x33, talent training durationX34, talent small towns distribution x35 composition, enable x3=x31+x32+x33+x34+x35;
The index of training service x4 is made of number of trainees x41, training number x42, training qualification rate x43, enables x4=x41+x42+x43;
The index of logistics system x5 is sent to ground x52 by logistics duration x51, logistics, logistics evaluation x53 is formed, and enables x5=x51+x52+x53。
Further, it is also necessary to which model training, step are carried out to county domain electric business Development Assessment model are as follows:
Original is crawled into data input training set there are six the moon, data cleansing is carried out using data integration tool, by dataTraining set data after cleaning carries out ETL data pick-up, writes data script, establishes resource data table, true table and dimension table,Data characteristics extraction is carried out, the advantage electric business shop in each county domain is obtained.
Further, the numberical range of each independent variable in county domain electric business Development Assessment model is defined, for example, defining independent variableX1, independent variable x2, independent variable x3, independent variable x4, independent variable x5 numberical range be 0~100 point;Then x11, x12, x13,X14, x21, x22, x23, x24, x25, x31, x32, x33, x34, x35, x41, x42, x43, x51, x52, x53 may be configured as tableNumberical range in 1.
Table 1
Currently each independent variable carries out reasonable value in the advantage shop in the county Jiang Ge domain;
Each adjustment variable and Dynamic gene are subjected to value, for example, enabling α 1=α 2=α 3=α 4=α 5=0.2, c=0;
The value of α 1, x1, α 2, x2, α 3, x3, α 4, x4, α 5, x5, c are brought into county domain electric business Development Assessment model and be can be obtained respectivelyThe value of shop development scoring Y;
The value of α 1, x1, α 2, x2, α 3, x3, α 4, x4, α 5, x5, c, Y are subjected to visualization output on large screen, may be used alsoResource data table, true table and dimension table are subjected to visualization output on large screen, checked convenient for governmental personnel and each enterpriseAnalysis;
Change α 1, x1, α 2, x2, α 3, x3, α 4, x4, α 5, any one value in x5, c, the value of Y can be changed, thus justIt is estimated in developing scoring Y to each shop.For example, c, as Dynamic gene, the logistics system x5 for representing county domain is promoted10, then each electric business development scoring Y in this county domain increases by 10.
Embodiment two provided by the invention are as follows:
On the basis of example 1, it is assessed by taking Fujian Province's Zhangpu County as an example, specifically:
Nearly 2 years Zhangpu County electric quotient datas are crawled in each electric business platform, data cleansing is carried out to Zhangpu County electric quotient dataAfterwards, then an advantage electric business shop of Zhangpu County is extracted, for example, Zhangpu County fruit is relatively abundant, Special Agriculture Product is extracted, in conjunction with ZhangpuPosition and flow situation in the ground in county, comment the training system of Zhangpu County, brand, the talent, training service, logistics systemEstimate, specifically:
In terms of logistics system, the Zhangpu County geographical location Xiamen Dong Jie borders on Shantou in the south, is separated by a narrow strip of water with Taiwan, sea6, gulf, 5, bay;Zhang imperial edict highway, 324 national highways, tall building depth railway, the coastal big channel in ZhangZhou are passed by, the stockaded village water transport You Xia and oldTown harbour, Shen sea highway are equipped with 3 interworking apertures in Zhangpu, and logistic advantages are fairly obvious, therefore to Zhangpu logistics system x5Point value of evaluation, logistics duration x51, logistics are sent to ground x52, logistics evaluation x53 gives a mark respectively are as follows: 32 points, 28 points, 32 points, then x5=x51+x52+x53=92;
In terms of brand, the Special Agriculture Product of Zhangpu County belongs to Fujian Province, to Zhangpu County brand x2 point value of evaluation, brand classNot Shuo x21, brand occur number x22, category sales volume x23, it is continuous increase brand x24, brand sales volume x25 gives a mark respectively are as follows:16 points, 10 points, 16 points, 18 points, 16 points, then x2=x21+x22+x23+x24+x25=76;
In terms of training service, training system, the talent, Zhangpu County carries out the talent training service of profession to the talent, hasRelatively complete training system, Talent System construction, assessment marking result are as follows:
To Zhangpu County training service x4 point value of evaluation, number of trainees x41, training number x42, training qualification rate x43 differenceMarking are as follows: 28 points, 28 points, 38 points, then x4=x41+x42+x43=94;
To Zhangpu County training system x1 point value of evaluation, if provide value-added service x11, objective comment x12, reply situationX13 and process are given a mark respectively using time x14 are as follows: 23 points, 22 points, 24 points, 24 points, then x1=x11+x12+x13+x14=93;
To Zhangpu County talent's x3 point value of evaluation, talent's number x31, talent's increased numbers x32, talent's educational background index x33, Ren CaipeiInstruction duration x34, talent small towns distribution x35 give a mark respectively are as follows: 18 points, 19 points, 18 points, 17 points, 16 points, then x3=x31+x32+X33+x34+x35=88;
It brings the value of above-mentioned each independent variable into county domain electric business Development Assessment model and carries out recurrence calculating, obtain Zhangpu County shopDevelopment scoring are as follows: Y=α 1x1+ α 2x2+ α 3x3+ α 4x4+ α 5x5+c=88.6.
If doing effort again in terms of brand, making Dynamic gene c=1, then Zhangpu County shop development scoring Y can promote 1 again,Become 89.6.
It is, of course, also possible to which nearly two annual turnover of Zhangpu County is taken to grow beyond the advantage electric business shop in 30% shop, analysis is drawnPlay the Pros and Cons of these electric business shops development.
Embodiment three provided by the invention are as follows:
On the basis of example 1, it is assessed by taking Fujian Province Nanjing County as an example, specifically:
Nearly 2 years Nanjing County electric quotient datas are crawled in each electric business platform, data cleansing is carried out to Nanjing County electric quotient dataAfterwards, then an advantage electric business shop of Nanjing County is extracted, for example, Nanjing County main development tourist industry, extracts travelling products, in conjunction withPosition and flow situation in the ground of Nanjing County carry out the training system of Nanjing County, brand, the talent, training service, logistics systemAssessment, specifically:
In terms of logistics system, Nanjing County is located at linchpin county, Zhangzhou City, Fujian Province, and Kowloon Jiangxi small stream upstream is public away from Xiamen more than 90In, 319 national highways, the imperial high speed in Zhang, Shen Hai high speed multiple line, Jing Hai high speed, imperial tall building railway wear border and mistake, the southeast and Xiangcheng District, Longhai CityCity adjoins, east neighbour Huaan County, and even Yongding District, north border on Longyan, Zhangping City in west, and south connects Pinghe County, and logistic advantages ten are clearly demarcatedIt is aobvious, therefore to Nan Jing's logistics system x5 point value of evaluation, logistics duration x51, logistics are sent to ground x52, logistics evaluation x53 gives a mark respectivelyAre as follows: 30 points, 27 points, 30 points, then x5=x51+x52+x53=87;
In terms of brand, the travelling products of Nanjing County are more famous, to Nanjing County brand x2 point value of evaluation, brand classification numberThere is number x22, category sales volume x23, continuous growth brand x24, brand sales volume x25 and give a mark respectively in x21, brand are as follows: 13 points,17 points, 18 points, 16 points, 16 points, then x2=x21+x22+x23+x24+x25=80;
In terms of training service, training system, the talent, Nanjing County carries out the talent training service of profession to the talent, hasRelatively complete training system, Talent System construction, assessment marking result are as follows:
To Nanjing County training service x4 point value of evaluation, number of trainees x41, training number x42, training qualification rate x43 differenceMarking are as follows: 27 points, 27 points, 36 points, then x4=x41+x42+x43=90;
To Nanjing County training system x1 point value of evaluation, if provide value-added service x11, objective comment x12, reply situationX13 and process are given a mark respectively using time x14 are as follows: 24 points, 23 points, 24 points, 23 points, then x1=x11+x12+x13+x14=94;
To Nanjing County talent's x3 point value of evaluation, talent's number x31, talent's increased numbers x32, talent's educational background index x33, Ren CaipeiInstruction duration x34, talent small towns distribution x35 give a mark respectively are as follows: 17 points, 18 points, 18 points, 17 points, 18 points, then x3=x31+x32+X33+x34+x35=88;
It brings the value of above-mentioned each independent variable into county domain electric business Development Assessment model and carries out recurrence calculating, obtain Nanjing County shopDevelopment scoring are as follows: Y=α 1x1+ α 2x2+ α 3x3+ α 4x4+ α 5x5+c=87.8.
It is, of course, also possible to take before the ranking of Nanjing County 30 advantage electric business shop, analysis causes the development of these electric business shops to be commentedDivide low reason, government and enterprise is instructed to improve.
In conclusion county domain electric business developing state appraisal procedure provided by the invention based on web crawlers technology and beingSystem carries out analysis assessment, the development of the county Shi Ge domain electric business to county domain electric business developing state from multiple dimensions convenient for obtaining electric quotient dataPros and Cons be more obviously seen, there is directive significance to the government work of subsequent county domain and electric business enterprise development.
The above description is only an embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hairEquivalents made by bright specification and accompanying drawing content are applied directly or indirectly in relevant technical field, similarly includeIn scope of patent protection of the invention.