Movatterモバイル変換


[0]ホーム

URL:


CN110457556A - Distributed reptile system architecture, the method and computer equipment for crawling data - Google Patents

Distributed reptile system architecture, the method and computer equipment for crawling data
Download PDF

Info

Publication number
CN110457556A
CN110457556ACN201910601110.6ACN201910601110ACN110457556ACN 110457556 ACN110457556 ACN 110457556ACN 201910601110 ACN201910601110 ACN 201910601110ACN 110457556 ACN110457556 ACN 110457556A
Authority
CN
China
Prior art keywords
crawler
module
data
task
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910601110.6A
Other languages
Chinese (zh)
Other versions
CN110457556B (en
Inventor
车驰
李钢
权佳成
谭瑞
张瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Tianying Information Technology Co ltd
Original Assignee
Chongqing Financial Assets Exchange LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Financial Assets Exchange LLCfiledCriticalChongqing Financial Assets Exchange LLC
Priority to CN201910601110.6ApriorityCriticalpatent/CN110457556B/en
Publication of CN110457556ApublicationCriticalpatent/CN110457556A/en
Application grantedgrantedCritical
Publication of CN110457556BpublicationCriticalpatent/CN110457556B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

This application discloses a kind of distributed reptile system architecture, the method and computer equipment of data are crawled, wherein method includes: to obtain crawler task using task release module, and crawler task is sent to crawler module;After crawler module gets crawler task, call and crawl into crawler service module corresponding target crawler is required to service, and serviced using target crawler, to targeted website on crawl original crawler data;By the original crawler data crawled storage to preset first memory module.The distributed reptile system architecture of the application, distributed reptile crawl method and computer equipment of data etc., crawler service module is set, the bottom demand of entire crawler system is packaged, carry out modularization, the processing of serviceization, the workload of developer is reduced, and does not limit the development language of developer, reduces ability need;The stability and extended capability of crawler system are promoted by architecture design.

Description

Distributed reptile system architecture, the method and computer equipment for crawling data
Technical field
This application involves data collecting field is arrived, especially relates to a kind of distributed reptile system architecture, crawls dataMethod and computer equipment.
Background technique
Current crawler Platform Designing is customized exploitation mainly for single business scenario, total between different crawlersIt is to need the independent module for writing demand, which results in the stabilizations that most of crawler system does not account for whole systemProperty and versatility, the exploitation maintenance efficiency of developer are low.
Summary of the invention
The main purpose of the application is to provide a kind of distributed reptile system architecture, crawl the method for data and computer is setIt is standby, it is intended to solve distributed reptile system mine in the prior art and build stability and poor universality, effect is safeguarded in the exploitation of developerThe low problem of rate.
In order to achieve the above-mentioned object of the invention, the application proposes that a kind of distributed reptile system architecture, the design of the framework makeWith HTTP service register mode, different modules is isolated, between different modules using message queue mode intoThe mutual access of row, the framework include:
Task release module, for issuing crawler task;
Crawler service module, for storing with crawler service different existing for service form, different crawler clothesDifferent crawler tasks is completed in business;
Crawler module is arrived for receiving the crawler task of the task release module publication, and according to the crawler taskCrawler service corresponding with the crawler task is called in the crawler service module, utilizes crawler service to targeted websiteIt carries out crawling movement, obtains corresponding original crawler data;
First data memory module, for storing the original crawler data;
Data cleansing module, for cleaning the original crawler data in first data memory module, after obtaining screeningThe first crawler data;
Second data memory module, for storing the first crawler data;
Back Administration Module is used to form visualization interface, realizes human-computer interaction on the visualization interface;
Then log and error handling module are obtained for obtaining the log that other modules generate in the system architectureError log in the day handles the corresponding event of the error log according to preset rules.
The application also provides a kind of method that distributed reptile crawls data, based on above-mentioned distributed reptile system trayStructure, comprising:
Crawler task is obtained using the task release module, and the crawler task is sent to the crawler module,The crawler task includes targeted website and crawls requirement;
After the crawler module gets the crawler task, calls into the crawler service module and wanted with described crawlAsk corresponding target crawler to service, and serviced using the target crawler, to the targeted website on crawl original crawler data,Wherein, at least one is packaged in the crawler service module with the crawler service of service form encapsulation;
By the original crawler data storage crawled to preset first memory module.
Further, the described the step of crawler task is sent to the crawler module, comprising:
The task release module sends the crawler task to the crawler module in the form of message queue.
Further, the step of original crawler data storage that will be crawled is to preset first memory module itAfterwards, which comprises
The original crawler data in first memory module are cleaned using data cleansing module, after obtaining cleaningThe first crawler data, and by the first crawler data storage to preset second memory module.
Further, the method also includes:
The log of other modules in the distributed reptile system architecture is obtained using the log and error handling moduleData, and obtain the error log in the daily record data;
The corresponding event of the error log is handled according to preset rules.
Further, after the step of event corresponding according to the preset rules processing error log, comprising:
Generate the error reporting of the corresponding event using the log and error handling module, and by the error reportingIt is sent to preset mailbox.
Further, the step of event corresponding according to the preset rules processing error log, comprising:
Judge whether the event is that crawler is failed using the log and error handling module;
If the event is crawler failure, the corresponding crawler task of the event is issued again.
Further, the method, further includes:
Judge whether to receive the incoming administration order of the Back Administration Module;
If so, administration order described in priority processing.
The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computerThe step of program, the processor realizes any of the above-described the method when executing the computer program.
The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journeyThe step of method described in any of the above embodiments is realized when sequence is executed by processor.
The distributed reptile system architecture of the application, distributed reptile crawl the method, computer equipment and calculating of dataMachine readable storage medium storing program for executing, the mode that the design of above-mentioned framework is registered using HTTP service, different modules is isolated, differentModule between using the mode of message queue carry out mutual access.Using this design scheme can reduce system module itBetween coupling, and the asynchronous message processing capacity of message queue can facilitate system with the parallel ability of lifting system data processingIt is carried out when promoting processing capacity extending transversely.Crawler service module is set, it is interior for storing crawler service, by entire crawlerThe bottom demand of system is packaged, and carries out modularization, the processing of service, reduces the workload of developer and unlimitedThe development language of developer processed reduces ability need;The stability and extended capability of crawler system are promoted by architecture design,Suitable for the large-scale crawler system exploitation of multitask;Visual Back Administration Module, so that the operation management of whole systemIt is more reliable efficient.
Detailed description of the invention
Fig. 1 is the structural schematic block diagram of the distributed reptile system architecture of one embodiment of the application;
Fig. 2 is the flow diagram that the distributed reptile of one embodiment of the application crawls the method for data;
Fig. 3 is the structural schematic block diagram for applying for the computer equipment of an embodiment.
The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understoodThe application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, notFor limiting the application.
Referring to Fig.1, the application proposes that a kind of distributed reptile system architecture, the design of the framework use HTTP service firstThe mode of registration is isolated by different modules, carries out mutual visit using the mode of message queue between different modulesIt asks, the framework includes:
Task release module 10, for issuing crawler task;
Crawler service module 20, for storing with crawler service different existing for service form, the different crawlersDifferent crawler tasks is completed in service;
Crawler module 30, for receiving the crawler task of task release module publication, and according to the crawler task,Crawler service corresponding with the crawler task is called into the crawler service module, utilizes crawler service to target networkStation carries out crawling movement, obtains corresponding original crawler data;
First data memory module 40, for storing the original crawler data;
Data cleansing module 50 is screened for cleaning the original crawler data in first data memory moduleThe first crawler data afterwards;
Second data memory module 60, for storing the first crawler data;
Back Administration Module 70, is used to form visualization interface, realizes human-computer interaction on the visualization interface;
Then log and error handling module 80 are obtained for obtaining the log that other modules generate in the system architectureThe error log in the day is taken, handles the corresponding event of the error log according to preset rules.
In the present embodiment, the mode that the design of above-mentioned framework is registered using HTTP service, by different module carry out everyFrom carrying out mutual access using the mode of message queue between different modules.It can be reduced using this design scheme and beUnite module between coupling, and the asynchronous message processing capacity of message queue can with the parallel ability of lifting system data processing,System is facilitated to carry out when promoting processing capacity extending transversely.In above-mentioned framework, using the mode of docker containerization by systemEnvironment, module service, storage system be packaged and are integrated, and the mode that script can be used carries out one-touch portion to systemAdministration, starting.When needing to be deployed to new environment, it is only necessary to container file be migrated to the migration for just completing system, transportedThe deployment of system can be completed in row starting script.In above-mentioned framework, crawlers are not compromised by first floor system development languageIt limits and unified language can only be used to be developed;The basis for using module can be provided in system for different development languagesSoftware support;The written in code that crawler developer only needs to be performed service logic in this way forms the service of corresponding crawler, and by itsIn incoming crawler service module, so that it may complete exploitation, maintenance of entire crawlers etc..
Referring to Fig. 2, the embodiment of the present application also provides a kind of method that distributed reptile crawls data, based on such as above-mentioned implementationThe distributed reptile system architecture of example, comprising steps of
S1, crawler task is obtained using the task release module, and the crawler task is sent to the crawler mouldBlock, the crawler task include targeted website and crawl requirement;
After S2, the crawler module get the crawler task, calls into the crawler service module and climbed with describedTake and corresponding target crawler required to service, and serviced using the target crawler, to the targeted website on crawl original crawlerData, wherein at least one is packaged in the crawler service module with the crawler service of service form encapsulation;
S3, the original crawler data crawled are stored to preset first memory module.
As described in above-mentioned steps S1, above-mentioned crawler task includes as targeted website and the crawling requirement of the task.It is above-mentionedTargeted website is that this crawls the data source of data;Above-mentioned crawl requires to be the type for crawling the requirement of data, for example specifyThe data etc. of function are specified in data, targeted website.It includes a variety of for obtaining the mode of crawler task, for example reception user is directly defeatedThe crawler task entered, or receive the crawler task dispatching that system generates.In one crawler task crawl requirement may include it is moreIt is a, for example require to crawl logon data, and require to crawl image recognition data of identifying code etc..
As described in above-mentioned steps S2, above-mentioned crawler service is to refer to complete the corresponding service for crawling task.It is above-mentioned to climbOne or more preset crawler services are provided in worm service module.Service in crawler service module is usually some correspondencesThe common service for crawling requirement, such as simulation Sign-On services, the image recognition service of identifying code, IP agent pool safeguard serviceDeng.In a specific embodiment, it is provided with an invocation list in crawler service module, is stored in list and is reflected in one-to-oneCrawling for penetrating requires and crawls service, when getting after crawling requirement of crawler task, arrives first lookup and its phase in invocation listSame crawls requirement, then gets target according to mapping relations and crawls service, the target is finally called to crawl service.When above-mentionedInclude in crawler task it is multiple crawl when require, while being called.Then target crawler service to mesh is utilizedMark website crawls data.
As described in above-mentioned steps S3, as by the data crawled storage into the first data memory module.Above-mentioned first depositsStorage module is generally a document storage system, and relative low price, can save storage aspect opens money.
In one embodiment, the above-mentioned the step of crawler task is sent to the crawler module, comprising:
S101, the task release module send the crawler task to the crawler module in the form of message queue.
As described in above-mentioned steps S101, message queue is a container, sends crawler task using the form of message queue,Quickly lateral and distribution extension can be carried out when for large-scale crawler task, improve the processing capacity of crawler task.
In one embodiment, the above-mentioned original crawler data that will be crawled are stored to preset first memory moduleAfter step S3, which comprises
S4, the original crawler data in first memory module are cleaned using data cleansing module, is obtained clearThe first crawler data after washing, and the first crawler data are stored to preset second memory module.
As described in above-mentioned steps S4, the cleaning rule of above-mentioned data cleansing module includes a variety of, for example removes duplicate numberAccording to, incomplete data of removal etc., the data of needs can also be filtered out, repeated data etc. is then removed.Above-mentioned secondMemory module can be the subdata base being arranged in above-mentioned first memory module, for example be a text in the first memory modulePart folder etc..In a specific embodiment, above-mentioned second memory module is a number independently of above-mentioned first memory moduleAccording to library, the cost of the second memory module is higher than above-mentioned first memory module, but more convenient to the management of data etc..BecauseThe data volume of original crawler data is larger, so the first memory module that use cost is low, the first crawler data number after cleaningAccording to measure it is relatively fewer, so management easy to use, but higher cost the second memory module.
In one embodiment, the method that above-mentioned distributed reptile crawls data further include:
S5, the day that other modules in the distributed reptile system architecture are obtained using the log and error handling moduleWill data, and obtain the error log in the daily record data;
S6, the corresponding event of the error log is handled according to preset rules.
In the present embodiment, the method that above-mentioned distributed reptile crawls data is completed, above-mentioned distributed reptile is relied onSystem architecture is realized, is executed above-mentioned the step of such as cleaning original crawler data, is crawled the step of data, can generate corresponding dayWill data, the application can get up these collection of log data, then utilize existing log analysis method, filter out each logThen error log in data finds corresponding event according to error log and carries out corresponding automatic words processing, such as automaticallyRepeat the step of generating error log etc..
In one embodiment, it is above-mentioned according to preset rules handle the corresponding event of the error log step S6 itAfterwards, comprising:
S7, the error reporting that the corresponding event is generated using the log and error handling module, and by the mistakeReport is sent to preset mailbox.
As described in above-mentioned steps S7, as by error log, to result of the time-triggered protocol etc. according to preset requirementMail Contents are generated, then send mail in preset mailbox.Above-mentioned mailbox can be the mailbox of specified developer.It is above-mentionedMailbox can be multiple and different mailboxes, the corresponding developer of each mailbox, to facilitate developer to obtain wrong feelings in timeCondition.Further, receive the receipt that each mailbox is opened, as long as receiving a receipt, will with the receipt it is not corresponding itsIts withdrawing mail, after preventing multiple developers from seeing mail while handling identical problem.
In one embodiment, the above-mentioned step S7 that the corresponding event of the error log is handled according to preset rules, packetIt includes:
S71, judge whether the event is crawler failure using the log and error handling module;
If S72, the event are crawler failures, the corresponding crawler task of the event is issued again.
As described in above-mentioned steps S71 and S72, when crawler failure, mail notification, record can be carried out to developer in timeLower error reason, and crawler task is rejoined in message queue by error handling logic, it is crawled again;It improvesThe stability of process and the function of carrying out automation O&M.
In one embodiment, the method that above-mentioned distributed reptile crawls data further include:
S8, judge whether to receive the incoming administration order of the Back Administration Module;
S9, if so, administration order described in priority processing.
In the present embodiment, above-mentioned Back Administration Module is monitored entire crawler system by way of management of webpageWith management.Start crawler process in such a way that Back Administration Module can be used and upload script and configuration;It can also be observed thatThere is the crawler task of performance bottleneck, the scale of real-time extension crawler module;It can also be realized by Back Administration Module to beingThe monitoring of all crawler tasks and data analysis etc. in system.
The method that the distributed reptile of the embodiment of the present application crawls data is based on above-mentioned distributed reptile system architecture, shouldThe mode that the design of framework is registered using HTTP service, different modules is isolated, and message is used between different modulesThe mode of queue carries out mutual access.The coupling between system module can be reduced using this design scheme, and message teamThe asynchronous message processing capacity of column can with the parallel ability of lifting system data processing, facilitate system when promoting processing capacity intoRow is extending transversely.Crawler service module is set, it is interior for storing crawler service, the bottom demand of entire crawler system is carried outEncapsulation carries out modularization, and the processing of service reduces the workload of developer, and does not limit the exploitation language of developerSpeech reduces ability need;The stability and extended capability of crawler system are promoted by architecture design, and it is extensive to be suitable for multitaskCrawler system exploitation;Visual Back Administration Module, so that the operation management of whole system is more reliable efficient.
Referring to Fig. 3, a kind of computer equipment is also provided in the embodiment of the present application, which can be above-mentioned pipeIt manages server or the corresponding server of management node, internal structure can be as shown in Figure 3.The computer equipment includes logicalCross processor, memory, network interface and the database of system bus connection.Wherein, the processor of the Computer Design is used forCalculating and control ability are provided.The memory of the computer equipment includes non-volatile memory medium, built-in storage.This is non-volatileProperty storage medium is stored with operating system, computer program and database.The internal memory is the behaviour in non-volatile memory mediumThe operation for making system and computer program provides environment.The database of the computer equipment is used for distributed storage crawler system frameThe data such as each module of structure.The network interface of the computer equipment is used to communicate with external terminal by network connection.The meterTo realize a kind of method that distributed reptile crawls data when calculation machine program is executed by processor.
Above-mentioned processor executes the method that above-mentioned distributed reptile crawls data, based on the above embodiment in distribution climbWorm system architecture, comprising: obtain crawler task using the task release module, and the crawler task is sent to described climbErpoglyph block, the crawler task include targeted website and crawl requirement;After the crawler module gets the crawler task, arriveIt is called in the crawler service module and requires corresponding target crawler to service with described crawl, and taken using the target crawlerBusiness, to the targeted website on crawl original crawler data, wherein be packaged in the crawler service module at least one with clothesThe crawler service of business form encapsulation;By the original crawler data storage crawled to preset first memory module.
In one embodiment, the above-mentioned the step of crawler task is sent to the crawler module, comprising: describedBusiness release module sends the crawler task to the crawler module in the form of message queue.
In one embodiment, the above-mentioned original crawler data that will be crawled are stored to preset first memory moduleAfter step, which comprises carried out using data cleansing module to the original crawler data in first memory moduleCleaning, the first crawler data after being cleaned, and the first crawler data are stored to preset second memory module.
In one embodiment, the method that above-mentioned distributed reptile crawls data further include: utilize the log and mistakeProcessing module obtains the daily record data of other modules in the distributed reptile system architecture, and obtains in the daily record dataError log;The corresponding event of the error log is handled according to preset rules.
In one embodiment, after the step of above-mentioned event corresponding according to the preset rules processing error log,It include: the error reporting of the corresponding event to be generated using the log and error handling module, and the false alarm is accusedGive preset mailbox.
In one embodiment, the step of above-mentioned event corresponding according to the preset rules processing error log, comprising:Judge whether the event is that crawler is failed using the log and error handling module;If the event is crawler failure,The corresponding crawler task of the event is issued again.
In one embodiment, the method that above-mentioned distributed reptile crawls data, which is characterized in that the method is also wrappedIt includes: judging whether to receive the incoming administration order of the Back Administration Module;If so, administration order described in priority processing.
It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tiedThe block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
The computer equipment of the embodiment of the present application, is based on above-mentioned distributed reptile system architecture, and the design of the framework usesThe mode of HTTP service registration, different modules is isolated, and is carried out between different modules using the mode of message queueMutual access.The coupling between system module can be reduced using this design scheme, and at the asynchronous message of message queueReason ability can facilitate system to carry out when promoting processing capacity extending transversely with the parallel ability of lifting system data processing.IfCrawler service module is set, it is interior for storing crawler service, the bottom demand of entire crawler system is packaged, module is carried outChange, the processing of service reduces the workload of developer, and does not limit the development language of developer, and reducing ability needsIt asks;The stability and extended capability that crawler system is promoted by architecture design, are opened suitable for the large-scale crawler system of multitaskHair;Visual Back Administration Module, so that the operation management of whole system is more reliable efficient.
One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculatesMachine program realizes a kind of method that above-mentioned distributed reptile crawls data when being executed by processor, based on the above embodiment in pointCloth crawler system framework, comprising: obtain crawler task using the task release module, and the crawler task is sent toThe crawler module, the crawler task include targeted website and crawl requirement;The crawler module gets the crawler and appointsAfter business, is called into the crawler service module and require corresponding target crawler to service with described crawl, and utilize the targetCrawler service, to the targeted website on crawl original crawler data, wherein be packaged at least one in the crawler service moduleA crawler service with service form encapsulation;By the original crawler data storage crawled to preset first memory module.
The method that above-mentioned distributed reptile crawls data is based on above-mentioned distributed reptile system architecture, the design of the frameworkThe mode registered using HTTP service, different modules is isolated, and the mode of message queue is used between different modulesCarry out mutual access.The coupling between system module can be reduced using this design scheme, and the asynchronous of message queue disappearsCeasing processing capacity can facilitate system to carry out lateral expansion when promoting processing capacity with the parallel ability of lifting system data processingExhibition.Crawler service module is set, it is interior to be serviced for storing crawler, the bottom demand of entire crawler system is packaged, intoRow modularization, the processing of service reduce the workload of developer, and do not limit the development language of developer, reduceAbility need;The stability and extended capability of crawler system are promoted by architecture design, are suitable for the large-scale crawler of multitaskSystem development;Visual Back Administration Module, so that the operation management of whole system is more reliable efficient.
In one embodiment, the above-mentioned the step of crawler task is sent to the crawler module, comprising: describedBusiness release module sends the crawler task to the crawler module in the form of message queue.
In one embodiment, the above-mentioned original crawler data that will be crawled are stored to preset first memory moduleAfter step, which comprises carried out using data cleansing module to the original crawler data in first memory moduleCleaning, the first crawler data after being cleaned, and the first crawler data are stored to preset second memory module.
In one embodiment, the method that above-mentioned distributed reptile crawls data further include: utilize the log and mistakeProcessing module obtains the daily record data of other modules in the distributed reptile system architecture, and obtains in the daily record dataError log;The corresponding event of the error log is handled according to preset rules.
In one embodiment, after the step of above-mentioned event corresponding according to the preset rules processing error log,It include: the error reporting of the corresponding event to be generated using the log and error handling module, and the false alarm is accusedGive preset mailbox.
In one embodiment, the step of above-mentioned event corresponding according to the preset rules processing error log, comprising:Judge whether the event is that crawler is failed using the log and error handling module;If the event is crawler failure,The corresponding crawler task of the event is issued again.
In one embodiment, the method that above-mentioned distributed reptile crawls data, which is characterized in that the method is also wrappedIt includes: judging whether to receive the incoming administration order of the Back Administration Module;If so, administration order described in priority processing.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be withRelevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computerIn read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,Any reference used in provided herein and embodiment to memory, storage, database or other media,Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may includeRandom access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancingType SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizationsEquivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlationsTechnical field, similarly include in the scope of patent protection of the application.

Claims (10)

CN201910601110.6A2019-07-042019-07-04Distributed crawler system architecture, method for crawling data and computer equipmentActiveCN110457556B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910601110.6ACN110457556B (en)2019-07-042019-07-04Distributed crawler system architecture, method for crawling data and computer equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910601110.6ACN110457556B (en)2019-07-042019-07-04Distributed crawler system architecture, method for crawling data and computer equipment

Publications (2)

Publication NumberPublication Date
CN110457556Atrue CN110457556A (en)2019-11-15
CN110457556B CN110457556B (en)2023-11-14

Family

ID=68482277

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910601110.6AActiveCN110457556B (en)2019-07-042019-07-04Distributed crawler system architecture, method for crawling data and computer equipment

Country Status (1)

CountryLink
CN (1)CN110457556B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110929127A (en)*2019-12-052020-03-27广州市原象信息科技有限公司Method for analyzing Taobao live broadcast putting effect and computer equipment
CN110929128A (en)*2019-12-112020-03-27北京启迪区块链科技发展有限公司Data crawling method, device, equipment and medium
CN111143336A (en)*2019-11-272020-05-12三盟科技股份有限公司College scientific research data management-oriented web crawler management method and platform
CN111192155A (en)*2019-12-252020-05-22杭州龙席网络科技股份有限公司Social media inquiry plate identification and recommendation method based on SAAS
CN111241373A (en)*2020-02-202020-06-05山东爱城市网信息技术有限公司Webpage crawler system based on micro-service and implementation method
CN111241366A (en)*2019-12-252020-06-05杭州龙席网络科技股份有限公司Client social media monitoring method based on SAAS
CN111708931A (en)*2020-06-062020-09-25谢国柱 Big data collection method and artificial intelligence cloud service platform based on mobile Internet
CN112597367A (en)*2020-11-302021-04-02国网北京市电力公司Data information fusion system and target decision generation method
CN112650908A (en)*2020-12-252021-04-13百果园技术(新加坡)有限公司Data processing method, system and device based on network theme crawler

Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20100269168A1 (en)*2009-04-212010-10-21Brightcloud Inc.System And Method For Developing A Risk Profile For An Internet Service
US20110208848A1 (en)*2008-08-052011-08-25Zhiyong FengNetwork system of web services based on semantics and relationships
CN102932448A (en)*2012-10-302013-02-13工业和信息化部电信传输研究所Distributed network crawler URL (uniform resource locator) duplicate removal system and method
CN105243159A (en)*2015-10-282016-01-13福建亿榕信息技术有限公司Visual script editor-based distributed web crawler system
CN105447088A (en)*2015-11-062016-03-30杭州掘数科技有限公司Volunteer computing based multi-tenant professional cloud crawler
CN105677918A (en)*2016-03-032016-06-15浪潮软件股份有限公司Distributed crawler architecture based on Kafka and Quartz and implementation method thereof
CN106484886A (en)*2016-10-172017-03-08金蝶软件(中国)有限公司A kind of method of data acquisition and its relevant device
CN106874487A (en)*2017-02-212017-06-20国信优易数据有限公司A kind of distributed reptile management system and its method
CN107135092A (en)*2017-03-152017-09-05浙江工业大学 A Web Service Clustering Method Oriented to Global Social Service Network
CN107943991A (en)*2017-12-012018-04-20成都嗨翻屋文化传播有限公司A kind of distributed reptile frame and implementation method based on memory database
CN108170551A (en)*2018-01-032018-06-15深圳壹账通智能科技有限公司Front and back end error handling method, server and storage medium based on crawler system
CN109492149A (en)*2018-11-292019-03-19深圳墨世科技有限公司Crawler task processing method and device
CN109508422A (en)*2018-12-052019-03-22南京邮电大学The height of multithreading intelligent scheduling is hidden crawler system
CN109815384A (en)*2019-01-292019-05-28携程旅游信息技术(上海)有限公司Method, system, equipment and the storage medium that crawler is realized

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20110208848A1 (en)*2008-08-052011-08-25Zhiyong FengNetwork system of web services based on semantics and relationships
US20100269168A1 (en)*2009-04-212010-10-21Brightcloud Inc.System And Method For Developing A Risk Profile For An Internet Service
CN102932448A (en)*2012-10-302013-02-13工业和信息化部电信传输研究所Distributed network crawler URL (uniform resource locator) duplicate removal system and method
CN105243159A (en)*2015-10-282016-01-13福建亿榕信息技术有限公司Visual script editor-based distributed web crawler system
CN105447088A (en)*2015-11-062016-03-30杭州掘数科技有限公司Volunteer computing based multi-tenant professional cloud crawler
CN105677918A (en)*2016-03-032016-06-15浪潮软件股份有限公司Distributed crawler architecture based on Kafka and Quartz and implementation method thereof
CN106484886A (en)*2016-10-172017-03-08金蝶软件(中国)有限公司A kind of method of data acquisition and its relevant device
CN106874487A (en)*2017-02-212017-06-20国信优易数据有限公司A kind of distributed reptile management system and its method
CN107135092A (en)*2017-03-152017-09-05浙江工业大学 A Web Service Clustering Method Oriented to Global Social Service Network
CN107943991A (en)*2017-12-012018-04-20成都嗨翻屋文化传播有限公司A kind of distributed reptile frame and implementation method based on memory database
CN108170551A (en)*2018-01-032018-06-15深圳壹账通智能科技有限公司Front and back end error handling method, server and storage medium based on crawler system
CN109492149A (en)*2018-11-292019-03-19深圳墨世科技有限公司Crawler task processing method and device
CN109508422A (en)*2018-12-052019-03-22南京邮电大学The height of multithreading intelligent scheduling is hidden crawler system
CN109815384A (en)*2019-01-292019-05-28携程旅游信息技术(上海)有限公司Method, system, equipment and the storage medium that crawler is realized

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董禹龙等: "主动获取式的分布式网络爬虫集群方法研究", 《计算机科学》*

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111143336A (en)*2019-11-272020-05-12三盟科技股份有限公司College scientific research data management-oriented web crawler management method and platform
CN110929127A (en)*2019-12-052020-03-27广州市原象信息科技有限公司Method for analyzing Taobao live broadcast putting effect and computer equipment
CN110929128A (en)*2019-12-112020-03-27北京启迪区块链科技发展有限公司Data crawling method, device, equipment and medium
CN111192155A (en)*2019-12-252020-05-22杭州龙席网络科技股份有限公司Social media inquiry plate identification and recommendation method based on SAAS
CN111241366A (en)*2019-12-252020-06-05杭州龙席网络科技股份有限公司Client social media monitoring method based on SAAS
CN111241373A (en)*2020-02-202020-06-05山东爱城市网信息技术有限公司Webpage crawler system based on micro-service and implementation method
CN111708931A (en)*2020-06-062020-09-25谢国柱 Big data collection method and artificial intelligence cloud service platform based on mobile Internet
CN112597367A (en)*2020-11-302021-04-02国网北京市电力公司Data information fusion system and target decision generation method
CN112650908A (en)*2020-12-252021-04-13百果园技术(新加坡)有限公司Data processing method, system and device based on network theme crawler

Also Published As

Publication numberPublication date
CN110457556B (en)2023-11-14

Similar Documents

PublicationPublication DateTitle
CN110457556A (en)Distributed reptile system architecture, the method and computer equipment for crawling data
CN105243159B (en)A kind of distributed network crawler system based on visualization script editing machine
Jain et al.Cloud to edge: distributed deployment of process-aware IoT applications
US10956013B2 (en)User interface for automated flows within a cloud based developmental platform
CN106326219B (en)Method, device and system for checking business system data
US20220277075A1 (en)Using orchestrators for false positive detection and root cause analysis
CN111404759A (en)Service detection method, rule configuration method, related device and medium
US20160098661A1 (en)Business process framework
CN111143167B (en)Alarm merging method, device, equipment and storage medium for multiple platforms
CN114217981A (en) Communication method, device, computer equipment and storage medium for direct connection between banks and enterprises
CN114513542A (en)Production equipment control method and device, computer equipment and storage medium
CN112788112A (en)Automatic publishing method, device and platform for equipment health management micro-service
CN116048467A (en)Micro-service development platform and business system development method
CN112787999B (en) Cross-chain calling method, device, system and computer-readable storage medium
CN102508773A (en)Method and device for monitoring WEB service system simulation based on Internet explorer (IE) kernel
CN113312148A (en)Big data service deployment method, device, equipment and medium
CN103118248B (en)Monitoring method, monitoring agent, monitoring server and system
Platenius-Mohr et al.An analysis of use cases for the asset administration shell in the context of edge computing
CN116136801A (en) Data processing method, device, electronic device and storage medium of cloud platform
CN113452725B (en)Message filtering information generation method and device
Prist et al.Cyber-physical manufacturing systems: An architecture for sensor integration, production line simulation and cloud services
CN114296880A (en)Service request processing method, device, equipment and medium based on large-scale cluster
CN105262845B (en)A kind of document transmission processing method and system
CN111447273A (en)Cloud processing system and data processing method based on cloud processing system
CN117075875A (en)Task development method, system, construction method and electronic equipment based on camera

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TR01Transfer of patent right

Effective date of registration:20240326

Address after:Room 101-1, Building 2, No. 95, Daguan Middle Road, Tianhe District, Guangzhou, Guangdong 510000 (office only)

Patentee after:Guangzhou Zhongtian Technology Consulting Co.,Ltd.

Country or region after:China

Address before:400010 38 / F, 39 / F, unit 1, 99 Wuyi Road, Yuzhong District, Chongqing

Patentee before:CHONGQING FINANCIAL ASSETS EXCHANGE Co.,Ltd.

Country or region before:China

TR01Transfer of patent right
TR01Transfer of patent right

Effective date of registration:20240710

Address after:Building 1, 6th Floor, No. 13 Gualv Road, Licheng Street, Zengcheng District, Guangzhou City, Guangdong Province 510000, China. Self designed, Room 4, Room 602

Patentee after:Guangzhou Tianying Information Technology Co.,Ltd.

Country or region after:China

Address before:Room 101-1, Building 2, No. 95, Daguan Middle Road, Tianhe District, Guangzhou, Guangdong 510000 (office only)

Patentee before:Guangzhou Zhongtian Technology Consulting Co.,Ltd.

Country or region before:China

TR01Transfer of patent right

[8]ページ先頭

©2009-2025 Movatter.jp