Invention content
The embodiment of the present invention provides a kind of blacklist base establishing method, device, equipment and medium for preventing flow from kidnapping, withSolve the problems, such as that the blacklist for currently preventing flow from kidnapping cannot be accomplished to identify web advertisement resource information comprehensively.
In a first aspect, the embodiment of the present invention provides a kind of blacklist base establishing method for preventing flow from kidnapping, including:
The HTTP access requests that client is sent are obtained, the HTTP access requests include URL to be visited;
Corresponding original web page is obtained based on the URL to be visited, the original web page corresponds to an original dom tree;
The original dom tree is scanned using the software development kit of anti-abduction, is judged in the original dom tree with the presence or absence of doubtfulLike advertisement URL;
If there are the doubtful advertisement URL in the original dom tree, the doubtful advertisement URL is stored in caching library;
Blacklist domain name is determined based on the doubtful advertisement URL in the caching library, and the blacklist domain name is depositedStorage is in blacklist library.
Second aspect, the embodiment of the present invention provide a kind of blacklist library creating device for preventing flow from kidnapping, including:
Access request acquisition module, the HTTP access requests for obtaining client transmission, the HTTP access requests packetInclude URL to be visited;
Original web page acquisition module, for obtaining corresponding original web page, the original web page based on the URL to be visitedA corresponding original dom tree;
Doubtful advertisement URL judgment modules judge for scanning the original dom tree using the software development kit of anti-abductionIt whether there is doubtful advertisement URL in the original dom tree;
Cache library storage module, in the original dom tree there are when the doubtful advertisement URL, will be described doubtfulAdvertisement URL is stored in caching library;
Blacklist domain Name acquisition module, for determining blacklist domain based on the doubtful advertisement URL in the caching libraryName, and the blacklist domain name is stored in blacklist library.
The third aspect, the embodiment of the present invention provide a kind of terminal device, including memory, processor and are stored in describedIn memory and the computer program that can run on the processor, the processor are realized when executing the computer programThe step of blacklist base establishing method for preventing flow from kidnapping.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage mediumMatter is stored with computer program, prevents the blacklist library of flow abduction when the computer program is executed by processor described in realizationThe step of creation method.
The blacklist base establishing method, device, equipment and the medium provided in an embodiment of the present invention that prevent flow from kidnapping, pass throughIt obtains the HTTP access requests that client is sent and obtains URL to be visited, obtained based on the URL to be visited got corresponding originalThe corresponding original dom tree of webpage.Then the original dom tree is scanned using the software development kit of anti-abduction, obtains the original dom treePresent in doubtful advertisement URL and be stored in caching library, help to improve the efficiency of follow-up blacklist domain name extraction.To cachingDoubtful advertisement URL in library carries out domain name extraction, obtains blacklist domain name, helps to improve the accurate of blacklist domain name confirmationProperty.The blacklist domain name is stored in blacklist library, helps to improve and subsequently the corresponding original web pages of URL to be visited is carried outThe accuracy of blacklist domain name identification improves the speed that the corresponding original web pages of URL to be visited confirm web advertisement resource information,Optimize the comprehensive of web advertisement resource information identification.
Embodiment 1
Fig. 1 shows the flow chart for the blacklist base establishing method for preventing flow from kidnapping in the present embodiment.This prevents flow from robbingIn the server, which carries out information exchange with client by network, can prevent for the blacklist base establishing method application heldOnly advertisement operators are inserted into web advertisement resource information in normal web page resources information, and reaching prevents advertisement operators flowThe purpose that advertisement is kidnapped.As shown in Figure 1, the blacklist base establishing method for preventing flow from kidnapping includes the following steps:
S10:The HTTP access requests that client is sent are obtained, HTTP access requests include URL to be visited.
Wherein, URL to be visited refers to that user needs the web page address accessed.Specifically, the clothes being connected with client communicationBusiness device can receive the HTTP access requests of client transmission, which generally carries web page address URL, the URLAs client is sent to the web page address that server needs access.
S20:Corresponding original web page is obtained based on URL to be visited, original web page corresponds to an original dom tree.
Specifically, original web page refers to the corresponding webpages of URL to be visited.Server waits visiting according in HTTP access requestsAsk that URL obtains the corresponding original web pages of the URL to be visited, all corresponding dom tree of each original web page, which is that this is originalThe corresponding original dom tree of webpage.Original dom tree refers to all web page resources letter of the corresponding original web page loads of URL to be visitedCease corresponding dom tree.
Wherein, dom tree (Document Object Model, DOM Document Object Model) is to be specially adapted for HTML (super textsThis markup language) DOM Document Object Model, the HTML refer to for webpage create and other letters that can be seen in web browserCease a kind of markup language of design.The essence of one webpage is made of a HTML (HyperText Markup Language), DOMTree is exactly the corresponding DOM Document Object Model of the webpage.In dom tree, each element in webpage is all counted as object one by one,To make the element in webpage that can also be obtained or be edited by computer language.There are at least one element in one webpage,One element corresponds to a DOM label in dom tree, i.e. there are at least one DOM labels in a dom tree.
S30:Original dom tree is scanned using the software development kit of anti-abduction, is judged in original dom tree with the presence or absence of doubtful wideAccuse URL.
Wherein, the software development kit of anti-abduction is doubted by what a set of JavaScript code formed for detecting whether existingLike the software development kit of advertisement URL, which is to be introduced into a manner of script labels in a browserThe software development kit.As the form of expression of the JavaScript code in the software development kit is<Script src=“a.js”>, wherein it is the address of the software development kit after src.Software development kit (Software Development Kit, i.e.,SDK refer to) a kind of kit provided for software development, be usually used to specific software package, software frame, hardware platformThe set of the developing instrument of application software is established with operating system etc..
Doubtful URL refers to the corresponding URL of DOM labels for meeting default feature.The default feature refers to that advertisement operators are plantedThe feature of the corresponding DOM labels of ad code entered.The feature of the corresponding DOM labels of ad code includes but not limited to advertisement generationCode Integral Characteristic, URL redirect feature and need the absolute fix feature for being illustrated in webpage specific location.Wherein, ad codeIntegral Characteristic refers to the complete advertising information that advertisement operators needs are shown, the corresponding ad code of the advertising information is exactlyOne section of complete code shows in dom tree to be exactly an entirety, the form of expression can be with<div>Start, with</div>One section of code in end.It refers to being inserted into an advertisement figure, and add that URL, which redirects feature,<a>URL link, a be a string of representativeThe character string of the picture deposit position.Absolute fix feature refers to dom tree corresponding in the corresponding original web pages of URL to be visitedTail portion has more the div for carrying out many iframe and being embedded with ad code, such as the last one of the corresponding original web pages of URL to be visitedElement is<Div id='last-div'>, the code being illegally inserted into is</div><Script src=" a.js ">.
Specifically, the corresponding original web pages of URL to be visited load web page resources information, and the web page resources information is on webpageIt can be there are many exhibition method, including but not limited to picture, word, network address and video.These web page resources information are exactly netElement in page.Element in these webpages is all with existing for DOM labels for software development kit.
Further, after obtaining the HTTP access requests that client is sent, server is obtained based on the HTTP access requestsThe software development kit of anti-abduction.All web page resources information of the webpage are completed in the corresponding original web page loads of URL to be visitedAfterwards, the corresponding original web pages of the URL to be visited will appear the state event of an onload, the state event be refer to the accession to it is anti-robberyThe request thing that the software development kit held handles the web page resources information of the URL to be visited corresponding original web page loadsPart, for the state event there are one interface, the software development kit that can access anti-abduction is scanned dom tree.
The software development kit of anti-abduction is based on the state event using the scan mode of breadth First URL pairs to be visited to thisThe corresponding dom tree of original web page answered is scanned, and is proceeded by scanning from outermost " html " label of dom tree, is successively sweptCorresponding DOM labels are retouched, search the DOM labels for meeting default feature with the presence or absence of doubtful advertisement URL.Using sweeping for breadth FirstMode is retouched to traverse all DOM labels in dom tree, can to all DOM labels that each level in dom tree includes intoRow scanning, all DOM labels for scanning through a level scan all DOM labels of next level again, to press out the suitable of teamSequence accesses all adjacent DOM labels of the same level DOM labels, is suitble to scan dom tree comprehensively.If original dom treeIn there are ad code Integral Characteristic, URL to redirect any one of feature and absolute fix feature these three default features,Then it can be assumed that there are doubtful advertisement URL, the doubtful advertisement URL being to primarily determine may be advertisement in the original dom treeURL determines that the doubtful advertisement URL aids in determining whether blacklist domain name, and domain is extracted to ensure that step S50 is based on the doubtful advertisementBlacklist domain name is stored in blacklist library by name, realization.
S40:If there are doubtful advertisement URL in original dom tree, doubtful advertisement URL is stored in caching library.
Specifically, judge with the presence or absence of the DOM labels for meeting default feature in original dom tree, if in the presence of default spy is metThe DOM labels of sign can then assert that there are doubtful advertisement URL in the dom tree, and the DOM labels i.e. doubtful advertisement URL are storedIn caching library.It is to be appreciated that by doubtful advertisement URL be stored in caching library in can accomplish to caching library in store it is doubtfulThe data such as advertisement URL carry out quickly processing (including but not limited to query processing), do not need request server and obtain serverThe process instruction of transmission carries out data processing.
Caching library in the present embodiment can be mysql relevant databases, and mysql relevant databases are a kind of openingsThe Relational DBMS of source code provides the programming interface (APIs) towards a variety of programming languages, supports a variety ofIt field type and provides complete operator and supports the SELECT in inquiry and WHERE operations.Mysql relevant databasesHave the characteristics that speed high, good reliability and adaptable, carry out storing doubtful advertisement URL using mysql relevant databases,The function that principal and subordinate's configuration and read and write abruption may be implemented, can provide efficient service for the storage of data.
S50:Blacklist domain name is determined based on the doubtful advertisement URL in caching library, and blacklist domain name is stored in black nameIn single library.
Wherein, blacklist domain name refers to the domain name for doubtful advertisement URL obtain after domain name extraction.Blacklist library refers toStore the database of blacklist domain name.Specifically, domain name extraction is carried out to the doubtful advertisement URL being stored in caching library, if shouldThe domain name of doubtful advertisement URL extractions meets preset blacklist judgment method, it is determined that the domain name of the doubtful advertisement URL extractions is trueIt is set to blacklist domain name.Then, which is stored in the blacklist library being pre-created, it is black in order to subsequently carry outWhen list domain name identifies, reference frame can be used as.
Step S10-S50 can obtain URL to be visited, based on acquisition by obtaining the HTTP access requests that client is sentThe corresponding original dom tree of the corresponding original web page of URL to be visited acquisitions arrived.The original is scanned using the software development kit of anti-abductionBeginning dom tree obtains doubtful advertisement URL present in the original dom tree, and carries out domain name extraction to the doubtful advertisement URL and obtainObtained blacklist domain name is stored in blacklist library by blacklist domain name, after the blacklist library thereby confirmed that helps to improveThe continuous accuracy that blacklist domain name identification is carried out to the corresponding original web pages of URL to be visited improves web advertisement resource information and knowsIt is incomprehensive.
In a specific embodiment, original using the software development kit scanning of anti-abduction as shown in Fig. 2, in step S30Dom tree judges to whether there is doubtful advertisement URL in original dom tree, specifically comprise the following steps:
S31:Original dom tree is scanned using the software development kit of anti-abduction, obtains the original URL that original dom tree includes.
Specifically, include at least one DOM labels in a dom tree.Range is used using the software development kit of anti-abductionPreferential scan mode is scanned the corresponding original dom tree of the corresponding original web pages of the URL to be visited, from the original DOMIt sets outermost html labels and proceeds by scanning, successively scan the DOM labels of each layer of pole, determine in DOM labels with URL shapesAt least one DOM labels existing for formula, then the URL for including in the DOM labels is searched, which is the original URL to be obtained.
S32:If the domain name of original URL and the domain name of URL to be visited mismatch, it is determined that there are doubtful in original dom treeAdvertisement URL.
Wherein, the domain name of original URL refers to carrying out the address on internet that domain name extraction obtains to original URL, waits visitingAsk that the domain name of URL refers to carrying out the address on internet that domain name extraction obtains to URL to be visited.Obtain the domain name of original URLIt is described in detail by step S51 with the process of the domain name of URL to be visited, to avoid repeating, there is this not to be described in detail one by one.
Specifically, the domain name for the original URL for including to the original dom tree of acquisition and the domain name of URL to be visited judge,Whether the domain name of the domain name and URL to be visited that judge original URL matches.If the two matching is consistent, then it represents that the original URL is shouldUser needs the original web page resources information of webpage accessed;If matching is inconsistent, then it represents that itself exists in the original dom treeDoubtful URL, the original URL are not that user needs the original web page resources information of webpage accessed.Pass through the domain name to original URLMatching treatment is carried out with the domain name of URL to be visited, can fast and effeciently determine and whether there is doubtful advertisement URL in original web page.
In a specific embodiment, it as shown in figure 3, in step S50, is determined based on the doubtful advertisement URL in caching libraryBlacklist domain name, specifically comprises the following steps:
S51:Domain name extraction is carried out to each doubtful advertisement URL in caching library, obtains corresponding doubtful domain name.
After being confirmed as doubtful advertisement URL, which will be stored in caching library, cache and stored in libraryAt least one doubtful URL.Domain name extraction is carried out to each doubtful URL in caching library, the domain name extracted is then doubtfulLike domain name.
Further, call the regular expression in the software development kit of anti-abduction to each doubtful advertisement in caching libraryURL carries out domain name extraction, obtains corresponding doubtful domain name.
Wherein, regular expression is also known as regular expression (Regular Expression, is often abbreviated as in codeRegex, regexp or RE).Regular expression is a kind of logical formula to string operation, in the present embodiment, the canonical tableIt is for expressing a kind of filter logic to character string up to formula.Character string includes general character (letter between such as a to z) and spyDifferent character (also known as " metacharacter ", such as " $, *, &, # ,+,”).
Specifically, there is packaged regular expression in the software development kit of anti-abduction.Step S51 is specially:Using thisPackaged regular expression to caching library in each doubtful advertisement URL split, be split as protocol name, domain name andThese three parts of parameter;Then, protocol name and the subsequent argument section of domain name are removed, domain name is only retained, it is corresponding to obtainDoubtful domain name.As doubtful advertisement URL is:http://pos.baidu.com/sHei=250&wid=250&di=U3031286&ltu=lV-RgLBX*E5wJyFr&r=35d363d1cad5eabfcd131082 d275f954#, wherein" http " corresponds to protocol name, and " pos.baidu.com " corresponds to domain name, and all the elements after domain name can be collectively referred to as parameter.It is adoptingOnly retain domain name part " pos.baidu.com " when carrying out domain name extraction to above-mentioned doubtful advertisement URL with regular expression, then" pos.baidu.com " is doubtful domain name.
S52:Determine that the doubtful domain name that quantity reaches preset value in caching library is blacklist domain name.
Wherein, blacklist domain name refers to that same doubtful domain name reaches and (be greater than or equal to) pre- in the number of caching library storageIf when value, determining that the doubtful domain name is blacklist domain name.Preset value refers to that pre-set doubtful domain name is stored in caching libraryQuantity.The preset value is for judging whether doubtful domain name is blacklist domain name.
If the doubtful domain name occurs once in caching library, not up to preset value when, can't confirm the doubtful domain name justIt is blacklist domain name, may is the unmatched domain name of domain name of one and URL to be visited, when the doubtful domain name is in caching libraryWhen the quantity of storage reaches preset value, then it can be confirmed that the doubtful domain name is blacklist domain name.It is to be appreciated that doubtful domain is arrangedThe quantity of name is just determined as blacklist domain name when reaching preset value, it is possible to reduce the erroneous judgement of blacklist domain name improves and determines black nameThe accuracy of single domain name.
In a specific embodiment, if as described above, when the quantity of the doubtful advertisement URL in caching library reaches preset valueAssert that it is blacklist domain name, it is understood that there may be erroneous judgement can cause follow-up misjudged doubtful advertisement URL to enter in blacklist library,Lead to not access or other are operated.As shown in figure 4, the step being stored in blacklist domain name in blacklist library itAfterwards, which further includes:
S61:Erroneous judgement recovery request is obtained, erroneous judgement recovery request includes target URL.
Erroneous judgement recovery request be server receive user carry out restore check the recovery request for being hidden content, thisHiding content refers to the content for the web page resources information that the corresponding URL of blacklist domain name of addition blacklist is shown.Target URL isFinger needs, which restore to check, is hidden the corresponding URL of content.Specifically, it during carrying out black name domain name confirmation, may depositIn erroneous judgement situation.When user is when accessing a certain webpage, since server will be inconsistent with the domain name of webpage to be visitedThe corresponding domain name of doubtful advertisement URL is judged as blacklist domain name, and is added in blacklist library.Therefore, which is only displayed withoutPartial content in blacklist library is added, the partial content being added in blacklist library is hidden without display.It is shown in browserWhen the corresponding web page contents of web page resources information, which will appear a notification information for whether checking hiding content.If withFamily, which is clicked, restores the hiding content, then server can obtain a recovery request, which is then erroneous judgement recovery request.TogetherWhen the erroneous judgement recovery request include that need the hiding the content corresponding URL, the URL that restore then be target URL.It is extensive to obtain erroneous judgementMultiple request can reduce the domain name for being added and accidentally being deposited in blacklist library, and user is helped to browse the corresponding net of complete web page resources informationPage content.
S62:It calls the regular expression in the software development kit of anti-abduction to carry out domain name extraction to target URL, obtains meshMark domain name.
When server receives the erroneous judgement recovery request of user's transmission, the canonical in the software development kit of anti-abduction is calledExpression formula carries out domain name extraction to target URL, obtains the corresponding target domain names of target URL, the domain name extraction process such as stepDescribed in S51, to avoid repeating, do not repeat one by one.
S63:The blacklist domain name consistent with target domain name stored in blacklist library is deleted, update blacklist library.
Based on target domain name is got, server compares the blacklist domain name of the target domain name and blacklist library storageCompared with confirmation, the blacklist domain name stored in the blacklist library consistent with target domain name is deleted, update blacklist library.Step S63,It can guarantee that the blacklist domain name stored in blacklist library can constantly be adjusted according to actual conditions, reduce blacklist domain nameFalse Rate, ensure the accuracy of the blacklist stored in blacklist library.
In this specific embodiment, it after step S63, i.e., will be stored in blacklist library consistent with target domain nameAfter the step of blacklist domain name is deleted, which can also include:
S64:Using the blacklist domain name consistent with target domain name stored in blacklist library as white list domain name, and will be whiteList domain name is stored in white list library.
A white list library is created while creating blacklist library, which, which refers to a certain webpage of storage, allows userThe database of the corresponding target domain names of URL of the webpage of access.Based on target domain name to the blacklist domain that is stored in blacklist libraryName is compared judgement, using the blacklist domain name consistent with target domain name as white list domain name, and the white list domain name is depositedStorage is in white list library.
It further include pre-stored white list domain name in the present embodiment, in white list library.The pre-stored white list domainIt is entitled:Some webpages to be visited are to allow to be inserted into the web advertisement resource letter for being not belonging to the normal web page resources information of the webpageAt this moment breath can carry out domain name extraction to the corresponding URL of the web advertisement resource information using regular expression, and will extractionTo domain name be stored in white list library.
When all DOM labels in the original dom tree that the software development kit of anti-abduction accesses webpage to user are scannedWhen, after determining doubtful advertisement URL and doubtful advertisement URL being stored in caching library, domain name need to be carried out to the doubtful advertisement URL and carriedTake, with the corresponding domain names of the doubtful advertisement URL of determination (i.e. doubtful domain name in step S51), and judge the doubtful domain name with it is whiteWhen white list domain name in list library is consistent, the corresponding web page resources information of the doubtful advertisement URL is shown.For example, Baidu's webpageMiddle that the Baidu being inserted into is allowed to promote advertisement, it is true that these Baidu promote software development kit scannings of the corresponding URL of advertisement through anti-abductionIt is set to doubtful advertisement URL, but domain name is determined in white list library after domain name extraction, then can shows that the Baidu promotes advertisement pairThe web page resources information of the URL answered.It in this way can be corresponding to avoid the web page resources information for allowing user to access a certain webpageWeb page contents are accidentally added in blacklist, cause the loss of the corresponding web page contents of unnecessary web page resources information, can be more comprehensivelyReflect the corresponding web page contents of web page resources information.
In a specific embodiment, after the step s 40, i.e., the step of doubtful advertisement URL is stored in caching libraryLater, which further includes:If the corresponding domain name of doubtful advertisement URL is stored in white nameIn single library, then doubtful advertisement URL is postponed and deleted in warehousing.
It is to be appreciated that after doubtful advertisement URL to be stored in caching library, domain name need to be carried out to the doubtful advertisement URLExtraction, with the corresponding domain names of the doubtful advertisement URL of determination (i.e. doubtful domain name in step S51), and is judging the doubtful advertisement URLWhen corresponding domain name is stored in white list library, then shows that the corresponding domain names of the doubtful advertisement URL belong to white list library, correspond toThe content of URL be to need the corresponding web page contents of web page resources information to be shown.In order to avoid occurring only deleting blacklist libraryThe corresponding domain names of doubtful advertisement URL of middle storage, without deleting the doubtful advertisement URL being stored in caching library, so as to causeThe corresponding web page contents of the corresponding web page resources information of the doubtful advertisement URL still cannot normally be shown.Therefore, doubtful in confirmationAfter the corresponding domain names of advertisement URL are stored in white list library, deleted in warehousing that doubtful advertisement URL need to be postponed.
The blacklist base establishing method provided in an embodiment of the present invention for preventing flow from kidnapping, by using the software of anti-abductionRegular expression in kit carries out domain name extraction to doubtful advertisement URL, and is stored in caching library, helps to improve follow-upThe high efficiency of blacklist domain name extraction.When the corresponding domain name quantity of the same URL in the caching library reaches preset value, then shouldDomain name is stored in blacklist library so that the blacklist library of confirmation helps to improve original net subsequently corresponding to URL to be visitedPage carries out the accuracy of blacklist domain name identification so that the identification of web advertisement resource information is more comprehensive.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each processExecution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limitIt is fixed.
Embodiment 2
Fig. 5 shows to prevent flow from robbing correspondingly with the blacklist base establishing method for preventing flow from kidnapping in embodiment 1The functional block diagram for the blacklist library creating device held.As shown in figure 5, the blacklist library creating device for preventing flow from kidnapping includesAccess request acquisition module 10, original web page acquisition module 20, doubtful advertisement URL judgment modules 30, caching library storage module 40With blacklist domain Name acquisition module 50.Wherein, access request acquisition module 10, original web page acquisition module 20, doubtful advertisement URLFlow is prevented in the realization function and embodiment of judgment module 30, caching library storage module 40 and blacklist domain Name acquisition module 50The corresponding step of blacklist base establishing method of abduction corresponds, and to avoid repeating, the present embodiment is not described in detail one by one.
Access request acquisition module 10, the HTTP access requests for obtaining client transmission, HTTP access requests includeURL to be visited.
Original web page acquisition module 20, for obtaining corresponding original web page based on URL to be visited, original web page corresponds to oneOriginal dom tree.
Doubtful advertisement URL judgment modules 30 judge original for scanning original dom tree using the software development kit of anti-abductionIt whether there is doubtful advertisement URL in beginning dom tree.
Library storage module 40 is cached, for, there are when doubtful advertisement URL, doubtful advertisement URL being stored in original dom treeIn caching library.
Blacklist domain Name acquisition module 50, for determining blacklist domain name based on the doubtful advertisement URL in caching library, and willBlacklist domain name is stored in blacklist library.
Preferably, doubtful advertisement URL judgment modules 30 include original URL acquiring units 31 and doubtful advertisement URL confirmation formsMember 32.
Original URL acquiring units 31 obtain original DOM for scanning original dom tree using the software development kit of anti-abductionThe original URL that tree includes.
Doubtful advertisement URL confirmation units 32 are used for when the domain name of original URL is mismatched with the domain name of URL to be visited, reallyThere are doubtful advertisement URL in fixed original dom tree.
Preferably, blacklist domain Name acquisition module 50 includes doubtful domain Name acquisition unit 51 and blacklist domain Name acquisition unit52。
Doubtful domain Name acquisition unit 51 obtains phase for carrying out domain name extraction to each doubtful advertisement URL in caching libraryThe doubtful domain name answered.
Blacklist domain Name acquisition unit 52, for determining that it is blacklist to cache quantity in library to reach the doubtful domain name of preset valueDomain name.
Preferably, doubtful domain Name acquisition unit 51, the regular expression pair in software development kit for calling anti-abductionThe each doubtful advertisement URL cached in library carries out domain name extraction, obtains corresponding doubtful domain name.
Preferably, it further includes erroneous judgement recovery request acquiring unit 61, mesh to prevent the blacklist library creating device that flow is kidnappedMark domain Name acquisition unit 62, blacklist library updating unit 63 and white list domain Name acquisition unit 64.
Judge recovery request acquiring unit 61 by accident, for obtaining erroneous judgement recovery request, erroneous judgement recovery request includes target URL.
Target domain name acquiring unit 62, the regular expression in software development kit for calling anti-abduction is to target URLDomain name extraction is carried out, target domain name is obtained.
Blacklist library updating unit 63, for deleting the blacklist domain name consistent with target domain name stored in blacklist libraryIt removes, update blacklist library.
White list domain Name acquisition unit 64, the blacklist domain name consistent with target domain name for will be stored in blacklist libraryIt is stored in white list library as white list domain name, and by white list domain name.
Preferably, prevent flow kidnap blacklist library creating device further include:Doubtful advertisement URL removing modules 70 are usedIn when the corresponding domain names of doubtful advertisement URL store in white list library, doubtful advertisement URL is postponed and is deleted in warehousing.
Embodiment 4
Fig. 6 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in fig. 6, the terminal of the embodiment is setStandby 80 include:Processor 81, memory 82 and it is stored in the computer journey that can be run in memory 82 and on processor 81Sequence 83, such as the blacklist library for preventing flow from kidnapping create program.Processor 81 is realized above-mentioned each when executing computer program 83Step in a blacklist base establishing method embodiment for preventing flow from kidnapping, such as step S10 to S50 shown in FIG. 1.OrPerson, processor 81 realize the function of each module/unit in above-mentioned each device embodiment, such as Fig. 5 when executing computer program 83Shown access request acquisition module 10, original web page acquisition module 20, doubtful advertisement URL judge 30, caching library storage module 40With the function of blacklist domain Name acquisition module 50.
Illustratively, computer program 83 can be divided into one or more module/units, one or more mouldBlock/unit is stored in memory 82, and is executed by processor 81, to complete the present invention.One or more module/units canTo be the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing computer program 83 at endImplementation procedure in end equipment 80.For example, access request acquisition module 10, original web page acquisition module 20, doubtful advertisement URL sentenceDisconnected 30, caching library storage module 40 and blacklist domain Name acquisition module 50.
Terminal device 80 can be the computing devices such as desktop PC, notebook, palm PC and cloud server.EventuallyEnd equipment may include, but be not limited only to, processor 81, memory 82.It will be understood by those skilled in the art that Fig. 6 is only eventuallyThe example of end equipment 80 does not constitute the restriction to terminal device 80, may include components more more or fewer than diagram, orCombine certain components or different components, for example, terminal device can also include input-output equipment, network access equipment,Bus etc..
Alleged processor 81 can be central processing unit (Central Processing Unit, CPU), can also beOther general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit(Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processorDeng.
Memory 82 can be the internal storage unit of terminal device 80, such as the hard disk or memory of terminal device 80.It depositsReservoir 82 can also be the plug-in type hard disk being equipped on the External memory equipment of terminal device 80, such as terminal device 80, intelligenceStorage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card)Deng.Further, memory 82 can also both include terminal device 80 internal storage unit and also including External memory equipment.It depositsReservoir 82 is used to store other programs and the data needed for computer program and terminal device.Memory 82 can be also used for temporarilyWhen store the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work(Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by differentFunctional unit, module are completed, i.e., the internal structure of described device are divided into different functional units or module, more than completionThe all or part of function of description.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can alsoIt is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated listThe form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale orIn use, can be stored in a computer read/write memory medium.Based on this understanding, the present invention realizes above-mentioned implementationAll or part of flow in example method, can also instruct relevant hardware to complete, the meter by computer programCalculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that onThe step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generationCode can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable mediumMay include:Any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic of the computer program code can be carriedDish, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM,Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that describedThe content that computer-readable medium includes can carry out increasing appropriate according to legislation in jurisdiction and the requirement of patent practiceSubtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal andTelecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned realityApplying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned eachTechnical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changedOr replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should allIt is included within protection scope of the present invention.