The content of the invention
It is a primary object of the present invention to provide a kind of network malice reptile recognition methods and device, to solve to dislike networkWhen meaning reptile is identified the problem of accuracy difference.
To achieve these goals, according to an aspect of the invention, there is provided a kind of network malice reptile recognition methods.
The network according to the invention malice reptile recognition methods includes:Network address to be detected is obtained, wherein, survey grid to be checkedNetwork address is meets the network address of the first preparatory condition, if passing through network address access target website in preset time periodNumber exceed preset times threshold value, it is determined that network address meet the first preparatory condition;It is corresponding to obtain network address to be detectedUser access information, wherein, user access information includes the network terminal information of access target website, network terminal information bagInclude objective network end message;According to the network to be detected that objective network end message is included in corresponding user access informationThe number of location and the number calculating target access ratio of network address access target website to be detected that passes through in preset time period;Judge whether target access ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined thatBehavior by network address access target website to be detected is that malice reptile accesses behavior.
Further, user access information corresponding to obtaining network address to be detected includes:Obtain the access of targeted websiteDaily record;Access log is parsed, obtains analysis result;User corresponding to network address to be detected is analytically obtained in result and accesses letterBreath.
Further, pre-set ratio threshold value is determined by the following method:Grid of reference address set is determined, wherein, referenceCollection of network addresses includes multiple network address, and multiple network address are to meet the network address of the second preparatory condition, ifBy the number of network address access target website not less than preset times threshold value in preset time period, it is determined that network addressMeet the second preparatory condition;Obtain user access information corresponding to grid of reference address set;According to grid of reference address setCorresponding user access information determines pre-set ratio threshold value, wherein, pre-set ratio threshold value is corresponding in grid of reference address setUser access information in comprising objective network end message network address number and in preset time period by reference toThe ratio of the number of network address access target website in collection of network addresses.
Further, grid of reference address is determined by multiple network address access target websites in preset time periodSet includes:Detect and preset in preset time period by the way that whether the number of multiple network address access target websites exceedes respectivelyFrequency threshold value;It is determined that in preset time period access target website number not less than preset times threshold value network address for ginsengExamine the network address in collection of network addresses.
Further, according to the network address to be detected that objective network end message is included in corresponding user access informationNumber and target access ratio bag is calculated by the number of network address access target website to be detected in preset time periodInclude:Statistics passes through the number of network address access target website to be detected in preset time period;Judge network address to be detectedWhether objective network end message is included in corresponding user access information;If user corresponding to network address to be detected accessesObjective network end message is included in information, then is treated in user access information corresponding to statistics comprising objective network end messageDetect the number of network address;Target access ratio is calculated by below equation:S=A/B, wherein, S is target access ratio, ATo include the number of the network address to be detected of objective network end message in corresponding user access information, B is when defaultBetween pass through the number of network address access target website to be detected in section.
To achieve these goals, according to another aspect of the present invention, there is provided a kind of network malice reptile identification device.
The network according to the invention malice reptile identification device includes:First acquisition unit, for obtaining network to be detectedAddress, wherein, network address to be detected is meets the network address of the first preparatory condition, if passing through net in preset time periodThe number of network address access target website exceedes preset times threshold value, it is determined that network address meets the first preparatory condition;SecondAcquiring unit, for obtaining user access information corresponding to network address to be detected, wherein, user access information includes accessing meshThe network terminal information of website is marked, network terminal information includes objective network end message;Computing unit, for corresponding toIn user access information comprising objective network end message network address to be detected number and pass through in preset time periodThe number of network address access target website to be detected calculates target access ratio;Judging unit, for judging target access ratioWhether rate exceedes pre-set ratio threshold value;Determining unit, for when target access ratio exceedes pre-set ratio threshold value, it is determined that passing throughThe behavior of network address access target website to be detected is that malice reptile accesses behavior.
Further, second acquisition unit includes:First acquisition module, for obtaining the access log of targeted website;SolutionModule is analysed, for parsing access log, obtains analysis result;Second acquisition module, it is to be detected for being obtained in analytically resultUser access information corresponding to network address.
Further, by determining pre-set ratio threshold value with lower module:First determining module, for determining grid of referenceLocation is gathered, wherein, grid of reference address set includes multiple network address, and multiple network address are to meet the second preparatory conditionNetwork address, if by the number of network address access target website not less than preset times threshold in preset time periodValue, it is determined that network address meets the second preparatory condition;3rd acquisition module, for obtaining corresponding to grid of reference address setUser access information;Second determining module, determined for the user access information according to corresponding to grid of reference address set defaultRate threshold, wherein, pre-set ratio threshold value is to include target network in corresponding user access information in grid of reference address setThe number of the network address of network end message and in preset time period by reference in collection of network addresses network address visitAsk the ratio of the number of targeted website.
Further, included in preset time period by multiple network address access target websites, the first determining module:Detection sub-module, whether surpassed by the number of multiple network address access target websites in preset time period for detecting respectivelyCross preset times threshold value;Determination sub-module, for determining the number of the access target website in preset time period not less than defaultThe network address of frequency threshold value is the network address in grid of reference address set.
Further, computing unit includes:First statistical module, pass through survey grid to be checked in preset time period for countingThe number of network address access target website;Judge module, for judging in user access information corresponding to network address to be detectedWhether objective network end message is included;Second statistical module, in user access information corresponding to network address to be detectedIn when including objective network end message, it is to be detected comprising objective network end message in user access information corresponding to statisticsThe number of network address;Computing module, for calculating target access ratio by below equation:S=A/B, wherein, S is targetAccess ratio, A include the number of the network address to be detected of objective network end message, B in the user access information for corresponding toTo pass through the number of network address access target website to be detected in preset time period.
By the present invention, using the method comprised the following steps:Network address to be detected is obtained, wherein, network to be detectedAddress is meets the network address of the first preparatory condition, if passing through network address access target website in preset time periodNumber exceedes preset times threshold value, it is determined that network address meets the first preparatory condition;Obtain corresponding to network address to be detectedUser access information, wherein, user access information includes the network terminal information of access target website, and network terminal information includesObjective network end message;According to the network address to be detected that objective network end message is included in corresponding user access informationNumber and target access ratio is calculated by the number of network address access target website to be detected in preset time period;SentenceWhether disconnected target access ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that logicalThe behavior for crossing network address access target website to be detected is that malice reptile accesses behavior, solves and network malice reptile is carried outDuring identification the problem of accuracy difference, and then determine to pass through survey grid to be checked in the case where target access ratio exceedes pre-set ratio threshold conditionThe behavior of network address access target website is that malice reptile accesses behavior, has reached the accuracy for improving the identification of network malice reptileEffect.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phaseMutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present applicationAccompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is onlyThe embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ordinary skill peopleThe every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model of the application protectionEnclose.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, "Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so useData can exchange in the appropriate case, so as to embodiments herein described herein.In addition, term " comprising " and " toolHave " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unitProcess, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clearIt is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
According to an embodiment of the invention, there is provided a kind of network malice reptile recognition methods.
Fig. 1 is the flow chart of the embodiment of the network according to the invention malice reptile recognition methods.As shown in figure 1, the partyMethod includes step S102 to step S110:
Step S102, network address to be detected is obtained, wherein, network address to be detected is to meet the net of the first preparatory conditionNetwork address, if exceeding preset times threshold value by the number of network address access target website in preset time period, reallyDetermine network address and meet the first preparatory condition.
In some cases, in preset time period by the number of a fixed network address access target website veryGreatly (beyond visit capacity generally), at this moment need that the property of the access by the network address is identified, wrapInclude and judge it for legal artificial access, or network malice reptile accesses.Here preset times threshold value is a reference value,Can with but be not limited to according to the experience of web analytics person set.
Step S104, user access information corresponding to network address to be detected is obtained, wherein, user access information includes visitingThe network terminal information of targeted website is asked, network terminal information includes objective network end message.
User access information corresponding to network address to be detected being obtained by the following method includes:Obtain targeted websiteAccess log;Access log is parsed, obtains analysis result;User corresponding to network address to be detected is analytically obtained in resultAccess information.
Preferably, user agent's information (UserAgent) corresponding to detection network address is analytically obtained in result,The information such as the browser that is used when including user access target website in UserAgent, operating system, terminal device model.
Step S106, according to the network address to be detected that objective network end message is included in corresponding user access informationNumber and target access ratio is calculated by the number of network address access target website to be detected in preset time period.
Preferably, can by the following method corresponding in user access information comprising objective network end messageThe number of network address to be detected and calculated in preset time period by the number of network address access target website to be detectedTarget access ratio includes:Statistics passes through the number of network address access target website to be detected in preset time period;JudgeWhether objective network end message is included in user access information corresponding to network address to be detected;If network address to be detectedObjective network end message is included in corresponding user access information, then includes target network in user access information corresponding to statisticsThe number of the network address to be detected of network end message;Target access ratio is calculated by below equation:S=A/B, wherein, S isTarget access ratio, A are of the network address to be detected comprising objective network end message in corresponding user access informationNumber, B are to pass through the number of network address access target website to be detected in preset time period.
For example, it is IE browser that objective network end message, which is the browser that access uses,.Assuming that in preset time period,Number by the first IP address access target website is 1000 times.Wherein, the number to be conducted interviews using IE browser is 900It is secondary.Then target access ratio is S=0.9.
Step S108, judges whether target access ratio exceedes pre-set ratio threshold value.
Pre-set ratio threshold value is a referential data, and the numerical value can be drafted according to the experience of judgement person, can alsoSet according to legal IP access ratio.
Preferably, pre-set ratio threshold value can be determined by the following method:Grid of reference address set is determined, wherein, ginsengExamining collection of network addresses includes multiple network address, and multiple network address are to meet the network address of the second preparatory condition, such asFruit is in preset time period by the number of network address access target website not less than preset times threshold value, it is determined that networkLocation meets the second preparatory condition;Obtain user access information corresponding to grid of reference address set;According to grid of reference address setUser access information corresponding to conjunction determines pre-set ratio threshold value, wherein, pre-set ratio threshold value is right in grid of reference address setIn the user access information answered comprising objective network end message network address number and pass through ginseng in preset time periodExamine the ratio of the number of the network address access target website in collection of network addresses.
Targeted website is have accessed by multiple network address in preset time period, can determine to refer to by the following methodCollection of network addresses:Detect and whether exceeded by the number of multiple network address access target websites in preset time period respectivelyPreset times threshold value;It is determined that in preset time period access target website number not less than preset times threshold value network addressFor the network address in grid of reference address set.
For example, it is IE browser that objective network end message, which is the browser that access uses,.Assuming that in preset time period,The network address that the number of access target website exceedes preset times threshold value (500 times) is the first IP address, is not above presettingThe network address of frequency threshold value is the second IP address, the 3rd IP address and the 4th IP address, wherein, pass through the first IP address and accessThe number of targeted website is 1000 times (browser that access uses is IE browser for 800 times);Respectively by the 2nd IPThe number of location, the 3rd IP address and the 4th IP address access target website be 100 times, 200 times and 300 times, access use it is clearDevice of looking at is respectively 50 times, 100 times and 150 times of IE browser.By the second IP address, the 3rd IP address and the 4th IP addressIt is considered as grid of reference address set, it is (50+100+150)/(100+200+300)=0.5 to calculate pre-set ratio threshold value.And targetAccess ratio is 800/1000=0.8.Because 0.8 more than 0.5, it is possible to think by the first IP address access target websiteBehavior be malice reptile access behavior.
Step S110, if target access ratio exceedes pre-set ratio threshold value, it is determined that visited by network address to be detectedThe behavior for asking targeted website is that malice reptile accesses behavior.
Web crawlers is the automatic program and script for capturing web message according to certain rule.Due to pre-set ratioThreshold value is a kind of statistical value of the artificial access situation in preset time period, and access situation is probability of happening corresponding to the statistical valueMaximum artificial access situation, a standard can be used as to contrast.When target access ratio has exceeded pre-set ratio threshold value,It is considered that the access for passing through the network address is the access of non-artificial progress, belongs to malice reptile and access behavior.
The embodiment is as a result of following steps:Network address to be detected is obtained, wherein, network address to be detected is fullThe network address of the first preparatory condition of foot, if exceeded in preset time period by the number of network address access target websitePreset times threshold value, it is determined that network address meets the first preparatory condition;User corresponding to network address to be detected is obtained to accessInformation, wherein, user access information includes the network terminal information of access target website, and network terminal information includes objective networkEnd message;According in corresponding user access information include objective network end message network address to be detected number andTarget access ratio is calculated by the number of network address access target website to be detected in preset time period;Judge that target is visitedAsk whether ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that by be detectedThe behavior of network address access target website is that malice reptile accesses behavior, is solved accurate when network malice reptile is identifiedThe problem of true property difference, and then determine to visit by network address to be detected in the case where target access ratio exceedes pre-set ratio threshold conditionThe behavior for asking targeted website is that malice reptile accesses behavior, has reached the effect for the accuracy for improving the identification of network malice reptile.
Fig. 2 is the flow chart of the second embodiment of the network according to the invention malice reptile recognition methods, and Fig. 2 can conductA kind of preferred embodiment of embodiment illustrated in fig. 1.As shown in Fig. 2 the method comprising the steps of S201 to step S208:
Step S201, user is accessed and carries out log recording, including the UserAgent when IP address of user, access.
Step S202, daily record is parsed, judges IP address for suspicion IP or legal IP.
Suspicion IP refers to IP address of the number more than preset times threshold value of access target website in preset time period;It is legalIP is IP address of the number not less than preset times threshold value of access target website in preset time period.
Step S203, the IP address for being judged as suspicion IP, UserAgent corresponding to each suspicion IP is dividedAnalysis.
Step S204, calculate each suspicion IP UserAgent ratios.
UserAgent ratios are target access ratio, for example, in preset time period, suspicion IP access targets websiteThe operating system used accounts for the ratio of suspicion IP access targets website total degree for the number of the systems of windows 7.
Step S205, for the legal IP judged, using all legal IP as legal IP groups, calculate legal IP groupsUserAgent ratios.
The UserAgent ratios of legal IP groups are pre-set ratio threshold value.For example, pass through all IP address in legal IP groupsThe operating system that access target website uses accounts for all IP address access targets in legal IP groups for the number of the systems of windows 7The ratio of website total degree.
Step S206, judge the difference of each suspicion IP UserAgent ratios and the UserAgent ratios of legal IP groupsWhether preset error value is more than.
Step S207, if the difference of the UserAgent ratios of suspicion IP UserAgent ratios and legal IP groups is littleIn preset error value, then accessed by suspicion IP access to be artificial.
Step S208, if the difference of the UserAgent ratios of suspicion IP UserAgent ratios and legal IP groups is more thanPreset error value, then by the suspicion IP non-artificial access of access, belong to malice reptile and access behavior.
During malice reptile is identified, one is detected by analyzing UserAgent by above-mentioned steps for the embodimentWhether individual IP address is that multiple users access the IP address being used in conjunction with, and manslaughters rate when reducing identification malice reptile, improvesThe accuracy of malice reptile identification.
According to an embodiment of the invention, there is provided a kind of network malice reptile identification device.It is it should be noted that of the inventionThe network malice reptile identification device of embodiment can be used for performing the network malice reptile identification that the embodiment of the present invention is providedMethod, the network malice reptile recognition methods of the embodiment of the present invention can also be by network malice that the embodiment of the present invention is providedReptile identification device performs.
Fig. 3 is the schematic diagram of the embodiment of the network according to the invention malice reptile identification device.As shown in figure 3, the dressPut including:First acquisition unit 10, second acquisition unit 20, computing unit 30, judging unit 40 and determining unit 50.
First acquisition unit 10, for obtaining network address to be detected, wherein, network address to be detected is pre- for satisfaction firstIf the network address of condition, if exceeding preset times by the number of network address access target website in preset time periodThreshold value, it is determined that network address meets the first preparatory condition.
Second acquisition unit 20, for obtaining user access information corresponding to network address to be detected, wherein, user accessesInformation includes the network terminal information of access target website, and network terminal information includes objective network end message.
Second acquisition unit includes:First acquisition module, for obtaining the access log of targeted website;Parsing module, useIn parsing access log, analysis result is obtained;Second acquisition module, for obtaining network address pair to be detected in analytically resultThe user access information answered.
Computing unit 30, for including the survey grid to be checked of objective network end message in the user access information corresponding toThe number of network address and the number calculating target access of network address access target website to be detected that passes through in preset time periodRatio.
Alternatively, computing unit can include:First statistical module, for counting in preset time period by be detectedThe number of network address access target website;Judge module, for judging user access information corresponding to network address to be detectedIn whether include objective network end message;Second statistical module, for accessing letter in user corresponding to network address to be detectedIt is to be checked comprising objective network end message in user access information corresponding to statistics when objective network end message is included in breathSurvey the number of network address;Computing module, for calculating target access ratio by below equation:S=A/B, wherein, S is meshAccess ratio is marked, A is the number of the network address to be detected comprising objective network end message in corresponding user access information,B is to pass through the number of network address access target website to be detected in preset time period.
Judging unit 40, for judging whether target access ratio exceedes pre-set ratio threshold value.
It is alternatively possible to by determining pre-set ratio threshold value with lower module:First determining module, for determining grid of referenceAddress set, wherein, grid of reference address set includes multiple network address, and multiple network address are to meet the second default barThe network address of part, if by the number of network address access target website not less than preset times threshold in preset time periodValue, it is determined that network address meets the second preparatory condition;3rd acquisition module, for obtaining corresponding to grid of reference address setUser access information;Second determining module, determined for the user access information according to corresponding to grid of reference address set defaultRate threshold, wherein, pre-set ratio threshold value is to include target network in corresponding user access information in grid of reference address setThe number of the network address of network end message and in preset time period by reference in collection of network addresses network address visitAsk the ratio of the number of targeted website.
Alternatively, if by multiple network address access target websites in preset time period, the first determining module canWith including:Detection sub-module, for detecting respectively in preset time period by time of multiple network address access target websitesWhether number exceedes preset times threshold value;Determination sub-module, for determining the number of the access target website in preset time period notNetwork address more than preset times threshold value is the network address in grid of reference address set.
Determining unit 50, for when target access ratio exceedes pre-set ratio threshold value, it is determined that by network to be detectedThe behavior of location access target website is that malice reptile accesses behavior.
The network malice reptile identification device that the present embodiment provides includes:First acquisition unit 10, second acquisition unit 20,Computing unit 30, judging unit 40 and determining unit 50.By the device, solve accurate when network malice reptile is identifiedThe problem of true property difference, and then determined in the case where target access ratio exceedes pre-set ratio threshold condition by determining unit 50 by treatingThe behavior of detection network address access target website is that malice reptile accesses behavior, has reached and has improved the identification of network malice reptileThe effect of accuracy.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with generalComputing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formedNetwork on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are storedPerformed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by theyIn multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specificHardware and software combines.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this areaFor art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiiesChange, equivalent substitution, improvement etc., should be included in the scope of the protection.