Background technology:
Phishing (phishing) is based on a kind of attack means of social engineering.It sends the duplicity information that comes from bank or other well-known mechanisms of claiming by spam, instant messenger, SMS or webpage sham publicity, be intended to lure the user to login and seem extremely real fake site, provide a kind of attack pattern of sensitive information (such as user name, password, account ID, ATM PIN code, credit card).
The phishing defence is the countermeasure techniques of phishing, can be divided into server end defence, user side defence and third party's defence.Server end defence refers to web site server end by other technologies, and such as digital watermark, digital finger-print, dynamic security skin (dynamic security skin), double verification protocol etc. prove the authenticity of website identity to the user.User side defence refers at user browser plug-in unit is installed, and detects prompting user or the input of protection user sensitive information etc. behind the fishing webpage.Third party defence comprises the URL blacklist strobe utility, fail-safe software manufacturer defense mechanism, public's protection mechanism of fishing Spam filtering, Third Party Authentication mechanism, browser provider etc.It is target that server end defends to protect the website identity reality, has increased the counterfeit cost of fisherman, produces from source containment fishing website, belongs to Initiative Defense; Both fishing websites of then occurring take defence are as target afterwards, and defense technique falls behind the counterfeit technology of fishing website, belongs to Passive Defence.Though the phishing defence has obtained considerable progress, the Initiative Defense technology exists allows the client user finally judge the website identity reality, and the Passive Defence technology is not installed the defective that plug-in unit just can't be defendd.
Summary of the invention:
For the problems referred to above, the purpose of this invention is to provide a kind of phishing intelligence system of defense, formed by user behavior identification module, fishing website lightweight Intelligent Measurement engine and phishing intelligent processing module, for the network user provides intelligence, timely phishing defence service.
For achieving the above object, the present invention takes following technical scheme:
1, user behavior identification module;
2, fishing website lightweight Intelligent Measurement engine;
3, phishing intelligent processing module;
The present invention is owing to take above technical scheme, and it has the following advantages:
1, based on
The user behavior recognizer of Bayes;
2, the fishing website based on the webpage noise detects learning algorithm;
3, detect learning algorithm based on website Logo identification fishing website.
Embodiment:
(1) user behavior understanding, study and Study of recognition
User behavior is understood and to be comprised that the user behavior formalization is understood and study, user browsing behavior priori probability density distributed data base build and based on
The user behavior identification of Bayes.Adopt investigation on the net questionnaire, manual research questionnaire, send the mode such as mail test at random, obtain in the URL address browse web sites, the access of input information, clickthrough, the user that downloads is normal and the suspicious behavior type of browsing of electronics Email, QQ, shopping website link, adopt similar " behavior of if URL Input Address then normal browsing " rule that the user is browsed capable formalized description, set up the priori probability density regularity of distribution of user browsing behavior, utilize
Bayes sets up the user behavior recognizer, activates when the user may access fishing website and detects engine.
(2) fishing website lightweight Intelligent Measurement engine research
Monthly statistical information by pertinent literature reading and PhishTank and the upper announcement of APAC shows: the phishing attacks number of times comes and go, but target of attack is concentrated, mainly concentrates on the websites such as payment transaction, financial instrument, instant messaging, broadcasting media.According to APAC2011 bulletin in December, the fishing website total amount that relates to Taobao, Tengxun, industrial and commercial bank, Bank of China accounts for 94.39% of whole report amounts.The famous website knowledge base of model is as the Heuristics that detects engine.Comprise in the knowledge base: domain name, IP address, URL, trade (brand) name, copyright information, Logo describe the information that the factor, WHOIS etc. describe identity.The detailed technology route of fishing website lightweight Intelligent Measurement engine research is as follows:
The online fast filtering Mechanism Study of URL adopts white list fast filtering mechanism, to the white list of user add, detects engine and directly ignores detection; Seminar intends adopting the blacklist mechanism of synchronous PhishTank, APWG, Google Safe Browser API, to fishing URL fast filtering, stops user's access.
For the URL that can't judge, adopt the online detection algorithm of fishing URL that merges multilayer feature.This algorithm intends adopting structure, vocabulary, domain name and four layers of feature of server, sets up the learning classification model based on SVM, calculates as the Fast Classification of fishing website URL.
Fishing website interactivity fast filtering Mechanism Study is because the purpose of fishing website is for obtaining user's input information, therefore whether comprise server input list in the analyzing web page, such as form mark, input mark, login logon form, can effectively determine whether fishing website.For the website that does not have server input list, can directly judge not to be fishing website have the website of input list just need to detect from content is similar with vision.Detect for the fishing website interactivity, adopt the identification of web page server submission form and filtration based on dom tree.Utilize the markup language sources program analysis method, make up the webpage dom tree, form, input, login submission form control in the identification dom tree are realized quick fishing webpage classification.
Detect the contents such as navigation bar that learning algorithm research webpage noise refers to that web page template comprises, tissue marker, contact details, advertisement bar based on the fishing website of webpage noise.The webpage noise content comprises the website identity information mostly, and fishing website can be applied mechanically these information of targeted website in order better to confuse the user.Replace the whole content of webpage can realize the website identification with the webpage noise, can reduce again the webpage other guide to the impact of detection algorithm performance and efficient.To the webpage noise, seminar intends adopting n-gram, word frequency vector, the Web information processing technology such as TF-IDF, Shingle that the webpage noise is analyzed, determine the feature mode of webpage noise, make up SVM machine learning and disaggregated model, to judge that the suspected site has used famous website template, but inconsistent again with information in the knowledge base, judge fishing website with this.
The fishing website sorting algorithm research website Logo of website Logo identification is significant point in the webpage, also is user's area-of-interest.Often with website Logo identification website, fishing website also utilizes this characteristics user cheating to the network user.Seminar intends adopting SIFT to analyze famous website Logo characteristics, determine the characterization factor of describing website Logo, frequent counterfeit famous targeted sites Logo carries out modeling to fishing website, to judge that the suspected site has used famous Net station logo, but inconsistent again with information in the knowledge base, judge fishing website with this.
(3) phishing Intelligent treatment mechanism research
Browser BHO interface provides relevant interface specification, for phishing Intelligent treatment fishing website provides interface.Adopt and analyze first the various standards of browser BHO, interface, method, event, common class etc., determine and catch the click of user's mouse, keyboard input behavior, address field and status bar event methods; Input message uses the SHA-1 algorithm to obscure, and realizes phishing Intelligent treatment mechanism by programming.