Background technology
With the development of network, Trojan attack is more and more severe, attacks species and means also become increasingly complex.Unit processingThe ability of wooden horse has not caught up with the development speed of wooden horse, is primarily limited to that memory space is small, data volume is few, to feature detectionEstablished with Exception Model and all run into hardware bottleneck.After user recognizes Trojan characteristics, data before are all lost,It can not detect whether that other users are attacked based on this Trojan characteristics, or the size lost after being attacked.Secondly, for differentNorm type establish, due to small data quantity problem, it is necessary to first prepare data, after establish model.After model is established, if to changeModel, then need to continue gathered data, re-establish model, data volume and memory space be far from for one-of-a-kind systemFoot.
The big data epoch, with the System Development of various big datas, it is already possible to by its huge data group solve withPreceding unit small data defect.But the contact application of unit and big data is still undeveloped, network big data is applied to netNetwork safety analysis still has very big technical bottleneck, hinders the detection efficiency of network security and still needs to larger dependence standalone hardwareEnergy.
The content of the invention
In order to solve the above-mentioned technical problem, present invention aims at provide a kind of Network Safety Analysis system based on big dataSystem and its analysis method.
A kind of Network Safety Analysis system based on big data of the present invention, it is characterised in that including:
MapReduceization unit, for original data source to be carried out into MapReduceization, export orcfile to be pre-treated;
Pretreatment unit, for the data command according to third party's data-interface, orcfile to be pre-treated is carried out on demand pre-Handle and the orcfile of pending data is exported to modeling unit;
Modeling unit, according to the orcfile and mining mode of pending data, real-time calculation type model corresponding to foundation;
Excavate unit, for big data platform carry out data mining and Result is exported to wooden horse analytic unit and/orAnomaly analysis unit;
Wooden horse analytic unit, for being analyzed using Result wooden horse, and the local data needed according to real-time analysisFeed back to algorithm recasting unit;
Anomaly analysis unit, for being analyzed using Result Network Abnormal, and the part needed according to real-time analysisData feedback to algorithm remakes unit;
Algorithm remakes unit, for remaking mining algorithm to the required data of feedback.
Described pretreatment unit includes:
Further Feature Extraction module, for carrying out Further Feature Extraction to orcfile to be pre-treated;
Information completion replacement module, for being carried out according to the instruction that third party's data-interface inputs to orcfile to be pre-treatedInformation completion or information are replaced;
Data normalization module, for unifying range to the like attribute from different data sources;
Data Discretization module, for being layered to the data for specifying data row.
The data of real-time calculation type model/configuration persistence is stored in RDBMS by described modeling unit(Relational databaseManagement system Relational Database Management System)In.
A kind of Network Security Analysis Method based on big data of the present invention, it is characterised in that comprise the following steps:
A, input path is loaded in mapper with Hive driver/Hbase driver/Sequencefile io api;
B, data are judged one by one using effectiveness condition, output KEY is timestamp, and VALUE is the set of target data row;
C, satisfactory data are exported according to orcfile outputformat in reducer;
D, the pretreatment options of the predefined statement configuration file of data mining scene are passed through;
E, real-time calculation type model is established;
F, excavation step is completed by successive ignition;
G, wooden horse analysis and/or anomaly analysis are carried out on unit according to the mining structure of output;
H, according to wooden horse analysis and/or real-time local data's needs of anomaly analysis, repeat step a is until complete to analyze.
The pretreatment options of the step d include Further Feature Extraction, information completion is replaced, data normalization and data fromDispersion.
The data of real-time calculation type model/configuration persistence is stored in RDBMS by described step e.
A kind of Network Safety Analysis system and its analysis method based on big data of the present invention, the advantage is that,Network security based on big data point can save historical data, as IP for finding Trojan characteristics, attacking end and being attacked end etc.After data, can quickly it associate:It was found that attacked which, which user period, and there is which data to leak.ItsSecondary, under big data environment, data largely preserve.If it find that model can not meet user's request, it is only necessary to for individualOther data remake the local new data that obtains using algorithm and re-establish model.This method eliminates Data Preparation Process, contracts significantlyShort model settling time, Exception Model detection efficiency is improved, also greatly reduces the dependence to standalone hardware.
Embodiment
According to Fig. 1, Fig. 2, a kind of Network Safety Analysis system based on big data of the present invention is specifically describedAnd its analysis method, by a variety of data mining algorithm Mapreduceization, unit data digging flow is distributed to more machinesOn, it is parallel to perform, greatly improve data mining efficiency.
Pass through the predefined statement of built-in data mining model(This statement file is the configuration file of json forms), retouchThe original data source required for current data excavation scene is stated, original data source is present in big data platform, initial dataSource is the data original form in external storage to big data platform, and Mapreduceization unit is taken out for the first layer of initial dataTake, according to initial data corresponding to the effectiveness condition extraction that different excavations requires, filter out background noise data.Use HiveDriver/Hbase driver/Sequencefile io api load input path in mapper, and utilize validity barPart judges that data output KEY is timestamp one by one, and VALUE is the set of target data row, then will symbol in reducerClose desired data to be exported according to orcfile outputformat, obtain orcfile to be pre-treated.
Pass through the predefined statement of built-in data mining scene(This statement file is the configuration file of json forms), match somebody with somebodyPut document and determine pretreatment options to being carried out required for orcfile to be pre-treated:1st, Further Feature Extraction:To original numberAccording to being operated, new data row are obtained.2nd, third party's information completion/replacement:Extraneous information is obtained from third party's data-interfaceInitial data is supplemented, for example specific geography information can be supplemented according to srcip/dstip and arranged.3rd, data normalization:Unify its range to the like attribute from different data sources, for example the time is all about set to UNIX timestamp YYYY_MM_DD hh:mm:ss.uuuuuu.4th, Data Discretization:Some data column datas are layered, for example, divide the time into the work hours/underClass's time.Discriminate whether that initial data needs to be pre-processed and need which is carried out according to the definition of data mining sceneThe pretreatment of step, program class name, method and parameter and execution step for performing pretreatment will be clearly indicated in definition.
The newly-built and existing excavation scene of maintenance and management, establishes real-time calculation type model, is newly modeled for the ease of the later stageThe management and maintenance of type, modeling unit is by persistence preservation model data/configuration into RDBMS.
The main of data mining realizes step, verifies input data by step according to built-in excavation plan and carries out dataProcessing, according to different excavation plans, the step of data processing may successive ignition, when each step is disposed(Or certainStep process fails)Afterwards, call result is cleared up into integration step, realizes the Formatting Output to result(Also middle knot can be cleared upFruit)Or result cleans.To realize and the uniformity of result data is externally exported, main result is recorded in ElasticSearch,On MYSQL DB, Hadoop Hbase, including but not limited to Formatting Output, correlation log is recorded, safeguards external linkage etc..
The wooden horse and Network Abnormal of intercepting and capturing are analyzed according to the Result of big data, but during analysisOften occur that partial data deficiency or partial data do not conform to the fault-tolerant events such as rule, then wooden horse analytic unit and anomaly analysis unitThese local data's demand feedbacks to algorithm can be remake unit.Algorithm recasting unit regenerates new calculation for local dataMethod simultaneously re-starts the whole mining process of Mapreduceization circulation until completing all analytical behaviors.
For those skilled in the art, technical scheme that can be as described above and design, make other eachThe corresponding change of kind and deformation, and all these changes and deformation should all belong to the protection model of the claims in the present inventionWithin enclosing.