Embodiment
It should be noted that, in the situation that not conflicting, the feature in embodiment and embodiment in the application can combine mutually.Describe below with reference to the accompanying drawings and in conjunction with the embodiments the present invention in detail.
In order to make those skilled in the art person understand better the present invention program, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the embodiment of a part of the present invention, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, should belong to the scope of protection of the invention.
It should be noted that, term " first ", " second " etc. in instructions of the present invention and claims and above-mentioned accompanying drawing are for distinguishing similar object, and needn't be used for describing specific order or precedence.Should be appreciated that the data of such use are suitably exchanging in situation, so that embodiments of the invention described herein.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, for example, those steps or unit that process, method, system, product or the equipment that has comprised series of steps or unit is not necessarily limited to clearly list, but can comprise clearly do not list or for these processes, method, product or equipment intrinsic other step or unit.
The embodiment of the present invention also provides a kind of log processing method.The method operates on computer equipment.
Fig. 1 is according to the process flow diagram of the log processing method of the embodiment of the present invention.As shown in Figure 1, this log processing method comprises that step is as follows:
Step S102, cluster server receives the journal file of user side.
User side can be the server that need to gather daily record, can be also the client that side of user need to gather daily record.For example, user is by the corresponding client of a station server, and different clients is moved respectively business separately, and client can produce daily record.Meanwhile, server is for individual client provides background service, and server also can produce some daily records in operational process.Cluster server can reception server or the journal file sending over of client, for journal file is processed.Cluster server can receive the journal file of multiple user sides simultaneously, and the journal file of different user end is processed respectively.
In the embodiment of the present invention, can, need to gathering the user side setting of daily record or carrying a proxy module, for timing acquiring journal file, send to cluster server.User side sends request by http protocol and corresponding journal file, after cluster server response request, receives journal file, so that journal file is stored on cluster server by the service interface providing.
Step S104, cluster server storing daily record file.
After receiving the journal file of user side, can store journal file into cluster server.
Particularly, storing daily record file can be first journal file to be split into multirow daily record data, then multirow daily record data is sent in Distributed Message Queue successively, for example kafka message queue, so that cluster server reads daily record data analysis from Distributed Message Queue.After daily record data is sent to Distributed Message Queue successively, cluster server can also read daily record data from Distributed Message Queue, the daily record data reading is resolved, and the form that generates key-value pair (key-value) is stored in distributed data base.In storing daily record file, can obtain the descriptor (as path, the creation-time etc. of journal file) of journal file, leave in the database of cluster server.
Step S106, cluster server is analyzed journal file, obtains analysis result.
After journal file is transferred to cluster server by user side, user can access cluster server, the analysis result of inquiry cluster server to journal file.For example, by log analysis, can obtain operation conditions or the fault state of user side business.Can be that the information in journal file is added up to journal file analysis, obtain statistics.
Due to the difference of the request for information of the analysis result of user to journal file, the analysis of daily record can be divided into real-time analysis and off-line analysis according to the promptness of search request.Real-time analysis requires to return within the several seconds analysis of more than one hundred million row daily record datas conventionally, just can reach the object that does not affect user's query analysis result.Daily record data is carried out to real-time statistics, and this part daily record data amount is generally not too large, can calculate statistical study by streaming, and for example in redis database, store analysis result after processing in result temporal data storehouse again.
Off-line analysis is less demanding to the promptness of statistics, can be every other day or every other month analysis result show.Daily record data after resolving is first left in to distributed data base as in Hbase database, require to finish writing task job according to service logic in advance, carry out counting statistics by predetermined period timing race task and analyze daily record.
Step S108, cluster server output analysis result.
Output analysis result can be that analysis result is exported to corresponding user side, can show analysis result by webpage or application program, so that staff checks at user side.
In the embodiment of the present invention, in cluster server, multiple servers are used for receiving journal file, multiple servers are for storing daily record file, and multiple servers are used for analyzing journal file, complex calculations are all assigned to each station server by the embodiment of the present invention, the concurrent ability of height that has realized whole system, processing power can reach the more than 10 times of conventional architectures.The classification of being stored and being analyzed by cluster server is processed to reach the high-effect of massive logs processing, has realized massive logs analysis, has solved the inefficient problem of log processing in prior art, has reached the effect that improves log processing efficiency.
The embodiment of the present invention can be to adopt cloud computing principle, and journal file is processed.Wherein, cloud computing (cloudcomputing) is increase, use and the delivery mode of the related service based on internet, and being usually directed to is provided dynamically easily expansion and be often virtualized resource by internet.Cloud is the one metaphor saying of network, internet.Past often represents telecommunications network with cloud in the drawings, is also used for afterwards representing the abstract of internet and underlying basis facility.Narrow sense cloud computing refers to payment and the use pattern of IT infrastructure, refers to obtain resource requirement by network in the mode of as required, easily expanding; Broad sense cloud computing refers to payment and the use pattern of service, refers to obtain required service by network in the mode of as required, easily expanding.It is relevant with software, internet that this service can be IT, also other services.It means that computing power also can be used as a kind of commodity and circulates by internet.Cloud computing is as a kind of emerging technical concept, cloud storage (mass data distributed store technology), cloud computing (the map reduce of hadoop, streaming are calculated in real time), Yunan County's congruence that it provides are applicable to the demands such as large data storage, excavation, analysis, early warning, statistics very much, and its efficient performance allows being protected in time and accurately of data processing.Based on the principle of cloud computing platform, carry out the selection of daily record data storage in early stage and done classification according to the requirement of data volume and inquiry real-time and processed, most importantly accomplished the parallel processing of a business task analysis, instead of the parallel processing of multitask, greatly promoted the correctness of search efficiency and statistics.
The object of the embodiment of the present invention is to solve the cloud storage of massive logs, and massive logs can be analyzed and analyse in depth the cloud computing service of excavation in time, and ensures security, the accuracy of daily record data.Solved the growth of daily record amount as long as solve by new computing node simultaneously, and without just improving data-handling efficiency and increase memory space by hardware simply.
Preferably, the step of cluster server storing daily record file comprises the following steps:
Step S1, journal file is split into daily record data by cluster server.
Because the form of the journal file of different user end is different, and in each journal file, include multiple log recordings, it can be that journal file is split into multirow daily record data that journal file is split into daily record data, form data line, be sent in distributed message row so that the journal file by form is not split into daily record data.
Step S2, cluster server is sent to daily record data in Distributed Message Queue.Wherein, cluster server reads daily record data from Distributed Message Queue, and daily record data is analyzed.
Distributed Message Queue can be kafka message queue, the Distributed Message Queue of kafka is relatively applicable to simple message transmission and distribution, can support big data quantity, especially daily record data, and be combined with mapreduce and do real-time analysis and also can reach good effect.
Preferably, after daily record data is sent to the step in Distributed Message Queue by cluster server, log processing method also comprises: cluster server reads daily record data from Distributed Message Queue; Cluster server is resolved the daily record data reading, and obtains analysis result; Cluster server generates key-value pair corresponding to daily record data according to analysis result; And cluster server carrys out storing daily record file by key-value pair is stored in distributed data base.
Particularly, from Distributed Message Queue, read daily record data, every daily record data is resolved, parsing obtains the key word of daily record, and such as mac address, flow, concrete application etc., generate key-value pair corresponding to daily record data based on these analysis results, as utilize mac address for key, other analysis result is value, then obtains the key-value pair of daily record data, then daily record data mapping is stored into distributed data base as in hbase database.
The embodiment of the present invention, the data of utilizing distributed data base hbase storing daily record to resolve, because hbase database is the data model storage based on key-value, favorable expandability, carry out analysis speed from hbase peek enough fast, and result can store arbitrarily, continue to store hbase, relational data or redis all can, do not have incompatible situation and occur.
Preferably, cluster server comprises journal file analysis: the daily record data of cluster server Real-time Obtaining increment from distributed data base; And cluster server adopts streaming calculating to add up to the daily record data of increment.
Due to constantly adding up of journal file, the daily record data being stored in distributed data base also constantly increases, real-time analysis in the embodiment of the present invention can be the cluster server daily record data of Real-time Obtaining increment from distributed data base in real time, the daily record data of increment is carried out to counting statistics, avoid the daily record data to having calculated to carry out double counting.The daily record data of Real-time Obtaining increment, adopts streaming to calculate the data of increment is added up.Wherein, it is to adopt the bolt of storm to complete that streaming is calculated, in bolt, carry the sequence of operations such as filtration, polymerization, Query Database, wherein, filter operation can complete in the parse in early stage analyzes, form with DB table leaves in hbase, only in streaming is calculated, has done map mapping the Organization of Data needing is got up to carry out polymerization computational analysis.
Particularly, first, take out daily record data and resolve and leave in hbase through parse from kafka queue, this process splits log recording, and the form that is mapped to DB table leaves in hbase.Then, adopt streaming to calculate to carry out real-time analysis statistics, it is to adopt the bolt of storm to complete that streaming is calculated, in bolt, carry the sequence of operations such as filtration, polymerization, Query Database, wherein, filter operation can complete in the parse in early stage analyzes, and leaves in hbase with the form of DB table, only in streaming is calculated, has done map mapping the Organization of Data needing is got up to carry out polymerization computational analysis.Then leave result complete streaming counting statistics in database as in redis database.Finally, the result data that is stored in redis is left in to hbase database according to actual needs, or in relevant database mysql, inquire about these statisticss for user.
Above-described embodiment has been described a flow process of the real-time analysis in log analysis, processes the real-time analysis of massive logs according to real-time analysis flow, and moment is given client result feedback, improves the promptness of log analysis result.
Preferably, cluster server comprises journal file analysis: cluster server obtains the daily record data of increment from distributed data base according to predetermined period; And cluster server carries out statistical computation to the daily record data of increment.
Because the difference of the request for information of the analysis result of user to journal file can adopt the mode of off-line analysis to carry out analyzing and processing to daily record data.The cycle that can set in advance analysis is predetermined period, and predetermined period can arrange as required, for example a week or one month etc.From distributed data base, obtain the daily record data of increment according to predetermined period, the daily record data of increment is being carried out to statistical computation.
Particularly, can realize by following steps:
Step 1 is taken out daily record data and is resolved and leave in hbase through parse from kafka queue, and this process splits log recording, and the form that is mapped to DB table leaves in hbase.
Step 2, creates job task one by one according to specific needs, and the Logic of Tasks is determined according to actual service logic.
Step 3, creates periodic scheduling Task, and periodic schedule job task is set exactly, such as being pre-created task 1, runs task 1 zero point every day.
Step 4, the scheduling time of arrival, according to scheduling content start task.
Step 5, carries out concrete the Logic of Tasks counting statistics daily record data.
Step 6, if tasks carrying failure is notified associated user by the notification module setting in advance in the mode of note or mail, user is manually being restarted job task after investigation reason.
Step 7, after tasks carrying success, leaves execution result in hbase database in, facilitates user to inquire about.
Step 8, tasks carrying success and result is left in after hbase database, can notify user in the mode of note or mail by notification module, tasks carrying success.
Above-described embodiment has been described a flow process of the off-line analysis in log analysis, according to the off-line analysis of such off-line analysis flow process parallel processing massive logs, and result is reported to front end for user's displaying.
Fig. 2 is according to the process flow diagram of a kind of preferred log processing method of the embodiment of the present invention.As shown in Figure 2, this log processing method comprises that step is as follows:
Step 202, the journal file of extraction user side.Extracting journal file can be to extract the default relevant journal file of key word.By designing the agent proxy module of a script type, built on the server of user side, gather at regular intervals the daily record needing based on service needed.After extracting the journal file of user side, the journal file of extraction can be pushed to cluster server.
Step 204, is stored in cluster server by the journal file pushing out.On cluster server, storing daily record file comprises: being first storing daily record file, is secondly that description document daily record (comprising path that daily record deposits, size, time etc.) is stored in redis.
Step 206, cluster server reads daily record data, and daily record data is sent in Distributed Message Queue.
Step 208, cluster server reads daily record data from Distributed Message Queue, and daily record data is resolved.First carry out daily record parsing, useful Data Analysis out, the data after parsing are stored in the corresponding literary name section of hbase.
Step 210, reads the data analysis after daily record is resolved, and obtains analysis result.Can take real-time analysis and two kinds of modes of off-line analysis to the data after resolving.
Step 212, analysis result by being illustrated in user side.Can be to represent with the form of webpage or mobile phone A PP by Thrift.
Above-described embodiment has been described a daily record and has finally been shown a whole flow process to result from collecting analysis, and the classification of being stored and being analyzed by cluster server is processed to reach the high-effect of massive logs processing, has realized massive logs analysis.
An application scenarios below by the log processing method of the embodiment of the present invention is described the present invention in detail.
Processing procedure for the daily record of aggregate video flow comprises: first, gather the daily record of aggregate video flow.Then, cluster server splits into the flow daily record collecting in the capable kfaka of the being sent to queue of daily record data.
After flow Log Shipping is in kfaka queue, cluster server reads successively daily record data from kfaka queue, and every daily record is resolved, and resolves to some key words, such as mac address, flow, concrete application etc.
Result after cluster server is resolved, can form the pattern of the key-value that daily record data is corresponding, as utilize mac for key, and all the other are value, and daily record data mapping is stored in hbase.
Then can be as required, adopt the mode of real-time analysis or off-line analysis to carry out analytic statistics to journal file.Wherein, off-line analysis can be every 2H as a dispatching cycle, the scheduling moment one is to starting the task of designing in advance, the flow situation of this 2H of incremental computations and renewal discharge record monthly.Inform the implementation status of user task simultaneously.
Real-time analysis can be according to query statement, inquires about rapidly last task and run through the flow information of query point, and the statistics that the result of real-time query and last task are run through is fed back to user as actual data on flows.
Finally, by analysis result showing interface to user.
Based on the principle of cloud computing platform, carry out the selection of Primary Stage Data storage and done classification according to the requirement of data volume and inquiry real-time and processed, most importantly accomplished the parallel processing of a business task analysis, instead of the parallel processing of multitask, greatly promoted the correctness of search efficiency and statistics.
The embodiment of the present invention provides a kind of log processing device, and this device can be realized its function by cluster server.It should be noted that, the log processing method that the log processing device of the embodiment of the present invention can provide for carrying out the embodiment of the present invention, the log processing device that the log processing method of the embodiment of the present invention also can provide by the embodiment of the present invention is carried out.
Fig. 3 is according to the schematic diagram of the log processing device of the embodiment of the present invention.As shown in Figure 3, this log processing device comprises receiving element 10, storage unit 30, analytic unit 50 and output unit 70.
Receiving element 10 is for making cluster server receive the journal file of user side.
User side can be the server that need to gather daily record, can be also the client that side of user need to gather daily record.For example, user is by the corresponding client of a station server, and different clients is moved respectively business separately, and client can produce daily record.Meanwhile, server is for individual client provides background service, and server also can produce some daily records in operational process.Cluster server can reception server or the journal file sending over of client, for journal file is processed.Cluster server can receive the journal file of multiple user sides simultaneously, and the journal file of different user end is processed respectively.
In the embodiment of the present invention, can, need to gathering the user side setting of daily record or carrying a proxy module, for timing acquiring journal file, send to cluster server.User side sends request by http protocol and corresponding journal file, after cluster server response request, receives journal file, so that journal file is stored on cluster server by the service interface providing.
Storage unit 30 is for making cluster server storing daily record file.
After receiving the journal file of user side, can store journal file into cluster server.
Particularly, storing daily record file can be first journal file to be split into multirow daily record data, then multirow daily record data is sent in Distributed Message Queue successively, for example kafka message queue, so that cluster server reads daily record data analysis from Distributed Message Queue.After daily record data is sent to Distributed Message Queue successively, cluster server can also read daily record data from Distributed Message Queue, the daily record data reading is resolved, and the form that generates key-value pair (key-value) is stored in distributed data base.In storing daily record file, can obtain the descriptor (as path, the creation-time etc. of journal file) of journal file, leave in the database of cluster server.
Analytic unit 50, for cluster server is analyzed journal file, obtains analysis result.
After journal file is transferred to cluster server by user side, user can access cluster server, the analysis result of inquiry cluster server to journal file.For example, by log analysis, can obtain operation conditions or the fault state of user side business.Can be that the information in journal file is added up to journal file analysis, obtain statistics.
Due to the difference of the request for information of the analysis result of user to journal file, the analysis of daily record can be divided into real-time analysis and off-line analysis according to the promptness of search request.Real-time analysis requires to return within the several seconds analysis of more than one hundred million row daily record datas conventionally, just can reach the object that does not affect user's query analysis result.Daily record data is carried out to real-time statistics, and this part daily record data amount is generally not too large, can calculate statistical study by streaming, and for example in redis database, store analysis result after processing in result temporal data storehouse again.
Off-line analysis is less demanding to the promptness of statistics, can be every other day or every other month analysis result show.Daily record data after resolving is first left in to distributed data base as in Hbase database, require to finish writing task job according to service logic in advance, carry out counting statistics by predetermined period timing race task and analyze daily record.
Output unit 70 is for making cluster server output analysis result.
Output analysis result can be that analysis result is exported to corresponding user side, can show analysis result by webpage or application program, so that staff checks at user side.
In the embodiment of the present invention, in cluster server, multiple servers are used for receiving journal file, multiple servers are for storing daily record file, and multiple servers are used for analyzing journal file, complex calculations are all assigned to each station server by the embodiment of the present invention, the concurrent ability of height that has realized whole system, processing power can reach the more than 10 times of conventional architectures.The classification of being stored and being analyzed by cluster server is processed to reach the high-effect of massive logs processing, has realized massive logs analysis, has solved the inefficient problem of log processing in prior art, has reached the effect that improves log processing efficiency.
The embodiment of the present invention can be to adopt cloud computing principle, and journal file is processed.Wherein, cloud computing (cloud computing) is increase, use and the delivery mode of the related service based on internet, and being usually directed to is provided dynamically easily expansion and be often virtualized resource by internet.Cloud is the one metaphor saying of network, internet.Past often represents telecommunications network with cloud in the drawings, is also used for afterwards representing the abstract of internet and underlying basis facility.Narrow sense cloud computing refers to payment and the use pattern of IT infrastructure, refers to obtain resource requirement by network in the mode of as required, easily expanding; Broad sense cloud computing refers to payment and the use pattern of service, refers to obtain required service by network in the mode of as required, easily expanding.It is relevant with software, internet that this service can be IT, also other services.It means that computing power also can be used as a kind of commodity and circulates by internet.Cloud computing is as a kind of emerging technical concept, cloud storage (mass data distributed store technology), cloud computing (the map reduce of hadoop, streaming are calculated in real time), Yunan County's congruence that it provides are applicable to the demands such as large data storage, excavation, analysis, early warning, statistics very much, and its efficient performance allows being protected in time and accurately of data processing.Based on the principle of cloud computing platform, carry out the selection of daily record data storage in early stage and done classification according to the requirement of data volume and inquiry real-time and processed, most importantly accomplished the parallel processing of a business task analysis, instead of the parallel processing of multitask, greatly promoted the correctness of search efficiency and statistics.
The object of the embodiment of the present invention is to solve the cloud storage of massive logs, and massive logs can be analyzed and analyse in depth the cloud computing service of excavation in time, and ensures security, the accuracy of daily record data.Solved the growth of daily record amount as long as solve by new computing node simultaneously, and without just improving data-handling efficiency and increase memory space by hardware simply.
Preferably, storage unit comprises fractionation module and delivery module.
Splitting module is used for making cluster server that journal file is split into daily record data.
Because the form of the journal file of different user end is different, and in each journal file, include multiple log recordings, it can be that journal file is split into multirow daily record data that journal file is split into daily record data, form data line, be sent in distributed message row so that the journal file by form is not split into daily record data.
Delivery module is used for making cluster server that daily record data is sent to Distributed Message Queue.Wherein, cluster server reads daily record data from Distributed Message Queue, and daily record data is analyzed.
Distributed Message Queue can be kafka message queue, the Distributed Message Queue of kafka is relatively applicable to simple message transmission and distribution, can support big data quantity, especially daily record data, and be combined with mapreduce and do real-time analysis and also can reach good effect.
Preferably, storage unit also comprises read module, parsing module, generation module and memory module.
Read module, for after daily record data is sent to Distributed Message Queue by cluster server, makes cluster server from Distributed Message Queue, read daily record data.Parsing module, for cluster server is resolved the daily record data reading, obtains analysis result.Generation module is for making cluster server generate key-value pair corresponding to daily record data according to analysis result.Memory module is used for making cluster server to carry out storing daily record file by key-value pair being stored into distributed data base.
Particularly, from Distributed Message Queue, read daily record data, every daily record data is resolved, parsing obtains the key word of daily record, and such as mac address, flow, concrete application etc., generate key-value pair corresponding to daily record data based on these analysis results, as utilize mac address for key, other analysis result is value, then obtains the key-value pair of daily record data, then daily record data mapping is stored into distributed data base as in hbase database.
The embodiment of the present invention, the data of utilizing distributed data base hbase storing daily record to resolve, because hbase database is the data model storage based on key-value, favorable expandability, carry out analysis speed from hbase peek enough fast, and result can store arbitrarily, continue to store hbase, relational data or redis all can, do not have incompatible situation and occur.
Preferably, analytic unit comprises the first acquisition module and the first computing module.
The first acquisition module is for making the daily record data of cluster server from distributed data base Real-time Obtaining increment.The first computing module is added up for making cluster server adopt streaming to calculate to the daily record data of increment.
Due to constantly adding up of journal file, the daily record data being stored in distributed data base also constantly increases, real-time analysis in the embodiment of the present invention can be the cluster server daily record data of Real-time Obtaining increment from distributed data base in real time, the daily record data of increment is carried out to counting statistics, avoid the daily record data to having calculated to carry out double counting.The daily record data of Real-time Obtaining increment, adopts streaming to calculate the data of increment is added up.Wherein, it is to adopt the bolt of storm to complete that streaming is calculated, in bolt, carry the sequence of operations such as filtration, polymerization, Query Database, wherein, filter operation can complete in the parse in early stage analyzes, form with DB table leaves in hbase, only in streaming is calculated, has done map mapping the Organization of Data needing is got up to carry out polymerization computational analysis.
Particularly, first, take out daily record data and resolve and leave in hbase through parse from kafka queue, this process splits log recording, and the form that is mapped to DB table leaves in hbase.Then, adopt streaming to calculate to carry out real-time analysis statistics, it is to adopt the bolt of storm to complete that streaming is calculated, in bolt, carry the sequence of operations such as filtration, polymerization, Query Database, wherein, filter operation can complete in the parse in early stage analyzes, and leaves in hbase with the form of DB table, only in streaming is calculated, has done map mapping the Organization of Data needing is got up to carry out polymerization computational analysis.Then leave result complete streaming counting statistics in database as in redis database.Finally, the result data that is stored in redis is left in to hbase database according to actual needs, or in relevant database mysql, inquire about these statisticss for user.
Above-described embodiment has been described a flow process of the real-time analysis in log analysis, processes the real-time analysis of massive logs according to real-time analysis flow, and moment is given client result feedback, improves the promptness of log analysis result.
Preferably, analytic unit comprises the second acquisition module and the second computing module.
The second acquisition module is for making cluster server obtain the daily record data of increment from distributed data base according to predetermined period.The second computing module is for making cluster server carry out statistical computation to the daily record data of increment.
Because the difference of the request for information of the analysis result of user to journal file can adopt the mode of off-line analysis to carry out analyzing and processing to daily record data.The cycle that can set in advance analysis is predetermined period, and predetermined period can arrange as required, for example a week or one month etc.From distributed data base, obtain the daily record data of increment according to predetermined period, the daily record data of increment is being carried out to statistical computation.
Particularly, can realize by following steps:
Step 1 is taken out daily record data and is resolved and leave in hbase through parse from kafka queue, and this process splits log recording, and the form that is mapped to DB table leaves in hbase.
Step 2, creates job task one by one according to specific needs, and the Logic of Tasks is determined according to actual service logic.
Step 3, creates periodic scheduling Task, and periodic schedule job task is set exactly, such as being pre-created task 1, runs task 1 zero point every day.
Step 4, the scheduling time of arrival, according to scheduling content start task.
Step 5, carries out concrete the Logic of Tasks counting statistics daily record data.
Step 6, if tasks carrying failure is notified associated user by the notification module setting in advance in the mode of note or mail, user is manually being restarted job task after investigation reason.
Step 7, after tasks carrying success, leaves execution result in hbase database in, facilitates user to inquire about.
Step 8, tasks carrying success and result is left in after hbase database, can notify user in the mode of note or mail by notification module, tasks carrying success.
Above-described embodiment has been described a flow process of the off-line analysis in log analysis, according to the off-line analysis of such off-line analysis flow process parallel processing massive logs, and result is reported to front end for user's displaying.
Fig. 4 is according to the schematic diagram of a kind of preferred log processing device of the embodiment of the present invention.As shown in Figure 4, the log processing device of this embodiment comprises log collection module 20, log store module 40, log analysis module 60 and display module 80.
Log collection module 20 is for extracting relevant daily record from external system.External system can be the server that need to gather daily record, can be also the client that side of user need to gather daily record, that is, and and the user side providing in the embodiment of the present invention.Particularly, can be an agent agency by design, be mounted on the server that need to gather daily record, the relevant daily record of timing acquiring is toward memory module transmission.
Log store module 40 for gather come log store at collector cluster server.Log store module 40 has two parts function, and the one, leave on cluster server gathering the journal file coming by http protocol, and the descriptor of journal file (such as file path, creation-time etc.) is left in Redis; The 2nd, processor processing procedure, the descriptor that reads journal file by redis is sent to concrete journal file data in kafka message queue, calls analysis for log analysis module 60.Log store module 40 can realize its function by the storage unit in the embodiment of the present invention.
Log analysis module 60, for counting statistics daily record related data, is divided into real-time analysis and off-line analysis according to the promptness of search request.Log analysis module 60 can realize its function by the analytic unit of the embodiment of the present invention.
Real-time analysis requires to return within the several seconds analysis of more than one hundred million row data conventionally, from log store module 40, point send instant daily record data and carry out real-time statistics, this part data volume is generally not too large, can calculate statistical study by streaming, in the temporary redis of result, after processing, in hbase, deposit, convenient peek front end is shown.
Off-line analysis is less demanding to the promptness of statistics, can show every other day or every other month.From log store module, the daily record data after resolving is first left in Hbase database, require to finish writing in advance task job according to service logic, timing race task is carried out counting statistics and is analyzed daily record.
Display module 80 is for showing user by log analysis result by webpage or mobile phone A PP.
The advantage of the embodiment of the present invention is: the first, and employing can be taken detachable agent agent acquisition daily record, can conveniently configure collection Log Types, does not need also can unload at any time, convenient and swift, without customized development again.The second, adopt cluster storage, as a daily record center, can accept all daily records that send, concentrate and carry out storing after key-value processing, especially along with the growth of daily record amount, as long as carry out dilatation by increasing the hardware such as hard disk, internal memory, convenient and swiftly pare down expenses again.The 3rd, log analysis module 60 is for daily record amount and the actual demand processing of classifying, very fast to the analysis speed of big data quantity, and accuracy is higher, can automatically notify user for the feedback of result, and promptness is well ensured.The 4th, the data of utilizing hbase storing daily record to resolve, because hbase is the data model storage based on key value, favorable expandability, carry out analysis speed from hbase database peek enough fast, and result can store arbitrarily, continue to store hbase database, relational data or redis database all can, do not have incompatible situation and occur.
To sum up, the present invention has following effect:
High arithmetic capability, is all assigned to each station server by complex calculations, has realized the concurrent ability of height of whole device, and processing power is the more than 10 times of conventional architectures.
In user's actual application environment, the probability that various dissimilar hardware and software failures occur is higher, service disruption even causes loss of data as hardware damage, network interruption, system crash etc. extremely all can cause.The embodiment of the present invention is a log processing device that is structured in the massive logs on cloud platform, and therefore it can utilize many master redundancys of cloud computing environment to ensure the high reliability of service.
The embodiment of the present invention can be done this locality storage of all user sides to gather, and can support the memory capacity of PB scale, and be very easy to store dilatation, and whole expansion process can service impacting continuous service.
The software product that the embodiment of the present invention is used is the product of increasing income, the PC-SERVER of hardware using low side, and total cost is lower.
It should be noted that, for aforesaid each embodiment of the method, for simple description, therefore it is all expressed as to a series of combination of actions, but those skilled in the art should know, the present invention is not subject to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.
In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part of detailed description, can be referring to the associated description of other embodiment.
In the several embodiment that provide in the application, should be understood that disclosed device can be realized by another way.For example, device embodiment described above is only schematic, the division of for example described unit, be only that a kind of logic function is divided, when actual realization, can there is other dividing mode, for example multiple unit or assembly can in conjunction with or can be integrated into another system, or some features can ignore, or do not carry out.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, indirect coupling or the communication connection of device or unit can be electrical or other form.
The described unit as separating component explanation can or can not be also physically to separate, and the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in multiple network element.Can select according to the actual needs some or all of unit wherein to realize the object of the present embodiment scheme.
In addition, the each functional unit in each embodiment of the present invention can be integrated in a processing unit, can be also that the independent physics of unit exists, and also can be integrated in a unit two or more unit.Above-mentioned integrated unit both can adopt the form of hardware to realize, and also can adopt the form of SFU software functional unit to realize.
If described integrated unit is realized and during as production marketing independently or use, can be stored in a computer read/write memory medium using the form of SFU software functional unit.Based on such understanding, the all or part of of the part that technical scheme of the present invention contributes to prior art in essence in other words or this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprises that some instructions are in order to make a computer equipment (can be personal computer, mobile terminal, server or the network equipment etc.) carry out all or part of step of method described in the present invention each embodiment.And aforesaid storage medium comprises: various media that can be program code stored such as USB flash disk, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CDs.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.