One kind being based on logstash+kafka data transmission methodTechnical field
The present invention relates to high-speed data acquisition application fields, specifically a kind of to be passed based on logstash+kafka dataTransmission method.
Background technique
In enterprise's application process, there is the information system of oneself in each information enterprise, but the data letter between each enterpriseBreath can not achieve it is shared, also lack a unified data platform come these business datums are integrated, are handled, are excavated with divideAnalysis, it is serious to hinder the whole progress of IT application in enterprise.To solve this problem, begin one's study various data of people passDefeated mode, attempt by not homologous ray data efficient, safety collect together, the basis of data platform in Unified SetOn, further mining analysis, mining data rule, aid decision are carried out to data.Available data transmission mode is broadly divided intoIt is several below:
1) CDN technology
By adding one layer of new network architecture in existing network, the content of website is published to the network closest to user" edge " allows users to the content needed for obtaining nearby, improves the response speed that user accesses website.But disadvantage is also very brightIt is aobvious: it is non real-time, indirect to update to specified object, and there is manual intervention in centre, it need be compared tight, thoughtfulIt arranges;
2) based on the transmission technology of File Transfer Protocol
The effect of FTP remote file transferring agreement is that file is moved on to from a computer there are one computer.Most frequently makeIt is the transmitted in both directions using FTP, i.e., data are transmitted between remote system and local.User can will be on remote computerFile download to user where host on, then copy to again in the terminating machine of user, or be directly downloaded to the end of userIn terminal, additionally it is possible to which the file in the file of host where user or user terminal is transferred on remote computer;
FTPserver need be established using FTP transmission file.Using the FTP of registration user, user and password need be also managed.General host all provides the client of FTP, it is possible to use dedicated FTPclient uses integrated ftp software.According to the peopleBanking software constraint is forbidden to use anonymous ftp transmitting data;
Had using the major defect that FTP mode carries out file transmission: the integrality for transmitting data is unable to get guarantee;Scalability compared withDifference;
3) it is based on mail transfer
File is transmitted using e-mail system.E-mail system is with transmission speed is fast, file type is diversified, transmitting-receiving sideJust, the features such as communicatee is extensive, safe.But the integrality for transmitting data is unable to get guarantee, and efficiency of transmission is lower, andAnd file is transmitted based on lettergram mode, efficiency is lower;
4) it is transmitted based on middleware
Data are transmitted using middlewares such as MQ, MT, has the function of data compression, the big file of transmission, breakpoint transmission etc., may be implementedFile security, reliable transmission.But that there are efficiency is lower for major part middleware transmission mode at present, needs corresponding interface exploitationWorkload, the inflexible disadvantage of data pick-up.
Summary of the invention
Technical assignment of the invention is to provide a kind of based on logstash+kafka data transmission method.
Technical assignment of the invention is realized in the following manner:
One kind being based on logstash+kafka data transmission method, including the operation of the business datum manufacturing side and business datum consumption terminalOperation;
Operating procedure is as follows:
Step 1) passes through business datum manufacturing side logstash, business datum in automatic collection operation system database, and transmitsTo middleware kafka;
Middleware kafka described in step 2 collects the data of each business datum manufacturing side transmission, is uniformly sent to industryBe engaged in data consumption end logstash;
Business datum consumption terminal logstash described in step 3) receives the message of the middleware kafka, switchs to data textPart, and store to target database.
Business datum in step 1) the automatic collection operation system database, comprising:
The business datum of the acquisition is not needed again by specifying in logstash configuration file through development interface program.
The business datum manufacturing side can dispose more logstash tools in the step 1).
The data of the extraction different business of logstash tool described in every.
The more logstash tools concurrently execute data pick-up work.
The middleware kafka clustered deploy(ment) is in multiple servers, equally, business datum consumption terminal logstash according toThe difference for extracting data service range, opens multiple logstash tools, receives different numbers from middleware kafka cluster respectivelyAccording to concurrent processing is data file.
One kind being based on logstash+kafka data transmission device, including the business datum manufacturing side and business datum consumptionEnd;
The business datum manufacturing side is used for the business datum that capturing service system generates;
The business datum consumption terminal is used to handle the data of acquisition, data is focused in unified data platform.
The business datum manufacturing side includes: input module and output module;
The input module is data acquisition, business datum needed for acquiring in service database automatically;
The output module is data transmission, and acquisition data are converted to message and are sent in data manufacturing side kafka queue.
The business datum consumption terminal includes consumption terminal kafka data middleware and consumption terminal logstash;
The consumption terminal kafka data middleware is used for transmission and receives the business datum message that each operation system is sent;
The consumption terminal logstash is converted into data file and deposits in unification for receiving middleware kafka message, processingData platform in.
Compared to the prior art one kind of the invention is based on logstash+kafka data transmission method, data content is reallyIt is fixed to be realized not by development interface, it can be made simply by the content in the logstash configuration file of the data manufacturing sideIt is fixed;When needing operation expanding, data area increases, it is only necessary to modify configuration file;And with the promotion of data volume, can pass throughThe guarantee of logstash and kafka cluster extended to realize performance;Final data processing format can be needed to configure according to enterpriseDifferent logstash configuration files realizations, can be according to profile name rule, to be placed into server designated position.
Detailed description of the invention
Attached drawing 1 is a kind of flow diagram based on logstash+kafka data transmission method.
Specific embodiment
Embodiment 1:
Configuration device:
One kind being based on logstash+kafka data transmission device, including the business datum manufacturing side and business datum consumption terminal;
The business datum manufacturing side is used for the business datum that capturing service system generates;The business datum manufacturing side packetIt includes: input module and output module;The input module is data acquisition, business needed for acquiring in service database automaticallyData;The output module is data transmission, and acquisition data are converted to message and are sent to data manufacturing side kafka queueIn.
The business datum consumption terminal is used to handle the data of acquisition, data is focused in unified data platform.The business datum consumption terminal includes consumption terminal kafka data middleware and consumption terminal logstash;The consumption terminalKafka data middleware is used for transmission and receives the business datum message that each operation system is sent;The consumption terminalLogstash is converted into data file and deposits in unified data platform for receiving middleware kafka message, processing.
Operating method:
Step 1) passes through business datum manufacturing side logstash, and business datum in automatic collection operation system database is describedThe business datum of acquisition is not needed again by development interface program, and be transmitted to centre by specifying in logstash configuration filePart kafka;
The business datum manufacturing side can dispose more logstash tools;Logstash tool described in every extracts differentThe data of business;The more logstash tools concurrently execute data pick-up work.
Middleware kafka described in step 2 collects the data of each business datum manufacturing side transmission, unified to sendTo business datum consumption terminal logstash;
Business datum consumption terminal logstash described in step 3) receives the message of the middleware kafka, switchs to data textPart, and store to target database.
The middleware kafka clustered deploy(ment) is in multiple servers, equally, business datum consumption terminal logstash according toThe difference for extracting data service range, opens multiple logstash tools, receives different numbers from middleware kafka cluster respectivelyAccording to concurrent processing is data file.
Business datum manufacturing side configuration file sample:
input {
jdbc {
jdbc_connection_string => "jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:db1"
jdbc_user => "user1"
jdbc_password => "upwd1"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_driver_library => "/opt/langchao/logstash-5.6.6/jdbcdrivers/ojdbc6.jar"
jdbc_paging_enabled => "false"
jdbc_page_size => "50000"
statement => "SELECT 'table1' as tablename, id, name, to_char(createdate, 'yyyy-MM-dd HH24:mi:ss') as createdate FROM produce where to_char(createdate, 'yyyy-MM-dd HH24:mi:ss') > to_char(:sql_last_value) orderby createdate"
schedule => "0 0 16 * * *"
type => "test"
record_last_run => true
use_column_value => true
parameters => {"createdate" => "2018-02-28 23:59:59"}
tracking_column => "createdate"
tracking_column_type => "numeric"
last_run_metadata_path => "/opt/langchao/logstash-5.6.6/runcache/tables/table1"
clean_run => false
}
}
output {
if [type]== "test" {
kafka {
codec => json_lines {
charset => "UTF-8"
}
topic_id => "test"
bootstrap_servers => "xxx.xxx.xxx.xxx:9092"
}
}
}
Business datum consumption terminal configuration file sample:
input {
kafka {
codec => json {
charset => "UTF-8"
}
auto_offset_reset => "earliest"
topics => ["test"]
bootstrap_servers => "10.1.80.238:9092"
max_poll_records => "10"
request_timeout_ms => "300000"
session_timeout_ms => "180000"
}
}
filter {
date {
match => ["message","UNIX_MS"]
target => "@timestamp"
}
ruby {
code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)"
}
ruby {
code => "event.set('@timestamp',event.get('timestamp'))"
}
mutate {
remove_field => ["timestamp"]
}
json {
source => "message"
}
}
output {
if [tabname] == "plm_item" {
csv {
codec => plain {
charset => "UTF-8"
}
path => "/opt/ldatas/data/%{category}/%{+YYYYMMdd}/%{tabname}_%{comid}_%{+YYYYMMdd-H}.csv"
dir_mode => 0777
file_mode => 0777
filename_failure => "/opt/ldatas/failures/filefailures.txt"
fields => ["item_id", "item_name"]
csv_options => {"col_sep" => ","}
}
}
}
The technical personnel in the technical field can readily realize the present invention with the above specific embodiments,.But it should manageSolution, the present invention is not limited to above-mentioned several specific embodiments.On the basis of the disclosed embodiments, the technical fieldTechnical staff can arbitrarily combine different technical features, to realize different technical solutions.