- Notifications
You must be signed in to change notification settings - Fork264
medcl/esm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Elasticsearch cross version data migration.
Links:
- Dec 3rd, 2020: [EN] Cross version Elasticsearch data migration with ESM
- Use INFINI Gateway to check the Document-Level differences between two clusters or indices after the migration
- Cross version migration supported
- Overwrite index name
- Copy index settings and mapping
- Support http basic auth
- Support dump index to local file
- Support loading index from local file
- Support http proxy
- Support sliced scroll ( elasticsearch 5.0 +)
- Support run in background
- Generate testing data by randomize the source document id
- Support rename filed name
- Support unify document type name
- Support specify which _source fields to return from source
- Support specify query string query to filter the data source
- Support rename source fields while do bulk indexing
- Support incremental update(add/update/delete changed records) with
--sync
. Notice: it use different implementation, just handle thechanged records, but not as fast as the old way - Load generating with
A 3 nodes cluster(3 * c5d.4xlarge, 16C,32GB,10Gbps)
root@ip-172-31-13-181:/tmp# ./esm -s https://localhost:8000 -d https://localhost:8000 -x logs1kw -y logs122 -m elastic:medcl123 -n elastic:medcl123 -w 40 --sliced_scroll_size=60 -b 5 --buffer_count=2000000 --regenerate_id[12-19 06:31:20] [INF] [main.go:506,main] start data migration..Scroll 10064570 / 10064570 [=================================================] 100.00% 55sBulk 10062602 / 10064570 [==================================================] 99.98% 55s[12-19 06:32:15] [INF] [main.go:537,main] data migration finished.
Migrated 10,000,000 documents within a minute, Nginx log generated from kibana_sample_data_logs.
Before running the esm, please manually prepare the target index with mapping and optimized settings to improve the speed, for example:
PUT your-new-index{ "settings": { "index.translog.durability": "async", "refresh_interval": "-1", "number_of_shards": 10, "number_of_replicas": 0 }}
copy indexindex_name
from192.168.1.x
to192.168.1.y:9200
./bin/esm -s http://192.168.1.x:9200 -d http://192.168.1.y:9200 -x index_name -w=5 -b=10 -c 10000
copy indexsrc_index
from192.168.1.x
to192.168.1.y:9200
and save withdest_index
./bin/esm -s http://localhost:9200 -d http://localhost:9200 -x src_index -y dest_index -w=5 -b=100
use sync feature for incremental update indexsrc_index
from192.168.1.x
to192.168.1.y:9200
./bin/esm --sync -s http://localhost:9200 -d http://localhost:9200 -x src_index -y dest_index
support Basic-Auth
./bin/esm -s http://localhost:9200 -x "src_index" -y "dest_index" -d http://localhost:9201 -n admin:111111
copy settings and override shard size
./bin/esm -s http://localhost:9200 -x "src_index" -y "dest_index" -d http://localhost:9201 -m admin:111111 -c 10000 --shards=50 --copy_settings
copy settings and mapping, recreate target index, add query to source fetch, refresh after migration
./bin/esm -s http://localhost:9200 -x "src_index" -q=query:phone -y "dest_index" -d http://localhost:9201 -c 10000 --shards=5 --copy_settings --copy_mappings --force --refresh
dump elasticsearch documents into local file
./bin/esm -s http://localhost:9200 -x "src_index" -m admin:111111 -c 5000 -q=query:mixer --refresh -o=dump.bin
dump source and target index to local file and compare them, so can find the difference quickly
./bin/esm --sort=_id -s http://localhost:9200 -x "src_index" --truncate_output --skip=_index -o=src.json./bin/esm --sort=_id -s http://localhost:9200 -x "dst_index" --truncate_output --skip=_index -o=dst.jsondiff -W 200 -ry --suppress-common-lines src.json dst.json
loading data from dump files, bulk insert to another es instance
./bin/esm -d http://localhost:9200 -y "dest_index" -n admin:111111 -c 5000 -b 5 --refresh -i=dump.bin
support proxy
./bin/esm -d http://123345.ap-northeast-1.aws.found.io:9200 -y "dest_index" -n admin:111111 -c 5000 -b 1 --refresh -i dump.bin --dest_proxy=http://127.0.0.1:9743
use sliced scroll(only available in elasticsearch v5) to speed scroll, and update shard number
./bin/esm -s=http://192.168.3.206:9200 -d=http://localhost:9200 -n=elastic:changeme -f --copy_settings --copy_mappings -x=bestbuykaggle --sliced_scroll_size=5 --shards=50 --refresh
migrate 5.x to 6.x and unify all the types todoc
./esm -s http://source_es:9200 -x "source_index*" -u "doc" -w 10 -b 10 - -t "10m" -d https://target_es:9200 -m elastic:passwd -n elastic:passwd -c 5000
to migrate version 7.x and you may need to rename_type
to_doc
./esm -s http://localhost:9201 -x "source" -y "target" -d https://localhost:9200 --rename="_type:type,age:myage" -u"_doc"
filter migration with range query
./esm -s https://192.168.3.98:9200 -m elastic:password -o json.out -x kibana_sample_data_ecommerce -q "order_date:[2020-02-01T21:59:02+00:00 TO 2020-03-01T21:59:02+00:00]"
range query, keyword type and escape
./esm -s https://192.168.3.98:9200 -m test:123 -o 1.txt -x test1 -q "@timestamp.keyword:[\"2021-01-17 03:41:20\" TO \"2021-03-17 03:41:20\"]"
generate testing data, ifinput.json
contains 10 documents, the follow command will ingest 100 documents, good for testing
./bin/esm -i input.json -d http://localhost:9201 -y target-index1 --regenerate_id --repeat_times=10
select source fields
./bin/esm -s http://localhost:9201 -x my_index -o dump.json --fields=author,title
rename fields while do bulk indexing
./bin/esm -i dump.json -d http://localhost:9201 -y target-index41 --rename=title:newtitle
user buffer_count to control memory used by ESM, and use gzip to compress network traffic
./esm -s https://localhost:8000 -d https://localhost:8000 -x logs1kw -y logs122 -m elastic:medcl123 -n elastic:medcl123 --regenerate_id -w 20 --sliced_scroll_size=60 -b 5 --buffer_count=1000000 --compress false
https://github.com/medcl/esm/releases
if download version is not fill you environment,you may try to compile it yourself.go
required.
make build
- go version >= 1.7
Usage: esm [OPTIONS]Application Options: -s, --source= source elasticsearch instance, ie: http://localhost:9200 -q, --query= query against source elasticsearch instance, filter data before migrate, ie: name:medcl --sort= sort field when scroll, ie: _id (default: _id) -d, --dest= destination elasticsearch instance, ie: http://localhost:9201 -m, --source_auth= basic auth of source elasticsearch instance, ie: user:pass -n, --dest_auth= basic auth of target elasticsearch instance, ie: user:pass -c, --count= number of documents at a time: ie "size" in the scroll request (10000) --buffer_count= number of buffered documents in memory (100000) -w, --workers= concurrency number for bulk workers (1) -b, --bulk_size= bulk size in MB (5) -t, --time= scroll time (1m) --sliced_scroll_size= size of sliced scroll, to make it work, the size should be > 1 (1) -f, --force delete destination index before copying -a, --all copy indexes starting with . and _ --copy_settings copy index settings from source --copy_mappings copy index mappings from source --shards= set a number of shards on newly created indexes -x, --src_indexes= indexes name to copy,support regex and comma separated list (_all) -y, --dest_index= indexes name to save, allow only one indexname, original indexname will be used if not specified -u, --type_override= override type name --green wait for both hosts cluster status to be green before dump. otherwise yellow is okay -v, --log= setting log level,options:trace,debug,info,warn,error (INFO) -o, --output_file= output documents of source index into local file --truncate_output= truncate before dump to output file -i, --input_file= indexing from local dump file --input_file_type= the data type of input file, options: dump, json_line, json_array, log_line (dump) --source_proxy= set proxy to source http connections, ie: http://127.0.0.1:8080 --dest_proxy= set proxy to target http connections, ie: http://127.0.0.1:8080 --refresh refresh after migration finished --sync= sync will use scroll for both source and target index, compare the data and sync(index/update/delete) --fields= filter source fields(white list), comma separated, ie: col1,col2,col3,... --skip= skip source fields(black list), comma separated, ie: col1,col2,col3,... --rename= rename source fields, comma separated, ie: _type:type, name:myname -l, --logstash_endpoint= target logstash tcp endpoint, ie: 127.0.0.1:5055 --secured_logstash_endpoint target logstash tcp endpoint was secured by TLS --repeat_times= repeat the data from source N times to dest output, use align with parameter regenerate_id to amplify the data size -r, --regenerate_id regenerate id for documents, this will override the exist document id in data source --compress use gzip to compress traffic -p, --sleep= sleep N seconds after finished a bulk request (-1)Help Options: -h, --help Show this help message
- Scroll ID too long, update
elasticsearch.yml
on source cluster.
http.max_header_size: 16khttp.max_initial_line_length: 8k
From | To |
---|---|
1.x | 1.x |
1.x | 2.x |
1.x | 5.x |
1.x | 6.x |
1.x | 7.x |
2.x | 1.x |
2.x | 2.x |
2.x | 5.x |
2.x | 6.x |
2.x | 7.x |
5.x | 1.x |
5.x | 2.x |
5.x | 5.x |
5.x | 6.x |
5.x | 7.x |
6.x | 1.x |
6.x | 2.x |
6.x | 5.0 |
6.x | 6.x |
6.x | 7.x |
7.x | 1.x |
7.x | 2.x |
7.x | 5.x |
7.x | 6.x |
7.x | 7.x |
About
An simple Elasticsearch migration tool.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Languages
- Go99.6%
- Makefile0.4%