medcl/esmPublic

forked fromhoffoo/elasticsearch-dump

NotificationsYou must be signed in to change notification settings
Fork264
Star878

An simple Elasticsearch migration tool.

878 stars 274 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.github		.github
.gitignore		.gitignore
.travis.yml		.travis.yml
Makefile		Makefile
README.md		README.md
buffer.go		buffer.go
domain.go		domain.go
esapi.go		esapi.go
file.go		file.go
go.mod		go.mod
go.sum		go.sum
http.go		http.go
log.go		log.go
logstash_tcp_client.go		logstash_tcp_client.go
main.go		main.go
migrator.go		migrator.go
scroll.go		scroll.go
utils.go		utils.go
v0.go		v0.go
v5.go		v5.go
v6.go		v6.go
v7.go		v7.go
verify.go		verify.go

Repository files navigation

An Elasticsearch Migration Tool

Elasticsearch cross version data migration.

Links:

Dec 3rd, 2020: [EN] Cross version Elasticsearch data migration with ESM
Use INFINI Gateway to check the Document-Level differences between two clusters or indices after the migration

Features:

Cross version migration supported
Overwrite index name
Copy index settings and mapping
Support http basic auth
Support dump index to local file
Support loading index from local file
Support http proxy
Support sliced scroll ( elasticsearch 5.0 +)
Support run in background
Generate testing data by randomize the source document id
Support rename filed name
Support unify document type name
Support specify which _source fields to return from source
Support specify query string query to filter the data source
Support rename source fields while do bulk indexing
Support incremental update(add/update/delete changed records) with--sync. Notice: it use different implementation, just handle thechanged records, but not as fast as the old way
Load generating with

ESM is fast!

A 3 nodes cluster(3 * c5d.4xlarge， 16C，32GB，10Gbps)

root@ip-172-31-13-181:/tmp# ./esm -s https://localhost:8000 -d https://localhost:8000 -x logs1kw -y logs122 -m elastic:medcl123 -n elastic:medcl123 -w 40 --sliced_scroll_size=60 -b 5 --buffer_count=2000000  --regenerate_id[12-19 06:31:20] [INF] [main.go:506,main] start data migration..Scroll 10064570 / 10064570 [=================================================] 100.00% 55sBulk 10062602 / 10064570 [==================================================]  99.98% 55s[12-19 06:32:15] [INF] [main.go:537,main] data migration finished.

Migrated 10,000,000 documents within a minute, Nginx log generated from kibana_sample_data_logs.

Before ESM

Before running the esm, please manually prepare the target index with mapping and optimized settings to improve the speed, for example:

PUT your-new-index{  "settings": {    "index.translog.durability": "async",     "refresh_interval": "-1",     "number_of_shards": 10,    "number_of_replicas": 0  }}

Example:

copy indexindex_name from192.168.1.x to192.168.1.y:9200

./bin/esm  -s http://192.168.1.x:9200   -d http://192.168.1.y:9200 -x index_name  -w=5 -b=10 -c 10000

copy indexsrc_index from192.168.1.x to192.168.1.y:9200 and save withdest_index

./bin/esm -s http://localhost:9200 -d http://localhost:9200 -x src_index -y dest_index -w=5 -b=100

use sync feature for incremental update indexsrc_index from192.168.1.x to192.168.1.y:9200

./bin/esm --sync -s http://localhost:9200 -d http://localhost:9200 -x src_index -y dest_index

support Basic-Auth

./bin/esm -s http://localhost:9200 -x "src_index" -y "dest_index"  -d http://localhost:9201 -n admin:111111

copy settings and override shard size

./bin/esm -s http://localhost:9200 -x "src_index" -y "dest_index"  -d http://localhost:9201 -m admin:111111 -c 10000 --shards=50  --copy_settings

copy settings and mapping, recreate target index, add query to source fetch, refresh after migration

./bin/esm -s http://localhost:9200 -x "src_index" -q=query:phone -y "dest_index"  -d http://localhost:9201  -c 10000 --shards=5  --copy_settings --copy_mappings --force  --refresh

dump elasticsearch documents into local file

./bin/esm -s http://localhost:9200 -x "src_index"  -m admin:111111 -c 5000 -q=query:mixer  --refresh -o=dump.bin

dump source and target index to local file and compare them, so can find the difference quickly

./bin/esm --sort=_id -s http://localhost:9200 -x "src_index" --truncate_output --skip=_index -o=src.json./bin/esm --sort=_id -s http://localhost:9200 -x "dst_index" --truncate_output --skip=_index -o=dst.jsondiff -W 200 -ry --suppress-common-lines src.json dst.json

loading data from dump files, bulk insert to another es instance

./bin/esm -d http://localhost:9200 -y "dest_index"   -n admin:111111 -c 5000 -b 5 --refresh -i=dump.bin

support proxy

 ./bin/esm -d http://123345.ap-northeast-1.aws.found.io:9200 -y "dest_index"   -n admin:111111  -c 5000 -b 1 --refresh  -i dump.bin  --dest_proxy=http://127.0.0.1:9743

use sliced scroll(only available in elasticsearch v5) to speed scroll, and update shard number

 ./bin/esm -s=http://192.168.3.206:9200 -d=http://localhost:9200 -n=elastic:changeme -f --copy_settings --copy_mappings -x=bestbuykaggle  --sliced_scroll_size=5 --shards=50 --refresh

migrate 5.x to 6.x and unify all the types todoc

./esm -s http://source_es:9200 -x "source_index*"  -u "doc" -w 10 -b 10 - -t "10m" -d https://target_es:9200 -m elastic:passwd -n elastic:passwd -c 5000

to migrate version 7.x and you may need to rename_type to_doc

./esm -s http://localhost:9201 -x "source" -y "target"  -d https://localhost:9200 --rename="_type:type,age:myage"  -u"_doc"

filter migration with range query

./esm -s https://192.168.3.98:9200 -m elastic:password -o json.out -x kibana_sample_data_ecommerce -q "order_date:[2020-02-01T21:59:02+00:00 TO 2020-03-01T21:59:02+00:00]"

range query, keyword type and escape

./esm -s https://192.168.3.98:9200 -m test:123 -o 1.txt -x test1  -q "@timestamp.keyword:[\"2021-01-17 03:41:20\" TO \"2021-03-17 03:41:20\"]"

generate testing data, ifinput.json contains 10 documents, the follow command will ingest 100 documents, good for testing

./bin/esm -i input.json -d  http://localhost:9201 -y target-index1  --regenerate_id  --repeat_times=10

select source fields

 ./bin/esm -s http://localhost:9201 -x my_index -o dump.json --fields=author,title

rename fields while do bulk indexing

./bin/esm -i dump.json -d  http://localhost:9201 -y target-index41  --rename=title:newtitle

user buffer_count to control memory used by ESM， and use gzip to compress network traffic

./esm -s https://localhost:8000 -d https://localhost:8000 -x logs1kw -y logs122 -m elastic:medcl123 -n elastic:medcl123 --regenerate_id -w 20 --sliced_scroll_size=60 -b 5 --buffer_count=1000000 --compress false

Download

https://github.com/medcl/esm/releases

Compile:

if download version is not fill you environment,you may try to compile it yourself.go required.

make build

go version >= 1.7

Options

Usage:  esm [OPTIONS]Application Options:  -s, --source=                    source elasticsearch instance, ie: http://localhost:9200  -q, --query=                     query against source elasticsearch instance, filter data before migrate, ie: name:medcl      --sort=                      sort field when scroll, ie: _id (default: _id)  -d, --dest=                      destination elasticsearch instance, ie: http://localhost:9201  -m, --source_auth=               basic auth of source elasticsearch instance, ie: user:pass  -n, --dest_auth=                 basic auth of target elasticsearch instance, ie: user:pass  -c, --count=                     number of documents at a time: ie "size" in the scroll request (10000)      --buffer_count=              number of buffered documents in memory (100000)  -w, --workers=                   concurrency number for bulk workers (1)  -b, --bulk_size=                 bulk size in MB (5)  -t, --time=                      scroll time (1m)      --sliced_scroll_size=        size of sliced scroll, to make it work, the size should be > 1 (1)  -f, --force                      delete destination index before copying  -a, --all                        copy indexes starting with . and _      --copy_settings              copy index settings from source      --copy_mappings              copy index mappings from source      --shards=                    set a number of shards on newly created indexes  -x, --src_indexes=               indexes name to copy,support regex and comma separated list (_all)  -y, --dest_index=                indexes name to save, allow only one indexname, original indexname will be used if not specified  -u, --type_override=             override type name      --green                      wait for both hosts cluster status to be green before dump. otherwise yellow is okay  -v, --log=                       setting log level,options:trace,debug,info,warn,error (INFO)  -o, --output_file=               output documents of source index into local file      --truncate_output=           truncate before dump to output file  -i, --input_file=                indexing from local dump file      --input_file_type=           the data type of input file, options: dump, json_line, json_array, log_line (dump)      --source_proxy=              set proxy to source http connections, ie: http://127.0.0.1:8080      --dest_proxy=                set proxy to target http connections, ie: http://127.0.0.1:8080      --refresh                    refresh after migration finished      --sync=                      sync will use scroll for both source and target index, compare the data and sync(index/update/delete)      --fields=                    filter source fields(white list), comma separated, ie: col1,col2,col3,...      --skip=                      skip source fields(black list), comma separated, ie: col1,col2,col3,...      --rename=                    rename source fields, comma separated, ie: _type:type, name:myname  -l, --logstash_endpoint=         target logstash tcp endpoint, ie: 127.0.0.1:5055      --secured_logstash_endpoint  target logstash tcp endpoint was secured by TLS      --repeat_times=              repeat the data from source N times to dest output, use align with parameter regenerate_id to amplify the data size  -r, --regenerate_id              regenerate id for documents, this will override the exist document id in data source      --compress                   use gzip to compress traffic  -p, --sleep=                     sleep N seconds after finished a bulk request (-1)Help Options:  -h, --help                       Show this help message

FAQ

Scroll ID too long, updateelasticsearch.yml on source cluster.

http.max_header_size: 16khttp.max_initial_line_length: 8k

Versions

From	To
1.x	1.x
1.x	2.x
1.x	5.x
1.x	6.x
1.x	7.x
2.x	1.x
2.x	2.x
2.x	5.x
2.x	6.x
2.x	7.x
5.x	1.x
5.x	2.x
5.x	5.x
5.x	6.x
5.x	7.x
6.x	1.x
6.x	2.x
6.x	5.0
6.x	6.x
6.x	7.x
7.x	1.x
7.x	2.x
7.x	5.x
7.x	6.x
7.x	7.x

About

An simple Elasticsearch migration tool.

Releases19

v0.7.0 Latest

Feb 28, 2024

+ 18 releases

Packages

No packages published

Languages

Go99.6%
Makefile0.4%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

An Elasticsearch Migration Tool

Features:

ESM is fast!

Before ESM

Example:

Download

Compile:

Options

FAQ

Versions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases19

Packages

Uh oh!

Languages

From	To
1.x	1.x
1.x	2.x
1.x	5.x
1.x	6.x
1.x	7.x
2.x	1.x
2.x	2.x
2.x	5.x
2.x	6.x
2.x	7.x
5.x	1.x
5.x	2.x
5.x	5.x
5.x	6.x
5.x	7.x
6.x	1.x
6.x	2.x
6.x	5.0
6.x	6.x
6.x	7.x
7.x	1.x
7.x	2.x
7.x	5.x
7.x	6.x
7.x	7.x

From	To
1.x	1.x
1.x	2.x
1.x	5.x
1.x	6.x
1.x	7.x
2.x	1.x
2.x	2.x
2.x	5.x
2.x	6.x
2.x	7.x
5.x	1.x
5.x	2.x
5.x	5.x
5.x	6.x
5.x	7.x
6.x	1.x
6.x	2.x
6.x	5.0
6.x	6.x
6.x	7.x
7.x	1.x
7.x	2.x
7.x	5.x
7.x	6.x
7.x	7.x

Movatterモバイル変換

medcl/esm

Folders and files

Latest commit

History

Repository files navigation

An Elasticsearch Migration Tool

Features:

ESM is fast!

Before ESM

Example:

Download

Compile:

Options

FAQ

Versions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases19

Packages0

Uh oh!

Languages

Packages

From	To
1.x	1.x
1.x	2.x
1.x	5.x
1.x	6.x
1.x	7.x
2.x	1.x
2.x	2.x
2.x	5.x
2.x	6.x
2.x	7.x
5.x	1.x
5.x	2.x
5.x	5.x
5.x	6.x
5.x	7.x
6.x	1.x
6.x	2.x
6.x	5.0
6.x	6.x
6.x	7.x
7.x	1.x
7.x	2.x
7.x	5.x
7.x	6.x
7.x	7.x