- Notifications
You must be signed in to change notification settings - Fork29
Powerful database anonymizer with flexible rules. Written in Rust.
License
datanymizer/datanymizer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Powerful database anonymizer with flexible rules. Written in Rust.
Datanymizer is created &supported by Evrone. See what else wedevelop with Rust.
More information you can find in articles inEnglish andRussian.
Database -> Dumper (+Faker) -> Dump.sql
You can import or process your dump with supported database without 3rd-party importers.
Datanymizer generates database-native dump.
There are several ways to installpg_datanymizer
, choose a more convenient option for you.
# Linux / macOS / Windows (MINGW and etc). Installs it into ./bin/ by default$ curl -sSfL https://raw.githubusercontent.com/datanymizer/datanymizer/main/cli/pg_datanymizer/install.sh| sh -s# Or more shorter way$ curl -sSfL https://git.io/pg_datanymizer| sh -s# Specify installation directory and version$ curl -sSfL https://git.io/pg_datanymizer| sudo sh -s -- -b /usr/local/bin v0.2.0# Alpine Linux (wget)$ wget -q -O - https://git.io/pg_datanymizer| sh -s
# Installs the latest stable release$ brew install datanymizer/tap/pg_datanymizer# Builds the latest version from the repository$ brew install --HEAD datanymizer/tap/pg_datanymizer
$ docker run --rm -v`pwd`:/app -w /app datanymizer/pg_datanymizer
First, inspect your database schema, choose fields with sensitive data, and create a config file based on it.
# config.ymltables: -name:marketsrules:name_translations:template:format:'{"en": "{{_1}}", "ru": "{{_2}}"}'rules: -words:min:1max:2 -words:min:1max:2 -name:franchiseesrules:operator_mail:template:format:user-{{_1}}-{{_2}}rules: -random_num:{} -email:kind:Safeoperator_name:first_name:{}operator_phone:phone:format: +###########name_translations:template:format:'{"en": "{{_1}}", "ru": "{{_2}}"}'rules: -words:min:2max:3 -words:min:2max:3 -name:usersrules:first_name:first_name:{}last_name:last_name:{} -name:customersrules:email:template:format:user-{{_1}}-{{_2}}rules: -random_num:{} -email:kind:Safeuniq:required:truetry_count:5phone:phone:format:+7##########uniq:truecity:city:{}age:random_num:min:10max:99first_name:first_name:{}last_name:last_name:{}birth_date:datetime:from:1990-01-01T00:00:00+00:00to:2010-12-31T00:00:00+00:00
And then start to make dump from your database instance:
pg_datanymizer -f /tmp/dump.sql -c ./config.yml postgres://postgres:postgres@localhost/test_database
It creates new dump file/tmp/dump.sql
with native SQL dump for Postgresql database.You can import fake data from this dump into new Postgresql database with command:
psql -U postgres -d new_database< /tmp/dump.sql
Dumper can stream dump toSTDOUT
likepg_dump
and you can use it in other pipelines:
pg_datanymizer -c ./config.yml postgres://postgres:postgres@localhost/test_database> /tmp/dump.sql
You can specify which tables you choose or ignore for making dump.
For dumping onlypublic.markets
andpublic.users
data.
# config.yml#...filter:only: -public.markets -public.users
For ignoring those tables and dump data from others.
# config.yml#...filter:except: -public.markets -public.users
You can also specify data and schema filters separately.
This is equivalent to the previous example.
# config.yml#...filter:data:except: -public.markets -public.users
For skipping schema and data from other tables.
# config.yml#...filter:schema:only: -public.markets -public.users
For skipping schema formarkets
table and dumping data only fromusers
table.
# config.yml#...filter:data:only: -public.usersschema:except: -public.markets
You can use wildcards in thefilter
section:
?
matches exactly one occurrence of any character;*
matches arbitrary many (including zero) occurrences of any character.
You can specify conditions (SQLWHERE
statement) and limit for dumped data per table:
# config.ymltables: -name:peoplequery:# don't dump some rowsdump_condition:"last_name <> 'Sensitive'"# select maximum 100 rowslimit:100
As the additional option, you can specify SQL conditions that define which rows will be transformed (anonymized):
# config.ymltables: -name:peoplequery:# don't dump some rowsdump_condition:"last_name <> 'Sensitive'"# preserve original values for some rowstransform_condition:"NOT (first_name = 'John' AND last_name = 'Doe')"# select maximum 100 rowslimit:100
You can use thedump_condition
,transform_condition
andlimit
options in any combination (onlytransform_condition
;transform_condition
andlimit
; etc).
You can specify global variables available from anytemplate
rule.
# config.ymltables:users:bio:template:format:"User bio is {{var_a}}"age:template:format:{{_0 | float * global_multiplicator}}#...globals:var_a:Global variable 1global_multiplicator:6
Rule | Description |
---|---|
email | Emails with different options |
ip | IP addresses. Supports IPv4 and IPv6 |
words | Lorem words with different length |
first_name | First name generator |
last_name | Last name generator |
city | City names generator |
phone | Generate random phone with differentformat |
pipeline | Use pipeline to generate more complicated values |
capitalize | Like filter, it capitalizes input value |
template | Template engine for generate random text with included rules |
digit | Random digit (in range0..9 ) |
random_num | Random number withmin andmax options |
password | Password with different length options (support max andmin options) |
datetime | Make DateTime strings with options (from andto ) |
more than 70 rules in total... |
For the complete list of rules please referthis document.
You can specify that result values must be unique (they are not unique by default).You can use short or full syntax.
Short:
uniq:true
Full:
uniq:required:truetry_count:5
Uniqueness is ensured by re-generating values when they are same.You can customize the number of attempts withtry_count
(this is an optional field, the default number of triesdepends on the rule).
Currently, uniqueness is supported by:email
,ip
,phone
,random_num
.
You can specify the locale for individual rules:
first_name:locale:RU
The default locale isEN
but you can specify a different default locale:
tables:# ........default:locale:RU
We also supportZH_TW
(traditional chinese) andRU
(translation in progress).
You can reference values of other row fields in templates.Useprev
for original values andfinal
- for anonymized:
tables: -name:some_table# You must specify the order of rule execution when using `final`rule_order: -greeting -optionsrules:first_name:first_name:{}greeting:template:# Keeping the first name, but anonymizing the last nameformat:"Hello, {{ prev.first_name }} {{ final.last_name }}!"options:template:# Using the anonymized value againformat:"{greeting:\"{{ final.greeting }}\"}"
You must specify the order of rule execution when usingfinal
withrule_order
.All rules not listed will be placed at the beginning (i.e. you must list only rules withfinal
).
We implemented a built-in key-value store that allows information to be exchanged between anonymized rows.
It is available via the special functions in templates.
Take a look at an example:
tables: -name:usersrules:name:template:# Save a name to the store as a side effect, the key is `user_names.<USER_ID>`format:"{{ _1 }}{{ store_write(key='user_names.' ~ prev.id, value=_1) }}"rules: -person_name:{} -name:user_operationsrules:user_name:template:# Using the saved value againformat:"{{ store_read(key='user_names.' ~ prev.user_id) }}"
- Postgresql
- MySQL or MariaDB (TODO)
- pg_datanymizer CLI application manual.
- config.yml file specification.
- Full list of transformation rules.
- Integration testing manual.
Mac to Linux
rustup target add x86_64-unknown-linux-gnubrew tap messense/macos-cross-toolchainsbrew install x86_64-unknown-linux-gnuCARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc cargo build --target x86_64-unknown-linux-gnu --release --features openssl/vendored
About
Powerful database anonymizer with flexible rules. Written in Rust.