datanymizer/datanymizerPublic

NotificationsYou must be signed in to change notification settings
Fork29
Star543

Powerful database anonymizer with flexible rules. Written in Rust.

License

MIT license

543 stars 29 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.cargo		.cargo
.github		.github
cli/pg_datanymizer		cli/pg_datanymizer
datanymizer_dumper		datanymizer_dumper
datanymizer_engine		datanymizer_engine
demo		demo
docs		docs
.codecov.yml		.codecov.yml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
logo.png		logo.png
rustfmt.toml		rustfmt.toml

Repository files navigation

[Data]nymizer

Powerful database anonymizer with flexible rules. Written in Rust.

Datanymizer is created &supported by Evrone. See what else wedevelop with Rust.

More information you can find in articles inEnglish andRussian.

How it works

Database -> Dumper (+Faker) -> Dump.sql

You can import or process your dump with supported database without 3rd-party importers.

Datanymizer generates database-native dump.

Installation

There are several ways to installpg_datanymizer, choose a more convenient option for you.

Pre-compiled binary

# Linux / macOS / Windows (MINGW and etc). Installs it into ./bin/ by default$ curl -sSfL https://raw.githubusercontent.com/datanymizer/datanymizer/main/cli/pg_datanymizer/install.sh| sh -s# Or more shorter way$ curl -sSfL https://git.io/pg_datanymizer| sh -s# Specify installation directory and version$ curl -sSfL https://git.io/pg_datanymizer| sudo sh -s -- -b /usr/local/bin v0.2.0# Alpine Linux (wget)$ wget -q -O - https://git.io/pg_datanymizer| sh -s

Homebrew / Linuxbrew

# Installs the latest stable release$ brew install datanymizer/tap/pg_datanymizer# Builds the latest version from the repository$ brew install --HEAD datanymizer/tap/pg_datanymizer

Docker

$ docker run --rm -v`pwd`:/app -w /app datanymizer/pg_datanymizer

Getting started with CLI dumper

First, inspect your database schema, choose fields with sensitive data, and create a config file based on it.

# config.ymltables:  -name:marketsrules:name_translations:template:format:'{"en": "{{_1}}", "ru": "{{_2}}"}'rules:            -words:min:1max:2            -words:min:1max:2  -name:franchiseesrules:operator_mail:template:format:user-{{_1}}-{{_2}}rules:            -random_num:{}            -email:kind:Safeoperator_name:first_name:{}operator_phone:phone:format: +###########name_translations:template:format:'{"en": "{{_1}}", "ru": "{{_2}}"}'rules:            -words:min:2max:3            -words:min:2max:3  -name:usersrules:first_name:first_name:{}last_name:last_name:{}  -name:customersrules:email:template:format:user-{{_1}}-{{_2}}rules:            -random_num:{}            -email:kind:Safeuniq:required:truetry_count:5phone:phone:format:+7##########uniq:truecity:city:{}age:random_num:min:10max:99first_name:first_name:{}last_name:last_name:{}birth_date:datetime:from:1990-01-01T00:00:00+00:00to:2010-12-31T00:00:00+00:00

And then start to make dump from your database instance:

pg_datanymizer -f /tmp/dump.sql -c ./config.yml postgres://postgres:postgres@localhost/test_database

It creates new dump file/tmp/dump.sql with native SQL dump for Postgresql database.You can import fake data from this dump into new Postgresql database with command:

psql -U postgres -d new_database< /tmp/dump.sql

Dumper can stream dump toSTDOUT likepg_dump and you can use it in other pipelines:

pg_datanymizer -c ./config.yml postgres://postgres:postgres@localhost/test_database> /tmp/dump.sql

Additional options

Tables filter

You can specify which tables you choose or ignore for making dump.

For dumping onlypublic.markets andpublic.users data.

# config.yml#...filter:only:    -public.markets    -public.users

For ignoring those tables and dump data from others.

# config.yml#...filter:except:    -public.markets    -public.users

You can also specify data and schema filters separately.

This is equivalent to the previous example.

# config.yml#...filter:data:except:      -public.markets      -public.users

For skipping schema and data from other tables.

# config.yml#...filter:schema:only:      -public.markets      -public.users

For skipping schema formarkets table and dumping data only fromusers table.

# config.yml#...filter:data:only:      -public.usersschema:except:      -public.markets

You can use wildcards in thefilter section:

? matches exactly one occurrence of any character;
* matches arbitrary many (including zero) occurrences of any character.

Dump conditions and limit

You can specify conditions (SQLWHERE statement) and limit for dumped data per table:

# config.ymltables:  -name:peoplequery:# don't dump some rowsdump_condition:"last_name <> 'Sensitive'"# select maximum 100 rowslimit:100

Transform conditions and limit

As the additional option, you can specify SQL conditions that define which rows will be transformed (anonymized):

# config.ymltables:  -name:peoplequery:# don't dump some rowsdump_condition:"last_name <> 'Sensitive'"# preserve original values for some rowstransform_condition:"NOT (first_name = 'John' AND last_name = 'Doe')"# select maximum 100 rowslimit:100

You can use thedump_condition,transform_condition andlimit options in any combination (onlytransform_condition;transform_condition andlimit; etc).

Global variables

You can specify global variables available from anytemplate rule.

# config.ymltables:users:bio:template:format:"User bio is {{var_a}}"age:template:format:{{_0 | float * global_multiplicator}}#...globals:var_a:Global variable 1global_multiplicator:6

Available rules

Rule	Description
`email`	Emails with different options
`ip`	IP addresses. Supports IPv4 and IPv6
`words`	Lorem words with different length
`first_name`	First name generator
`last_name`	Last name generator
`city`	City names generator
`phone`	Generate random phone with different`format`
`pipeline`	Use pipeline to generate more complicated values
`capitalize`	Like filter, it capitalizes input value
`template`	Template engine for generate random text with included rules
`digit`	Random digit (in range`0..9`)
`random_num`	Random number with`min` and`max` options
`password`	Password with different length options (support`max` and`min` options)
`datetime`	Make DateTime strings with options (`from` and`to`)
more than 70 rules in total...

For the complete list of rules please referthis document.

Uniqueness

You can specify that result values must be unique (they are not unique by default).You can use short or full syntax.

Short:

uniq:true

Full:

uniq:required:truetry_count:5

Uniqueness is ensured by re-generating values when they are same.You can customize the number of attempts withtry_count (this is an optional field, the default number of triesdepends on the rule).

Currently, uniqueness is supported by:email,ip,phone,random_num.

Locales

You can specify the locale for individual rules:

first_name:locale:RU

The default locale isEN but you can specify a different default locale:

tables:# ........default:locale:RU

We also supportZH_TW (traditional chinese) andRU (translation in progress).

Referencing row values from templates

You can reference values of other row fields in templates.Useprev for original values andfinal - for anonymized:

tables:  -name:some_table# You must specify the order of rule execution when using `final`rule_order:      -greeting      -optionsrules:first_name:first_name:{}greeting:template:# Keeping the first name, but anonymizing the last nameformat:"Hello, {{ prev.first_name }} {{ final.last_name }}!"options:template:# Using the anonymized value againformat:"{greeting:\"{{ final.greeting }}\"}"

You must specify the order of rule execution when usingfinal withrule_order.All rules not listed will be placed at the beginning (i.e. you must list only rules withfinal).

Sharing information between rows

We implemented a built-in key-value store that allows information to be exchanged between anonymized rows.

It is available via the special functions in templates.

Take a look at an example:

tables:  -name:usersrules:name:template:# Save a name to the store as a side effect, the key is `user_names.<USER_ID>`format:"{{ _1 }}{{ store_write(key='user_names.' ~ prev.id, value=_1) }}"rules:            -person_name:{}  -name:user_operationsrules:user_name:template:# Using the saved value againformat:"{{ store_read(key='user_names.' ~ prev.user_id) }}"

Supported databases

Postgresql
MySQL or MariaDB (TODO)

Documentation

pg_datanymizer CLI application manual.
config.yml file specification.
Full list of transformation rules.
Integration testing manual.

License

MIT

Development

Cross compilation

Mac to Linux

rustup target add x86_64-unknown-linux-gnubrew tap messense/macos-cross-toolchainsbrew install x86_64-unknown-linux-gnuCARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc cargo build --target x86_64-unknown-linux-gnu --release --features openssl/vendored

About

Powerful database anonymizer with flexible rules. Written in Rust.

datanymizer.github.io/docs/

Releases8

v0.7.2 Latest

May 10, 2025

+ 7 releases

Movatterモバイル変換

License

datanymizer/datanymizer

Folders and files

Latest commit

History

Repository files navigation

[Data]nymizer

How it works

Installation

Pre-compiled binary

Homebrew / Linuxbrew

Docker

Getting started with CLI dumper

Additional options

Tables filter

Dump conditions and limit

Transform conditions and limit

Global variables

Available rules

Uniqueness

Locales

Referencing row values from templates

Sharing information between rows

Supported databases

Documentation

Sponsors

License

Development

Cross compilation

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases8

Uh oh!

Contributors11

Uh oh!

Languages