Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Powerful database anonymizer with flexible rules. Written in Rust.

License

NotificationsYou must be signed in to change notification settings

datanymizer/datanymizer

Repository files navigation

datanymizer

Build StatusLicenseRelease VersionCodeCovAudit

Powerful database anonymizer with flexible rules. Written in Rust.

Datanymizer is created &supported by Evrone. See what else wedevelop with Rust.

More information you can find in articles inEnglish andRussian.

How it works

Database -> Dumper (+Faker) -> Dump.sql

You can import or process your dump with supported database without 3rd-party importers.

Datanymizer generates database-native dump.

Installation

There are several ways to installpg_datanymizer, choose a more convenient option for you.

Pre-compiled binary

# Linux / macOS / Windows (MINGW and etc). Installs it into ./bin/ by default$ curl -sSfL https://raw.githubusercontent.com/datanymizer/datanymizer/main/cli/pg_datanymizer/install.sh| sh -s# Or more shorter way$ curl -sSfL https://git.io/pg_datanymizer| sh -s# Specify installation directory and version$ curl -sSfL https://git.io/pg_datanymizer| sudo sh -s -- -b /usr/local/bin v0.2.0# Alpine Linux (wget)$ wget -q -O - https://git.io/pg_datanymizer| sh -s

Homebrew / Linuxbrew

# Installs the latest stable release$ brew install datanymizer/tap/pg_datanymizer# Builds the latest version from the repository$ brew install --HEAD datanymizer/tap/pg_datanymizer

Docker

$ docker run --rm -v`pwd`:/app -w /app datanymizer/pg_datanymizer

Getting started with CLI dumper

First, inspect your database schema, choose fields with sensitive data, and create a config file based on it.

# config.ymltables:  -name:marketsrules:name_translations:template:format:'{"en": "{{_1}}", "ru": "{{_2}}"}'rules:            -words:min:1max:2            -words:min:1max:2  -name:franchiseesrules:operator_mail:template:format:user-{{_1}}-{{_2}}rules:            -random_num:{}            -email:kind:Safeoperator_name:first_name:{}operator_phone:phone:format: +###########name_translations:template:format:'{"en": "{{_1}}", "ru": "{{_2}}"}'rules:            -words:min:2max:3            -words:min:2max:3  -name:usersrules:first_name:first_name:{}last_name:last_name:{}  -name:customersrules:email:template:format:user-{{_1}}-{{_2}}rules:            -random_num:{}            -email:kind:Safeuniq:required:truetry_count:5phone:phone:format:+7##########uniq:truecity:city:{}age:random_num:min:10max:99first_name:first_name:{}last_name:last_name:{}birth_date:datetime:from:1990-01-01T00:00:00+00:00to:2010-12-31T00:00:00+00:00

And then start to make dump from your database instance:

pg_datanymizer -f /tmp/dump.sql -c ./config.yml postgres://postgres:postgres@localhost/test_database

It creates new dump file/tmp/dump.sql with native SQL dump for Postgresql database.You can import fake data from this dump into new Postgresql database with command:

psql -U postgres -d new_database< /tmp/dump.sql

Dumper can stream dump toSTDOUT likepg_dump and you can use it in other pipelines:

pg_datanymizer -c ./config.yml postgres://postgres:postgres@localhost/test_database> /tmp/dump.sql

Additional options

Tables filter

You can specify which tables you choose or ignore for making dump.

For dumping onlypublic.markets andpublic.users data.

# config.yml#...filter:only:    -public.markets    -public.users

For ignoring those tables and dump data from others.

# config.yml#...filter:except:    -public.markets    -public.users

You can also specify data and schema filters separately.

This is equivalent to the previous example.

# config.yml#...filter:data:except:      -public.markets      -public.users

For skipping schema and data from other tables.

# config.yml#...filter:schema:only:      -public.markets      -public.users

For skipping schema formarkets table and dumping data only fromusers table.

# config.yml#...filter:data:only:      -public.usersschema:except:      -public.markets

You can use wildcards in thefilter section:

  • ? matches exactly one occurrence of any character;
  • * matches arbitrary many (including zero) occurrences of any character.

Dump conditions and limit

You can specify conditions (SQLWHERE statement) and limit for dumped data per table:

# config.ymltables:  -name:peoplequery:# don't dump some rowsdump_condition:"last_name <> 'Sensitive'"# select maximum 100 rowslimit:100

Transform conditions and limit

As the additional option, you can specify SQL conditions that define which rows will be transformed (anonymized):

# config.ymltables:  -name:peoplequery:# don't dump some rowsdump_condition:"last_name <> 'Sensitive'"# preserve original values for some rowstransform_condition:"NOT (first_name = 'John' AND last_name = 'Doe')"# select maximum 100 rowslimit:100

You can use thedump_condition,transform_condition andlimit options in any combination (onlytransform_condition;transform_condition andlimit; etc).

Global variables

You can specify global variables available from anytemplate rule.

# config.ymltables:users:bio:template:format:"User bio is {{var_a}}"age:template:format:{{_0 | float * global_multiplicator}}#...globals:var_a:Global variable 1global_multiplicator:6

Available rules

RuleDescription
emailEmails with different options
ipIP addresses. Supports IPv4 and IPv6
wordsLorem words with different length
first_nameFirst name generator
last_nameLast name generator
cityCity names generator
phoneGenerate random phone with differentformat
pipelineUse pipeline to generate more complicated values
capitalizeLike filter, it capitalizes input value
templateTemplate engine for generate random text with included rules
digitRandom digit (in range0..9)
random_numRandom number withmin andmax options
passwordPassword with different
length options (supportmax andmin options)
datetimeMake DateTime strings with options (from andto)
more than 70 rules in total...

For the complete list of rules please referthis document.

Uniqueness

You can specify that result values must be unique (they are not unique by default).You can use short or full syntax.

Short:

uniq:true

Full:

uniq:required:truetry_count:5

Uniqueness is ensured by re-generating values when they are same.You can customize the number of attempts withtry_count (this is an optional field, the default number of triesdepends on the rule).

Currently, uniqueness is supported by:email,ip,phone,random_num.

Locales

You can specify the locale for individual rules:

first_name:locale:RU

The default locale isEN but you can specify a different default locale:

tables:# ........default:locale:RU

We also supportZH_TW (traditional chinese) andRU (translation in progress).

Referencing row values from templates

You can reference values of other row fields in templates.Useprev for original values andfinal - for anonymized:

tables:  -name:some_table# You must specify the order of rule execution when using `final`rule_order:      -greeting      -optionsrules:first_name:first_name:{}greeting:template:# Keeping the first name, but anonymizing the last nameformat:"Hello, {{ prev.first_name }} {{ final.last_name }}!"options:template:# Using the anonymized value againformat:"{greeting:\"{{ final.greeting }}\"}"

You must specify the order of rule execution when usingfinal withrule_order.All rules not listed will be placed at the beginning (i.e. you must list only rules withfinal).

Sharing information between rows

We implemented a built-in key-value store that allows information to be exchanged between anonymized rows.

It is available via the special functions in templates.

Take a look at an example:

tables:  -name:usersrules:name:template:# Save a name to the store as a side effect, the key is `user_names.<USER_ID>`format:"{{ _1 }}{{ store_write(key='user_names.' ~ prev.id, value=_1) }}"rules:            -person_name:{}  -name:user_operationsrules:user_name:template:# Using the saved value againformat:"{{ store_read(key='user_names.' ~ prev.user_id) }}"

Supported databases

  • Postgresql
  • MySQL or MariaDB (TODO)

Documentation

Sponsors

Sponsored by Evrone

License

MIT

Development

Cross compilation

Mac to Linux

rustup target add x86_64-unknown-linux-gnubrew tap messense/macos-cross-toolchainsbrew install x86_64-unknown-linux-gnuCARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc cargo build --target x86_64-unknown-linux-gnu --release --features openssl/vendored

[8]ページ先頭

©2009-2025 Movatter.jp