- Notifications
You must be signed in to change notification settings - Fork0
Rugged embedded and client/server key-value database (Python implementation)
License
alttch/yedb-py
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Is it fast?
Fast to read, slow to write
Is it smart?
No
So what is YEDB for?
YEDB is ultra-reliable, thread-safe and very easy to use.
I don't like Python
There are otherimplementations
https://www.youtube.com/watch?v=i3hSWjrNqLo
YEDB is absolutely reliable rugged key-value database, which can survive in anypower loss, unless the OS file system die. Keys data is saved in the veryreliable way and immediately flushed to disk (this can be disabled to speedup the engine but is not recommended - why then YEDB is used for).
YEDB database objects are absolutely thread-safe.
YEDB has built-in tools to automatically repair itself if any keys are broken.
If the tools failed to help, YEDB can be easily repaired by a systemadministrator, using standard Linux tools.
YEDB can automatically validate keys with JSON Schema(https://json-schema.org/)
YEDB has a cool CLI
Practical usage:
Create a database and start writing continuously
Turn the power switch off
Boot the machine again. The typical result: the latest saved key isn'tsurvived, but the database is still valid and working. In 99% of cases, thelatest key can be automatically restored with built-in repair tools.
We created YEDB to use in our embedded products as config registry trees andrugged key-value data storage. We use it a lot and hope you'll like it too.
Note: YEDB is good on SSDs and SD cards. As it immediately syncs all the datawritten, it can work on classic HDDs really slowly.
Modern SSDs give about 200-300 keys/sec written with auto-flush enabled. Thewrite speed can be 10-15 times faster without it, but we would not recommendturning auto-flush off, as it is the key feature of YEDB.
Reading speed varies:
for embedded: 30-40k keys/second (70-100k keys/second when cached).
for UNIX/TCP socket: 7-15k keys/second
for HTTP: 700-800 keys/second. Transport via HTTP is mostly slow because YEDBclient uses synchronous "requests" library (while the default server isasync). To get better results, consider tuning the server manually and usea custom async client.
# install YEDBpip3 install yedb# to use as embedded or client/server - go on. to use CLI - install additional# required librariespip3 install"yedb[cli]"# create a new database and go interactiveyedb /path/to/my/database# set a keyyedbset key1 value1# get the key valueyedb get key1
# Install required system packages# Debian/Ubuntu: apt-get install -y --no-install-recommends python3 python3-dev gcc# RedHat/Fedora/CenOS: yum install -y python3 python3-devel gccsudo mkdir /opt/yedbdcd /opt/yedbd && curl https://raw.githubusercontent.com/alttch/yedb-py/main/setup-server.sh | sudo sh
Use env to specify extra options:
- YEDBD_BIND - override bind to (tcp://host:port,http://host:port or path toUNIX socket)
- YEDBD_SERVICE - system service name
- YEDB_PS - CLI prompt
- PIP_EXTRA_OPTIONS - specify pip extra options
- PYTHON - override Python path
- PIP - override pip path
fromyedbimportYEDBwithYEDB('/path/to/db',auto_repair=True)asdb:# do some stuff# ORdb=YEDB('/path/to/db')db.open()try:# do some stufffinally:db.close()
- If socket transport requested, the built-in in server requires "msgpack"Python module
- If HTTP transport requested, the built-in server requires "aiohttp" Pythonmodule
# listen to tcp://localhost:8870 (default), to bind UNIX socket, specify the# full socket path, to use http transport, specify http://host:port to bind topython3 -m yedb.server /path/to/db
- If socket transport requested, the built-in in client requires "msgpack"Python module
- If HTTP transport requested, the built-in client requires "requests" Pythonmodule
fromyedbimportYEDBwithYEDB('tcp://localhost:8870')asdb:# do some stuff, remember to send all parameters as kwargs
YEDB creates thread-local objects. If the software is using permanent threadsor a thread pool, it is recommended to use sessions to correctly drop theseobjects at the end of the statement:
fromyedbimportYEDBwithYEDB('tcp://localhost:8870')asdb:withdb.session()assession:# do some stuff, remember to send all parameters as kwargssession.key_set(key='key1',value='val1')print(session.key_get(key='key1'))
YEDB uses JSON RPC (https://www.jsonrpc.org/) as the API protocol. Any method,listed in yedb.server.METHODS can be called. Payloads can be packed either withJSON or with MessagePack.
If working via UNIX or TCP socket:
only MessagePack payload encoding is supported
Request/response format: PROTO_VER + DATA_FMT + FRAME_LEN(32-bitlittle-endian) + frame
Where PROTO_VER = protocol version (0x01) and DATA_FMT = data encoding format(0x02 for MessagePack, which is the only protocol supported by thebuilt-in server).
fromyedbimportYEDBwithYEDB('/path/to/db')asdb:withdb.key_as_dict('path/to/keydict)askey:key.set('field','value')# If modified, the key is automatically saved at the end of the statement.
The default engine data format is JSON(https://github.com/python-rapidjson/python-rapidjson is detected and importedautomatically if present)
Other possible formats and their benefits:
YAML - (requires manually installing "pyyaml" Python module) slow, but keyfiles are more human-readable and editable
msgpack - (requires manually installing "msgpack" Python module). Fast,reliable binary serialization format. If used, keys can hold binary values aswell.
cbor - similar to msgpack (requires manually installing "cbor" Python module)
pickle - native Python pickle binary data serialization format. Is slowerthan msgpack/cbor, but keys can hold Python objects and functions as-is.
Databases can be easily converted between formats using "yedb" CLI tool or"convert_fmt" method, unless format-specific features are used (e.g. if keyshave binary data, they can't be converted to JSON properly).
Seehttps://github.com/alttch/yedb
As all keys are serialized values, they can be automatically schema-validatedwith JSON Schema (https://json-schema.org/).
To create the validation schema for the chosen key, or key group, create aspecial key ".schema/path/to", which has to contain the valid JSON Schema.
E.g. the schema, stored in the key ".schema/groups/group1" will be used forvalidating all keys in "groups/group1", including the group primary key. Andthe schema, stored in ".schema/groups/group1/key1" will be used for validating"groups/group1/key1" only (if key or subgroup schema is present, the parentschemas are omitted).
YEDB also supports a non-standard scheme:
{"type" :"code.python" }
which requires the key to have valid Python code, without syntax errors.
If schema validation fails on set or structure "with" statement exit, anexception yedb.SchemaValidationError is raised.
Full backup: simply backup the database directory with any preferred method.
Partial/server backup:
Use "dump_keys" / "load_keys" methods. If dump is created with CLI (requires"msgpack" Python module for that), it has the format:
DUMP\_VER + DUMP\_FMTKEY_LEN(32-bit little-endian) + KEYKEY_LEN(32-bit little-endian) + KEYKEY_LEN(32-bit little-endian) + KEYKEY_LEN(32-bit little-endian) + KEY....KEY_LEN(32-bit little-endian) + KEY
Start client/server with DEBUG=1 env variable:
DEBUG=1 yedb /path/to/db
to debug when embedded, enable debug logging
importyedbyedb.debug=True
After, lower the default logging level.