microsoft/documentdbPublic

NotificationsYou must be signed in to change notification settings
Fork104
Star1.8k

DocumentDB is the open-source engine powering vCore-based Azure Cosmos DB for MongoDB. It offers a native implementation of document-oriented NoSQL database, enabling seamless CRUD operations on BSON data types within a PostgreSQL framework.

License

MIT license

1.8k stars 104 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 930 Commits
.devcontainer		.devcontainer
.github		.github
.pipelines		.pipelines
.vscode		.vscode
docs		docs
internal		internal
packaging		packaging
pg_documentdb		pg_documentdb
pg_documentdb_core		pg_documentdb_core
pg_documentdb_gw		pg_documentdb_gw
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
Makefile.cflags		Makefile.cflags
Makefile.docdb		Makefile.docdb
Makefile.global		Makefile.global
Makefile.inc		Makefile.inc
Makefile.versions		Makefile.versions
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
common_header.sql		common_header.sql
documentdb_errors.csv		documentdb_errors.csv
error_mappings.csv		error_mappings.csv
mkdocs.yml		mkdocs.yml
syncpoint_oss.txt		syncpoint_oss.txt

Repository files navigation

Introduction

DocumentDB is the engine powering vCore-based Azure Cosmos DB for MongoDB. It offers a native implementation of document-oriented NoSQL database, enabling seamless CRUD operations on BSON data types within a PostgreSQL framework. Beyond basic operations, DocumentDB empowers you to execute complex workloads, including full-text searches, geospatial queries, and vector embeddings on your dataset, delivering robust functionality and flexibility for diverse data management needs.

PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.

Components

The project comprises of two primary components, which work together to support document operations.

pg_documentdb_core : PostgreSQL extension introducing BSON datatype support and operations for native Postgres.
pg_documentdb : The public API surface for DocumentDB providing CRUD functionality on documents in the store.

Why DocumentDB ?

At DocumentDB, we believe in the power of open-source to drive innovation and collaboration. Our commitment to being a fully open-source document database means that we are dedicated to transparency, community involvement, and continuous improvement. We are open-sourced under the most permissiveMIT license, where developers and organizations alike have no restrictions incorporating the project into new and existing solutions of their own. DocumentDB introduces the BSON data type and provides APIs for seamless operation within native PostgreSQL, enhancing efficiency and aligning with operational advantages.

DocumentDB also provides a powerful on-premise solution, allowing organizations to maintain full control over their data and infrastructure. This flexibility ensures that you can deploy it in your own environment, meeting your specific security, compliance, and performance requirements. With DocumentDB, you get the best of both worlds: the innovation of open-source and the control of on-premise deployment.

Based on Postgres

DocumentDB is built on top of PostgreSQL, one of the most advanced and reliable open-source relational database systems available. We chose PostgreSQL as our base layer for several reasons:

Proven Stability and Performance: PostgreSQL has a long history of stability and performance, making it a trusted choice for mission-critical applications.
Extensibility: Their extensible architecture allows us to integrate a DocumentDB API on BSON data type seamlessly, providing the flexibility to handle both relational and document data.
Active Community: PostgreSQL has a vibrant and active community that continuously contributes to its development, ensuring that it remains at the forefront of database technology.
Advanced Features: PostgreSQL offers a rich set of features, including advanced indexing, full-text search, and powerful querying capabilities, which enhance the functionality of DocumentDB.
Compliance and Security: PostgreSQL's robust security features and compliance with various standards make it an ideal choice for organizations with stringent security and regulatory requirements.

By building on PostgreSQL, DocumentDB leverages these strengths to provide a powerful, flexible, and reliable document database that meets the need of modern applications. DocumentDB will continue to benefit from the advancements brought into the PostgreSQL ecosystem.

Get Started

Pre-requisite

EnsureDocker is installed on your system.

Building DocumentDB with Docker

Step 1: Clone the DocumentDB repo.

git clone https://github.com/microsoft/documentdb.git

Step 2: Create the docker image. Navigate to cloned repo.

docker build. -f .devcontainer/Dockerfile -t documentdb

Note: Validate usingdocker image ls

Step 3: Run the Image as a container

docker run -v$(pwd):/home/documentdb/code -it documentdb /bin/bashcd code

(Aligns local location with docker image created, allows de-duplicating cloning repo again within image).
Note: Validate container is runningdocker container ls

Step 4: Build & Deploy the binaries

make

Note: Run in case of an unsuccessful buildgit config --global --add safe.directory /home/documentdb/code within image.

sudo make install

Note: To run backend postgresql tests after installing you can runmake check.

You are all set to work with DocumentDB.

Using the Prebuilt Docker Image

You can use aprebuilt docker image for DocumentDB instead of building it from source. Follow these steps:

Pull the Prebuilt Image

Pull the prebuilt image directly from the Microsoft Container Registry:

docker pull ghcr.io/microsoft/documentdb/documentdb-oss:PG16-amd64-0.105.0

Running the Prebuilt Image

To run the prebuilt image, use one of the following commands:

Run the container:

docker run -dt ghcr.io/microsoft/documentdb/documentdb-oss:PG16-amd64-0.105.0

If external access is required, run the container with parameter "-e":

docker run -p 127.0.0.1:9712:9712 -dt ghcr.io/microsoft/documentdb/documentdb-oss:PG16-amd64-0.105.0 -e

This will start the container and map port9712 from the container to the host.

Connecting to the Server

Internal Access

Step 1: Runstart_oss_server.sh to initialize the DocumentDB server and manage dependencies.

./scripts/start_oss_server.sh

Or logging into the container if using prebuild image

dockerexec -it<container-id> bash

Step 2: Connect topsql shell

psql -p 9712 -d postgres

External Access

Connect topsql shell

psql -h localhost --port 9712 -d postgres -U documentdb

Usage

Once you have yourDocumentDB set up running, you can start with creating collections, indexes and perform queries on them.

Create a collection

DocumentDB providesdocumentdb_api.create_collection function to create a new collection within a specified database, enabling you to manage and organize your BSON documents effectively.

SELECTdocumentdb_api.create_collection('documentdb','patient');

Perform CRUD operations

Insert documents

Thedocumentdb_api.insert_one command is used to add a single document into a collection.

selectdocumentdb_api.insert_one('documentdb','patient','{ "patient_id": "P001", "name": "Alice Smith", "age": 30, "phone_number": "555-0123", "registration_year": "2023","conditions": ["Diabetes", "Hypertension"]}');selectdocumentdb_api.insert_one('documentdb','patient','{ "patient_id": "P002", "name": "Bob Johnson", "age": 45, "phone_number": "555-0456", "registration_year": "2023", "conditions": ["Asthma"]}');selectdocumentdb_api.insert_one('documentdb','patient','{ "patient_id": "P003", "name": "Charlie Brown", "age": 29, "phone_number": "555-0789", "registration_year": "2024", "conditions": ["Allergy", "Anemia"]}');selectdocumentdb_api.insert_one('documentdb','patient','{ "patient_id": "P004", "name": "Diana Prince", "age": 40, "phone_number": "555-0987", "registration_year": "2024", "conditions": ["Migraine"]}');selectdocumentdb_api.insert_one('documentdb','patient','{ "patient_id": "P005", "name": "Edward Norton", "age": 55, "phone_number": "555-1111", "registration_year": "2025", "conditions": ["Hypertension", "Heart Disease"]}');

Read document from a collection

Thedocumentdb_api.collection function is used for retrieving the documents in a collection.

SELECT documentFROMdocumentdb_api.collection('documentdb','patient');

Alternatively, we can apply filter to our queries.

SET search_path TO documentdb_api, documentdb_core;SETdocumentdb_core.bsonUseEJson TO true;SELECT cursorPageFROMdocumentdb_api.find_cursor_first_page('documentdb','{ "find" : "patient", "filter" : {"patient_id":"P005"}}');

We can perform range queries as well.

SELECT cursorPageFROMdocumentdb_api.find_cursor_first_page('documentdb','{ "find" : "patient", "filter" : { "$and": [{ "age": { "$gte": 10 } },{ "age": { "$lte": 35 } }] }}');

Update document in a collection

DocumentDB uses thedocumentdb_api.update function to modify existing documents within a collection.

The SQL command updates theage for patientP004.

selectdocumentdb_api.update('documentdb','{"update":"patient", "updates":[{"q":{"patient_id":"P004"},"u":{"$set":{"age":14}}}]}');

Similarly, we can update multiple documents usingmulti property.

SELECTdocumentdb_api.update('documentdb','{"update":"patient", "updates":[{"q":{},"u":{"$set":{"age":24}},"multi":true}]}');

Delete document from the collection

DocumentDB uses thedocumentdb_api.delete function for precise document removal based on specified criteria.

The SQL command deletes the document for patientP002.

SELECTdocumentdb_api.delete('documentdb','{"delete": "patient", "deletes": [{"q": {"patient_id": "P002"}, "limit": 1}]}');

Collection management

We can review for the available collections and databases by queryingdocumentdb_api.list_collections_cursor_first_page.

SELECT*FROMdocumentdb_api.list_collections_cursor_first_page('documentdb','{ "listCollections": 1 }');

documentdb_api.list_indexes_cursor_first_page allows reviewing for the existing indexes on a collection. We can find collection_id fromdocumentdb_api.list_collections_cursor_first_page.

SELECTdocumentdb_api.list_indexes_cursor_first_page('documentdb','{"listIndexes": "patient"}');

ttl indexes by default gets scheduled through thepg_cron scheduler, which could be reviewed by querying thecron.job table.

select*fromcron.job;

Indexing

Create an Index

DocumentDB uses thedocumentdb_api.create_indexes_background function, which allows background index creation without disrupting database operations.

The SQL command demonstrates how to create asingle field index onage on thepatient collection of thedocumentdb.

SELECT*FROMdocumentdb_api.create_indexes_background('documentdb','{ "createIndexes": "patient", "indexes": [{ "key": {"age": 1},"name": "idx_age"}]}');

The SQL command demonstrates how to create acompound index on fields age and registration_year on thepatient collection of thedocumentdb.

SELECT*FROMdocumentdb_api.create_indexes_background('documentdb','{ "createIndexes": "patient", "indexes": [{ "key": {"registration_year": 1, "age": 1},"name": "idx_regyr_age"}]}');

Drop an Index

DocumentDB uses thedocumentdb_api.drop_indexes function, which allows you to remove an existing index from a collection. The SQL command demonstrates how to drop the index namedid_ab_1 from thefirst_collection collection of thedocumentdb.

CALLdocumentdb_api.drop_indexes('documentdb','{"dropIndexes": "patient", "index":"idx_age"}');

Perform aggregations`Group by`

DocumentDB provides thedocumentdb_api.aggregate_cursor_first_page function, for performing aggregations over the document store.

The example projects an aggregation on number of patients registered over the years.

SELECT cursorpageFROMdocumentdb_api.aggregate_cursor_first_page('documentdb','{ "aggregate": "patient", "pipeline": [ { "$group": { "_id": "$registration_year", "count_patients": { "$count": {} } } } ] , "cursor": { "batchSize": 3 } }');

We can perform more complex operations, listing below a few more usage examples.The example demonstrates an aggregation on patients, categorizing them into buckets defined by registration_year boundaries.

SELECT cursorpageFROMdocumentdb_api.aggregate_cursor_first_page('documentdb','{ "aggregate": "patient", "pipeline": [ { "$bucket": { "groupBy": "$registration_year", "boundaries": ["2022","2023","2024"], "default": "unknown" } } ], "cursor": { "batchSize": 3 } }');

This query performs an aggregation on thepatient collection to group documents byregistration_year. It collects unique patient conditions for each registration year using the$addToSet operator.

SELECT cursorpageFROMdocumentdb_api.aggregate_cursor_first_page('documentdb','{ "aggregate": "patient", "pipeline": [ { "$group": { "_id": "$registration_year", "conditions": { "$addToSet": { "conditions" : "$conditions" } } } } ], "cursor": { "batchSize": 3 } }');

Join data from multiple collections

Let's create an additional collection namedappointment to demonstrate how a join operation can be performed.

selectdocumentdb_api.insert_one('documentdb','appointment','{"appointment_id": "A001", "patient_id": "P001", "doctor_name": "Dr. Milind", "appointment_date": "2023-01-20", "reason": "Routine checkup" }');selectdocumentdb_api.insert_one('documentdb','appointment','{"appointment_id": "A002", "patient_id": "P001", "doctor_name": "Dr. Moore", "appointment_date": "2023-02-10", "reason": "Follow-up"}');selectdocumentdb_api.insert_one('documentdb','appointment','{"appointment_id": "A004", "patient_id": "P003", "doctor_name": "Dr. Smith", "appointment_date": "2024-03-12", "reason": "Allergy consultation"}');selectdocumentdb_api.insert_one('documentdb','appointment','{"appointment_id": "A005", "patient_id": "P004", "doctor_name": "Dr. Moore", "appointment_date": "2024-04-15", "reason": "Migraine treatment"}');selectdocumentdb_api.insert_one('documentdb','appointment','{"appointment_id": "A007","patient_id": "P001", "doctor_name": "Dr. Milind", "appointment_date": "2024-06-05", "reason": "Blood test"}');selectdocumentdb_api.insert_one('documentdb','appointment','{ "appointment_id": "A009", "patient_id": "P003", "doctor_name": "Dr. Smith","appointment_date": "2025-01-20", "reason": "Follow-up visit"}');

The example presents each patient along with the doctors visited.

SELECT cursorpageFROMdocumentdb_api.aggregate_cursor_first_page('documentdb','{ "aggregate": "patient", "pipeline": [ { "$lookup": { "from": "appointment","localField": "patient_id", "foreignField": "patient_id", "as": "appointment" } },{"$unwind":"$appointment"},{"$project":{"_id":0,"name":1,"appointment.doctor_name":1,"appointment.appointment_date":1}} ], "cursor": { "batchSize": 3 } }');

Community

Please refer to page for contributing to ourRoadmap list.
FerretDB integration allows using DocumentDB as backend engine.

Contributors and users can join theDocumentDB Discord channel in the Microsoft OSS server for quick collaboration.

FAQs

Q1. While performingmake check if you encounter errorFATAL: "/home/documentdb/code/pg_documentdb_core/src/test/regress/tmp/data" has wrong ownership?

Please drop the/home/documentdb/code/pg_documentdb_core/src/test/regress/tmp/ directory and rerun themake check.

About

Resources

Readme

License

MIT license

Code of conduct

Movatterモバイル変換

License

microsoft/documentdb

Folders and files

Latest commit

History

Repository files navigation

Introduction

Components

Why DocumentDB ?

Based on Postgres

Get Started

Pre-requisite

Building DocumentDB with Docker

Using the Prebuilt Docker Image

Pull the Prebuilt Image

Running the Prebuilt Image

Connecting to the Server

Internal Access

External Access

Usage

Create a collection

Perform CRUD operations

Insert documents

Read document from a collection

Update document in a collection

Delete document from the collection

Collection management

Indexing

Create an Index

Drop an Index

Perform aggregationsGroup by

Join data from multiple collections

Community

FAQs

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Uh oh!

Contributors39

Languages

Perform aggregations`Group by`

Packages