This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Document-oriented database" – news ·newspapers ·books ·scholar ·JSTOR(June 2025) (Learn how and when to remove this message) |
Adocument-oriented database, ordocument store, is acomputer program and data storage system designed for storing, retrieving and managing document-oriented information, also known assemi-structured data.[1]
Document-oriented databases are one of the main categories ofNoSQL databases, and the popularity of the term "document-oriented database" has grown[2] with the use of the term NoSQL itself.XML databases are a subclass of document-oriented databases that are optimized to work withXML documents.Graph databases are similar, but add another layer, therelationship, which allows them to link documents for rapid traversal.
Document-oriented databases are inherently a subclass of thekey-value store, another NoSQL database concept. The difference[contradictory] lies in the way the data is processed; in a key-value store, the data is considered to be inherently opaque to the database, whereas a document-oriented system relies on internal structure in thedocument in order to extractmetadata that the database engine uses for further optimization. Although the difference is often negligible due to tools in the systems,[a] conceptually the document-store is designed to offer a richer experience with modern programming techniques.
Document databases[b] contrast strongly with the traditionalrelational database (RDB). Relational databases generally store data in separatetables that are defined by the programmer, and a single object may be spread across several tables. Document databases store all information for a given object in a single instance in the database, and every stored object can be different from every other. This eliminates the need forobject-relational mapping while loading data into the database.
The central concept of a document-oriented database is the notion of adocument. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format or encoding.[3][4] Encodings in use includeXML,YAML,JSON, as well as binary forms likeBSON.[5]
Documents in a document store are roughly equivalent to the programming concept of an object. They are not required to adhere to a standard schema, nor will they have all the same sections, slots, parts or keys. Generally, programs using objects have many different types of objects, and those objects often have many optional fields. Every object, even those of the same class, can look very different. Document stores are similar in that they allow different types of documents in a single store, allow the fields within them to be optional, and often allow them to be encoded using different encoding systems. For example, the following is a document, encoded in JSON:
{"firstName":"Bob","lastName":"Smith","address":{"type":"Home","street1":"5 Oak St.","city":"Boys","state":"AR","zip":"32225","country":"US"},"hobby":"sailing","phone":{"type":"Cell","number":"(555)-123-4567"}}
A second document might be encoded in XML as:
<contact><firstname>Bob</firstname><lastname>Smith</lastname><phonetype="Cell">(123)555-0178</phone><phonetype="Work">(890)555-0133</phone><address><type>Home</type><street1>123BackSt.</street1><city>Boys</city><state>AR</state><zip>32225</zip><country>US</country></address></contact>
These two documents share some structural elements with one another, but each also has unique elements. The structure and text and other data inside the document are usually referred to as the document'scontent and may be referenced via retrieval or editing methods, (see below). Unlike a relational database where every record contains the same fields, leaving unused fields empty; there are no empty 'fields' in either document (record) in the above example. This approach allows new information to be added to some records without requiring that every other record in the database share the same structure.
Document databases typically provide for additionalmetadata to be associated with and stored along with the document content. That metadata may be related to facilities the datastore provides for organizing documents, providing security, or other implementation specific features.
The core operations that a document-oriented database supports for documents are similar to other databases, and while the terminology is not perfectly standardized, most practitioners will recognize them asCRUD:
Documents are addressed in the database via a uniquekey that represents that document. This key is a simpleidentifier (or ID), typically astring, aURI, or apath. The key can be used to retrieve the document from the database. Typically the database retains anindex on the key to speed up document retrieval, and in some cases the key is required to create or insert the document into the database.
Another defining characteristic of a document-oriented database is that, beyond the simple key-to-document lookup that can be used to retrieve a document, the database offers an API or query language that allows the user to retrieve documents based on content (or metadata).[3] For example, you may want a query that retrieves all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to another. Likewise, the specific set of indexing options and configuration that are available vary greatly by implementation.
It is here that the document store varies most from the key-value store. In theory, the values in a key-value store are opaque to the store, they are essentially black boxes. They may offer search systems similar to those of a document store, but may have less understanding about the organization of the content. Document stores use the metadata in the document to classify the content, allowing them, for instance, to understand that one series of digits is a phone number, and another is a postal code. This allows them to search on those types of data, for instance, all phone numbers containing 555, which would ignore the zip code 55555.
Document databases typically provide some mechanism for updating or editing the content (or metadata) of a document, either by allowing for replacement of the entire document, or individual structural pieces of the document.
Document database implementations offer a variety of ways of organizing documents, including notions of
Sometimes these organizational notions vary in how much they are logical vs physical, (e.g. on disk or in memory), representations.
A document-oriented database is a specializedkey-value store, which itself is another NoSQL database category. In a simple key-value store, the document content is opaque. A document-oriented database provides APIs or a query/update language that exposes the ability to query or update based on the internal structure in thedocument.[4] This difference may be minor for users that do not need richer query, retrieval, or editing APIs that are typically provided by document databases. Modern key-value stores often include features for working with metadata, blurring the lines between document stores.
Some search engine (akainformation retrieval) systems likeApache Solr andElasticsearch provide enough of the core operations on documents to fit the definition of a document-oriented database.
This section mayrequirecleanup to meet Wikipedia'squality standards. The specific problem is:"Requires cleanup". Please helpimprove this section if you can.(July 2016) (Learn how and when to remove this message) |
In a relational database, data is first categorized into a number of predefined types, andtables are created to hold individual entries, orrecords, of each type. The tables define the data within each record'sfields, meaning that every record in the table has the same overall form. The administrator also defines therelationships between the tables, and selects certain fields that they believe will be most commonly used for searching and definesindexes on them. A key concept in the relational design is that any data that may be repeated is normally placed in its own table, and if these instances are related to each other, a column is selected to group them together, theforeign key. This design is known asdatabase normalization.[6]
For example, an address book application will generally need to store the contact name, an optional image, one or more phone numbers, one or more mailing addresses, and one or more email addresses. In a canonical relational database, tables would be created for each of these rows with predefined fields for each bit of data: the CONTACT table might include FIRST_NAME, LAST_NAME and IMAGE columns, while the PHONE_NUMBER table might include COUNTRY_CODE, AREA_CODE, PHONE_NUMBER and TYPE (home, work, etc.). The PHONE_NUMBER table also contains a foreign key column, "CONTACT_ID", which holds the unique ID number assigned to the contact when it was created. In order to recreate the original contact, the database engine uses the foreign keys to look for the related items across the group of tables and reconstruct the original data.
In contrast, in a document-oriented database there may be no internal structure that maps directly onto the concept of a table, and the fields and relationships generally don't exist as predefined concepts. Instead, all of the data for an object is placed in a single document, and stored in the database as a single entry. In the address book example, the document would contain the contact's name, image, and any contact info, all in a single record. That entry is accessed through its key, which allows the database to retrieve and return the document to the application. No additional work is needed to retrieve the related data; all of this is returned in a single object.
A key difference between the document-oriented and relational models is that the data formats are not predefined in the document case. In most cases, any sort of document can be stored in any database, and those documents can change in type and form at any time. If one wishes to add a COUNTRY_FLAG to a CONTACT, this field can be added to new documents as they are inserted, this will have no effect on the database or the existing documents already stored. To aid retrieval of information from the database, document-oriented systems generally allow the administrator to providehints to the database to look for certain types of information. These work in a similar fashion to indexes in the relational case. Most also offer the ability to add additional metadata outside of the content of the document itself, for instance, tagging entries as being part of an address book, which allows the programmer to retrieve related types of information, like "all the address book entries". This provides functionality similar to a table, but separates the concept (categories of data) from its physical implementation (tables).
In the classic normalized relational model, objects in the database are represented as separate rows of data with no inherent structure beyond that given to them as they are retrieved. This leads to problems when trying to translate programming objects to and from their associated database rows, a problem known asobject-relational impedance mismatch.[7] Document stores more closely, or in some cases directly, map programming objects into the store. These are often marketed using the termNoSQL.
| Name | Publisher | License | Languages supported | Notes | RESTful API |
|---|---|---|---|---|---|
| Aerospike | Aerospike | AGPL andProprietary | C,C#,Java,Scala,Python,Node.js,PHP,Go,Rust,Spring Framework | Aerospike is a flash-optimized and in-memory distributed key value NoSQL database which also supports a document store model.[8] | Yes[9] |
| AllegroGraph | Franz, Inc. | Proprietary | Java,Python,Common Lisp,Ruby,Scala,C#,Perl | The database platform supports document store and graph data models in a single database. SupportsJSON,JSON-LD,RDF, full-text search,ACID,two-phase commit,Multi-Master Replication,Prolog andSPARQL. | Yes[10] |
| ArangoDB | ArangoDB | Business Source Licence | C,C#,Java,Python,Node.js,PHP,Scala,Go,Ruby,Elixir | The database system supports document store as well as key/value and graph data models with one database core and a unified query language AQL (ArangoDB Query Language). | Yes[11] |
| BaseX | BaseX Team | BSD License | Java,XQuery | Support for XML, JSON and binary formats; client-/server based architecture; concurrent structural and full-text searches and updates. | Yes |
| Caché | InterSystems Corporation | Proprietary | Java,C#,Node.js | Commonly used in Health, Business and Government applications. | Yes |
| Cloudant | Cloudant, Inc. | Proprietary | Erlang,Java,Scala, andC | Distributed database service based onBigCouch, the company'sopen source fork of theApache-backedCouchDB project. Uses JSON model. | Yes |
| Clusterpoint Database | Clusterpoint Ltd. | Proprietary with free download | JavaScript,SQL,PHP,C#,Java,Python,Node.js,C,C++, | Distributed document-oriented XML / JSON database platform withACID-complianttransactions;high-availabilitydata replication andsharding; built-infull-text search engine withrelevanceranking; JS/SQLquery language;GIS; Available as pay-per-usedatabase as a service or as an on-premise free software download. | Yes |
| Couchbase Server | Couchbase, Inc. | Business Source Licence | C,C#,Java,Python,Node.js,PHP,SQL,Go,Spring Framework,LINQ | Distributed NoSQL Document Database, JSON model and SQL based Query Language. | Yes[12] |
| CouchDB | Apache Software Foundation | Apache License | Any language that can make HTTP requests | JSON over REST/HTTP withMulti-Version Concurrency Control and limitedACID properties. Usesmap andreduce for views and queries.[13] | Yes[14] |
| CrateDB | Crate.io, Inc. | Apache License | Java | Use familiar SQL syntax for real time distributed queries across a cluster. Based on Lucene / Elasticsearch ecosystem with built-in support for binary objects (BLOBs). | Yes[15] |
| Cosmos DB | Microsoft | Proprietary | C#,Java,Python,Node.js,JavaScript,SQL | Platform-as-a-Service offering, part of theMicrosoft Azure platform. Builds upon and extends the earlier Azure DocumentDB. | Yes |
| DocumentDB | Amazon Web Services | Proprietary online service | various,REST | fully managed MongoDB v3.6-compatible database service | Yes |
| DynamoDB | Amazon Web Services | Proprietary | Java,JavaScript,Node.js,Go,C#.NET,Perl,PHP,Python,Ruby,Rust,Haskell,Erlang,Django, andGrails | fully managed proprietaryNoSQLdatabase service that supportskey–value and document data structures | Yes |
| Elasticsearch | Shay Banon | Dual-licensed underServer Side Public License and Elastic license. | Java | JSON, Search engine. | Yes |
| eXist | eXist | LGPL | XQuery,Java | XML over REST/HTTP, WebDAV, Lucene Fulltext search, binary data support, validation, versioning, clustering, triggers, URL rewriting, collections, ACLS, XQuery Update | Yes[16] |
| Informix | IBM | Proprietary, with no-cost editions[17] | Various (Compatible with MongoDB API) | RDBMS with JSON, replication, sharding and ACID compliance. | Yes |
| Jackrabbit | Apache Foundation | Apache License | Java | Java Content Repository implementation | ? |
| HCL Notes (HCL Domino) | HCL | Proprietary | LotusScript,Java, Notes Formula Language | MultiValue | Yes |
| MarkLogic | MarkLogic Corporation | Proprietary with free developer download | Java,JavaScript,Node.js,XQuery,SPARQL,XSLT,C++ | Distributed document-oriented database for JSON, XML, andRDF triples. Built-infull-text search,ACID transactions,high availability anddisaster recovery, certified security. | Yes |
| MongoDB | MongoDB, Inc | Server Side Public License for the DBMS,Apache 2 License for the client drivers[18] | C,C++,C#,Java,Perl,PHP,Python,Go,Node.js,Ruby,Rust,[19]Scala[20] | Document database with replication and sharding,BSON store (binary formatJSON). | Yes[21][22] |
| MUMPS Database | ? | Proprietary andAGPL[23] | MUMPS | Commonly used in health applications. | ? |
| ObjectDatabase++ | Ekky Software | Proprietary | C++,C#,TScript | Binary Native C++ class structures | ? |
| OpenLink Virtuoso | OpenLink Software | GPLv2 andProprietary | C++,C#,Java,SPARQL | Middleware anddatabase engine hybrid | Yes |
| OrientDB | Orient Technologies | Apache License | Java | JSON over HTTP, SQL support,ACID transactions | Yes |
| Oracle NoSQL Database | Oracle Corp | Apache License andProprietary | C, C#, Java, Python, node.js, Go | Shared nothing, horizontally scalable database with support for schema-less JSON, fixed schema tables, and key/value pairs. Also supports ACID transactions. | Yes |
| Qizx | Qualcomm | Proprietary | REST,Java,XQuery,XSLT,C,C++,Python | Distributed document-orientedXML database with integratedfull-text search; support forJSON, text, and binaries. | Yes |
| RavenDB | RavenDB Ltd. | AGPL, commercial and free | C#,C++,Java,NodeJS,Python,Ruby,PHP andGo | RavenDB is an open-source document-oriented cross-platform database written in C#, developed by RavenDB Ltd. Supported onWindows,Linux,Mac OS,AWS,Azure, andGCP | Yes |
| RedisJSON | Redis | Redis Source Available License (RSAL) | Python | JSON with integratedfull-text search.[24] | Yes |
| RethinkDB | ? | Apache License[25] | C++,Python,JavaScript,Ruby,Java | Distributed document-orientedJSON database with replication and sharding. | No |
| SAP HANA | SAP | Proprietary | SQL-like language | ACID transaction supported,JSON only | Yes |
| Sedna | sedna.org | Apache License | C++,XQuery | XML database | No |
| SimpleDB | Amazon Web Services | Proprietary online service | Erlang | ? | |
| Apache Solr | Apache Software Foundation | Apache License[26] | Java | JSON,CSV,XML, and a few other formats.[27] Search engine. | Yes[28] |
| TerminusDB | TerminusDB | Apache License | Python,Node.js, JavaScript | The database system supports document store as well as graph data models with one database core and a unified,datalog based query language WOQL (Web Object Query Language).[29] | Yes |
Most XML databases are document-oriented databases.
Document-oriented databases, or document stores, are NoSQL databases that store data in the form of documents. Document stores are a type of key-value store: each document has a unique identifier — its key — and the document itself serves as the value.