You’ve probably heard of or even worked with relational databases. The row-table format is the most popular and intuitive structure to store information. Unfortunately, you can’t store all the data that comes your way in rows and tables. In fact, so many problems in the real world require non-relational databases. So, are there alternatives?
The answer is YES! There are four types of databases that don’t have rows or tables. They are called NoSQL databases, as you can’t use SQL to query them. They are:
This article focuses on document databases and how to use a server called MongoDB. But before we jump into the technical details, let’s look at the use cases of document databases. You can check out our separate guide tograph databases for more information on them.
One of the main use cases for choosing document databases is when you have data that doesn’t neatly fit into a predefined schema like a table. Many processes or applications in industries store these types of data. Here are some examples:
Take a moment to think about how data collected from these industries would fit into tables. For example, e-commerce platforms would have difficulty storing product catalogs into a predefined schema. Different products have different attributes or, worse, different numbers of attributes. Do you need 10 columns to store 10 physical attributes of drones of 100 different brands or just 5–6 to store book information?
Table-based databases can’t help you in such scenarios. By using document databases such as MongoDB, you gain the following benefits:
Now, let’s take a look at the core concepts around document databases and MongoDB.
Our certification programs help you stand out and prove your skills are job-ready to potential employers.

I’ve been saying document databases over and over up to this point, but what actually are they? Here are the main concepts:
Here is a great table summarizing the differences between relational databases and document databases:
| Feature | Document Databases (e.g., MongoDB) | Relational Databases (e.g., MySQL, PostgreSQL) |
|---|---|---|
| Data Structure | Stores data as documents (e.g., JSON, BSON), allowing for flexible, hierarchical structures. | Stores data in tables with rows and columns, following a predefined schema. |
| Schema Flexibility | Schema-less: Documents can have varying structures, allowing different fields and data types. | Fixed schema: Requires a predefined schema with specific columns and data types. |
| Query Language | Uses MongoDB Query Language (MQL) or similar, which is object-based and more flexible. | Uses SQL (Structured Query Language) for querying structured data. |
| Joins | Avoids joins by embedding related data inside documents (denormalization). | Supports complex joins across tables (normalization). |
| Performance | Faster reads and writes for unstructured or semi-structured data. Avoids overhead from joins. | Strong performance for structured data, but joins can slow down queries. |
| Scalability | Horizontally scalable: Can distribute data across multiple servers using sharding. | Typically vertically scalable: Relies on more powerful hardware, though some support horizontal scaling (e.g., with partitions). |
| Transaction Support | Supports multi-document ACID transactions (from MongoDB 4.0 and above), but was initially designed for non-transactional operations. | Full support for ACID-compliant transactions, providing strong consistency and reliability. |
| Use Cases | Best for unstructured or semi-structured data like user profiles, logs, catalogs, and flexible data structures. | Ideal for structured data with clear relationships, such as financial records or enterprise resource planning (ERP). |
| Data Relationships | Supports embedded data (denormalization), which makes it easy to retrieve related information from a single query. | Relational databases rely on foreign keys to establish relationships across tables (normalization). |
| Indexing | Supports indexing but lacks the variety and sophistication of indexing available in relational databases. | Strong indexing capabilities, supporting multiple index types (e.g., B-tree, hash) for better performance optimization. |
| Consistency | Provides eventual consistency in distributed setups but also offers strong consistency when needed (via ACID transactions). | Ensures strong consistency in most cases due to ACID transactions and relational integrity. |
| Scaling Data Volume | Easily scales to accommodate large amounts of data by adding servers (sharding). | Can scale vertically, but horizontal scaling requires more complex configuration (e.g., partitioning). |
| Data Integrity | Data integrity is managed within each document, but managing relationships between documents can be more challenging. | Strong built-in support for data integrity through primary and foreign keys, as well as constraints like UNIQUE and NOT NULL. |
| Developer Friendliness | Developer-friendly: Flexible data modeling, works well with modern applications (e.g., JSON, REST APIs). | Rigid data modeling but well-understood by developers familiar with SQL and structured data. |
Let’s actually start working with documents in MongoDB!
To query document databases, we need to install the MongoDB server. Here are the platform-specific instructions:
$ sudo apt-get install -y mongodbThen, inside a virtual environment, install the librariespymongo andrequests.pymongo is the official Python adapter for the MongoDB server. We will need therequests library to pull data from an API.
$ pip install pymongo$ pip install requestsThen, from the terminal, start the MongoDB server with the following command:
$ sudo service mongodb startNow, we are ready to load some data into a document database. There are two scenarios when doing so:
We will cover both. First, let’s load a collection nameddrone_races.json locally. Here is the snippet to do so:
import jsonfrom pymongo import MongoClient# Establish connection to MongoDBclient = MongoClient("localhost", 27017)# Create a database named "drones"drones = client["drones"]# Create a collection named "races"races = drones["races"]# Load dataset into MongoDBwith open("data/drone_races.json", "r") as file: data = json.load(file) races.insert_many(data)The two most important objects for us aredrones (a database) andraces (a collection). Most of the functions and methods will be related to collections. Database objects are mostly used for managing collections.
Now, let’s see how to load the same data using an API. I’ve stored the information as an API using a service calledMockaroo. Here is the snippet:
import requestsfrom pymongo import MongoClient# Fetch data from the APIapi_url = ( "https://my.api.mockaroo.com/drone_race_matches.json?key=6f5a6b50")response = requests.get(api_url)if response.status_code == 200: data = response.json() # Get the JSON data from the API # Establish a connection to MongoDB client = MongoClient() # Access or create a specific database drones = client["drones"] # Access or create a specific collection within the database races = drones["races"] # Insert the fetched data into the MongoDB collection races.insert_many(data)else: print("Failed to fetch data from the API.")We’ve loaded some data into theraces collection of thedrones document database, or did we? Let's check by using queries!
We need to count the documents inside to find out if any data exists in a collection. We will usecount_documents method to do so:
>>> races.count_documents({})9040Notice the empty dictionary passed tocount_documents. That dictionary is called a filter in MongoDB. As we go through the tutorial, we will learn how to fill the dictionary to create different kinds of filters. Right now, we have no filter. The above code is the same asSELECT COUNT(*) FROM table_name in SQL.
We’ve got 9040 documents — yay! Now, let’s look at some data.
To look at one document withpymongo, we can use thefind_one method:
from pprint import pprint>>> pprint(races.find_one()){'_id': ObjectId('659d31e9255ec0cf4bab529d'),'laps': 3,'league': 'F1 Drones','location': {'city': 'Ford', 'country': 'United Kingdom', 'date': 'error: invalid date "2024-10-25"', 'venue': 'Manhattan Seas'},'name': 'Honorable','pilots': {'drone': 'DJI3-old', 'finishing_position': 66, 'name': 'Kariotta Cow', 'qualification_time': 27.39, 'team': 'Sky Crusaders', 'telemetry': {'altitude': 34.3, 'battery_voltage': 12.1, 'speed': 68.3, 'timestamp': 'error: invalid date ' '"2024-10-25T14:09:26Z"'}},'sponsors': ['Fat Shark', 'DJI', 'Etisalat'],'weather_conditions': 'snowy'}Take note of the fields (keys) of this document. It stores information about a single drone race and includes information such as:
The document also has a required_id field, which is a unique hash.
count_documents always returns a number, but sometimes, we want to look at the data our query matches. To do so, we can use the big brother offind_one, which isfind:
from pprint import pprintfor race in races.find(): pprint(race) break{'_id': ObjectId('659d31e9255ec0cf4bab529d'),'laps': 3,'league': 'F1 Drones','location': {'city': 'Ford', 'country': 'United Kingdom', 'date': 'error: invalid date "2024-10-25"', 'venue': 'Manhattan Seas'},'name': 'Honorable','pilots': {'drone': 'DJI3-old', 'finishing_position': 66, 'name': 'Kariotta Cow', 'qualification_time': 27.39, 'team': 'Sky Crusaders', 'telemetry': {'altitude': 34.3, 'battery_voltage': 12.1, 'speed': 68.3, 'timestamp': 'error: invalid date ' '"2024-10-25T14:09:26Z"'}},'sponsors': ['Fat Shark', 'DJI', 'Etisalat'],'weather_conditions': 'snowy'}find with an empty query (no arguments) returns documents one-by-one but that's not what we want! We want to perform queries so that we can answer interesting questions about our data. This is where filter documents will prove useful.
Let’s start with the simplest filters — matching documents where some field equals some value. This would be the same as:
SELECT * FROM table_nameWHERE field = valueLet’s do it in MongoDB:
criteria = {"sponsors": "Fat Shark"}fat_shark_races = races.count_documents(criteria)fat_shark_races6194Above, we are choosing the races with "Fat Shark" as their sponsors. The syntax is simply a dictionary that maps thesponsors field to "Fat Shark".
MongoDB query language wouldn’t be a language if it didn’t have some common inequality operators. Here is how to use the “less than” operator:
criteria = {"pilots.qualification_time": {"$lt": 10}}quick_races = races.count_documents(criteria)quick_races3061The above query introduces four new features of the MongoDB query language (MQL):
pilots.qualification_time extracts the nested qualification time within pilots fields.$lt is for the "less than" operator.So, the result of this query tells us that there were 3061 matches where one pilot had less than 10 seconds of qualification time. This query was possible with the $lt operator. Here are its brothers and sisters:
$lte: less than or equal$gt: greater than$gte: greater than or equal.They have the same syntax as $lt.
MQL also includes logical conditional operators such as$and and$or. Let's start with the latter.
We will retrieve races with either the United Kingdom as the location or Etisalat as the sponsor:
criteria = { "$or": [ {"location.country": "United Kingdom"}, {"sponsors": "Etisalat"}, ]}>>> races.count_documents(criteria)6223Again, use theExplain code button for a detailed explanation.
There are 6223 documents matching our criteria. To use anOR logic for multiple values for the same field, we can use the$in operator.
For example, we can check for bad weather conditions the following way:
criteria = { "weather_conditions": {"$in": ["rainy", "snowy", "cloudy"]}}>>> races.count_documents(criteria)5508This query would have been a pain to write with the$or operator. Now, onto$and.
This time, we want to find the races with Australia as the location AND Fat Shark as the sponsor. Here is how we can do it with$and:
criteria = { "$and": [ {"location.country": "Australia"}, {"sponsors": "Fat Shark"}, ]}>>> races.count_documents(criteria)193But in practice, you will rarely use$and as it can be implemented in a much simpler way:
criteria = { "location.country": "Australia", "sponsors": "Fat Shark",}races.count_documents(criteria)193Just add more key-value pairs to the filter document to achieve the AND logical operator.
Finally, there is$nin operator, which checks for non-membership. For example, we can return all matches that weren't held in the United States, United Kingdom, or Australia:
criteria = { "location.country": { "$nin": ["United States", "United Kingdom", "Australia"] }}>>> races.count_documents(criteria)126This only leaves the United Arab Emirates as the country, so the above query could actually be written as:
criteria = {"location.country": "United Arab Emirates"}>>> races.count_documents(criteria)126But, you get the idea.
Checking for null or missing values is a universal operation in all data analysis tasks. As such, there is an operator for that in MongoDB — $exists. Here are two examples that checks whether a certain field exists:
criteria = {"location.district": {"$exists": True}}>>> races.count_documents(criteria)0Hmm, it turns out thedistrict Thelaps field doesn't exist in any of the documents. However, it must exist in all documents as it is a key piece of information about races.
criteria = {"laps": {"$exists": True}}races.count_documents(criteria)9040As expected, all documents have thelaps field. But what about fields that exist but have null value? We can check that too:
criteria = {"pilots.finishing_position": None}races.count_documents(criteria)0By using the built-inNone object in Python, we can check any field's value for missingness.
There are some advanced scenarios that require null or existence checks, as well. For example, you may want to check whether certain elements of some massive nested arrays exist.
To do this, we can use array indexing syntax in MQL. For instance, to find the races with only one sponsor, we need to check whether the second element of thesponsors array exists:
# Counting starts with 0 as alwayscriteria = {"sponsors.1": {"$exists": False}}races.count_documents(criteria)2929And it is as easy as appending the index number of the element to they key. So, in our collection, almost 3000 races were sponsored by only one entity.
This array indexing syntax works for many other operators, not just$exists.
One last thing we are going to cover in the tutorial is projections. Up until this point, our query results included every single field in each document. This is not ideal when your documents have hundreds of fields. Imagine the eye-sore of outputs when you print them!
So, to choose the fields we want returned, we can use projections. Here is how:
criteria = {"pilots.telemetry.speed": {"$gte": 20}}projection = { "sponsors": 1, "location.country": 1, "pilots.telemetry.speed": 1, "pilots.name": 1,}fast_pilots = races.find(criteria, projection)for pilot in fast_pilots: pprint(pilot) breakIn the above case, we are writing our filter criteria as usual but this time, we are defining another document with four fields set to 1. If we pass thisprojection document as the second argument tofind orcount_documents, we will only get the fields set to 1 in the output.
{'_id': ObjectId('659d31e9255ec0cf4bab529d'),'location': {'country': 'United Kingdom'},'pilots': {'name': 'Kariotta Cow', 'telemetry': {'speed': 68.3}},'sponsors': ['Fat Shark', 'DJI', 'Etisalat']}Even though we chose only four fields, the pesky_id field got squeezed in somehow. To suppress this behavior, set it to 0 in theprojection dictionary:
criteria = {"pilots.telemetry.speed": {"$gte": 20}}projection = { "sponsors": 1, "location.country": 1, "pilots.telemetry.speed": 1, "pilots.name": 1, "_id": 0,}fast_pilots = races.find(criteria, projection)for pilot in fast_pilots: pprint(pilot) break{'location': {'country': 'United Kingdom'},'pilots': {'name': 'Kariotta Cow', 'telemetry': {'speed': 68.3}},'sponsors': ['Fat Shark', 'DJI', 'Etisalat']}Now, this is prettier.
Finally, to return all but a few fields, we can them fields to 0:
projection = {"_id": 0, "league": 0, "pilots": 0}# Empty criteria for this oneraces.find_one({}, projection){'name': 'Honorable','location': {'venue': 'Manhattan Seas', 'city': 'Ford', 'country': 'United Kingdom', 'date': 'error: invalid date "2024-10-25"'},'sponsors': ['Fat Shark', 'DJI', 'Etisalat'],'laps': 3,'weather_conditions': 'snowy'}As you can see, this time, we have all the fields but_id,league, andpilots.
This tutorial doesn’t do justice to the massive size of MongoDB as a database management tool. Today, we only coveredGET queries (queries to retrieve information), but MongoDB also allows data specialists to insert, update, or delete information in document databases. We’ve also left out a whole class of queries — aggregations.
All these topics are beyond the scope of the article and requires additional resources to learn them. Why don’t you check these out:
Document databases, such as MongoDB, store data indocuments (often in JSON-like formats), which can contain nested data structures. This differs fromrelational databases, which store data in rows and tables with a fixed schema. Document databases allow for more flexibility as the schema is dynamic, meaning that each document can have different fields and data types. This makes MongoDB suitable for unstructured or semi-structured data, unlike relational databases, which require a predefined schema.
MongoDB is beneficial when working with data that doesn't fit neatly into a tabular structure. Use MongoDB if your data has a flexible schema, if you anticipate frequent changes in data structure, or if you need to handle large volumes of unstructured data. It's also a good choice for applications requiring high-speed read and write operations at scale, such as e-commerce, logging, and content management systems.
MongoDB offers compatibility with a wide range of programming languages, includingPython, Java, JavaScript, Node.js, Go, Ruby, andC#, through official drivers and libraries. The Python librarypymongo is commonly used for interacting with MongoDB in data science applications. MongoDB also integrates well with modern frameworks such asDjango,Flask, andExpress.js.
MongoDB is designed forhorizontal scaling throughsharding, where data is distributed across multiple servers to manage large-scale data efficiently. As your data grows, MongoDB can distribute the load across multiple machines, allowing for better performance and capacity. This makes MongoDB ideal for big data applications or those experiencing rapid growth in data volume.
Yes, MongoDB can handle complex queries, but its query language (MQL, MongoDB Query Language) is quite different from SQL. MongoDB supports filters, projections, logical operators, and aggregations to perform sophisticated queries, allowing you to retrieve, filter, and transform data. However, unlike SQL databases, MongoDB does not supportjoins in the same way, as it is designed to denormalize data into flexible document structures.
MongoDB can be used forreal-time analytics, but its performance largely depends on how the data is structured and indexed. Using MongoDB’s powerful indexing and aggregation framework, you can run real-time queries and generate insights efficiently. However, for more complex analytical tasks, you might consider integrating MongoDB with tools likeApache Spark or using itsaggregation framework to handle large-scale, real-time processing.
MongoDB provides several security features, includingauthentication,authorization (role-based access control),encryption (both in-transit and at-rest), andauditing. MongoDB’sEnterprise Edition offers additional security features likeLDAP integration andKerberos authentication for enterprise-level security. These features help secure sensitive data while maintaining compliance with industry regulations.
Yes, MongoDB supportsACID-compliant transactions, particularly from version 4.0 onward. This allows for multi-document transactions, similar to those in relational databases, ensuringatomicity,consistency,isolation, anddurability for operations involving multiple documents or collections. This makes MongoDB more suitable for scenarios that require transaction guarantees.
WhileJSON is a human-readable format commonly used for representing data,BSON (Binary JSON) is MongoDB’s storage format. BSON allows for more efficient storage and retrieval of data and supports additional data types likedates andbinary data, which JSON does not natively handle. BSON also adds more metadata, which improves performance during document storage and retrieval.

I am a data science content creator with over 2 years of experience and one of the largest followings on Medium. I like to write detailed articles on AI and ML with a bit of a sarcastıc style because you've got to do something to make them a bit less dull. I have produced over 130 articles and a DataCamp course to boot, with another one in the makıng. My content has been seen by over 5 million pairs of eyes, 20k of whom became followers on both Medium and LinkedIn.

Start Your Non-Relational Database Journey on DataCamp Today!
Cours
Cours
Cours
Didacticiel
Didacticiel
Didacticiel

Didacticiel
Didacticiel

Didacticiel