- Notifications
You must be signed in to change notification settings - Fork28
Devinterview-io/databases-interview-questions
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
You can also find all 50 answers here 👉Devinterview.io - Databases
ADatabase Management System (DBMS) serves as an intermediary between users and the database. It facilitates data maintenance, retrieval, and ongoing management, using a structured mechanism to ensuredata integrity, security, and efficiency.
- Data Modeling: Organizes data into logically structured tables and relationships.
- Data Manipulation: Allows for CRUD operations (Create, Read, Update, Delete), usually via a query language.
- Data Integrity: Enforces referential integrity and ensures consistent data.
- Security & Access Control: Regulates user permissions to the data.
- Data Concurrency: Manages simultaneous data access by multiple users to avoid conflicts.
- Backup & Recovery: Provides mechanisms for data restoration in case of loss or corruption.
- Data Analysis & Reporting: Often includes tools for query optimization and report generation.
Relational DBMS (RDBMS): Organizes data into tables with predefined structures defined by a schema. SQL (Structured Query Language) is typically used for data operations. Common examples include MySQL, PostgreSQL, and Microsoft SQL Server.
NoSQL DBMS: Evolved as a response to the limitations of RDBMS, designed for unstructured or semi-structured data and horizontal scalability. Examples include MongoDB for document-oriented storage and Redis for key-value stores.
Cloud-Based DBMS: Hosted on cloud platforms, these systems provide flexibility and scalability, requiring minimal maintenance from users. Notable examples are Amazon RDS, Google Cloud Bigtable, and Azure Cosmos DB.
NewSQL DBMS: Combines the benefits of traditional RDBMS with modern needs like scalability. These systems often offer improved performance, designed for big data scenarios. Examples include Google Cloud Spanner and NuoDB.
Object-Oriented DBMS (OODBMS): Designed for managing complex, object-based data models. They provide persistence for objects, disentangling the object structure from a relational schema. ODBMS are less popular in current times, but examples include db4o and ObjectStore.
In-memory DBMS: Maintains data in the system’s memory, enabling rapid access and processing. Examples include Oracle TimesTen and Redis.
Multimodel DBMS: Can handle multiple kinds of databases, such as key-value stores, document stores, and graph databases. This offers a variety of data models in a single system. ArangoDB is an example of such a system.
Graph DBMS: Specialized for dealing with interconnected data elements. They are optimized for operations like traversals and pathfinding. Neo4j is a well-known example of this type of DBMS.
In database management,ACID ensures that transactions are processed in a reliable, standardized manner.
Atomicity ensures that all tasks in a transaction are completed or none are. Databases usetransaction logs to manage atomicity. If a task fails, the transaction log allows the system to recognize the incomplete state and restore to the last known consistent state.
Consider a banking transfer: if the debit is successful but the credit fails, atomicity ensures that the debit is also rolled back.
Consistency requires that a transaction takes the database from one consistent state to another consistent state. It ensures that data integrity rules are not violated. For example, after a transfer, the sum of account balances should remain the same.
Isolation ensures that the operations within concurrent transactions are invisible to each other until they are completed, to protect against potential conflicts and inconsistencies.
Different isolation levels, likeread uncommitted,read committed,repeatable read, andserialize, define the extent to which transactions are isolated from one another.
Durability guarantees that once a transaction is committed, its changes persist, even in the event of a system failure. This is typically achieved through mechanisms such as write-ahead logging and buffer management, which ensure changes are written to disk.
Banking Applications: ACID ensures that monetary transactions are secure and reliable.
Inventory Management: When goods are purchased, inventory levels are updated consistently, without corruption or errors.
Scheduling Systems: Activities, such as booking appointments, either take place entirely or not at all, avoiding messy partial bookings.
E-commerce Order Processing: The entire cycle, from a customer placing an order to its fulfilment, is a cohesive and consistent unit of work.
Messaging Services: For example, ensuring that a message is sent only after it is completely composed, not mid-way.
SQL andNoSQL databases vary in several key aspects. Here, I will elaborate on five of them.
Data Structure: SQL databases are table-based, whereas NoSQL databases can be document, key-value, wide-column, or graph-oriented.
Schema: SQL databases are schema-based. They necessitate database schema, and any deviation requires schema modifications. NoSQL databases are either ad-hoc, making each document consistent with one another, or schema-on-read.
Querying: SQL databases employ Structured Query Language to execute queries. NoSQL databases use methods specific to their data model.
Scalability: SQL databases historically have employed vertical scaling or "scaling up" by increasing CPU power, memory, and storage on a single node. However, recent trends show support for horizontal scaling or "scaling out" across multiple nodes. NoSQL databases, by nature, support horizontal scaling more readily, making them "scale-out" architectures.
ACID: Traditional SQL databases often guarantee ACID (Atomicity, Consistency, Isolation, and Durability) compliance. NoSQL databases, especially in eventual consistency models, might trade off immediate consistency for performance and availability.
Consistency Models: SQL often indicates immediate or strict consistency. NoSQL provides a range of consistency models, from eventual consistency to strong consistency.
Data Relationships and Transactions: SQL databases typically enforce data relationships using methods such as foreign keys and support more complex transactional behavior. NoSQL databases might sacrifice some of these features for greater performance.
Use Cases and Popularity: SQL databases are time-tested, especially in cases that need strict consistency or have clear data relationships. NoSQL databases are popular for their flexibility, particularly in rapidly changing or expansive data needs, and within the realm of big data and other modern computing paradigms.
Relational Integrity: SQL databases pride themselves on ensuring referential integrity in relational data.
Efficiency in Certain Query Workloads: SQL databases are frequently favored in scenarios that involve complex querying or multi-table joins because of their optimizer capabilities. NoSQL excels in certain types of queries, like key-value lookups, due to their data models.
Ease of Horizontal Scaling: NoSQL is often the preferred choice in setups requiring high availability and distributed data.
ARelational Database Schema is a set of rules that define the structure, constraints, and relationships of data tables in aRelational Database Management System (RDBMS).
- Tables: Central data containers often referred to as "relations."
- Defined by column names and data types.
- Columns: Attributes or fields, defined by their name, type, and optional constraints.
- Each column should have a unique name within the table.
- Common data types include integers, strings, dates, and more.
- Rows: Individual data records, consisting of data based on the table's defined columns.
- Each row should have a unique identifier, often referred to as a "primary key".
- Primary Key (PK): A unique identifier for each record within a table. It's often an auto-incrementing integer. Each table should have precisely one primary key. This is Essential designated as “E” in databases.
- Foreign Key (FK): A field or a group of fields in a table that uniquely identifies a row in another table. Commonly a primary key from another table, these help establish relationships between tables. This term is derived as “F” from Candidate Key in databases
- One-to-One: Each record in the first table is related to one and only one record in the second table, and vice versa.
- One-to-Many: A record in the first table can be related to one or more records in the second table, but a record in the second table is related to only one record in the first table.
- Many-to-Many: Records in both tables can be related to multiple records in the other table. This relationship requires a linking table.
- NOT NULL: Specifying that a column must have a value and cannot be left empty. This is Essential designated as “E” in databases.
- UNIQUE: Requires all entries in a column or set of columns to be distinct.
- CHECK: Allows for conditional checks to be applied to column data.
- DEFAULT: Automatically provides a predetermined value when a new column is created.
- INDEX: Optimizes data retrieval by creating an index on the column(s), speeding up relevant queries.
Here is the SQL code:
-- Table for studentsCREATETABLEStudents ( StudentIDINTEGERPRIMARY KEY, NameTEXTNOT NULL, AgeINTEGERCHECK (Age>=18), MajorIDINTEGER,FOREIGN KEY (MajorID)REFERENCES Majors(MajorID));-- Table for coursesCREATETABLECourses ( CourseIDINTEGERPRIMARY KEY, TitleTEXTNOT NULL, CreditsINTEGERCHECK (Credits>=0));-- The table that maps the many-to-many relationship between Students and CoursesCREATETABLEEnrollments ( StudentIDINTEGER, CourseIDINTEGER, EnrollmentDateDATE DEFAULTCURRENT_DATE,PRIMARY KEY (StudentID, CourseID),FOREIGN KEY (StudentID)REFERENCES Students(StudentID),FOREIGN KEY (CourseID)REFERENCES Courses(CourseID));-- The table that holds the list of possible majorsCREATETABLEMajors ( MajorIDINTEGERPRIMARY KEY, NameTEXTNOT NULL UNIQUE);
Aprimary key is a unique identifier for a table. It ensures each data record in a table is distinct and can be efficiently referenced.
- Uniqueness: Every record in a table must have a unique primary key.
- Non-Null Value: A primary key field cannot contain null or undefined values.
- Stability: Changes to primary key values should be rare, ensuring its reliability as an identifier.
- Data Integrity: Ensures accuracy and consistency within the dataset.
- Data Retrieval: Helps in efficient data retrieval and can easily reference related data tables through Foreign Keys.
- Data Normalization: Aids in organizing data across multiple tables, reducing redundancy and improving data integrity.
- Table Relationships: Serves as a basis for establishing multiple types of relationships with other tables (one-to-one, one-to-many, many-to-many).
- Character Limit: 1000 characters in total across all the columns defined for a primary key.
- Restrictions: Text and blob columns are not permitted in primary keys.
- Supported Types: Can use most column types, including integers, strings, and dates.
- Character Limit: 32 characters for the name of the primary key constraint, which defaults to
table_name_pkey
. - Supported Types: Most column types can be defined as primary keys, including UUID.
- Character Limit: 900 bytes.
- Restrictions: Text and image columns are forbidden in primary keys.
- Supported Types: The uniqueidentifier data type is commonly used as a primary key.
- Character Limit: No fixed-length requirement.
- Supported Types: Most column types, including integers, can be used for primary keys.
- Character Limit: The PRIMARY KEY constraint name, combined with the schema name, cannot exceed 30 characters.
- Supported Types: Most column data types can be used, including LOB columns.
Aforeign key (FK) establishes a relationship between two tables in a relational database.
- Column Type: The FK column in the child table must have the same data type as the primary key column in the parent table.
- Referential Integrity: The FK ensures data consistency by either rejecting the change, setting it to NULL, or cascading the changes.
- Joining Tables: Utilizing the foreign key in SQL queries helps combine related data from multiple tables.
JOIN statement is the SQL command central to integrating tables through foreign keys:
- Inner Join: Matches only the records having related entries in both tables.
- Outer Join: Includes records from one table even if there are no related entries in the other table.
Here is the SQL code:
-- 1. Parent Table CreationCREATETABLEauthors ( author_idINTPRIMARY KEY, author_nameVARCHAR(100));-- 2. Child Table Creation with Foreign KeyCREATETABLEbooks ( book_idINTPRIMARY KEY, book_nameVARCHAR(255), author_idINT,FOREIGN KEY (author_id)REFERENCES authors(author_id));-- 3. Data InsertionINSERT INTO authors (author_id, author_name)VALUES (1,'J.K. Rowling');INSERT INTO books (book_id, book_name, author_id)VALUES (1,'Harry Potter',1);-- 4. Select Query for Data RetrievalSELECT book_name, author_nameFROM booksJOIN authorsONbooks.author_id=authors.author_id;
In this code,author_id
in thebooks
table is a foreign key pointing to theauthor_id
in theauthors
table. TheSELECT
statement employs aJOIN to unify the relatedbook
andauthor
data.
- SYS.TYPE_CATALOG uses system views for all three types of keys: primary, foreign, and unique. It displays tables with their corresponding keys.
- INFORMATION_SCHEMA.KEY_COLUMN_USAGE furnishes comprehensive key information, such as the table containing the keys, as well as the cardinality.
For instance, using SQL SERVER:
SELECT TABLE_NAME, COLUMN_NAME, CONSTRAINT_NAMEFROMINFORMATION_SCHEMA.KEY_COLUMN_USAGEWHERE TABLE_NAME='books';
Database normalization is a set of practices that ensuredata integrity by minimizing redundancy and dependency within a relational database.
- Data Consistency: Reduction in redundancy decreases the risk of inconsistent data.
- Improved Maintainability: With a more structured database, maintenance becomes more straightforward.
- Easier Updates: Normalization usually means fewer records to update.
There are generally six levels of normalization, from 0 to 5 or "BCNF".
- Unique Primary Keys: A table should have a unique primary key for each record.
- Atomic Values: Each cell in a table should hold a single value, eliminating multi-valued attributes.
- All requirements of the previous form, as well as:
- Removal ofPartial Dependencies: Non-primary key columns should depend on the entire primary key.
- All requirements of 2NF, as well as:
- Elimination ofTransitive Dependencies: Non-primary key columns should not be dependent on other non-primary key columns.
- A stricter version of 3NF that ensures each determinant is a candidate key.
- Deals with multi-valued dependencies.
- Also called "Projection-Join Normal Form" and deals with join dependencies.
Most relational databases aim for 3NF, as it typically strikes a good balance between performance and data integrity.
However, it's essential to understand the specificrequirements of your database and decide on an optimal normalization level accordingly. For instance, reporting databases might not be heavily normalized to improve query performance.
Denormalization is the process ofreducing normalization (typically for performance reasons) by introducingredundant data into one or more tables. While normalization ensures data integrity,denormalization mainly focuses onimproving query performance by eliminating complex JOIN operations.
- CTEs: With multipleCommon Table Expressions, maintaining benefits of a normalized schema can be achieved in a denormalized setup, potentially eliminating the drawbacks.
- Maintainability: Denormalization can make data harder to manage and keep consistent.
- Query Performance: Queries may become more efficient, as data required from different normalized tables is now available in a single denormalized table.
- Storage Efficiency: Queries that access multiple smaller tables require less disk I/O and can fit into the memory more easily than queries accessing a few larger tables.
- Data Integrity: Redundant data can lead to inconsistencies if not managed properly.
Lookup Tables: Tables with relatively static data (like Status types or Location information) that are frequently joined with other tables in your system, are strong candidates for denormalization. This can makefrequent lookups faster.
Reporting and Analytics: Systems that are skewed towardsread operations over write operations benefit from denormalization. When designing systems for reporting or analytics, it's common practice to have dedicated read replicas that are denormalized for improved read performance.
Partitioning: In large systems, denormalization can be used topartition data across different tables or clusters to distribute load. For example, customer data can be segregated into one cluster, and order data into another, linked via denormalization.
Here is the SQL code:
-- Example of Normalized TablesCustomer Table: | ID | Name | StatusID ||1 | John |2 ||2 | Jane |1 |Status Table:| StatusID | Description ||1 | Pending ||2 | Active |-- Querying the Status using JOINSELECTc.Name,s.DescriptionAS StatusFROM Customer cJOIN Status sONc.StatusID=s.StatusID;-- Denormalized TableCustomer Table (Denormalized):| ID | Name | Status ||1 | John | Active ||2 | Jane | Pending |-- Eliminating JOINs in the QuerySELECT Name, StatusFROM Customer;
Let's look at the three commands -DROP
,DELETE
andTRUNCATE
in terms of their function, operation, speed and recovery.
TheDROP TABLE
statement deletes anentire table and its structure from the database.
DROPTABLE table_name;
- Implicit Commit: The action is immediately committed, and its effects are irreversible.
- Irreversible: Drops the table as a whole; it doesn't delete records row by row.
- Data Loss: Any data in the table is permanently removed.
TheDELETE
statement operates on individual rows in the table and is reversible.
DELETEFROM table_nameWHERE condition;
- Needs a Transaction: Works within a transaction and needs to be explicitly committed to become permanent.
- Precision: Removes specific rows based on the provided condition.
- Recoverability: Data can be undeleted or restored during the transaction if not committed.
TheTRUNCATE TABLE
statement quicklyremoves all rows from a table, providing both speed and simplicity. It doesn't, however, release allocated storage likeDELETE
does.
TRUNCATE TABLE table_name;
- Requires Commit: Like
DELETE
, this statement operates within a transaction and requires explicit commitment to finalize its actions.
- Efficiency: Processes significantly faster compared to
DELETE
. Ideal for removing all records and resetting table states in certain applications. - No Selective Deletion: Unlike
DELETE
, it clears all records in the table without regard for specific conditions. - Transaction and Recovery: Upon execution, the data removal can usually be undone or rolled back until a final commit or termination of the transaction.
Here is the PostgreSQL code:
-- Create a sample tableCREATETABLEemployees ( idSERIALPRIMARY KEY, nameVARCHAR(50), ageINT);-- Insert sample dataINSERT INTO employees (name, age)VALUES ('Alice',25), ('Bob',30), ('Charlie',28);-- Display initial contentSELECT*FROM employees;-- Use DELETE to remove specific rowsDELETEFROM employeesWHERE age>29;-- Changes not yet appliedSELECT*FROM employees;-- Use TRUNCATE to remove all rowsTRUNCATE TABLE employees;-- Changes not yet appliedSELECT*FROM employees;-- Final commitment to make changes permanentCOMMIT;-- Now, the table is emptySELECT*FROM employees;
Full joins andinner joins serve different purposes in database management systems and exhibit distinct behavior patterns.
An inner join finds and combines matching records from both the parent and child tables, based on a specified join condition. Records that don't have a match in either table are excluded from the result.
SELECT column_name(s)FROM table1INNER JOIN table2ONtable1.column_name=table2.column_name;
A full join, also known as afull outer join, returns all records from both tables and fills inNULL values where no match is found in the corresponding table. This type of join is less frequently used compared to the others, as it can potentially generate very large result sets.
SELECT column_name(s)FROM table1FULL OUTERJOIN table2ONtable1.column_name=table2.column_name;
Let's look at two common methods for identifying and handling duplicate records in SQL.
SELECT column1, column2, ..., columnN,COUNT(*)FROM mytableGROUP BY column1, column2, ..., columnNHAVINGCOUNT(*)>1;
This method can be particularly useful in cases where you also want to keep track of the "duplicate" IDs.
WITH duplicatesAS (SELECT*, ROW_NUMBER() OVER (PARTITION BY column1, column2, ..., columnNORDER BY ID)AS rnumFROM mytable)SELECT*FROM duplicatesWHERE rnum>1;
In both of these methods,column1, column2, ..., columnN
refer to the column or set of columns you're using to identify duplicates.
SELECT name, age,COUNT(*)FROM mytableGROUP BY name, ageHAVINGCOUNT(*)>1;
In this example, we're looking forduplicate records based on the "name" and "age" columns from the "mytable".
WITH duplicatesAS (SELECT*, ROW_NUMBER() OVER (PARTITION BY name, ageORDER BY ID)AS rnumFROM mytable)SELECT*FROM duplicatesWHERE rnum>1;
In this example, we're usingROW_NUMBER() to identify records with the same "name" and "age" and keeping track of theunique row number for each.
Prepared Statements reduce the risk of SQL injection by separating SQL data and commands. They also optimize query execution by allowingrepetitive parameter bindings.
- Security: They defend against SQL injection by distinguishing between SQL code and input values.
- Performance: Prepared statements can be faster when used repeatedly, as they are parsed and executed in discrete steps.
- Readability and Maintainability: Separating the SQL code from the parameters makes it more readable. It can make the code easier to understand, review, and maintain.
Input from Untrusted Sources: When SQL queries are constructed with data from untrusted sources, prepared statements ensure the data is treated as literal values, preventing SQL injection.
Repeated Executions: For queries executed multiple times, using a prepared statement can be more efficient than constructing and executing a new query each time. This is especially relevant in loops or high-volume operations.
N+1 Query Problem arises when an object graph is retrieved using a query that results in unnecessary database hits.
For instance, consider aone-to-many relationship where each "Order" has many "LineItems". If you fetch all "Orders" and then individually fetch their "LineItems," it leads to multiple query rounds. These excessive queries are inefficient, especially when the number of related items is high.
Let's say, in a more specific case, you fetch:
- All Orders.
- Then for each Order, fetch its LineItems.
This corresponds to:
- Primary Query:
SELECT * FROM Orders
. - Secondary Query:
SELECT * FROM LineItems WHERE order_id = :order_id
(Potentially executed several times based on order count).
Here you have anN+1 scenario because the second query is executed "N" times, once for each Order, hence N+1.
By employing a JOIN, you can retrieve related entities in a single query.
SQL
SELECT o.*, li.*FROM Orders oJOIN LineItems liONo.id=li.order_id;
ORMORM tools, like Hibernate, can handle this for you. You might not even know under the hood whether the ORM uses JOIN.
Some ORMs are configured to execute extra queries only when a related object is accessed for the first time. This is known aslazy loading. It reduces the initial query size, improving efficiency but might lead to an N+1 problem if multiple related entities are accessed.
Modern ORMs often provide mechanisms forexplicitly defining which related entities to fetch eagerly. This way, you can still benefit from lazy loading while strategically opting for immediate access when needed. In Entity Framework (EF) and Java Persistence API (JPA), for example, you can use annotations like@ManyToOne
and@OneToMany
to control lazy/eager loading behavior.
Instead of loading entire entities, you canselect specific fields needed for immediate operations. This can be beneficial when you aren't interested in all the entity's properties.
SQL
SELECT id, nameFROM Orders;
ORMWith JPA, you can use constructor expressions in JPQL to achieve this:
TypedQuery<OrderSummary>query =em.createQuery("SELECT NEW OrderSummary(o.id, o.name) FROM Order o",OrderSummary.class);
And in Entity Framework, you can utilize LINQ projections:
varorderSummaries=fromorderincontext.OrdersselectnewOrderSummary{Id=order.Id,Name=order.Name};
When dealing with alarge dataset, it's often more reasonable to employpaging rather than fetching everything at once. A SELECT query can be modified withOFFSET
andLIMIT
(in SQL Server and PostgreSQL) or theROW_NUMBER()
function (in SQL Server and Oracle) to achieve this neatly.
- SQL Server and PostgreSQL
SELECT*FROM OrdersORDER BY idOFFSET0 ROWSFETCH NEXT5 ROWS ONLY;
GROUP BY andHAVING transform, filter, and work together with aggregate functions to enable more complex and nuanced result sets.
- Aggregation: Identifies groups and computes aggregate functions (e.g., COUNT, SUM) for each group.
- Redundancy Reduction: Collapses redundant data based on the grouped columns to provide a leaner dataset.
- Column Constraint: Allows queries to select only aggregated columns, and from the source columns, all non-aggregated ones need to be in the
GROUP BY
, ensuring consistent data representation.
- Group-Wise Filtering: Applies conditional checks to grouped data.
- Post-Group Filtering: Allows filtering based on the results of aggregate functions.
- Aggregation Assertion: Validates aggregated values against specific conditions.
- Comparison with WHERE: Enables more advanced, aggregate-aware filtering compared to the global, pre-aggregation filtering provided by
WHERE
.
Consider the following relational schema:
EMPLOYEE (ID, Name, DeptID, Salary)DEPARTMENT (ID, Name)
To retrieve department IDs where the average salary is over a certain threshold:
SELECT DeptIDFROM EMPLOYEEGROUP BY DeptIDHAVINGAVG(Salary)>50000;
Anindex serves as a data structure in a database system. It enhances the efficiency and speed of data lookup operations by logically ordering the indexed data.
- B-Tree (Balanced Tree): Suited for ranges and equality operations, such as using
WHERE
clauses in SQL. - Hash: Ideal for exact-match lookups, like primary keys.
- Full-text: Specifically designed for text searches, often seen in search engines.
- Bitmap: Efficient for low-cardinality columns, where there are few distinct values, like gender.
- Ordered Scans: Possible through B-Tree indexes where data is sorted, aiding range queries.
- Exact-Match Sequencing: For Hash and tree-based indexes, this ensures swift exact-value matches.
- Range Searches: Supported by B-Trees, they enable operations like finding numbers within a range.
- Consider Index Maintenance Overhead: While indexes speed up reads, they might slow down data modifications like
INSERT
,UPDATE
, andDELETE
. - Index Tailing: Place more selective columns first, and follow them with less-discriminatory columns for best results.
- Index Coverage: Aim to cover frequently queried columns, but don't go overboard, leading to index bloat.
Here is the SQL code:
-- B-Tree IndexCREATEINDEXidx_btreeON table_name (column_name);-- Hash IndexCREATEINDEXidx_hashON table_name USING HASH (column_name);
Explore all 50 answers here 👉Devinterview.io - Databases
About
🟣 Databases interview questions and answers to help you prepare for your next software architecture and design patterns interview in 2025.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.