Devinterview-io/sql-interview-questionsPublic

NotificationsYou must be signed in to change notification settings
Fork22
Star86

🟣 SQL interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.

86 stars 22 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Repository files navigation

100 Must-Know SQL Interview Questions in 2025

You can also find all 100 answers here 👉Devinterview.io - SQL

1. What isSQL and what is it used for?

SQL (Structured Query Language) is a domain-specific, declarative programming language designed for managing relational databases. It is the primary language for tasks like data retrieval, data manipulation, and database administration.

Core Components

DDL (Data Definition Language): Used for defining and modifying the structure of the database.
DML (Data Manipulation Language): Deals with adding, modifying, and removing data in the database.
DCL (Data Control Language): Manages the permissions and access rights of the database.
TCL (Transaction Control Language): Governs the transactional management of the database, such as commits or rollbacks.

Common Database Management Tasks

Data Retrieval and Reporting: Retrieve and analyze data, generate reports, and build dashboards.

Data Manipulation: Insert, update, or delete records from tables. Powerful features like Joins and Subqueries enable complex operations.

Data Integrity: Ensure data conform to predefined rules. Techniques like foreign keys, constraints, and triggers help maintain the integrity of the data.

Data Security: Manage user access permissions and roles.

Data Consistency: Enforce ACID properties (Atomicity, Consistency, Isolation, Durability) in database transactions.

Data Backups and Recovery: Perform database backups and ensure data is restorable in case of loss.

Data Normalization: Design databases for efficient storage and reduce data redundancy.

Indices and Performance Tuning: Optimize queries for faster data retrieval.

Replication and Sharding: Advanced techniques for distributed systems.

Basic SQL Commands

CREATE DATABASE: Used to create a new database.
CREATE TABLE: Defines a new table.
INSERT INTO: Adds a new record into a table.
SELECT: Retrieves data from one or more tables.
UPDATE: Modifies existing records.
DELETE: Removes records from a table.
ALTER TABLE: Modifies an existing table (e.g., adds a new column, renames an existing column, etc.).
DROP TABLE: Deletes a table (along with its data) from the database.
INDEX: Adds an index to a table for better performance.
VIEW: Creates a virtual table that can be used for data retrieval.
TRIGGER: Triggers a specified action when a database event occurs.
PROCEDURE andFUNCTION: Store database logic for reuse and to simplify complex operations.

Code Example: Basic SQL Queries

Here is the SQL code:

-- Create a databaseCREATEDATABASECompany;-- Use Company databaseUSE Company;-- Create tablesCREATETABLEDepartment (    DeptIDINTPRIMARY KEY AUTO_INCREMENT,    DeptNameVARCHAR(50)NOT NULL);CREATETABLEEmployee (    EmpIDINTPRIMARY KEY AUTO_INCREMENT,    EmpNameVARCHAR(100)NOT NULL,    EmpDeptIDINT,FOREIGN KEY (EmpDeptID)REFERENCES Department(DeptID));-- Insert dataINSERT INTO Department (DeptName)VALUES ('Engineering');INSERT INTO Department (DeptName)VALUES ('Sales');INSERT INTO Employee (EmpName, EmpDeptID)VALUES ('John Doe',1);INSERT INTO Employee (EmpName, EmpDeptID)VALUES ('Jane Smith',2);-- Select data from databaseSELECT*FROM Department;SELECT*FROM Employee;-- Perform an inner join to combine data from two tablesSELECTEmployee.EmpID,Employee.EmpName,Department.DeptNameFROM EmployeeJOIN DepartmentONEmployee.EmpDeptID=Department.DeptID;

2. Describe the difference betweenSQL andNoSQL databases.

SQL andNoSQL databases offer different paradigms, each designed to suit various types of data and data handling.

Top-Level Differences

SQL: Primarily designed for structured (structured, semi-structured) data — data conforming to a predefined schema.
NoSQL: Suited for unstructured or semi-structured data that evolves gradually, thereby supporting flexible schemas.
SQL: Employs SQL (Structured Query Language) for data modification and retrieval.
NoSQL: Offers various APIs (like the document and key-value store interfaces) for data operations; the use of structured query languages can vary across different NoSQL implementations.
SQL: Often provides ACID (Atomicity, Consistency, Isolation, Durability) compliance to ensure data integrity.
NoSQL: Databases are oftentimes optimized for high performance and horizontal scalability, with potential trade-offs in consistency.

Common NoSQL Database Types

Document Stores

Example: MongoDB, Couchbase
Key Features: Each record is a self-contained document, typically formatted as JSON. Relationship between documents is established through embedded documents or references.Example: Users and their blog posts could be encapsulated within a single document or linked via document references.

Key-Value Stores

Example: Redis, Amazon DynamoDB
Key Features: Data is stored as a collection of unique keys and their corresponding values. No inherent structure or schema is enforced, providing flexibility in data content.Example: Shopping cart items keyed by a user's ID.

Wide-Column Stores (Column Families)

Example: Apache Cassandra, HBase
Key Features: Data is grouped into column families, akin to tables in traditional databases. Each column family can possess a distinct set of columns, granting a high degree of schema flexibility.Example: User profiles, where certain users might have additional or unique attributes.

Graph Databases

Example: Neo4j, JanusGraph
Key Features: Tailored for data with complex relationships. Data entities are represented as nodes, and relationships between them are visualized as edges.Example: A social media platform could ensure efficient friend connections management.

Data Modeling Differences

SQL: Normalization is employed to minimize data redundancies and update anomalies.
NoSQL: Data is often denormalized, packaging entities together to minimize the need for multiple queries.

Auto-Incrementing IDs

SQL: Often, each entry is assigned a unique auto-incrementing ID.
NoSQL: The generation of unique IDs can be driven by external systems or even specific to individual documents within a collection.

Handling Data Relationships

SQL: Relationships between different tables are established using keys (e.g., primary, foreign).
NoSQL: Relationships are handled either through embedded documents, referencing techniques, or as graph-like structures in dedicated graph databases.

Transaction Support

SQL: Transactions (a series of operations that execute as a single unit) are standard.
NoSQL: The concept and features of transactions can be more varied based on the specific NoSQL implementation.

Data Consistency Levels

SQL: Traditionally ensures strong consistency across the database to maintain data integrity.
NoSQL: Offers various consistency models, ranging from strong consistency to eventual consistency. This flexibility enables performance optimizations in distributed environments.

Scalability

SQL: Typically scales vertically, i.e., by upgrading hardware.
NoSQL: Is often designed to scale horizontally, using commodity hardware across distributed systems.

Data Flexibility

SQL: Enforces a predefined, rigid schema, making it challenging to accommodate evolving data structures.
NoSQL: Supports dynamic, ad-hoc schema updates for maximum flexibility.

Data Integrity & Validation

SQL: Often relies on constraints and strict data types to ensure data integrity and validity.
NoSQL: Places greater emphasis on the application layer to manage data integrity and validation.

3. What are the different types ofSQL commands?

SQL commands fall into four primary categories:Data Query Language (DQL),Data Definition Language (DDL),Data Manipulation Language (DML), andData Control Language (DCL).

Data Query Language (DQL)

These commands focus on querying data within tables.

Keywords and Examples:

SELECT: Retrieve data.
FROM: Identify the source table.
WHERE: Apply filtering conditions.
GROUP BY: Group results based on specified fields.
HAVING: Establish qualifying conditions for grouped data.
ORDER BY: Arrange data based on one or more fields.
LIMIT: Specify result count (sometimes replacesSELECT TOP for certain databases).
JOIN: Bring together related data from multiple tables.

Data Definition Language (DDL)

DDL commands are for managing the structure of the database, including tables and constraints.

Keywords and Examples:

CREATE TABLE: Generate new tables.
ALTER TABLE: Modify existing tables.
- ADD,DROP: Incorporate or remove elements like columns, constraints, or properties.
CREATE INDEX: Establish indexes to improve query performance.
DROP INDEX: Remove existing indexes.
TRUNCATE TABLE: Delete all rows from a table, but the table structure remains intact.
DROP TABLE: Delete tables from the database.

Data Manipulation Language (DML)

These commands are useful for handling data within tables.

Keywords and Examples:

INSERT INTO: Add new rows of data.
- SELECT: Copy data from another table or tables.
UPDATE: Modify existing data in a table.
DELETE: Remove rows of data from a table.

Data Control Language (DCL)

DCL is all about managing the access and permissions to database objects.

Keywords and Examples:

GRANT: Assign permission to specified users or roles for specific database objects.
REVOKE: Withdraw or remove these permissions previously granted.

4. Explain the purpose of theSELECT statement.

TheSELECT statement in SQL is fundamental to data retrieval and manipulation within relational databases. Its primary role is to precisely choose, transform, and organize data per specific business requirements.

Key Components of the SELECT Statement

TheSELECT statement typically comprises the following elements:

SELECT: Identifies the columns or expressions to be included in the result set.
FROM: Specifies the table(s) from which the data should be retrieved.
WHERE: Introduces conditional statements to filter rows based on specific criteria.
GROUP BY: Aggregates data for summary or statistical reporting.
HAVING: Functions likeWHERE, but operates on aggregated data.
ORDER BY: Defines the sort order for result sets.
LIMIT orTOP: Limits the number of rows returned.

Practical Applications of SELECT

The robust design of theSELECT statement empowers data professionals across diverse functions, enabling:

Data Exploration: Gaining insights through filtered views or aggregated summaries.
Data Transformation: Creating new fields via operations such as concatenation or mathematical calculations.
Data Validation: Verifying data against defined criteria.
Data Reporting: Generating formatted outputs for business reporting needs.
Data Consolidation: Bringing together information from multiple tables or databases.
Data Export: Facilitating the transfer of query results to other systems or for data backup.

Beyond these functions, proper utilization of the other components ensures efficiency and consistency working with relational databases.

SELECT Query Example

Here is the SQL code:

SELECTOrders.OrderID,Customers.CustomerName,Orders.OrderDate,OrderDetails.UnitPrice,OrderDetails.Quantity,Products.ProductName,Employees.LastNameFROM     ((OrdersINNER JOIN CustomersONOrders.CustomerID=Customers.CustomerID)INNER JOIN EmployeesONOrders.EmployeeID=Employees.EmployeeID)INNER JOIN OrderDetailsONOrders.OrderID=OrderDetails.OrderID

5. What is the difference betweenWHERE andHAVING clauses?

WHERE andHAVING clauses are both used in SQL queries to filter data, but they operate in distinct ways.

WHERE Clause

TheWHERE clause is primarily used to filter records before they are grouped or aggregated. It's typically employed with non-aggregated fields or raw data.

HAVING Clause

Conversely, theHAVING clause filters dataafter the grouping step, often in conjunction with aggregate functions likeSUM orCOUNT. This makes it useful for setting group-level conditions.

6. Define what aJOIN is in SQL and list its types.

Join operations in SQL are responsible for combining rows from multiple tables, primarily based on related columns that are established using a foreign key relationship.

Thethree common types of joins in SQL are:

Inner Join
Outer Join
- Left Outer Join
- Right Outer Join
- Full Outer Join
Cross Join
Self Join

Inner Join

Inner Join only returns rows where there is a match in both tables for the specified column(s).

Visual Representation:

Table1:        Table2:            Result (Inner Join):A   B         B    C               A    B    C-   -         -    -               -    -    -1  aa         aa   20              1   aa   202  bb         bb   30              2   bb   303  cc         cc   40

SQL Query:

SELECTTable1.A,Table1.B,Table2.CFROM Table1INNER JOIN Table2ONTable1.B=Table2.B;

Outer Join

Outer Joins—whether left, right or full—include all records from one table (the "left" or the "right" table") and matched existing records from the other table. Unmatched records are filled with NULL values for missing columns from the other table.

Left Outer Join

Left Outer Join (or simply Left Join) returns all records from the "left" table and the matched records from the "right" table.

Visual Representation:

Table1:        Table2:            Result (Left Outer Join):A   B         B    C               A    B    C-   -         -    -               -    -    -1  aa         aa   20              1   aa   202  bb         bb   30              2   bb   303  cc         NULL  NULL            3   cc  NULL

SQL Query:

SELECTTable1.A,Table1.B,Table2.CFROM Table1LEFT JOIN Table2ONTable1.B=Table2.B;

Right Outer Join

Right Outer Join (or Right Join) returns all records from the "right" table and the matched records from the "left" table.

Visual Representation:

Table1:        Table2:            Result (Right Outer Join):A   B         B    C               A    B    C-   -         -    -               -    -    -1  aa         aa   20              1   aa   202  bb         bb   30              2   bb   30NULL NULL      cc   40             NULL NULL  40

SQL Query:

SELECTTable1.A,Table1.B,Table2.CFROM Table1RIGHT JOIN Table2ONTable1.B=Table2.B;

Full Outer Join

Full Outer Join (or Full Join) returns all records when there is a match in either the left or the right table.

Visual Representation:

Table1:        Table2:            Result (Full Outer Join):A   B         B    C               A    B    C-   -         -    -               -    -    -1  aa         aa   20              1   aa   202  bb         bb   30              2   bb   303  cc         NULL  NULL            3   cc  NULL                           NULL NULL      cc   40            NULL NULL  40

SQL Query:

SELECT COALESCE(Table1.A,Table2.A)AS A,Table1.B,Table2.CFROM Table1FULLJOIN Table2ONTable1.B=Table2.B;

Cross Join

A Cross Join, also known as a Cartesian Join, produces a result set that is the cartesian product of the two input sets. It will generate every possible combination of rows from both tables.

Visual Representation:

Table1:        Table2:            Result (Cross Join):A   B         C    D               A    B    C    D-   -         -    -               -    -    -    -1  aa        20    X               1   aa   20   X2  bb        30    Y               1   aa   30   Y3  cc        40    Z               1   aa   40   Z                                    2   bb   20   X                                    2   bb   30   Y                                    2   bb   40   Z                                    3   cc   20   X                                    3   cc   30   Y                                    3   cc   40   Z

SQL Query:

SELECT Table1.*, Table2.*FROM Table1CROSS JOIN Table2;

Self Join

A Self Join is when a table is joined with itself. This is used when a table has a relationship with itself, typically when it has a parent-child relationship.

Visual Representation:

Employee:                              Result (Self Join):EmpID   Name       ManagerID       EmpID  Name     ManagerID-       -           -               -       -          -1      John         3               1     John        32      Amy          3               2     Amy         33      Chris       NULL              3    Chris      NULL4      Lisa        2                4     Lisa        25      Mike        2                5     Mike        2

SQL Query:

SELECTE1.EmpID,E1.Name,E1.ManagerIDFROM EmployeeAS E1LEFT JOIN EmployeeAS E2ONE1.ManagerID=E2.EmpID;

7. What is aprimary key in a database?

Aprimary key in a database is a unique identifier for each record in a table.

Key Characteristics

Uniqueness: Each value in the primary key column is unique, distinguishing every record.
Non-Nullity: The primary key cannot be null, ensuring data integrity.
Stability: It generally does not change throughout the record's lifetime, promoting consistency.

Data Integrity Benefits

Entity Distinctness: Enforces that each record in the table represents a unique entity.
Association Control: Helps manage relationships across tables and ensures referential integrity in foreign keys.

Performance Advantages

Efficient Indexing: Primary keys are often auto-indexed, making data retrieval faster.
Optimized Joins: When the primary key links to a foreign key, query performance improves for related tables.

Industry Best Practice

Pick a Natural Key: Whenever possible, choose existing data values that are unique and stable.
Keep It Simple: Single-column primary keys are easier to manage.
Avoid Data in Column Attributes: Using data can lead to bloat, adds complexity, and can be restrictive.
Avoid Data Sensitivity: Decrease potential risks associated with sensitive data by separating it from keys.
Evaluate Multi-Column Keys Carefully: Identify and justify the need for such complexity.

Code Example: Declaring a Primary Key

Here is the SQL code:

CREATETABLEStudents (    student_idINTPRIMARY KEY,    grade_levelINT,    first_nameVARCHAR(50),    last_nameVARCHAR(50));

8. Explain what aforeign key is and how it is used.

Aforeign key (FK) is a column or a set of columns in a table that uniquely identifies a row or a set of rows in another table. It establishes a relationship between two tables, often referred to as theparent table and thechild table.

Key Functions of a Foreign Key

Data Integrity: Assures that each entry in the referencing table has a corresponding record in the referenced table, ensuring the data's accuracy and reliability.
Relationship Mapping: Defines logical connections between tables that can be used to retrieve related data.
Action Propagation: Specify what action should be taken in the child table when a matching record in the parent table is created, updated, or deleted.
Cascade Control: Allows operations like deletion or updates to propagate to related tables, maintaining data consistency.

Foreign Key Constraints

The database ensures the following with foreign key constraints:

Uniqueness: The referencing column or combination of columns in the child table is unique.
Consistency: Each foreign key in the child table either matches a corresponding primary key or unique key in the parent table or contains a null value.

Use Cases and Best Practices

Data Integrity and Consistency: FKs ensure that references between tables are valid and up-to-date. For instance, a sales entry references a valid product ID and a customer ID.
Relationship Representation: FKs depict relationships between tables, such as 'One-to-Many' (e.g., one department in a company can have multiple employees) or 'Many-to-Many' (like in associative entities).
Querying Simplification: They aid in performing joined operations to retrieve related data, abstracting away complex data relationships.

Code Example: Creating a Foreign Key Relationship

Here is the SQL code:

-- Create the parent (referenced) table firstCREATETABLEdepartments (    idINTPRIMARY KEY,    nameVARCHAR(100));-- Add a foreign key reference to the child tableCREATETABLEemployees (    idINTPRIMARY KEY,    nameVARCHAR(100),    department_idINT,FOREIGN KEY (department_id)REFERENCES departments(id));

9. How can you preventSQL injections?

SQL injection occurs when untrusted data is mixed with SQL commands. To prevent these attacks, useparameterized queries and input validation.

Here are specific methods to guard against SQL injection:

Parameterized Queries

Description: Also known as a prepared statement, it separates SQL code from user input, rendering direct command injection impossible.

Code Example:

Java (JDBC):

Stringquery ="SELECT * FROM users WHERE username = ? AND password = ?";PreparedStatementps =con.prepareStatement(query);ps.setString(1,username);ps.setString(2,password);ResultSetrs =ps.executeQuery();

Python (MySQL):

cursor.execute("SELECT * FROM users WHERE username = %s AND password = %s", (username,password))

Benefits:
- Improved security.
- Reliability across different databases.
- No need for manual escaping.

Stored Procedures

Description: Allows the database to pre-compile and store your SQL code, providing a layer of abstraction between user input and database operations.

Code Example:

With MySQL:
- Procedure definition:

CREATE PROCEDURE login(IN p_usernameVARCHAR(50),IN p_passwordVARCHAR(50))BEGINSELECT*FROM usersWHERE username= p_usernameAND password= p_password;END

Calling the procedure:

cursor.callproc('login', (username,password))

Advantages:
- Reduction of code redundancy.
- Allows for granular permissions.
- Can improve performance through query plan caching.

Input Validation

Description: Examine user-supplied data to ensure it meets specific criteria before allowing it in a query.

Code Example:Using regex:

ifnotre.match("^[A-Za-z0-9_-]*$",username):print("Invalid username format")

Drawbacks:
- Not a standalone method for preventing SQL injection.
- Might introduce false positives, limiting the user's input freedom.

Code Filtering

Description: Sanitize incoming data based on its type, like strings or numbers. This approach works best in conjunction with other methods.

Code Example:In Python:

username=re.sub("[^a-zA-Z0-9_-]","",username)

Considerations:
- Still necessitates additional measures for robust security.
- Can restrict legitimate user input.

10. What isnormalization? Explain with examples.

Normalization is a database design method, refining table structures to reduce data redundancy and improve data integrity. It is a multi-step process, divided into five normal forms (1NF, 2NF, 3NF, BCNF, 4NF), each with specific rules.

Normalization in Action

Let's consider a simplistic "Customer Invoices" scenario, starting from an unnormalized state:

Unnormalized Table (0NF)

ID	Name	Invoice No.	Invoice Date	Item No.	Description	Quantity	Unit Price

In this initial state, all data is stored in a single table without structural cohesion. Each record is a mix of customer and invoice information. This can lead to data redundancy and anomalies.

First Normal Form (1NF)

To reach 1NF, ensureall cells are atomic, meaning they hold single values. Make separate tables for related groups of data. In our example, let's separate customer details from invoices and address multiple items on a single invoice.

Customer Details Table

ID	Name

Invoices Table

Invoice No.	Customer_ID	Invoice Date

Items Table

Invoice No.	Item No.	Description	Quantity	Unit Price

Now, each table focuses on specific data, unique to 1NF.

1NF is crucial for efficient database operations, especially for tasks like reporting and maintenance.

Second Normal Form (2NF)

To achieve 2NF, consider the context of a complete data entry.Each non-key column should be dependent on the whole primary key.

In our example, the Items table already satisfies 2NF, as all non-key columns, likeDescription andUnit Price, depend on the entire primary key, formed byInvoice No. andItem No. together.

Third Normal Form (3NF)

For 3NF compliance,there should be no transitive dependencies. Non-key columns should rely only on the primary key.

The Invoices table requires further refinement:

Updated Invoices Table

Invoice No.	Customer_ID	Invoice Date

Here,Customer_ID is the sole attribute associated with the customer.

Practical Implications

Higher normal forms providestronger data integrity but might be harder to maintain during regular data operations.
Consider your specific application needs when determining the target normal form.

Real-World Usage

Many databases aim for 3NF.
In scenarios requiring exhaustive data integrity, 4NF, and sometimes beyond, are appropriate.

Code Example: Implementing 3NF

Here is the SQL code:

-- Create Customer and Invoices TableCREATETABLECustomers (    IDINTPRIMARY KEY,    NameVARCHAR(50));CREATETABLEInvoices (    InvoiceNoINTPRIMARY KEY,    Customer_IDINT,    InvoiceDateDATE,FOREIGN KEY (Customer_ID)REFERENCES Customers(ID));-- Create Items TableCREATETABLEItems (    InvoiceNoINT,    ItemNoINT,    DescriptionVARCHAR(100),    QuantityINT,    UnitPriceDECIMAL(10,2),PRIMARY KEY (InvoiceNo, ItemNo),FOREIGN KEY (InvoiceNo)REFERENCES Invoices(InvoiceNo));

This code demonstrates the specified 3NF structure with distinct tables for Customer, Invoices, and Items, ensuring data integrity during operations.

11. Describe the concept ofdenormalization and when you would use it.

Denormalization involves optimizing database performance by reducing redundancy at the cost of some data integrity.

Common Techniques for Denormalization

Flattening Relationships:
- Combining related tables to minimize joins.
- Example:Order andProduct tables are merged, eliminating the many-to-many relationship.
Aggregating Data:
- Precomputing derived values to minimize costly calculations.
- Example: aSales_Total column in anOrder table.
Adding Additional Redundant Data:
- Replicating data from one table in another to reduce the need for joins.
- Example: TheCustomer andSales tables can both have aCountry column, even though the country is indirectly linked through theCustomer table.

Common Use Cases

Reporting and Analytics:
- Companies often need to run complex reports that span numerous tables.
- Denormalization can flatten these tables, making the reporting process more efficient.
High-Volume Transaction Systems:
- In systems where data consistency can be relaxed momentarily, denormalization can speed up operations.
- It's commonly seen in e-commerce sites where a brief delay in updating the sales figures might be acceptable for faster checkouts and improved user experience.
Read-Mostly Applications:
- Systems that are heavy on data reads and relatively light on writes can benefit from denormalization.
Search- and Query-Intensive Applications:
- For example, search engines often store data in a denormalized format to enhance retrieval speed.
Partitioning Data:
- In distributed systems like Hadoop or NoSQL databases, data is often stored redundantly across multiple nodes for enhanced performance.

Considerations and Trade-offs

Performance vs. Consistency:
- Denormalization can boost performance but at the expense of data consistency.
Maintenance Challenges:
- Redundant data must be managed consistently, which can pose challenges.
Operational Simplicity:
- Sometimes, having a simple, denormalized structure can outweigh the benefits of granularity and normalization.
Query Flexibility:
- A normalized structure can be more flexible for ad-hoc queries and schema changes. Denormalized structures might require more effort to adapt to such changes.

12. What areindexes and how can they improve query performance?

Indexes are essential in SQL to accelerate queries by providing quick data lookups.

How Do Indexes Improve Performance?

Faster Data Retrieval: Think of an index like a book's table of contents, which leads you right to the desired section.
Sorted Data Access: With data logically ordered, lookups are more efficient.
Reduces Disk I/O: Queries may read fewer data pages when using an index.
Enhances Joins: Indexes help optimize join conditions, particularly in larger tables.
Aggregates and Uniques: They can swiftly resolve aggregate functions and enforce data uniqueness.

Index Types

B-Tree: Standard for most databases, arranges data in a balanced tree structure.
Hash: Direct lookup based on a hash of the indexed column.
Bitmap: Best used for columns with a low cardinality.
R-Tree: Optimized for spatial data, such as maps.

Different databases may offer additional specialized index types.

When to Use Carefully

Excessive or unnecessary indexing can:

Consume Resources: Indexes require disk space and upkeep during data modifications.
Slow Down Writes: Each write operation might trigger updates to associated indexes.

Best Practices

Appropriate Index Count: Identify crucial columns and refrain from over-indexing.
Monitor and Refactor: Regularly assess index performance and refine or remove redundant ones.
Consistency: Ensure all queries access data in a consistent manner to take full advantage of indexes.
Data Type Consideration: Certain data types are better suited for indexing than others.

Types of Keys

Primary Key: Uniquely identifies each record in a table.
Foreign Key: Establishes a link between tables, enforcing referential integrity.
Compound Key: Combines two or more columns to form a unique identifier.

13. Explain the purpose of theGROUP BY clause.

TheGROUP BY clause in SQL serves to consolidate data and perform operations across groups of records.

Key Functions

Data Aggregation: Collapses rows into summary data.
Filtering: Provides filtering criteria for groups.
Calculated Fields: Allows computation on group-level data.

Usage Examples

Consider aSales table with the following columns:Product,Region, andAmount.

Data Aggregation

For data aggregation, we use aggregate functions such asSUM,AVG,COUNT,MIN, orMAX.

The query below calculates total sales by region:

SELECT Region,SUM(Amount)AS TotalSalesFROM SalesGROUP BY Region;

Filtering

TheGROUP BY clause can include conditional statements. For example, to count only those sales that exceed $100 in amount:

SELECT Region,COUNT(Amount)AS SalesAbove100FROM SalesWHERE Amount>100GROUP BY Region;

Calculated Fields

You can compute derived values for groups. For instance, to find what proportion each product contributes to the overall sales in a region, use this query:

SELECT Region, Product,SUM(Amount)/ (SELECTSUM(Amount)FROM SalesWHERE Region=s.Region)AS RelativeContributionFROM Sales sGROUP BY Region, Product;

Performance Considerations

Efficient database design aims to balance query performance with storage requirements. Aggregating data during retrieval can optimize performance, especially when dealing with huge datasets.

It's essential to verify these calculations for accuracy, as improper data handling can lead to skewed results.

14. What is asubquery, and when would you use one?

Subqueries are embedded SQL select statements that provide inputs for an outer query. They can perform various tasks, such as filtering and aggregate computations.Subqueries can also be useful for complex join conditions, self-joins, and more.

Common Subquery Types

Scalar Subquery

AScalar Subquery returns a single value. They're frequently used for comparisons—like>,=, orIN.

Examples:

Getting themaximum value:
- SELECT col1 FROM table1 WHERE col1 = (SELECT MAX(col1) FROM table1);
Checking existence:
- SELECT col1, col2 FROM table1 WHERE col1 = (SELECT col1 FROM table2 WHERE condition);
Usingaggregates:
- SELECT col1 FROM table1 WHERE col1 = (SELECT SUM(col2) FROM table2);

Table Subquery

ATable Subquery is like a temporary table. It returns rows and columns and can be treated as a regular table for further processing.

Examples:

Filtering data:
- SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table2 WHERE condition);
Data deduplication:
- SELECT DISTINCT col1 FROM table1 WHERE condition1 AND col1 IN (SELECT col1 FROM table2 WHERE condition2);

Advantages of Using Subqueries

Simplicity: They offer cleaner syntax, especially for complex queries.
Structured Data: Subqueries can ensure that intermediate data is properly processed, making them ideal for multi-step tasks.
Reduced Code Duplication: By encapsulating certain logic within a subquery, you can avoid repetitive code.
Dynamic Filtering: The data returned by a subquery can dynamically influence the scope of the outer query.
Milestone Calculations: For long and complex queries, subqueries can provide clarity and help break down the logic into manageable parts.

Limitations and Optimization

Performance: Subqueries can sometimes be less efficient. Advanced databases like Oracle, SQL Server, and PostgreSQL offer optimizations, but it's essential to monitor query performance.
Versatility: While subqueries are powerful, they can be less flexible in some scenarios compared to other advanced features like Common Table Expressions (CTEs) and Window Functions.
Understanding and Debugging: Nested logic might make a stored procedure or more advanced techniques like CTEs easier to follow and troubleshoot.

Code Example: Using Subqueries

Here is the SQL code:

-- Assuming you have table1 and table2-- Scalar Subquery ExampleSELECT col1FROM table1WHERE col1= (SELECTMAX(col1)FROM table1);-- Table Subquery ExampleSELECT col1, col2FROM table1WHERE col1= (SELECT col1FROM table2WHERE condition);

15. Describe the functions of theORDER BY clause.

TheORDER BY clause in SQL serves to sort the result set based on specified columns, in either ascending (ASC, default) or descending (DESC) order. It's often used in conjunction with various SQL statements likeSELECT orUNION to enhance result presentation.

Key Features

Column-Specific Sorting: You can designate one or more columns as the basis for sorting. For multiple columns, the order of precedence is from left to right.
ASC and DESC Directives: These allow for both ascending and descending sorting. If neither is specified, it defaults to ascending.

Use Cases

Top-N Queries: Selecting a specific number of top or bottom records can be accomplished usingORDER BY along withLIMIT orOFFSET.
Trends Identification: WithORDER BY, you can identify trends or patterns in your data, such as ranking by sales volume or time-based sequences.
Improved Data Presentation: By sorting records in a logical order, you can enhance the visual appeal and comprehension of your data representations.

Code Example: Order by Multiple Columns and Limit Results

Let's say you have a "sales" table with columnsproduct_name,sale_date, andunits_sold. You want to fetch the top 3 products that sold the most units on a specific date, sorted by units sold (in descending order) and product name (in ascending order).

Here is the SQL query:

SELECT product_name, sale_date, units_soldFROM salesWHERE sale_date='2022-01-15'ORDER BY units_soldDESC, product_nameASCLIMIT3;

The expected result will show the top 3 products with the highest units sold on the given date. If two products have the same number of units sold, they will be sorted in alphabetical order by their names.

SQL Server Specific: Order by Column Position

InSQL Server, you can also use the column position in the ORDER BY clause. For example, instead of using column names, you can use 1 for the first column, 2 for the second, and so on. This syntax:

SELECT product_name, sale_date, units_soldFROM salesWHERE sale_date='2022-01-15'ORDER BY3DESC,1ASCLIMIT3;

performs the same operation as the previous example.

MySQL Specific: Random Order

InMySQL, you can reorder the results in a random sequence. This can be useful, for instance, in a quiz app to randomize the order of questions. TheORDER BY clause with theRAND() function looks like this:

SELECT product_nameFROM productsORDER BY RAND()LIMIT1;

Explore all 100 answers here 👉Devinterview.io - SQL

About

🟣 SQL interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Devinterview-io/sql-interview-questions

Folders and files

Latest commit

History

Repository files navigation

100 Must-Know SQL Interview Questions in 2025

You can also find all 100 answers here 👉Devinterview.io - SQL

1. What isSQL and what is it used for?

Core Components

Common Database Management Tasks

Basic SQL Commands

Code Example: Basic SQL Queries

2. Describe the difference betweenSQL andNoSQL databases.

Top-Level Differences

Common NoSQL Database Types

Document Stores

Key-Value Stores

Wide-Column Stores (Column Families)

Graph Databases

Data Modeling Differences

Auto-Incrementing IDs

Handling Data Relationships

Transaction Support

Data Consistency Levels

Scalability

Data Flexibility

Data Integrity & Validation

3. What are the different types ofSQL commands?

Data Query Language (DQL)

Keywords and Examples:

Data Definition Language (DDL)

Keywords and Examples:

Data Manipulation Language (DML)

Keywords and Examples:

Data Control Language (DCL)

Keywords and Examples:

4. Explain the purpose of theSELECT statement.

Key Components of the SELECT Statement

Practical Applications of SELECT

SELECT Query Example

5. What is the difference betweenWHERE andHAVING clauses?

WHERE Clause

HAVING Clause

6. Define what aJOIN is in SQL and list its types.

Inner Join

Outer Join

Left Outer Join

Right Outer Join

Full Outer Join

Cross Join

Self Join

7. What is aprimary key in a database?

Key Characteristics

Data Integrity Benefits

Performance Advantages

Industry Best Practice

Code Example: Declaring a Primary Key

8. Explain what aforeign key is and how it is used.

Key Functions of a Foreign Key

Foreign Key Constraints

Use Cases and Best Practices

Code Example: Creating a Foreign Key Relationship

9. How can you preventSQL injections?

Parameterized Queries

Stored Procedures

Input Validation

Code Filtering

10. What isnormalization? Explain with examples.

Normalization in Action

Unnormalized Table (0NF)

First Normal Form (1NF)

Customer Details Table

Invoices Table

Items Table

Second Normal Form (2NF)

Third Normal Form (3NF)

Updated Invoices Table

Practical Implications

Real-World Usage

Packages