You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
End-to-end Order data analysis project using Python and SQL to extract, clean, and analyze order data from Kaggle.
Project Overview
A brief description of this project
This project involves data analysis of orders using Python to extract data from Kaggle via API, load it intoJupyter Notebook for manipulation, and then into aMS SQL server for comprehensive analysis. This end-to-end process showcases aETL (Extract, Transform, Load) project aimed at deriving insights from order data to understand trends, improve decision-making, and optimize business strategies.
Problem Statement
Retailers often face challenges in managing orders, understanding customer behavior, and predicting future trends. This project addresses these challenges by leveraging data analysis techniques to extract meaningful insights from order data. The goal is to analyze sales patterns, product performance, and regional differences to support business operations and strategy formulation.
Table of Contents
Dataset
Data Extraction
Data Cleaning
Data Loading
Data Analysis
Libraries Used
Key Findings
Insights
Conclusion
Dataset
The dataset used in this project includes the following features:
Order ID
Order Date
Ship Mode
Segment
Country
City
State
Postal Code
Region
Category
Sub-category
Product ID
Quantity
Discount
Sale Price
Profit
Data Extraction
Data extraction steps performed include:
Using the Kaggle API to download the dataset
Loading the dataset into Jupyter Notebook for initial inspection
Data Cleaning
Data cleaning steps performed in Python include:
Handling missing values
Removing duplicates
Converting data types
Standardizing categorical variables
Data Loading into SQL Server
Data loading steps include:
Storing the cleaned data into an SQL server
Setting up the appropriate database schema
Writing SQL queries to manipulate
Data Analysis
During the data analysis, several analyses were conducted:
Distribution analysis of orders over different categories and sub-categories
Customer segmentation based on demographics and segments
Time-series analysis of order trends by date and region
Performance metrics of different product categories and sub-categories
Analysis of shipping modes and their impact on sales and profit
Regional analysis of sales and profit distribution
Libraries Used
The following Python libraries were used in this project:
Pandas
SQLAlchemy
Jupyter Notebook
Zip file
Kaggle
Project Architecture
The following diagram illustrates the data flow for this project:
**Key Findings
Key Findings
Finding the top 10 highest revenue-generating products.
Finding the top 5 selling products in each region.
Query for finding each category that had the highest month of sales.
Finding month-over-month growth comparison for 2022 and 2023 sales.
Which sub categories had highest growth by profit in 2023 compare to 2022.
SQL Queries
Finding the top 10 highest revenue-generating products.
Finding the top 5 selling products in each region.
Query for finding each category that had the highest month of sales.
Finding month-over-month growth comparison for 2022 and 2023 sales.
Which sub categories had highest growth by profit in 2023 compare to 2022.
About
End-to-end Orders data analysis project using Python and SQL to extract, clean, and analyze order data from Kaggle (dataset).