marcoshsq/Data_Science_RoadmapPublic

NotificationsYou must be signed in to change notification settings
Fork4
Star26

A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.

26 stars 4 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
README.md		README.md

Repository files navigation

📊 Data Science & Analytics Self-Taught Curriculum

A structured roadmap to learn data science for free

🧭 Summary

🧠 About

ThisSelf-Taught Data Science Curriculum is a structured roadmap I designed to guide my own journey into data science — using onlyfree and high-quality online resources.

My goal was to create a program that covered the full stack of data science knowledge, from basic concepts to advanced applications, including:

Programming
Mathematics and Statistics
Machine Learning and Deep Learning
Databases, Big Data, and Cloud Computing

This roadmap is ideal for:

Aspiring data scientists learning independently
Professionals who want todeepen their analytics skills
Anyone seeking a structured, free alternative to paid bootcamps

📌 This is a living document — I update it regularly as I complete courses or discover better resources.

🎯 Learning Goals

This curriculum is designed to help you gain practical, theoretical, and technical proficiency across key areas of data science:

1️⃣ Programming for Data Science

Python: Data manipulation, visualization, ML tools

2️⃣ Mathematics & Statistics

Linear Algebra, Calculus, Probability
Inferential Statistics, Bayesian Methods, Regression
ML theory and algorithmic foundations

3️⃣ Databases, Warehousing, and Big Data

SQL & NoSQL
Data lakes, pipelines, and ETL
Tools like Hadoop, Spark, and cloud-native storage

4️⃣ Machine Learning & Deep Learning

Supervised & Unsupervised Learning
Neural Networks, CNNs, NLP, RL
AI ethics and responsible modeling

🗂️ Curriculum Overview

The curriculum is structured into10 sections, grouped by learning stage and topic. Each includes carefully selected resources with estimated time commitment.

Section	Area	Approx. Hours
01	Fundamentals	~40h
02	Mathematics & Statistics	~90h
03	Programming (Python)	~215h
04	Data Mining	~120h
05	Databases & SQL	~80h
06	Big Data	~85h
07	Machine Learning	~120h
08	Deep Learning	~125h
09	Data Warehousing	~300h
10	Cloud Computing	~120h

🧩 Detailed tables with links, skills, and certificates are available in each section.

📌 How to Use This Curriculum

This roadmap is flexible and can be adapted based on your learning pace and background:

✅ Follow itsequentially if you're starting from scratch.
✅ Skip sections if you already have knowledge in a particular area.
✅ Combine different resources, projects, and additional readings.

Each module contains curated courses with estimated effort and certification options when available.

To the top

📚 Section 01 - Fundamentals (~40h)

In this first section, my goal is to establish a solid foundation in data science by understanding the role of data in decision-making, the fundamentals of the field, and the key tools used by professionals. Additionally, I aim to develop a clear understanding of what it means to be a data scientist, the essential skills required, and how to apply this knowledge in practice.

The main skills I want to acquire in this stage include:

✅ Understanding what data is and how it can be used
✅ Fundamental concepts of data science and its impact on various industries
✅ Familiarity with essential tools for data analysis and manipulation

Courses

📚Data – What It Is, What We Can Do With It (Johns Hopkins University)

This course provides a clear introduction to what data is, how it is generated, and how it can be used to answer questions and support decision-making. I chose this course to build a conceptual foundation before moving on to more complex techniques.

Skills developed:

Understanding the concept of data and its different forms
Practical applications of data usage in problem-solving
Introduction to data collection, organization, and interpretation

Course	Offered by	Effort
Data – What It Is, What We Can Do With It	Johns Hopkins University	~11h

📚 What is Data Science? (IBM Skills Network)

This course offers an overview of the field of data science, exploring the responsibilities of a data scientist, the stages of the data analysis process, and its applications. It helps to better understand the career and the importance of data science in the modern world.

Skills developed:

Understanding what data science is and its applications
Insights into the data science lifecycle
Knowledge of the key tools and technologies used in the field

Course	Offered by	Effort
What is Data Science?	IBM Skills Network	~11h

📚The Data Scientist's Toolbox (Johns Hopkins University)

This course is essential for gaining familiarity with the fundamental tools used by data scientists. It introduces basic programming concepts, version control, and project organization—essential elements for working with data in a structured and efficient way.

Skills developed:

Introduction to R and RStudio
Basic concepts of Git and GitHub for version control
Insights into data science workflows

Course	Offered by	Effort
The Data Scientist's Toolbox	Johns Hopkins University	~18h

To the top

📐 Section 02 - Mathematics and Statistics for Data Science (~90h)

This section is essential to understand themathematical and statistical foundations of data science. My goal here is to acquire strong theoretical tools to support more complex models, especially in machine learning and inferential analysis.

The content covers linear algebra, calculus, probability, and statistics — from basic concepts to the application of Bayesian methods and ML theory.

The main skills I want to develop at this stage include:

✅ Understanding matrix operations, vectors, eigenvalues, and decompositions
✅ Derivatives, gradients, and optimization for ML
✅ Concepts of probability, distributions, and statistical inference
✅ Bayesian thinking and probabilistic reasoning
✅ Theoretical foundation behind supervised and unsupervised models

Courses

📚Linear Algebra for Machine Learning and Data Science – DeepLearning.AI

Description:This course offers an applied and visual introduction to linear algebra — one of the most crucial areas for working with data. It explores matrices, vector spaces, linear transformations, and the math behind dimensionality reduction and neural networks.

Why I chose this course:It’s part of theMathematics for Machine Learning and Data Science specialization by DeepLearning.AI, created with Andrew Ng’s endorsement. The practical approach with visual tools makes it ideal for learners in applied data science.

Skills developed:

Matrix operations, vector norms, and projections
Singular Value Decomposition (SVD)
Applications in data compression and feature extraction

Course	Offered by	Effort
Linear Algebra for ML and DS	DeepLearning.AI	~34h

📚Calculus for Machine Learning and Data Science – DeepLearning.AI

Description:This course provides a practical introduction to calculus, focusing on how derivatives and integrals are used to train and optimize machine learning models.

Why I chose this course:Traditional calculus courses are very theoretical. This one, however, is laser-focused on real applications like gradient descent, cost functions, and model convergence — essential concepts for data scientists.

Skills developed:

Derivatives and chain rule
Optimization using gradients
Applications of calculus in ML training

Course	Offered by	Effort
Calculus for ML and DS	DeepLearning.AI	~25h

📚Probability and Statistics for Machine Learning and Data Science – DeepLearning.AI

Description:A comprehensive course that introduces both descriptive and inferential statistics, with a focus on applications in machine learning. Topics include probability theory, conditional probability, hypothesis testing, and Bayesian methods.

Why I chose this course:This course doesn't just teach "classical stats" — it explicitly bridges the gap between statistics and ML, making it perfect for applied work in data science.

Skills developed:

Descriptive statistics and probability distributions
Conditional probability and Bayes' Theorem
Confidence intervals and hypothesis testing
Probabilistic thinking in ML

Course	Offered by	Effort
Probability & Stats for ML	DeepLearning.AI	~33h

To the top

🐍 Section 03-A - Python Language for Data Analysis (~140h)

In this section, my goal is to master Python as the primary language for data analysis, visualization, and machine learning. Python is the industry standard in data science, widely adopted thanks to its simplicity, community, and rich ecosystem of libraries like NumPy, pandas, matplotlib, scikit-learn, and many others.

The focus here is onhands-on experience, building the ability to:

Write clean, efficient code for data manipulation
Use Python tools to explore, visualize, and analyze data
Implement and evaluate machine learning models
Work with real datasets, pipelines, and applied problems

The main skills I want to develop at this stage include:

✅ Python programming for data manipulation (NumPy, pandas)
✅ Data visualization using matplotlib and seaborn
✅ Building and validating machine learning models with scikit-learn
✅ Natural language processing and social network analysis
✅ Applying Python in real-world projects across different domains

Courses

🐍Introduction to Data Science in Python – University of Michigan

Description:A foundational course that introduces data manipulation with pandas, working with DataFrames, and the basics of cleaning and transforming data for analysis.

Why I chose this course:It’s the first course of theApplied Data Science with Python Specialization, one of the most respected Python tracks on Coursera. It provides a smooth learning curve for practical data tasks.

Skills developed:

Data structures in pandas
Handling missing values and data types
Basic exploratory data analysis (EDA)

Course	Offered by	Effort
Intro to Data Science in Python	Univ. of Michigan	~34h

🐍Applied Plotting, Charting & Data Representation in Python – University of Michigan

Description:This course introduces practical techniques to create visualizations using matplotlib and other Python libraries, focusing on choosing the right type of plot for different data contexts.

Why I chose this course:Data visualization is often underestimated, but it’s critical for communicating insights. This course strengthens your ability to create professional, informative visuals.

Skills developed:

Line plots, histograms, scatterplots, and advanced charts
Visual perception principles
Interactive plotting and dashboard elements

Course	Offered by	Effort
Plotting in Python	Univ. of Michigan	~24h

🐍Applied Machine Learning in Python – University of Michigan

Description:Focuses on implementing machine learning models using scikit-learn, including classification, regression, and clustering.

Why I chose this course:It emphasizes not just the use of models but also best practices like train/test splits, model evaluation, overfitting, and performance metrics — all essential for a solid ML foundation.

Skills developed:

scikit-learn pipelines
Supervised learning (logistic regression, decision trees)
Model evaluation and validation techniques

Course	Offered by	Effort
ML in Python	Univ. of Michigan	~31h

🐍Applied Text Mining in Python – University of Michigan

Description:Covers the fundamentals of natural language processing (NLP) in Python, including tokenization, TF-IDF, and basic text classification.

Why I chose this course:Text data is everywhere — and this course provides the essential tools to process and analyze it using real-world datasets.

Skills developed:

Working with text data using pandas and NLTK
Document-term matrices
Basic text classifiers (e.g., Naive Bayes)

Course	Offered by	Effort
Text Mining in Python	Univ. of Michigan	~25h

🐍Applied Social Network Analysis in Python – University of Michigan

Description:Explores how to analyze network structures such as social graphs, user connections, and centrality using NetworkX and Python.

Why I chose this course:Social network analysis is increasingly useful in marketing, user behavior, fraud detection, and influence modeling.

Skills developed:

Graph theory and network metrics
Using NetworkX for social graphs
Identifying influential nodes and clusters

Course	Offered by	Effort
Social Network Analysis	Univ. of Michigan	~26h

To the top

🧪 Section 04 - Data Mining (~120h)

In this section, I explore how to extract useful patterns, knowledge, and structures from large volumes of data — both structured and unstructured. The focus is onpractical techniques in text mining, clustering, and pattern discovery, with applications in business intelligence, recommendation systems, and behavioral analysis.

The goals here are to:

✅ Understand fundamental data mining concepts
✅ Learn how to extract insights from text data
✅ Apply clustering and pattern recognition algorithms
✅ Improve decision-making with data visualization

All the courses come from theData Mining Specialization by the University of Illinois Urbana-Champaign.

🧪Data Visualization – University of Illinois

Description:
This course covers the essentials of visualizing data effectively — not just creating pretty charts, but telling meaningful stories through data. It introduces principles of design, perception, and interpretation.

Why I chose this course:
Understanding how tocommunicate insights visually is just as important as the analysis itself. This course emphasizes design thinking and good visualization practices.

Skills developed:

Best practices for chart selection and design
Use of color, layout, and perception in data storytelling
Hands-on experience building visualizations

Course	Offered by	Effort
Data Visualization	University of Illinois	~15h

🔎Text Retrieval and Search Engines – University of Illinois

Description:
Explores how modern search engines work, including indexing, ranking, and retrieval of large text collections. Introduces TF-IDF, inverted indexes, and Boolean models.

Why I chose this course:
It offers a solid foundation forbuilding search systems and working withlarge-scale text data — crucial for recommendation systems, search platforms, and NLP.

Skills developed:

Document indexing and search algorithms
TF-IDF and cosine similarity
Evaluation of retrieval performance (precision, recall)

Course	Offered by	Effort
Text Retrieval and Search Engines	University of Illinois	~30h

🧾Text Mining and Analytics – University of Illinois

Description:
Delves into the mining of unstructured text data, covering key topics like topic modeling, sentiment analysis, and named entity recognition.

Why I chose this course:
Text is one of the most abundant data formats today. This course buildsNLP fundamentals that are crucial for applications in marketing, product reviews, and social media analysis.

Skills developed:

Text preprocessing and feature engineering
Topic modeling (e.g., LDA)
Sentiment analysis and classification

Course	Offered by	Effort
Text Mining and Analytics	University of Illinois	~33h

🔁Pattern Discovery in Data Mining – University of Illinois

Description:
Focuses on algorithms to discover frequent patterns, associations, and sequences in datasets. Introduces the Apriori algorithm and association rule mining.

Why I chose this course:
It’s essential formarket basket analysis,fraud detection, andbehavior prediction, giving insights intorepetitive and meaningful patterns.

Skills developed:

Frequent itemset mining (Apriori, FP-Growth)
Association rules (support, confidence, lift)
Sequential pattern mining

Course	Offered by	Effort
Pattern Discovery in Data Mining	University of Illinois	~17h

🔗Cluster Analysis in Data Mining – University of Illinois

Description:
Introduces unsupervised learning techniques to group similar items without labeled outcomes. Covers clustering metrics, methods, and applications.

Why I chose this course:
Clustering is a powerful tool forcustomer segmentation,anomaly detection, andunsupervised exploration of datasets.

Skills developed:

k-means and hierarchical clustering
Density-based clustering (DBSCAN)
Cluster evaluation and visualization

Course	Offered by	Effort
Cluster Analysis in Data Mining	University of Illinois	~16h

To the top

🗄️ Section 05 - Databases and SQL (~80h)

This section focuses on mastering relational databases and SQL, the backbone of storing and querying structured data. Understanding how databases work — from design principles to advanced querying — is essential for any data analyst or data scientist.

Main goals in this section:

✅ Learn how to design and normalize relational databases
✅ Query data efficiently using SQL
✅ Understand advanced database topics, including emerging technologies
✅ Build a solid foundation for data warehousing and backend data engineering

All courses are part of theDatabases for Data Scientists Specialization by the University of Colorado.

🗄️Relational Database Design – University of Colorado

Description:
Introduces the foundations of relational databases, including normalization, entity-relationship modeling, and schema design for structured data storage.

Why I chose this course:
Before writing any SQL, it’s essential to understandhow databases are structured and why proper design ensures data integrity and performance.

Skills developed:

Entity-Relationship (ER) modeling
Normalization (1NF to 3NF)
Schema creation and database logic

Course	Offered by	Effort
Relational Database Design	University of Colorado	~34h

🧾The Structured Query Language (SQL) – University of Colorado

Description:
A hands-on introduction to SQL, covering SELECT statements, joins, subqueries, filtering, aggregation, and working with multiple tables.

Why I chose this course:
SQL is amust-have skill for data professionals. This course reinforces the fundamentals while also preparing for complex queries and real-world use cases.

Skills developed:

SELECT, WHERE, GROUP BY, and JOIN clauses
Writing nested queries and subqueries
Filtering, sorting, and aggregating data

Course	Offered by	Effort
The Structured Query Language (SQL)	University of Colorado	~26h

🚀Advanced Topics and Future Trends in Database Technologies – University of Colorado

Description:
Covers cutting-edge and emerging database topics such as NoSQL, NewSQL, distributed databases, and database scalability.

Why I chose this course:
As data ecosystems evolve, it’s important to understandwhere database technology is heading — especially with big data, real-time systems, and cloud-native tools.

Skills developed:

Concepts of NoSQL, document, key-value, and columnar stores
Distributed database systems and CAP theorem
Emerging trends: scalability, cloud databases, and database-as-a-service

Course	Offered by	Effort
Advanced Topics and Future Trends in Database Technologies	University of Colorado	~16h

To the top

🧱 Section 06 - Big Data (~85h)

This section introduces the architecture, tools, and methods used to work withmassive volumes of data that exceed the capabilities of traditional systems. The courses cover everything fromdata storage and integration todistributed processing andmachine learning at scale.

Key goals for this section:

✅ Understand the foundations of big data systems and architectures
✅ Explore tools for storing, querying, and integrating large datasets
✅ Learn scalable machine learning techniques
✅ Apply graph analytics to uncover relationships in complex data

All courses come from theBig Data Specialization by the University of California, San Diego.

🧱Introduction to Big Data – University of California

Description:
A high-level overview of what big data is, why it matters, and how it’s transforming business and research. Covers the big data ecosystem, including Hadoop and NoSQL.

Why I chose this course:
It provides a clearintroductory framework for the concepts, challenges, and technologies of working with large-scale data.

Skills developed:

Definitions and scope of big data
Overview of the Hadoop ecosystem
Real-world applications and case studies

Course	Offered by	Effort
Introduction to Big Data	University of California	~17h

🗃️Big Data Modeling and Management Systems – University of California

Description:
Covers how to structure and organize data in distributed systems, including NoSQL databases like HBase, Cassandra, and MongoDB.

Why I chose this course:
To understand thedifferent paradigms of data storage and how schema design affects performance and scalability.

Skills developed:

Data modeling in big data environments
NoSQL systems: document, columnar, and key-value stores
Data consistency and availability trade-offs

Course	Offered by	Effort
Big Data Modeling and Management Systems	University of California	~13h

🔄Big Data Integration and Processing – University of California

Description:
Focuses on data ingestion and transformation at scale, using Apache Spark, MapReduce, and ETL pipelines for distributed processing.

Why I chose this course:
Efficient processing is key in big data — this course buildshands-on skills for integrating and transforming large datasets.

Skills developed:

Distributed data processing (Spark, MapReduce)
ETL and data integration pipelines
Batch vs. stream processing

Course	Offered by	Effort
Big Data Integration and Processing	University of California	~17h

🤖Machine Learning with Big Data – University of California

Description:
Teaches how to build and scale machine learning models using tools like Apache Spark’s MLlib, focusing on classification, clustering, and recommendation systems.

Why I chose this course:
It connectsmachine learning theory withbig data tools, which is essential for working in real-world production environments.

Skills developed:

Scalable machine learning with Spark MLlib
Model training and evaluation in distributed systems
Feature engineering at scale

Course	Offered by	Effort
Machine Learning with Big Data	University of California	~23h

🌐Graph Analytics for Big Data – University of California

Description:
Explores how to analyze relationships in large graphs, such as social networks or web link structures, using graph theory and distributed algorithms.

Why I chose this course:
Graph analytics is a powerful approach forunderstanding structure and influence in connected datasets — from fraud detection to recommendation systems.

Skills developed:

Graph modeling and structure
Graph traversal and centrality
Distributed graph processing (e.g., GraphX)

Course	Offered by	Effort
Graph Analytics for Big Data	University of California	~13h

To the top

🤖 Section 07 - Machine Learning (~100h)

This section builds the foundation for understanding and applyingmachine learning algorithms, from basic regression to advanced techniques like ensemble learning, recommendation systems, and reinforcement learning.

The focus is on bothconceptual understanding andhands-on implementation, using real-world datasets to develop practical, production-ready ML pipelines.

Main goals for this section:

✅ Master core ML algorithms (supervised and unsupervised)
✅ Build and evaluate models using regression, classification, and clustering
✅ Understand trade-offs in model complexity, bias, and variance
✅ Explore recommender systems and reinforcement learning techniques

All courses are part of theMachine Learning Specialization by DeepLearning.AI, taught by Andrew Ng.

📈Supervised Machine Learning: Regression and Classification – DeepLearning.AI

Description:
This course introduces the most fundamental machine learning techniques: linear regression, logistic regression, and decision boundaries — all explained with practical coding examples.

Why I chose this course:
It provides thebest conceptual intro to supervised learning, with hands-on notebooks and real-world exercises. Andrew Ng’s teaching style makes even complex topics accessible.

Skills developed:

Linear and logistic regression
Gradient descent and loss functions
Bias-variance tradeoff and regularization

Course	Offered by	Effort
Supervised Machine Learning: Regression and Classification	DeepLearning.AI	~33h

🧠Advanced Learning Algorithms – DeepLearning.AI

Description:
Goes deeper into supervised learning with advanced algorithms such as decision trees, random forests, XGBoost, and support vector machines.

Why I chose this course:
Toexpand beyond linear models and gain confidence in implementing some of the most powerful ML algorithms used in industry.

Skills developed:

Decision trees and ensemble methods (Random Forests, XGBoost)
SVMs and kernel tricks
Model selection and hyperparameter tuning

Course	Offered by	Effort
Advanced Machine Learning Algorithms	DeepLearning.AI	~34h

🧩Unsupervised Learning, Recommenders, Reinforcement Learning – DeepLearning.AI

Description:
Covers powerful unsupervised learning techniques such as clustering, anomaly detection, and PCA, along with real-world applications like recommendation systems and Q-learning.

Why I chose this course:
It connectstheory to application, showing how clustering and reinforcement learning power modern platforms — from YouTube recommendations to game AIs.

Skills developed:

k-means clustering and anomaly detection
Dimensionality reduction (PCA)
Recommender systems and collaborative filtering
Reinforcement learning and Q-learning

Course	Offered by	Effort
Unsupervised Learning, Recommenders, Reinforcement Learning	DeepLearning.AI	~37h

To the top

🧬 Section 08 - Deep Learning (~125h)

This section dives deep intoneural networks and deep learning, the foundation of modern AI systems. It covers a full pipeline from basic neural networks to advanced architectures likeCNNs andRNNs, with an emphasis on practical techniques for building and improving deep models.

Main goals in this section:

✅ Understand the math and mechanics behind deep neural networks
✅ Learn how to tune, train, and optimize deep learning models
✅ Apply deep learning to images, sequences, and NLP tasks
✅ Gain experience with TensorFlow/Keras and real-world use cases

All courses are part of theDeep Learning Specialization by DeepLearning.AI, taught by Andrew Ng.

🧠Neural Networks and Deep Learning – DeepLearning.AI

Description:
Introduces the fundamentals of deep learning, including perceptrons, forward/backpropagation, activation functions, and basic architectures.

Why I chose this course:
It lays thecore theoretical foundation for all deep learning work and presents it in an accessible, structured way.

Skills developed:

Basics of deep neural networks
Forward and backward propagation
Activation functions and weight initialization

Course	Offered by	Effort
Neural Networks and Deep Learning	DeepLearning.AI	~24h

⚙️Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization – DeepLearning.AI

Description:
Covers practical tools for improving deep learning models: optimization strategies, hyperparameter tuning, batch normalization, dropout, and more.

Why I chose this course:
It bridges the gap between theory and practice, offeringhands-on techniques for boosting model performance.

Skills developed:

Learning rate decay and mini-batch gradient descent
Regularization: L2, dropout
Hyperparameter tuning and optimizers (Adam, RMSprop)

Course	Offered by	Effort
Improving Deep Neural Networks	DeepLearning.AI	~23h

🛠️Structuring Machine Learning Projects – DeepLearning.AI

Description:
Focuses on the mindset and best practices for managing ML projects — how to prioritize errors, build scalable pipelines, and iterate effectively.

Why I chose this course:
It offersstrategic thinking that is often overlooked: how to debug, scale, and manage ML projects in real-world environments.

Skills developed:

Error analysis and ceiling analysis
Avoiding data leakage
Managing train/dev/test splits in production

Course	Offered by	Effort
Structuring Machine Learning Projects	DeepLearning.AI	~06h

🧿Convolutional Neural Networks (CNNs) – DeepLearning.AI

Description:
Explores convolutional architectures used in image recognition, detection, and segmentation tasks — including ResNet and YOLO.

Why I chose this course:
CNNs are essential for working with image data — this course gives ahands-on introduction to convolutional layers and computer vision tasks.

Skills developed:

Convolutions, pooling, padding
Deep CNN architectures (ResNet, Inception)
Image classification and object detection

Course	Offered by	Effort
Convolutional Neural Networks	DeepLearning.AI	~35h

🔁Sequence Models – DeepLearning.AI

Description:
Covers how to build models for sequential data, such as time series or natural language, using RNNs, GRUs, LSTMs, and attention mechanisms.

Why I chose this course:
Sequence models power everything fromchatbots to music generation — and this course gives the tools to implement them.

Skills developed:

Recurrent neural networks (RNN, LSTM, GRU)
Natural language processing basics
Attention and sequence-to-sequence models

Course	Offered by	Effort
Sequence Models	DeepLearning.AI	~37h

To the top

🏗️ Section 09 - Data Warehousing (~300h)

This section focuses on the architecture and implementation ofdata warehouses andbusiness intelligence systems — critical infrastructure for enterprise analytics. It covers everything from relational database theory to ETL pipelines and BI reporting.

Main goals in this section:

✅ Understand how data warehouses are designed and structured
✅ Learn how to build scalable ETL processes and integrate data from multiple sources
✅ Apply business intelligence tools to extract actionable insights
✅ Prepare for roles in backend analytics, data engineering, and BI architecture

All courses are part of theData Warehousing for Business Intelligence Specialization by the University of Colorado Boulder.

🏗️Database Management Essentials – Colorado Boulder

Description:
Covers relational database foundations: relational algebra, SQL queries, schema design, and data integrity enforcement.

Why I chose this course:
It provides thecore theoretical and technical background needed for understanding how relational databases support analytical workloads.

Skills developed:

Relational model, ER modeling, and constraints
SQL for data definition and manipulation
Foundations for OLAP vs. OLTP systems

Course	Offered by	Effort
Database Management Essentials	Colorado Boulder	~122h

🧱Data Warehouse Concepts, Design, and Data Integration – Colorado Boulder

Description:
Introduces dimensional modeling, star/snowflake schemas, and the processes of integrating data from disparate sources into a central warehouse.

Why I chose this course:
It focuses on thedesign principles behind scalable data warehouses, which are crucial for efficient querying and reporting.

Skills developed:

Dimensional data modeling (facts/dimensions)
Star, snowflake, and constellation schemas
ETL design and implementation

Course	Offered by	Effort
Data Warehouse Concepts, Design, and Data Integration	Colorado Boulder	~62h

🧮Relational Database Support for Data Warehouses – Colorado Boulder

Description:
Explores how relational systems support warehouse workloads, including indexing, query optimization, and data partitioning.

Why I chose this course:
It connectsrelational database theory with warehousing practice, helping understand performance and scalability challenges.

Skills developed:

Query performance tuning
Materialized views and indexing strategies
Physical schema design for OLAP

Course	Offered by	Effort
Relational Database Support for Data Warehouses	Colorado Boulder	~71h

📊Business Intelligence Concepts, Tools, and Applications – Colorado Boulder

Description:
Covers how BI tools are used to extract, visualize, and act on business data — with case studies and practical examples of analytics dashboards.

Why I chose this course:
To connectdata infrastructure to end-user decision-making, focusing on storytelling, KPIs, and dashboards.

Skills developed:

BI tool landscape and use cases
OLAP operations (roll-up, drill-down)
Data-driven decision frameworks

Course	Offered by	Effort
Business Intelligence Concepts, Tools, and Applications	Colorado Boulder	~21h

🛠️Design and Build a Data Warehouse for BI Implementation – Colorado Boulder

Description:
A capstone-style course that guides you through designing and implementing a working data warehouse, integrating ETL processes and building reports.

Why I chose this course:
It offershands-on experience that ties together all previous concepts — from schema design to final BI delivery.

Skills developed:

Full warehouse architecture lifecycle
Data sourcing, transformation, and loading
Reporting and BI dashboard implementation

Course	Offered by	Effort
Design and Build a Data Warehouse for Business Intelligence Implementation	Colorado Boulder	~31h

To the top

☁️ Section 10 - Cloud Computing (~120h)

This section focuses on thecore principles of cloud computing, including infrastructure, applications, networking, and practical project deployment. It builds a foundational understanding of how cloud systems work and how todesign scalable, distributed applications in the cloud.

Key goals for this section:

✅ Understand cloud infrastructure, virtualization, and scalability
✅ Learn how to design and deploy cloud-native applications
✅ Explore networking, security, and orchestration in the cloud
✅ Complete a practical project simulating real-world deployment

All courses are part of theCloud Computing Specialization by the University of Illinois Urbana-Champaign.

☁️Cloud Concepts Part 1 – University of Illinois

Description:
Introduces the fundamental building blocks of cloud computing, including data centers, virtualization, and service models like IaaS, PaaS, and SaaS.

Why I chose this course:
It builds thefoundational knowledge needed to understand the economics, architecture, and design of modern cloud systems.

Skills developed:

Cloud service models and deployment strategies
Virtualization and resource allocation
Intro to AWS, Google Cloud, and Azure paradigms

Course	Offered by	Effort
Cloud Concepts 1	University of Illinois	~24h

🌩️Cloud Concepts Part 2 – University of Illinois

Description:
Expands on the first course by discussing elasticity, fault tolerance, containers, and scalability strategies in cloud architecture.

Why I chose this course:
It dives deeper intocloud resilience and elasticity, key aspects for high-availability systems.

Skills developed:

Containers and microservices
Cloud scalability and elasticity
Managing reliability and availability

Course	Offered by	Effort
Cloud Concepts 2	University of Illinois	~19h

🧩Cloud Applications Part 1 – University of Illinois

Description:
Focuses on developing cloud-native applications using APIs, data storage services, and managed compute instances.

Why I chose this course:
It introduces thedeveloper’s perspective, teaching how to design and deploy real applications on the cloud.

Skills developed:

Cloud APIs and storage models
Stateless and stateful application design
Handling scale and concurrency

Course	Offered by	Effort
Cloud Applications 1	University of Illinois	~15h

⚙️Cloud Applications Part 2 – University of Illinois

Description:
Continues development topics with a focus on performance, monitoring, container orchestration, and user authentication.

Why I chose this course:
This course emphasizesoperational excellence and monitoring, which are crucial for real-world systems in production.

Skills developed:

Logging and monitoring cloud apps
Load balancing and caching
Authentication and access control

Course	Offered by	Effort
Cloud Applications 2	University of Illinois	~19h

🌐Cloud Networking – University of Illinois

Description:
Covers how networking works in cloud environments, including virtual networks, firewalls, routing, and SDNs.

Why I chose this course:
To understandhow services communicate at scale, securely and efficiently across virtualized infrastructure.

Skills developed:

Virtual Private Clouds (VPCs)
Network configuration and subnetting
Load balancers and security groups

Course	Offered by	Effort
Cloud Networking	University of Illinois	~22h

🛠️Cloud Computing Project – University of Illinois

Description:
A hands-on capstone project where you build and deploy a full-stack application in the cloud, integrating all concepts from the specialization.

Why I chose this course:
To apply all concepts in arealistic, end-to-end scenario, simulating a true production deployment pipeline.

Skills developed:

App deployment using cloud platforms
Integrating storage, compute, and networking
Debugging and monitoring a cloud-native app

Course	Offered by	Effort
Cloud Computing Project	University of Illinois	~21h

To the top

📖 Extra Bibliography

If you're looking for deeper insights, consider these additional resources:

Mathematics

The Elements of Statistical Learning - Hastie, Tibshirani, Friedman.
Introduction to Statistical Learning - James, Witten, Hastie, Tibshirani.
Bayesian Statistics - Peter M. Lee.

Machine Learning & AI

Artificial Intelligence: A Modern Approach - Stuart Russell.
Deep Learning Papers Reading Roadmap - Collection of AI research papers.

Programming & Databases

SQL for Smarties - Joe Celko.
The Missing Semester of Your CS Education - MIT.

These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.

To the top

📝 Notes and Clarifications

Course durations are approximate and based on platform estimates.
Some books were accessed through university partnerships, but if you don't have access... well, explore alternative ways. If possible, support authors by purchasing them.
The curriculum iscontinuously evolving as new resources become available.

🔗 References

Sources used to structure this curriculum:

OSSU Data Science - Open-source university model.
AI Expert Roadmap - AI & Data Science roadmap.
Roadmap SH - Learning paths for various tech disciplines.
USP Statistics Course - Inspiration for course selection.

To the top

About

github.com/marcoshsq

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

marcoshsq/Data_Science_Roadmap

Folders and files

Latest commit

History

Repository files navigation

📊 Data Science & Analytics Self-Taught Curriculum

🧭 Summary

🧠 About

🎯 Learning Goals

1️⃣ Programming for Data Science

2️⃣ Mathematics & Statistics

3️⃣ Databases, Warehousing, and Big Data

4️⃣ Machine Learning & Deep Learning

🗂️ Curriculum Overview

📌 How to Use This Curriculum

📚 Section 01 - Fundamentals (~40h)

Courses

📚Data – What It Is, What We Can Do With It (Johns Hopkins University)

📚 What is Data Science? (IBM Skills Network)

📚The Data Scientist's Toolbox (Johns Hopkins University)

📐 Section 02 - Mathematics and Statistics for Data Science (~90h)

Courses

📚Linear Algebra for Machine Learning and Data Science – DeepLearning.AI

📚Calculus for Machine Learning and Data Science – DeepLearning.AI

📚Probability and Statistics for Machine Learning and Data Science – DeepLearning.AI

🐍 Section 03-A - Python Language for Data Analysis (~140h)

Courses

🐍Introduction to Data Science in Python – University of Michigan

🐍Applied Plotting, Charting & Data Representation in Python – University of Michigan

🐍Applied Machine Learning in Python – University of Michigan

🐍Applied Text Mining in Python – University of Michigan

🐍Applied Social Network Analysis in Python – University of Michigan

🧪 Section 04 - Data Mining (~120h)

🧪Data Visualization – University of Illinois

🔎Text Retrieval and Search Engines – University of Illinois

🧾Text Mining and Analytics – University of Illinois

🔁Pattern Discovery in Data Mining – University of Illinois

🔗Cluster Analysis in Data Mining – University of Illinois

🗄️ Section 05 - Databases and SQL (~80h)

🗄️Relational Database Design – University of Colorado

🧾The Structured Query Language (SQL) – University of Colorado

🚀Advanced Topics and Future Trends in Database Technologies – University of Colorado

🧱 Section 06 - Big Data (~85h)

🧱Introduction to Big Data – University of California

🗃️Big Data Modeling and Management Systems – University of California

🔄Big Data Integration and Processing – University of California

🤖Machine Learning with Big Data – University of California

🌐Graph Analytics for Big Data – University of California

🤖 Section 07 - Machine Learning (~100h)

📈Supervised Machine Learning: Regression and Classification – DeepLearning.AI

🧠Advanced Learning Algorithms – DeepLearning.AI

🧩Unsupervised Learning, Recommenders, Reinforcement Learning – DeepLearning.AI

🧬 Section 08 - Deep Learning (~125h)

🧠Neural Networks and Deep Learning – DeepLearning.AI

⚙️Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization – DeepLearning.AI

🛠️Structuring Machine Learning Projects – DeepLearning.AI

🧿Convolutional Neural Networks (CNNs) – DeepLearning.AI

🔁Sequence Models – DeepLearning.AI

🏗️ Section 09 - Data Warehousing (~300h)

🏗️Database Management Essentials – Colorado Boulder

🧱Data Warehouse Concepts, Design, and Data Integration – Colorado Boulder

🧮Relational Database Support for Data Warehouses – Colorado Boulder

📊Business Intelligence Concepts, Tools, and Applications – Colorado Boulder

🛠️Design and Build a Data Warehouse for BI Implementation – Colorado Boulder

☁️ Section 10 - Cloud Computing (~120h)

☁️Cloud Concepts Part 1 – University of Illinois

🌩️Cloud Concepts Part 2 – University of Illinois

🧩Cloud Applications Part 1 – University of Illinois

⚙️Cloud Applications Part 2 – University of Illinois

🌐Cloud Networking – University of Illinois

🛠️Cloud Computing Project – University of Illinois

📖 Extra Bibliography

Mathematics

Machine Learning & AI

Programming & Databases

📝 Notes and Clarifications

🔗 References

About

Topics

Packages