- Notifications
You must be signed in to change notification settings - Fork4
A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.
marcoshsq/Data_Science_Roadmap
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A structured roadmap to learn data science for free
- 🧠 About
- 🎯 Learning Goals
- 🗂️ Curriculum Overview
- 🧩 How to Use This Curriculum
- 📘 Full Curriculum Sections
- 📖 Extra Bibliography
- 📝 Notes and Clarifications
- 🔗 References
ThisSelf-Taught Data Science Curriculum is a structured roadmap I designed to guide my own journey into data science — using onlyfree and high-quality online resources.
My goal was to create a program that covered the full stack of data science knowledge, from basic concepts to advanced applications, including:
- Programming
- Mathematics and Statistics
- Machine Learning and Deep Learning
- Databases, Big Data, and Cloud Computing
This roadmap is ideal for:
- Aspiring data scientists learning independently
- Professionals who want todeepen their analytics skills
- Anyone seeking a structured, free alternative to paid bootcamps
📌 This is a living document — I update it regularly as I complete courses or discover better resources.
This curriculum is designed to help you gain practical, theoretical, and technical proficiency across key areas of data science:
- Python: Data manipulation, visualization, ML tools
- Linear Algebra, Calculus, Probability
- Inferential Statistics, Bayesian Methods, Regression
- ML theory and algorithmic foundations
- SQL & NoSQL
- Data lakes, pipelines, and ETL
- Tools like Hadoop, Spark, and cloud-native storage
- Supervised & Unsupervised Learning
- Neural Networks, CNNs, NLP, RL
- AI ethics and responsible modeling
The curriculum is structured into10 sections, grouped by learning stage and topic. Each includes carefully selected resources with estimated time commitment.
Section | Area | Approx. Hours |
---|---|---|
01 | Fundamentals | ~40h |
02 | Mathematics & Statistics | ~90h |
03 | Programming (Python) | ~215h |
04 | Data Mining | ~120h |
05 | Databases & SQL | ~80h |
06 | Big Data | ~85h |
07 | Machine Learning | ~120h |
08 | Deep Learning | ~125h |
09 | Data Warehousing | ~300h |
10 | Cloud Computing | ~120h |
🧩 Detailed tables with links, skills, and certificates are available in each section.
This roadmap is flexible and can be adapted based on your learning pace and background:
- ✅ Follow itsequentially if you're starting from scratch.
- ✅ Skip sections if you already have knowledge in a particular area.
- ✅ Combine different resources, projects, and additional readings.
Each module contains curated courses with estimated effort and certification options when available.
In this first section, my goal is to establish a solid foundation in data science by understanding the role of data in decision-making, the fundamentals of the field, and the key tools used by professionals. Additionally, I aim to develop a clear understanding of what it means to be a data scientist, the essential skills required, and how to apply this knowledge in practice.
The main skills I want to acquire in this stage include:
- ✅ Understanding what data is and how it can be used
- ✅ Fundamental concepts of data science and its impact on various industries
- ✅ Familiarity with essential tools for data analysis and manipulation
This course provides a clear introduction to what data is, how it is generated, and how it can be used to answer questions and support decision-making. I chose this course to build a conceptual foundation before moving on to more complex techniques.
Skills developed:
- Understanding the concept of data and its different forms
- Practical applications of data usage in problem-solving
- Introduction to data collection, organization, and interpretation
Course | Offered by | Effort |
---|---|---|
Data – What It Is, What We Can Do With It | Johns Hopkins University | ~11h |
This course offers an overview of the field of data science, exploring the responsibilities of a data scientist, the stages of the data analysis process, and its applications. It helps to better understand the career and the importance of data science in the modern world.
Skills developed:
- Understanding what data science is and its applications
- Insights into the data science lifecycle
- Knowledge of the key tools and technologies used in the field
Course | Offered by | Effort |
---|---|---|
What is Data Science? | IBM Skills Network | ~11h |
This course is essential for gaining familiarity with the fundamental tools used by data scientists. It introduces basic programming concepts, version control, and project organization—essential elements for working with data in a structured and efficient way.
Skills developed:
- Introduction to R and RStudio
- Basic concepts of Git and GitHub for version control
- Insights into data science workflows
Course | Offered by | Effort |
---|---|---|
The Data Scientist's Toolbox | Johns Hopkins University | ~18h |
This section is essential to understand themathematical and statistical foundations of data science. My goal here is to acquire strong theoretical tools to support more complex models, especially in machine learning and inferential analysis.
The content covers linear algebra, calculus, probability, and statistics — from basic concepts to the application of Bayesian methods and ML theory.
The main skills I want to develop at this stage include:
- ✅ Understanding matrix operations, vectors, eigenvalues, and decompositions
- ✅ Derivatives, gradients, and optimization for ML
- ✅ Concepts of probability, distributions, and statistical inference
- ✅ Bayesian thinking and probabilistic reasoning
- ✅ Theoretical foundation behind supervised and unsupervised models
Description:This course offers an applied and visual introduction to linear algebra — one of the most crucial areas for working with data. It explores matrices, vector spaces, linear transformations, and the math behind dimensionality reduction and neural networks.
Why I chose this course:It’s part of theMathematics for Machine Learning and Data Science specialization by DeepLearning.AI, created with Andrew Ng’s endorsement. The practical approach with visual tools makes it ideal for learners in applied data science.
Skills developed:
- Matrix operations, vector norms, and projections
- Singular Value Decomposition (SVD)
- Applications in data compression and feature extraction
Course | Offered by | Effort |
---|---|---|
Linear Algebra for ML and DS | DeepLearning.AI | ~34h |
Description:This course provides a practical introduction to calculus, focusing on how derivatives and integrals are used to train and optimize machine learning models.
Why I chose this course:Traditional calculus courses are very theoretical. This one, however, is laser-focused on real applications like gradient descent, cost functions, and model convergence — essential concepts for data scientists.
Skills developed:
- Derivatives and chain rule
- Optimization using gradients
- Applications of calculus in ML training
Course | Offered by | Effort |
---|---|---|
Calculus for ML and DS | DeepLearning.AI | ~25h |
Description:A comprehensive course that introduces both descriptive and inferential statistics, with a focus on applications in machine learning. Topics include probability theory, conditional probability, hypothesis testing, and Bayesian methods.
Why I chose this course:This course doesn't just teach "classical stats" — it explicitly bridges the gap between statistics and ML, making it perfect for applied work in data science.
Skills developed:
- Descriptive statistics and probability distributions
- Conditional probability and Bayes' Theorem
- Confidence intervals and hypothesis testing
- Probabilistic thinking in ML
Course | Offered by | Effort |
---|---|---|
Probability & Stats for ML | DeepLearning.AI | ~33h |
In this section, my goal is to master Python as the primary language for data analysis, visualization, and machine learning. Python is the industry standard in data science, widely adopted thanks to its simplicity, community, and rich ecosystem of libraries like NumPy, pandas, matplotlib, scikit-learn, and many others.
The focus here is onhands-on experience, building the ability to:
- Write clean, efficient code for data manipulation
- Use Python tools to explore, visualize, and analyze data
- Implement and evaluate machine learning models
- Work with real datasets, pipelines, and applied problems
The main skills I want to develop at this stage include:
- ✅ Python programming for data manipulation (NumPy, pandas)
- ✅ Data visualization using matplotlib and seaborn
- ✅ Building and validating machine learning models with scikit-learn
- ✅ Natural language processing and social network analysis
- ✅ Applying Python in real-world projects across different domains
Description:A foundational course that introduces data manipulation with pandas, working with DataFrames, and the basics of cleaning and transforming data for analysis.
Why I chose this course:It’s the first course of theApplied Data Science with Python Specialization, one of the most respected Python tracks on Coursera. It provides a smooth learning curve for practical data tasks.
Skills developed:
- Data structures in pandas
- Handling missing values and data types
- Basic exploratory data analysis (EDA)
Course | Offered by | Effort |
---|---|---|
Intro to Data Science in Python | Univ. of Michigan | ~34h |
Description:This course introduces practical techniques to create visualizations using matplotlib and other Python libraries, focusing on choosing the right type of plot for different data contexts.
Why I chose this course:Data visualization is often underestimated, but it’s critical for communicating insights. This course strengthens your ability to create professional, informative visuals.
Skills developed:
- Line plots, histograms, scatterplots, and advanced charts
- Visual perception principles
- Interactive plotting and dashboard elements
Course | Offered by | Effort |
---|---|---|
Plotting in Python | Univ. of Michigan | ~24h |
Description:Focuses on implementing machine learning models using scikit-learn, including classification, regression, and clustering.
Why I chose this course:It emphasizes not just the use of models but also best practices like train/test splits, model evaluation, overfitting, and performance metrics — all essential for a solid ML foundation.
Skills developed:
- scikit-learn pipelines
- Supervised learning (logistic regression, decision trees)
- Model evaluation and validation techniques
Course | Offered by | Effort |
---|---|---|
ML in Python | Univ. of Michigan | ~31h |
Description:Covers the fundamentals of natural language processing (NLP) in Python, including tokenization, TF-IDF, and basic text classification.
Why I chose this course:Text data is everywhere — and this course provides the essential tools to process and analyze it using real-world datasets.
Skills developed:
- Working with text data using pandas and NLTK
- Document-term matrices
- Basic text classifiers (e.g., Naive Bayes)
Course | Offered by | Effort |
---|---|---|
Text Mining in Python | Univ. of Michigan | ~25h |
Description:Explores how to analyze network structures such as social graphs, user connections, and centrality using NetworkX and Python.
Why I chose this course:Social network analysis is increasingly useful in marketing, user behavior, fraud detection, and influence modeling.
Skills developed:
- Graph theory and network metrics
- Using NetworkX for social graphs
- Identifying influential nodes and clusters
Course | Offered by | Effort |
---|---|---|
Social Network Analysis | Univ. of Michigan | ~26h |
In this section, I explore how to extract useful patterns, knowledge, and structures from large volumes of data — both structured and unstructured. The focus is onpractical techniques in text mining, clustering, and pattern discovery, with applications in business intelligence, recommendation systems, and behavioral analysis.
The goals here are to:
- ✅ Understand fundamental data mining concepts
- ✅ Learn how to extract insights from text data
- ✅ Apply clustering and pattern recognition algorithms
- ✅ Improve decision-making with data visualization
All the courses come from theData Mining Specialization by the University of Illinois Urbana-Champaign.
Description:
This course covers the essentials of visualizing data effectively — not just creating pretty charts, but telling meaningful stories through data. It introduces principles of design, perception, and interpretation.
Why I chose this course:
Understanding how tocommunicate insights visually is just as important as the analysis itself. This course emphasizes design thinking and good visualization practices.
Skills developed:
- Best practices for chart selection and design
- Use of color, layout, and perception in data storytelling
- Hands-on experience building visualizations
Course | Offered by | Effort |
---|---|---|
Data Visualization | University of Illinois | ~15h |
Description:
Explores how modern search engines work, including indexing, ranking, and retrieval of large text collections. Introduces TF-IDF, inverted indexes, and Boolean models.
Why I chose this course:
It offers a solid foundation forbuilding search systems and working withlarge-scale text data — crucial for recommendation systems, search platforms, and NLP.
Skills developed:
- Document indexing and search algorithms
- TF-IDF and cosine similarity
- Evaluation of retrieval performance (precision, recall)
Course | Offered by | Effort |
---|---|---|
Text Retrieval and Search Engines | University of Illinois | ~30h |
Description:
Delves into the mining of unstructured text data, covering key topics like topic modeling, sentiment analysis, and named entity recognition.
Why I chose this course:
Text is one of the most abundant data formats today. This course buildsNLP fundamentals that are crucial for applications in marketing, product reviews, and social media analysis.
Skills developed:
- Text preprocessing and feature engineering
- Topic modeling (e.g., LDA)
- Sentiment analysis and classification
Course | Offered by | Effort |
---|---|---|
Text Mining and Analytics | University of Illinois | ~33h |
Description:
Focuses on algorithms to discover frequent patterns, associations, and sequences in datasets. Introduces the Apriori algorithm and association rule mining.
Why I chose this course:
It’s essential formarket basket analysis,fraud detection, andbehavior prediction, giving insights intorepetitive and meaningful patterns.
Skills developed:
- Frequent itemset mining (Apriori, FP-Growth)
- Association rules (support, confidence, lift)
- Sequential pattern mining
Course | Offered by | Effort |
---|---|---|
Pattern Discovery in Data Mining | University of Illinois | ~17h |
Description:
Introduces unsupervised learning techniques to group similar items without labeled outcomes. Covers clustering metrics, methods, and applications.
Why I chose this course:
Clustering is a powerful tool forcustomer segmentation,anomaly detection, andunsupervised exploration of datasets.
Skills developed:
- k-means and hierarchical clustering
- Density-based clustering (DBSCAN)
- Cluster evaluation and visualization
Course | Offered by | Effort |
---|---|---|
Cluster Analysis in Data Mining | University of Illinois | ~16h |
This section focuses on mastering relational databases and SQL, the backbone of storing and querying structured data. Understanding how databases work — from design principles to advanced querying — is essential for any data analyst or data scientist.
Main goals in this section:
- ✅ Learn how to design and normalize relational databases
- ✅ Query data efficiently using SQL
- ✅ Understand advanced database topics, including emerging technologies
- ✅ Build a solid foundation for data warehousing and backend data engineering
All courses are part of theDatabases for Data Scientists Specialization by the University of Colorado.
Description:
Introduces the foundations of relational databases, including normalization, entity-relationship modeling, and schema design for structured data storage.
Why I chose this course:
Before writing any SQL, it’s essential to understandhow databases are structured and why proper design ensures data integrity and performance.
Skills developed:
- Entity-Relationship (ER) modeling
- Normalization (1NF to 3NF)
- Schema creation and database logic
Course | Offered by | Effort |
---|---|---|
Relational Database Design | University of Colorado | ~34h |
Description:
A hands-on introduction to SQL, covering SELECT statements, joins, subqueries, filtering, aggregation, and working with multiple tables.
Why I chose this course:
SQL is amust-have skill for data professionals. This course reinforces the fundamentals while also preparing for complex queries and real-world use cases.
Skills developed:
- SELECT, WHERE, GROUP BY, and JOIN clauses
- Writing nested queries and subqueries
- Filtering, sorting, and aggregating data
Course | Offered by | Effort |
---|---|---|
The Structured Query Language (SQL) | University of Colorado | ~26h |
Description:
Covers cutting-edge and emerging database topics such as NoSQL, NewSQL, distributed databases, and database scalability.
Why I chose this course:
As data ecosystems evolve, it’s important to understandwhere database technology is heading — especially with big data, real-time systems, and cloud-native tools.
Skills developed:
- Concepts of NoSQL, document, key-value, and columnar stores
- Distributed database systems and CAP theorem
- Emerging trends: scalability, cloud databases, and database-as-a-service
Course | Offered by | Effort |
---|---|---|
Advanced Topics and Future Trends in Database Technologies | University of Colorado | ~16h |
This section introduces the architecture, tools, and methods used to work withmassive volumes of data that exceed the capabilities of traditional systems. The courses cover everything fromdata storage and integration todistributed processing andmachine learning at scale.
Key goals for this section:
- ✅ Understand the foundations of big data systems and architectures
- ✅ Explore tools for storing, querying, and integrating large datasets
- ✅ Learn scalable machine learning techniques
- ✅ Apply graph analytics to uncover relationships in complex data
All courses come from theBig Data Specialization by the University of California, San Diego.
Description:
A high-level overview of what big data is, why it matters, and how it’s transforming business and research. Covers the big data ecosystem, including Hadoop and NoSQL.
Why I chose this course:
It provides a clearintroductory framework for the concepts, challenges, and technologies of working with large-scale data.
Skills developed:
- Definitions and scope of big data
- Overview of the Hadoop ecosystem
- Real-world applications and case studies
Course | Offered by | Effort |
---|---|---|
Introduction to Big Data | University of California | ~17h |
Description:
Covers how to structure and organize data in distributed systems, including NoSQL databases like HBase, Cassandra, and MongoDB.
Why I chose this course:
To understand thedifferent paradigms of data storage and how schema design affects performance and scalability.
Skills developed:
- Data modeling in big data environments
- NoSQL systems: document, columnar, and key-value stores
- Data consistency and availability trade-offs
Course | Offered by | Effort |
---|---|---|
Big Data Modeling and Management Systems | University of California | ~13h |
Description:
Focuses on data ingestion and transformation at scale, using Apache Spark, MapReduce, and ETL pipelines for distributed processing.
Why I chose this course:
Efficient processing is key in big data — this course buildshands-on skills for integrating and transforming large datasets.
Skills developed:
- Distributed data processing (Spark, MapReduce)
- ETL and data integration pipelines
- Batch vs. stream processing
Course | Offered by | Effort |
---|---|---|
Big Data Integration and Processing | University of California | ~17h |
Description:
Teaches how to build and scale machine learning models using tools like Apache Spark’s MLlib, focusing on classification, clustering, and recommendation systems.
Why I chose this course:
It connectsmachine learning theory withbig data tools, which is essential for working in real-world production environments.
Skills developed:
- Scalable machine learning with Spark MLlib
- Model training and evaluation in distributed systems
- Feature engineering at scale
Course | Offered by | Effort |
---|---|---|
Machine Learning with Big Data | University of California | ~23h |
Description:
Explores how to analyze relationships in large graphs, such as social networks or web link structures, using graph theory and distributed algorithms.
Why I chose this course:
Graph analytics is a powerful approach forunderstanding structure and influence in connected datasets — from fraud detection to recommendation systems.
Skills developed:
- Graph modeling and structure
- Graph traversal and centrality
- Distributed graph processing (e.g., GraphX)
Course | Offered by | Effort |
---|---|---|
Graph Analytics for Big Data | University of California | ~13h |
This section builds the foundation for understanding and applyingmachine learning algorithms, from basic regression to advanced techniques like ensemble learning, recommendation systems, and reinforcement learning.
The focus is on bothconceptual understanding andhands-on implementation, using real-world datasets to develop practical, production-ready ML pipelines.
Main goals for this section:
- ✅ Master core ML algorithms (supervised and unsupervised)
- ✅ Build and evaluate models using regression, classification, and clustering
- ✅ Understand trade-offs in model complexity, bias, and variance
- ✅ Explore recommender systems and reinforcement learning techniques
All courses are part of theMachine Learning Specialization by DeepLearning.AI, taught by Andrew Ng.
Description:
This course introduces the most fundamental machine learning techniques: linear regression, logistic regression, and decision boundaries — all explained with practical coding examples.
Why I chose this course:
It provides thebest conceptual intro to supervised learning, with hands-on notebooks and real-world exercises. Andrew Ng’s teaching style makes even complex topics accessible.
Skills developed:
- Linear and logistic regression
- Gradient descent and loss functions
- Bias-variance tradeoff and regularization
Course | Offered by | Effort |
---|---|---|
Supervised Machine Learning: Regression and Classification | DeepLearning.AI | ~33h |
Description:
Goes deeper into supervised learning with advanced algorithms such as decision trees, random forests, XGBoost, and support vector machines.
Why I chose this course:
Toexpand beyond linear models and gain confidence in implementing some of the most powerful ML algorithms used in industry.
Skills developed:
- Decision trees and ensemble methods (Random Forests, XGBoost)
- SVMs and kernel tricks
- Model selection and hyperparameter tuning
Course | Offered by | Effort |
---|---|---|
Advanced Machine Learning Algorithms | DeepLearning.AI | ~34h |
Description:
Covers powerful unsupervised learning techniques such as clustering, anomaly detection, and PCA, along with real-world applications like recommendation systems and Q-learning.
Why I chose this course:
It connectstheory to application, showing how clustering and reinforcement learning power modern platforms — from YouTube recommendations to game AIs.
Skills developed:
- k-means clustering and anomaly detection
- Dimensionality reduction (PCA)
- Recommender systems and collaborative filtering
- Reinforcement learning and Q-learning
Course | Offered by | Effort |
---|---|---|
Unsupervised Learning, Recommenders, Reinforcement Learning | DeepLearning.AI | ~37h |
This section dives deep intoneural networks and deep learning, the foundation of modern AI systems. It covers a full pipeline from basic neural networks to advanced architectures likeCNNs andRNNs, with an emphasis on practical techniques for building and improving deep models.
Main goals in this section:
- ✅ Understand the math and mechanics behind deep neural networks
- ✅ Learn how to tune, train, and optimize deep learning models
- ✅ Apply deep learning to images, sequences, and NLP tasks
- ✅ Gain experience with TensorFlow/Keras and real-world use cases
All courses are part of theDeep Learning Specialization by DeepLearning.AI, taught by Andrew Ng.
Description:
Introduces the fundamentals of deep learning, including perceptrons, forward/backpropagation, activation functions, and basic architectures.
Why I chose this course:
It lays thecore theoretical foundation for all deep learning work and presents it in an accessible, structured way.
Skills developed:
- Basics of deep neural networks
- Forward and backward propagation
- Activation functions and weight initialization
Course | Offered by | Effort |
---|---|---|
Neural Networks and Deep Learning | DeepLearning.AI | ~24h |
⚙️Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization – DeepLearning.AI
Description:
Covers practical tools for improving deep learning models: optimization strategies, hyperparameter tuning, batch normalization, dropout, and more.
Why I chose this course:
It bridges the gap between theory and practice, offeringhands-on techniques for boosting model performance.
Skills developed:
- Learning rate decay and mini-batch gradient descent
- Regularization: L2, dropout
- Hyperparameter tuning and optimizers (Adam, RMSprop)
Course | Offered by | Effort |
---|---|---|
Improving Deep Neural Networks | DeepLearning.AI | ~23h |
Description:
Focuses on the mindset and best practices for managing ML projects — how to prioritize errors, build scalable pipelines, and iterate effectively.
Why I chose this course:
It offersstrategic thinking that is often overlooked: how to debug, scale, and manage ML projects in real-world environments.
Skills developed:
- Error analysis and ceiling analysis
- Avoiding data leakage
- Managing train/dev/test splits in production
Course | Offered by | Effort |
---|---|---|
Structuring Machine Learning Projects | DeepLearning.AI | ~06h |
Description:
Explores convolutional architectures used in image recognition, detection, and segmentation tasks — including ResNet and YOLO.
Why I chose this course:
CNNs are essential for working with image data — this course gives ahands-on introduction to convolutional layers and computer vision tasks.
Skills developed:
- Convolutions, pooling, padding
- Deep CNN architectures (ResNet, Inception)
- Image classification and object detection
Course | Offered by | Effort |
---|---|---|
Convolutional Neural Networks | DeepLearning.AI | ~35h |
Description:
Covers how to build models for sequential data, such as time series or natural language, using RNNs, GRUs, LSTMs, and attention mechanisms.
Why I chose this course:
Sequence models power everything fromchatbots to music generation — and this course gives the tools to implement them.
Skills developed:
- Recurrent neural networks (RNN, LSTM, GRU)
- Natural language processing basics
- Attention and sequence-to-sequence models
Course | Offered by | Effort |
---|---|---|
Sequence Models | DeepLearning.AI | ~37h |
This section focuses on the architecture and implementation ofdata warehouses andbusiness intelligence systems — critical infrastructure for enterprise analytics. It covers everything from relational database theory to ETL pipelines and BI reporting.
Main goals in this section:
- ✅ Understand how data warehouses are designed and structured
- ✅ Learn how to build scalable ETL processes and integrate data from multiple sources
- ✅ Apply business intelligence tools to extract actionable insights
- ✅ Prepare for roles in backend analytics, data engineering, and BI architecture
All courses are part of theData Warehousing for Business Intelligence Specialization by the University of Colorado Boulder.
Description:
Covers relational database foundations: relational algebra, SQL queries, schema design, and data integrity enforcement.
Why I chose this course:
It provides thecore theoretical and technical background needed for understanding how relational databases support analytical workloads.
Skills developed:
- Relational model, ER modeling, and constraints
- SQL for data definition and manipulation
- Foundations for OLAP vs. OLTP systems
Course | Offered by | Effort |
---|---|---|
Database Management Essentials | Colorado Boulder | ~122h |
Description:
Introduces dimensional modeling, star/snowflake schemas, and the processes of integrating data from disparate sources into a central warehouse.
Why I chose this course:
It focuses on thedesign principles behind scalable data warehouses, which are crucial for efficient querying and reporting.
Skills developed:
- Dimensional data modeling (facts/dimensions)
- Star, snowflake, and constellation schemas
- ETL design and implementation
Course | Offered by | Effort |
---|---|---|
Data Warehouse Concepts, Design, and Data Integration | Colorado Boulder | ~62h |
Description:
Explores how relational systems support warehouse workloads, including indexing, query optimization, and data partitioning.
Why I chose this course:
It connectsrelational database theory with warehousing practice, helping understand performance and scalability challenges.
Skills developed:
- Query performance tuning
- Materialized views and indexing strategies
- Physical schema design for OLAP
Course | Offered by | Effort |
---|---|---|
Relational Database Support for Data Warehouses | Colorado Boulder | ~71h |
Description:
Covers how BI tools are used to extract, visualize, and act on business data — with case studies and practical examples of analytics dashboards.
Why I chose this course:
To connectdata infrastructure to end-user decision-making, focusing on storytelling, KPIs, and dashboards.
Skills developed:
- BI tool landscape and use cases
- OLAP operations (roll-up, drill-down)
- Data-driven decision frameworks
Course | Offered by | Effort |
---|---|---|
Business Intelligence Concepts, Tools, and Applications | Colorado Boulder | ~21h |
Description:
A capstone-style course that guides you through designing and implementing a working data warehouse, integrating ETL processes and building reports.
Why I chose this course:
It offershands-on experience that ties together all previous concepts — from schema design to final BI delivery.
Skills developed:
- Full warehouse architecture lifecycle
- Data sourcing, transformation, and loading
- Reporting and BI dashboard implementation
Course | Offered by | Effort |
---|---|---|
Design and Build a Data Warehouse for Business Intelligence Implementation | Colorado Boulder | ~31h |
This section focuses on thecore principles of cloud computing, including infrastructure, applications, networking, and practical project deployment. It builds a foundational understanding of how cloud systems work and how todesign scalable, distributed applications in the cloud.
Key goals for this section:
- ✅ Understand cloud infrastructure, virtualization, and scalability
- ✅ Learn how to design and deploy cloud-native applications
- ✅ Explore networking, security, and orchestration in the cloud
- ✅ Complete a practical project simulating real-world deployment
All courses are part of theCloud Computing Specialization by the University of Illinois Urbana-Champaign.
Description:
Introduces the fundamental building blocks of cloud computing, including data centers, virtualization, and service models like IaaS, PaaS, and SaaS.
Why I chose this course:
It builds thefoundational knowledge needed to understand the economics, architecture, and design of modern cloud systems.
Skills developed:
- Cloud service models and deployment strategies
- Virtualization and resource allocation
- Intro to AWS, Google Cloud, and Azure paradigms
Course | Offered by | Effort |
---|---|---|
Cloud Concepts 1 | University of Illinois | ~24h |
Description:
Expands on the first course by discussing elasticity, fault tolerance, containers, and scalability strategies in cloud architecture.
Why I chose this course:
It dives deeper intocloud resilience and elasticity, key aspects for high-availability systems.
Skills developed:
- Containers and microservices
- Cloud scalability and elasticity
- Managing reliability and availability
Course | Offered by | Effort |
---|---|---|
Cloud Concepts 2 | University of Illinois | ~19h |
Description:
Focuses on developing cloud-native applications using APIs, data storage services, and managed compute instances.
Why I chose this course:
It introduces thedeveloper’s perspective, teaching how to design and deploy real applications on the cloud.
Skills developed:
- Cloud APIs and storage models
- Stateless and stateful application design
- Handling scale and concurrency
Course | Offered by | Effort |
---|---|---|
Cloud Applications 1 | University of Illinois | ~15h |
Description:
Continues development topics with a focus on performance, monitoring, container orchestration, and user authentication.
Why I chose this course:
This course emphasizesoperational excellence and monitoring, which are crucial for real-world systems in production.
Skills developed:
- Logging and monitoring cloud apps
- Load balancing and caching
- Authentication and access control
Course | Offered by | Effort |
---|---|---|
Cloud Applications 2 | University of Illinois | ~19h |
Description:
Covers how networking works in cloud environments, including virtual networks, firewalls, routing, and SDNs.
Why I chose this course:
To understandhow services communicate at scale, securely and efficiently across virtualized infrastructure.
Skills developed:
- Virtual Private Clouds (VPCs)
- Network configuration and subnetting
- Load balancers and security groups
Course | Offered by | Effort |
---|---|---|
Cloud Networking | University of Illinois | ~22h |
Description:
A hands-on capstone project where you build and deploy a full-stack application in the cloud, integrating all concepts from the specialization.
Why I chose this course:
To apply all concepts in arealistic, end-to-end scenario, simulating a true production deployment pipeline.
Skills developed:
- App deployment using cloud platforms
- Integrating storage, compute, and networking
- Debugging and monitoring a cloud-native app
Course | Offered by | Effort |
---|---|---|
Cloud Computing Project | University of Illinois | ~21h |
If you're looking for deeper insights, consider these additional resources:
- The Elements of Statistical Learning - Hastie, Tibshirani, Friedman.
- Introduction to Statistical Learning - James, Witten, Hastie, Tibshirani.
- Bayesian Statistics - Peter M. Lee.
- Artificial Intelligence: A Modern Approach - Stuart Russell.
- Deep Learning Papers Reading Roadmap - Collection of AI research papers.
- SQL for Smarties - Joe Celko.
- The Missing Semester of Your CS Education - MIT.
These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.
- Course durations are approximate and based on platform estimates.
- Some books were accessed through university partnerships, but if you don't have access... well, explore alternative ways. If possible, support authors by purchasing them.
- The curriculum iscontinuously evolving as new resources become available.
Sources used to structure this curriculum:
- OSSU Data Science - Open-source university model.
- AI Expert Roadmap - AI & Data Science roadmap.
- Roadmap SH - Learning paths for various tech disciplines.
- USP Statistics Course - Inspiration for course selection.
About
A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.