- Notifications
You must be signed in to change notification settings - Fork167
CICIFLY/Data-Analytics-Projects
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Certificate :https://graduation.udacity.com/confirm/KUM3F4AJ
This repository is mainly for projects I have done under Udacity-Data-Analysis-Nanodegree.
Udacity online data analyst program prepares me for a career as a data analyst by helping me learn to clean and organize data, uncover patterns and insights, draw meaningful conclusions, and clearly communicate critical findings. I am developing proficiency in Python and its data analysis libraries (Numpy, pandas, Matplotlib) and SQL as I build a portfolio of projects .
Tips: For data science projects with python, I would recomend you to install numpy , pandas , scipy , scikit learn , matplotlib , seaborn thest basic libraries.
Subjects Covered:
- Anaconda: Learn to use Anaconda to manage packages and environments for use with Python
- Jupyter Notebook: Learn to use this open-source web application
- Data Analysis Process
- NumPy for 1 and 2D Data
- Pandas Series and Dataframes
In this project, I choose one of Udacity's curated datasets and investigate it using NumPy and pandas.I complete the entire data analysis process, starting by posing a question and finishing by sharing the findings.( It may be better to place this section inside the readme of the project 1)
I was provided a dataset reflecting data collected from an experiment. I used statistical techniques to answer questions about the data and report my conclusions and recommendations in a report.
Subjects Covered:
- Probability
- Conditional Probability
- Binominal Distribution
- Sampling Distribution and Central Limit Theorem
- Descriptive Statistics
- Inferential Statistics
- Confidence Levels and Intervals
- Hypothesis Testing
- T-tests and A/B test
- Regression
- Multiple Linear Regression
- Logistic Regression
Using Python, I gathered data from a variety of sources, assess its quality and tidiness, then clean it. I documented the wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python and SQL.By using AB Testing and regression methods to decide if the company should launch a new webpage or keep the old one.
Subjects Covered:
- GATHERING DATA:
- Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs
- Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files
- Store gathered data in a PostgreSQL database
- ASSESSING DATA
- Assess data visually and programmatically using pandas
- Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues)
- Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity
- CLEANING DATA
- Identify each step of the data cleaning process (defining, coding,and testing)
- Clean data using Python and pandas
- Test cleaning code visually and programmatically using Python
Collect data from different sources and assess data visually and programmatically , clean data for visulizing data and finding insights later.
Subjects Covered:
- Univariate exploration of data ( histogram , bar charts , Use axis limits and different scales )
- Bivariate exploration of data ( scatter plots , clustered bar charts , violin and bar charts , faceting )
- Multivariate exploration of data ( encodings , plot matrices , feature enginnering )
- Explanatory Visulizations ( story telling with data , polish plots , create slide deck )
Data visualization to a dataset involving the characteristics of diamonds and their prices.
In this project, I used Python’s data visualization tools to systematically explore the bike dataset forits properties and relationships between variables. Then, I created a presentation that communicates the findings to others.
About
This repository contains the projects related to data collecting, assessing,cleaning,visualizations and analyzing
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.