Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Big data manipulation and modelling projects

NotificationsYou must be signed in to change notification settings

KevinAbrahamRepo/Data-Analytics-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Included in this repo are some interesting data manipulation and modelling projects that I worked on over the last few months. All analysis was performed inPython 3 (Jupyter Notebook). Below is a brief introduction to each of the projects included.

For more information on the individual projects including some interesting finds during exploratory analysis, please go into the sub-folders. Also looking to improve existing code and extend current functionality so if anyone has got interesting ideas or suggestions for future work, please do let me know!

Projects using Supervised Learning Models:

  1. Analysis on United Kingdoms road safety and traffic demographics dataset obtained fromUK Traffic Dataset - Kaggle with the following key goals:

    • Identify common factors responsible for higher accident rates through various feature engineering techniques
    • Carry out a restrospective study of the historical dataset and perform descriptive analysis (Tableau, Power BI and Excel Power Pivot)
    • Attempt to correct an imbalanced target class (SMOTE, Cluster Centroid, Tomek Links)
    • Perform hyper-paramter tuning usingGridsearchCV (scikit-learn python package) to enhance predictive power of several supervised learning models (KNN, SVM, Naive Bayes, Logistic Regression, Random Forest, Gradient Boost - Scikit-learn)
  2. Analyze several thousand tweets collected usingTwitters Streaming API inJSON format to perform sentiment analysis and classify them into sub categories for a more general consensus. The topic for this NLP project was the 106th#Greycup/#greycup held in Edmonton in November, 2018. Key analytic goals:

    • Perform a clean data pull from Twitter and transform data for analysis in python (Tweepy)
    • Various descriptive and time series analysis for insights (matplotlib (Basemap), Mapboxgl)
    • Build predictive models to classify sentiment of a tweet (Naive Bayes, SVM - Linear/Polynomial)

[8]ページ先頭

©2009-2025 Movatter.jp