NotificationsYou must be signed in to change notification settings
Fork0
Star0

Using data scraped from mydramalist.com, I will like to create a recommendation system catered for me

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
ahjumma_joy.ipynb		ahjumma_joy.ipynb

Repository files navigation

KDrama

Reference from:https://www.kaggle.com/chanoncharuchinda/sample-top-100-korean-dramas/notebook https://github.com/swati-gwc/DramaList/blob/main/WebScrapingMyDramaList.py https://medium.com/@apoorvareddy612/korean-drama-analysis-4da6146ffaab https://github.com/doranbae/Recommender-for-Korean-Dramas https://feifang.github.io/New-Lipstick-Effect/https://towardsai.net/p/artificial-intelligence/an-ai-practitioners-guide-to-the-kdrama-start-up-56ab95c2afd8

DATASET

Korean Weekly Rating in Naverhttps://search.naver.com/search.naver?where=nexearch&query=%EB%93%9C%EB%9D%BC%EB%A7%88

International Rating in DramaListhttps://mydramalist.com/shows

Most awards wonhttps://haen.ai/l/dataset-of-korean-dramas-with-the-most-awards-won

https://www.reddit.com/r/datasets/comments/ioepvd/looking_for_datasets_related_to_korean_dramas_and/https://dramaqa.snu.ac.kr/Dataset

Tnms & Nielsen ratingshttp://www.koreandrama.org/voice-season-4/

How-TO:

https://www.analyticsvidhya.com/blog/2021/12/comprehensive-project-on-building-a-movie-recommender-website/https://medium.com/analytics-vidhya/recommender-systems-in-10-minutes-2e50b430f98d

Read & Laugh:Start Up Review by an AI engineerhttps://towardsai.net/p/artificial-intelligence/an-ai-practitioners-guide-to-the-kdrama-start-up-56ab95c2afd8

Background

Overview of Project

I love KDrama but not all of them. However, I do not have specific genres and would like to have a Recommender System to introduce me to "new" or older good KDramas.

Another link is the 100 most popular:https://mydramalist.com/shows/top_korean_dramas

Purpose

Analysis And Challenges

Think of potential way to achieve Collaborative Filtering. URL/statistics have some information

Methodology: Analytics Paradigm

https://cloudy.achakladar.com/a-movie-recommender-engine-using-k-means-and-collaborative-filtering-and-deployed-to-kubernetes-ckj7mj1280292w7s1bzxw4uiv

1. Decomposing the Ask

2. Identify the Datasource

While the API for mydramalist.com exists, currently no API keys are given out anymore.

Use webscraping method from:https://towardsdatascience.com/web-scraping-basics-82f8b5acd45c

https://stackoverflow.com/questions/58419896/writing-scraped-data-into-json-using-python

Inspect the website HTML that you want to crawlAccess URL of the website using code and download all the HTML contents on the pageFormat the downloaded content into a readable formatExtract out useful information and save it into a structured formatFor information displayed on multiple pages of the website, you may need to repeat steps 2–4 to have the complete information.

3. Define Strategy & Metrics

Resource: Python 3, Flask, Pandas, Jupyter Notebook, Splinter, Beautiful Soup, PyMongo, MongoDB, HTML5Lib, LXML

4. Data Retrieval Plan

dramalist.com: this is a good site for getting information due to the high number of users and critics therefore making the dataset viable. Also it is the only English site with this traffic.

5. Assemble & Clean the Data

6. Analyse for Trends

7. Acknowledging Limitations

Actually it is better to get information directly from the Korean search engine, Naver, however it is all in Korean and currently I do not have the knowledge on how to deal with NLP or non-English information
The current list does not factor newly released dramas and therefore maybe a bit outdated.
It is most likely that older dramas have more critics and therefore the ranking will be more normalised and/or higher
New dramas might have fewer critics and therefore unreliable ranking
Classification - User-based: Not sure where to get this data for classification
Classification - Product / Item-based: Do not have enough users-based data
How to make use of user reviews in each drama for NLP machine learning?

8. Making the Call:

The "Proper" Conclusion is indicated below onSummary

Analysis

June Temperature Aggregates

June Temperature

Summary

Appendix

About

Using data scraped from mydramalist.com, I will like to create a recommendation system catered for me

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Folders and files

Latest commit

History

Repository files navigation

KDrama

Background

Overview of Project

Purpose

Analysis And Challenges

Methodology: Analytics Paradigm

1. Decomposing the Ask

2. Identify the Datasource

3. Define Strategy & Metrics

4. Data Retrieval Plan

5. Assemble & Clean the Data

6. Analyse for Trends

7. Acknowledging Limitations

8. Making the Call:

Analysis

Summary

Appendix

About

Resources

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

suyinwb/KDrama

Folders and files

Latest commit

History

Repository files navigation

KDrama

Background

Overview of Project

Purpose

Analysis And Challenges

Methodology: Analytics Paradigm

1. Decomposing the Ask

2. Identify the Datasource

3. Define Strategy & Metrics

4. Data Retrieval Plan

5. Assemble & Clean the Data

6. Analyse for Trends

7. Acknowledging Limitations

8. Making the Call:

Analysis

Summary

Appendix

About

Resources

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages