- Notifications
You must be signed in to change notification settings - Fork0
suyinwb/KDrama
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Reference from:https://www.kaggle.com/chanoncharuchinda/sample-top-100-korean-dramas/notebookhttps://github.com/swati-gwc/DramaList/blob/main/WebScrapingMyDramaList.pyhttps://medium.com/@apoorvareddy612/korean-drama-analysis-4da6146ffaabhttps://github.com/doranbae/Recommender-for-Korean-Dramashttps://feifang.github.io/New-Lipstick-Effect/https://towardsai.net/p/artificial-intelligence/an-ai-practitioners-guide-to-the-kdrama-start-up-56ab95c2afd8
DATASET
Korean Weekly Rating in Naverhttps://search.naver.com/search.naver?where=nexearch&query=%EB%93%9C%EB%9D%BC%EB%A7%88
International Rating in DramaListhttps://mydramalist.com/shows
Most awards wonhttps://haen.ai/l/dataset-of-korean-dramas-with-the-most-awards-won
https://www.reddit.com/r/datasets/comments/ioepvd/looking_for_datasets_related_to_korean_dramas_and/https://dramaqa.snu.ac.kr/Dataset
Tnms & Nielsen ratingshttp://www.koreandrama.org/voice-season-4/
How-TO:
https://www.analyticsvidhya.com/blog/2021/12/comprehensive-project-on-building-a-movie-recommender-website/https://medium.com/analytics-vidhya/recommender-systems-in-10-minutes-2e50b430f98d
Read & Laugh:Start Up Review by an AI engineerhttps://towardsai.net/p/artificial-intelligence/an-ai-practitioners-guide-to-the-kdrama-start-up-56ab95c2afd8
I love KDrama but not all of them. However, I do not have specific genres and would like to have a Recommender System to introduce me to "new" or older good KDramas.
Another link is the 100 most popular:https://mydramalist.com/shows/top_korean_dramas
- Think of potential way to achieve Collaborative Filtering. URL/statistics have some information
While the API for mydramalist.com exists, currently no API keys are given out anymore.
Use webscraping method from:https://towardsdatascience.com/web-scraping-basics-82f8b5acd45c
https://stackoverflow.com/questions/58419896/writing-scraped-data-into-json-using-python
Inspect the website HTML that you want to crawlAccess URL of the website using code and download all the HTML contents on the pageFormat the downloaded content into a readable formatExtract out useful information and save it into a structured formatFor information displayed on multiple pages of the website, you may need to repeat steps 2–4 to have the complete information.
Resource: Python 3, Flask, Pandas, Jupyter Notebook, Splinter, Beautiful Soup, PyMongo, MongoDB, HTML5Lib, LXML
- dramalist.com: this is a good site for getting information due to the high number of users and critics therefore making the dataset viable. Also it is the only English site with this traffic.
Actually it is better to get information directly from the Korean search engine, Naver, however it is all in Korean and currently I do not have the knowledge on how to deal with NLP or non-English information
The current list does not factor newly released dramas and therefore maybe a bit outdated.
It is most likely that older dramas have more critics and therefore the ranking will be more normalised and/or higher
New dramas might have fewer critics and therefore unreliable ranking
Classification - User-based: Not sure where to get this data for classification
Classification - Product / Item-based: Do not have enough users-based data
How to make use of user reviews in each drama for NLP machine learning?
The "Proper" Conclusion is indicated below onSummary
June Temperature Aggregates
June Temperature