khuyentran1401/Extract-text-from-articlePublic

NotificationsYou must be signed in to change notification settings
Fork0
Star6

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
images		images
Find common words in article-2.ipynb		Find common words in article-2.ipynb
README.md		README.md

Repository files navigation

About this project

This project extracts the text from an article using Python Article Library and uses NLTK (Natural Language Processing Toolkit) to preprocess the text and extract the most common words in the article

Tools

Newspaper3k: tool to scrape article
NLTK: tool to process text

Steps

Scrape articles with newspaper3k

fromnewspaperimportArticleurl='https://mystudentvoices.com/it-took-me-2-years-to-get-1000-followers-life-lessons-ive-learned-throughout-the-journey-9bc44f2959f0'article=Article(url)article.download()

Find the publish date

article.publish_date

Extract image
Find the author
Find the keywords
Find the summary
Preprocessing with NLTK
- Tokenize text
- Lowercase and remove stopwords
Visualization the frequency of words with Matplotlib

Tutorial blog

Find the Medium article for this repositoryhere

About

No description or website provided.

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Folders and files

Latest commit

History

Repository files navigation

About this project

Tools

Steps

Tutorial blog

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

khuyentran1401/Extract-text-from-article

Folders and files

Latest commit

History

Repository files navigation

About this project

Tools

Steps

Tutorial blog

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages