Movatterモバイル変換


[0]ホーム

URL:


Open In App
Next Article:
How to fetch data from Jira in Python?
Next article icon

In this article, we are going to see how to scrape Reddit using Python, here we will be using python's PRAW (Python Reddit API Wrapper) module to scrape the data. Praw is an acronym Python Reddit API wrapper, it allows Reddit API through Python scripts.

Installation

To install PRAW, run the following commands on the command prompt:

pip install praw

Creating a Reddit App

Step 1:To extract data from Reddit, we need to create a Reddit app. You can create a new Reddit app(https://www.reddit.com/prefs/apps).

Reddit - Create an App

Step 2:Click on "are you a developer? create an app...".

Step 3:A form like this will show up on your screen. Enter the name and description of your choice. In theredirect uri box, enterhttp://localhost:8080

App Form

Step 4:After entering the details, click on "create app".

Developed Application

The Reddit app has been created. Now, we can use python and praw to scrape data from Reddit. Note down the client_id, secret, and user_agent values. These values will be used to connect to Reddit using python.

Creating a PRAW Instance

In order to connect to Reddit, we need to create a praw instance. There are 2 types of praw instances:  

Python3
# Read-only instancereddit_read_only=praw.Reddit(client_id="",# your client idclient_secret="",# your client secretuser_agent="")# your user agent# Authorized instancereddit_authorized=praw.Reddit(client_id="",# your client idclient_secret="",# your client secretuser_agent="",# your user agentusername="",# your reddit usernamepassword="")# your reddit password

Now that we have created an instance, we can use Reddit's API to extract data. In this tutorial, we will be only using the read-only instance.

Scraping Reddit Subreddits

There are different ways of extracting data from a subreddit. The posts in a subreddit are sorted as hot, new, top, controversial, etc. You can use any sorting method of your choice.

Let's extract some information from the redditdev subreddit.

Python3
importprawimportpandasaspdreddit_read_only=praw.Reddit(client_id="",# your client idclient_secret="",# your client secretuser_agent="")# your user agentsubreddit=reddit_read_only.subreddit("redditdev")# Display the name of the Subredditprint("Display Name:",subreddit.display_name)# Display the title of the Subredditprint("Title:",subreddit.title)# Display the description of the Subredditprint("Description:",subreddit.description)

Output:

Name, Title, and Description

Now let's extract 5 hot posts from the Python subreddit:

Python3
subreddit=reddit_read_only.subreddit("Python")forpostinsubreddit.hot(limit=5):print(post.title)print()

Output:

Top 5 hot posts

We will now save the top posts of the python subreddit in a pandas data frame:

Python3
posts=subreddit.top("month")# Scraping the top posts of the current monthposts_dict={"Title":[],"Post Text":[],"ID":[],"Score":[],"Total Comments":[],"Post URL":[]}forpostinposts:# Title of each postposts_dict["Title"].append(post.title)# Text inside a postposts_dict["Post Text"].append(post.selftext)# Unique ID of each postposts_dict["ID"].append(post.id)# The score of a postposts_dict["Score"].append(post.score)# Total number of comments inside the postposts_dict["Total Comments"].append(post.num_comments)# URL of each postposts_dict["Post URL"].append(post.url)# Saving the data in a pandas dataframetop_posts=pd.DataFrame(posts_dict)top_posts

Output:

top posts of the python subreddit

Exporting Data to a CSV File:

Python3
importpandasaspdtop_posts.to_csv("Top Posts.csv",index=True)

Output:

CSV File of Top Posts

Scraping Reddit Posts:

To extract data from Reddit posts, we need the URL of the post. Once we have the URL, we need to create a submission object.

Python3
importprawimportpandasaspdreddit_read_only=praw.Reddit(client_id="",# your client idclient_secret="",# your client secretuser_agent="")# your user agent# URL of the posturl="https://www.reddit.com/r/IAmA/comments/m8n4vt/\im_bill_gates_cochair_of_the_bill_and_melinda/"# Creating a submission objectsubmission=reddit_read_only.submission(url=url)

We will extract the best comments from the post we have selected. We will need the MoreComments object from the praw module. To extract the comments, we will use a for-loop on the submission object. All the comments will be added to the post_comments list. We will also add an if-statement in the for-loop to check whether any comment has the object type of more comments. If it does, it means that our post has more comments available. So we will add these comments to our list as well. Finally, we will convert the list into a pandas data frame.

Python3
frompraw.modelsimportMoreCommentspost_comments=[]forcommentinsubmission.comments:iftype(comment)==MoreComments:continuepost_comments.append(comment.body)# creating a dataframecomments_df=pd.DataFrame(post_comments,columns=['comment'])comments_df

Output:

list into a pandas dataframe


 


How to Scrape Reddit Using Python

Similar Reads

We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood ourCookie Policy &Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences

[8]ページ先頭

©2009-2025 Movatter.jp