Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A complete step-by-step Python web scraping guide for 2025 - learn how to fetch, parse, and analyze website data using Requests and BeautifulSoup.

NotificationsYou must be signed in to change notification settings

IPRoyal/python-web-scraping-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

GitHub Banner

If you want to exploreweb scraping, Python is the best place to start. Thanks to its simple syntax and great library support, Python makes it easy to extract data from websites.

In this tutorial, you’ll learn how to useRequests andBeautiful Soup to scrape web pages and analyze them. As an example, the project will collect post titles from ther/programming subreddit and determine the most mentioned programming languages.


What Is Web Scraping?

Web scraping is the automated collection of data from websites.
Scrapers fetch a page’sHTML and extract needed data. Advanced tools may even useheadless browsers to simulate user actions.

⚠️ Web scraping can break easily when a website’s structure changes. Always check for available APIs before scraping.


Why Use Python?

Python offers unmatched simplicity and a strong ecosystem:

  • Requests for handling HTTP requests
  • BeautifulSoup for HTML parsing
  • Scrapy andPlaywright for advanced use cases

These tools are well-documented, reliable, and widely used by developers.


Setup

You’ll need Python installed. Then, install the libraries:

pip install requestspip install bs4

Create a file namedscraper.py for your code.


Fetching HTML

Fetching page data is the first step. The example below loads the front page ofr/programming from the old Reddit interface.

importrequestspage=requests.get("https://old.reddit.com/r/programming/",headers={'User-agent':'Sorry, learning Python!'})html=page.content

Parsing HTML

To extract titles from the HTML, useBeautifulSoup.

frombs4importBeautifulSoupsoup=BeautifulSoup(html,"html.parser")p_tags=soup.find_all("p","title")titles= [p.find("a").get_text()forpinp_tags]print(titles)

This prints the titles of the posts on the first page.


Scraping Multiple Pages

You can extend the script to scrape multiple pages by looping through them.

importrequestsfrombs4importBeautifulSoupimporttimepost_titles= []next_page="https://old.reddit.com/r/programming/"forcurrent_pageinrange(0,20):page=requests.get(next_page,headers={'User-agent':'Sorry, learning Python!'})html=page.contentsoup=BeautifulSoup(html,"html.parser")p_tags=soup.find_all("p","title")titles= [p.find("a").get_text()forpinp_tags]post_titles+=titlesnext_page=soup.find("span","next-button").find("a")['href']time.sleep(3)print(post_titles)

Finding the Most Mentioned Programming Languages

After scraping, you can analyze which programming languages appear most often in post titles.

language_counter= {"javascript":0,"html":0,"css":0,"sql":0,"python":0,"typescript":0,"java":0,"c#":0,"c++":0,"php":0,"c":0,"powershell":0,"go":0,"rust":0,"kotlin":0,"dart":0,"ruby":0,}words= []fortitleinpost_titles:words+= [word.lower()forwordintitle.split()]forwordinwords:forkeyinlanguage_counter:ifword==key:language_counter[key]+=1print(language_counter)

Using Proxies for Web Scraping

Frequent scraping can get you blocked. Use aproxy server to hide your IP and distribute requests.

Example withIPRoyal Residential Proxies:

PROXIES= {"http":"http://yourusername:yourpassword@geo.iproyal.com:22323","https":"http://yourusername:yourpassword@geo.iproyal.com:22323"}page=requests.get(next_page,headers={'User-agent':'Just learning Python, sorry!'},proxies=PROXIES)

This routes all requests through proxy servers to prevent rate-limiting or bans.


Summary

You’ve learned how to:

  • Fetch and parse HTML with Requests + BeautifulSoup
  • Scrape multiple pages of Reddit
  • Count programming language mentions
  • Add proxy rotation for safer scraping

For more advanced scraping, explore frameworks likeScrapy orPlaywright.

About

A complete step-by-step Python web scraping guide for 2025 - learn how to fetch, parse, and analyze website data using Requests and BeautifulSoup.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp