Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A web scraping project extracting book data from 'Books to Scrape', followed by exploratory data analysis (EDA) and visualizations.

NotificationsYou must be signed in to change notification settings

Run-d1/books-web-scraping-and-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains a Python-based project that demonstrates web scraping and data analysis. The project involves extracting book-related data from theBooks to Scrape website, followed by exploratory data analysis (EDA) and visualizations to gain insights from the collected data.

Project Structure

The repository includes the following Jupyter Notebooks:

  1. books-website-scraping.ipynb

    • Extracts book data such as titles, ratings, prices, and availability from the website.
    • Saves the scraped data into a CSV file for further analysis.
  2. books-data-analysis.ipynb

    • Loads the scraped data from the CSV file.
    • Cleans and preprocesses the dataset (e.g., converting ratings to numerical values).
    • Performs EDA and visualizations to analyze pricing, ratings, and other trends.

Features

  • Web Scraping:

    • Extract book details including:
      • Book ID (UPC)
      • Title
      • Category
      • Rating
      • Price
      • Stock availability (Stock status)
      • Quantity available
  • Exploratory Data Analysis (EDA):

    • Visualizes key metrics such as price distributions and rating trends.
    • Identifies relationships between features like price and rating.

Tools and Libraries

  • Web Scraping:

    • requests
    • BeautifulSoup
  • Data Manipulation:

    • pandas
  • Data Visualization:

    • matplotlib
    • seaborn
    • squarify

About the Dataset

The scraped dataset is available on Kaggle:Books Data on Kaggle

Dataset Columns:

  1. ID: Unique Product Code (UPC) for each book.
  2. Title: The title of the book.
  3. Category: Genre or category of the book.
  4. Price [£]: Price in GBP (£).
  5. Rating: Star rating (One to Five) based on customer reviews.
  6. Availability: Whether the book is in stock.
  7. Quantity: The number of available copies.

About

A web scraping project extracting book data from 'Books to Scrape', followed by exploratory data analysis (EDA) and visualizations.

Topics

Resources

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp