Run-d1/books-web-scraping-and-analysisPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

A web scraping project extracting book data from 'Books to Scrape', followed by exploratory data analysis (EDA) and visualizations.

www.kaggle.com/datasets/randsj/books-dataset

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
books-data-analysis.ipynb		books-data-analysis.ipynb
books-data.csv		books-data.csv
books-website-scraping.ipynb		books-website-scraping.ipynb

Repository files navigation

Books Web Scraping and Analysis

This repository contains a Python-based project that demonstrates web scraping and data analysis. The project involves extracting book-related data from theBooks to Scrape website, followed by exploratory data analysis (EDA) and visualizations to gain insights from the collected data.

Project Structure

The repository includes the following Jupyter Notebooks:

books-website-scraping.ipynb
- Extracts book data such as titles, ratings, prices, and availability from the website.
- Saves the scraped data into a CSV file for further analysis.
books-data-analysis.ipynb
- Loads the scraped data from the CSV file.
- Cleans and preprocesses the dataset (e.g., converting ratings to numerical values).
- Performs EDA and visualizations to analyze pricing, ratings, and other trends.

Features

Web Scraping:
- Extract book details including:
  - Book ID (UPC)
  - Title
  - Category
  - Rating
  - Price
  - Stock availability (Stock status)
  - Quantity available
Exploratory Data Analysis (EDA):
- Visualizes key metrics such as price distributions and rating trends.
- Identifies relationships between features like price and rating.

Tools and Libraries

Web Scraping:
- requests
- BeautifulSoup
Data Manipulation:
- pandas
Data Visualization:
- matplotlib
- seaborn
- squarify

About the Dataset

The scraped dataset is available on Kaggle:Books Data on Kaggle

Dataset Columns:

ID: Unique Product Code (UPC) for each book.
Title: The title of the book.
Category: Genre or category of the book.
Price [£]: Price in GBP (£).
Rating: Star rating (One to Five) based on customer reviews.
Availability: Whether the book is in stock.
Quantity: The number of available copies.

About

A web scraping project extracting book data from 'Books to Scrape', followed by exploratory data analysis (EDA) and visualizations.

www.kaggle.com/datasets/randsj/books-dataset

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Books Web Scraping and Analysis

Project Structure

Features

Tools and Libraries

About the Dataset

Dataset Columns:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages

Movatterモバイル変換

Run-d1/books-web-scraping-and-analysis

Folders and files

Latest commit

History

Repository files navigation

Books Web Scraping and Analysis

Project Structure

Features

Tools and Libraries

About the Dataset

Dataset Columns:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages