- Notifications
You must be signed in to change notification settings - Fork0
This project presents a straightforward and simple approach to collecting, processing and storing data from YouTube channels using the YouTube Data API and storing it in an SQL database, all implemented in a Jupyter Notebook
License
RockManRK/YouTubeDataCollector
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains a Jupyter Notebook (youtube_data_download.ipynb) that demonstrates how to download data from YouTube using the YouTube Data API.
This Python script performs the following tasks:
- Retrieves channel information (name, ID, handle) for specified YouTube channels.
- Collects video data from these channels, including views, likes, comments, and other statistics.
- Processes and formats the data for SQL compatibility.
- Stores the collected data in a MySQL database.
The primary goal of this project is to serve as a portfolio piece and a learning exercise in working with APIs, data processing, and database management.
- Python 3.10.9
- MySQL 8.0.37
- YouTube Data API v3
Before you begin, ensure you have met the following requirements:
- You have installed Python 3.x.
- You have installed Jupyter Notebook.
- You have a Google account to access the YouTube Data API.
For those who want to get up and running quickly:
- Clone the repository:
git clone https://github.com/RockManRK/YouTubeDataCollector - Install dependencies:
pip install -r requirements.txt - Set up your MySQL database:
- Open the
schema.sqlfile and replace{DATABASE_NAME}with your desired database name. - Run the SQL script in your MySQL environment:
mysql -u your_username -p < schema.sql
- Open the
- Update the
.envfile with your database credentials and YouTube API key - Run the Jupyter notebook:
jupyter notebook youtube_data_download.ipynb - Execute all cells in the notebook
For more detailed instructions, see the full installation and usage sections below.
YouTubeDataCollector/│├── .env.example # Example environment variable file├── .gitignore # Git ignore rules├── LICENSE # License file (MIT)├── requirements.txt # Python dependencies├── schema.sql # SQL script to set up the initial database├── youtube_data_download.ipynb # Main Jupyter notebook with the code└── README.md # This file- Clone the repository:
git clone https://github.com/RockManRK/YouTubeDataCollector
- Navigate to the project directory:
cd YouTubeDataCollector - Install the required Python packages:
pip install -r requirements.txt
In the project directory, you'll find a file named
.env.example. This file contains template environment variables.Create a copy of this file and name it
.env. You can do this in several ways:- On Unix-like systems (Linux, macOS), you can use the terminal:
cp .env.example .env
- On Windows, you can use the command prompt:
copy .env.example .env - Alternatively, you can simply create a new file named
.envand copy the contents of.env.exampleinto it using any text editor.
- On Unix-like systems (Linux, macOS), you can use the terminal:
Open the newly created
.envfile in a text editor.Update the values in the
.envfile with your specific details:- Replace
YOUR_YOUTUBE_API_KEYwith the API key you obtained from the Google Developers Console. - Update the database connection details (DB_HOST, DB_USER, DB_PASSWORD, DB_NAME) with your MySQL database information.
- Replace
Save the
.envfile.
This configuration file will be used by the script to access your YouTube API key and connect to your database.
If you haven't obtained a YouTube API key yet, follow these steps:
- Go to theGoogle Developers Console.
- Create a new project or select an existing one.
- Enable the YouTube Data API v3 for your project.
- Create credentials (API Key) for the YouTube Data API.
- Copy the API Key and add it to your
.envfile as described above.
For detailed instructions, refer to theYouTube Data API documentation.
This project requires a MySQL database. Follow these steps to set up the required database structure:
Open the
schema.sqlfile in a text editor.Replace all occurrences of
{DATABASE_NAME}with your desired database name.Run the SQL script in your MySQL environment. You can do this in several ways:
a. Via command line:
mysql -u your_username -p< schema.sqlReplace
your_usernamewith your MySQL username. You'll be prompted to enter your password.b. Or, if you prefer to enter your password directly in the command:
mysql -u your_username -pyour_password< schema.sqlReplace
your_usernameandyour_passwordwith your MySQL credentials. Note that there is no space between-pand your password.c. Alternatively, you can use a MySQL client like MySQL Workbench:
- Open MySQL Workbench and connect to your server
- Open the
schema.sqlfile - Execute the script
This will create the necessary database and tables for the YouTubeDataCollector to function.
Note: Ensure that your MySQL server is running before executing these commands. If you encounter any permission issues, you may need to usesudo (on Unix-like systems) or run your command prompt as an administrator (on Windows).
- Open the Jupyter Notebook:
jupyter notebook youtube_data_download.ipynb
- Run the cells in order to collect data from the specified YouTube channels and store it in your MySQL database.
When you run the notebook, you can expect the following:
- The script will start by executing a function to fetch details of one or more YouTube channels. You'll need to specify the names of the channels you want to collect data from. This function will retrieve the ID, Name, and Handle for each channel.
- Using the retrieved channel information, the script will then authenticate with the YouTube API using your provided key.
- For each channel, it will collect data on recent videos (views, likes, comments, etc.).
- The collected data will be processed and formatted for SQL compatibility.
- Finally, the data will be stored in your configured MySQL database.
Note: The process may take some time depending on the number of channels and videos being processed.
While not part of this repository, the collected data is visualized using Power BI. You can view the dashboard here:YouTube Channel Analytics Dashboard
This project is licensed under the MIT License - see theLICENSE file for details.
Davi Prata
- GitHub:RockManRK
- Email:rockmanrk@hotmail.com
If you have any questions, please open an issue or contactMe.
requirements.txt: This file contains a list of Python packages required to run the project. Ensure you install these packages using the command provided in the Installation section..env.example: This file serves as a template for your environment variables. Copy this file to.envand update it with your specific configuration details.
- YouTube Data API documentation
- MySQL Connector/Python documentation
About
This project presents a straightforward and simple approach to collecting, processing and storing data from YouTube channels using the YouTube Data API and storing it in an SQL database, all implemented in a Jupyter Notebook
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.