- Notifications
You must be signed in to change notification settings - Fork0
Python module for batch-downloading media from Reddit, with support for duplicate removal.
License
NullSingularity3/RedditImageDownloader
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
RedditImageDownloader is a lightweight Python tool for batch-downloading Reddit-hosted images, tailored for AI/ML workflows like training generative models.
- Batch Download - Fetch media from Reddit posts usingasyncpraw, an asynchronous Python Reddit API wrapper.
- MD5 Deduplication - Built-in MD5 hash checking to auto-remove reposts and duplicates across multiple subreddits, ensuring dataset quality.
- Async/Await Download Flow - High-speed, non-blocking downloads to handle large datasets efficiently.
- Simple CLI - powered by argparse for flexible dataset crawling without coding.
- Docker-Ready - Easily containerized for reproducible and environment-independent dataset crawling.
- clone the repository
git clone https://github.com/kaledgar/RedditImageDownloader
- Create
authorized reddit application, read aboutReddit APIand obtain the necessary credentials, such as the client ID, client secret, username, password, and user agent. Store these credentials in a JSON filecredentials.jsonin your local repository that you cloned.
{"username":"your reddit username","password":"pw to your reddit account","user_agent":"anything here","client_secret":"client secret of reddit app you create","client_id":"app id, see below for details"}- Customize the constants.py file if needed, adjusting default file paths or other constants according to your preferences.
- Install the required dependencies:
# Install requirementspip install -r requirements.txt# Check possible argumentspython3 -m reddit_image_downloader -h# Run module with your custom argumentspython3 -m reddit_image_downloader -rd -u example_user -d'/mnt/d/downloads'
The last command runs the script and downloads media from users given in list and saves it in separate directories.
To use the "Reddit Image Downloader" with Docker, follow these steps:
- Adjust the Dockerfile up to your preferences
# build docker imagedocker build -t reddit-image-downloader.# rundocker run -v /your/local/directory:/app/downloads reddit-image-downloader
To usepre-commit during the development run:
python3 -m venv .vevsource .venv/bin/activatepip install pre-commitpre-commit install.pre-commit-config.yaml stores thepre-commit configuration.
Inauthorized reddit application settings:
- WIN - Run the script in Powershell Admin session
- Linux - run script with sudo
About
Python module for batch-downloading media from Reddit, with support for duplicate removal.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.
