Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Python module for batch-downloading media from Reddit, with support for duplicate removal.

License

NotificationsYou must be signed in to change notification settings

NullSingularity3/RedditImageDownloader

Repository files navigation

RedditImageDownloader is a lightweight Python tool for batch-downloading Reddit-hosted images, tailored for AI/ML workflows like training generative models.

Features

  • Batch Download - Fetch media from Reddit posts usingasyncpraw, an asynchronous Python Reddit API wrapper.
  • MD5 Deduplication - Built-in MD5 hash checking to auto-remove reposts and duplicates across multiple subreddits, ensuring dataset quality.
  • Async/Await Download Flow - High-speed, non-blocking downloads to handle large datasets efficiently.
  • Simple CLI - powered by argparse for flexible dataset crawling without coding.
  • Docker-Ready - Easily containerized for reproducible and environment-independent dataset crawling.

Usage

Preliminary

  • clone the repository
git clone https://github.com/kaledgar/RedditImageDownloader
  • Createauthorized reddit application, read aboutReddit API and obtain the necessary credentials, such as the client ID, client secret, username, password, and user agent. Store these credentials in a JSON filecredentials.json in your local repository that you cloned.
{"username":"your reddit username","password":"pw to your reddit account","user_agent":"anything here","client_secret":"client secret of reddit app you create","client_id":"app id, see below for details"}

Run the script

  • Customize the constants.py file if needed, adjusting default file paths or other constants according to your preferences.
  • Install the required dependencies:
# Install requirementspip install -r requirements.txt# Check possible argumentspython3 -m reddit_image_downloader -h# Run module with your custom argumentspython3 -m reddit_image_downloader -rd -u example_user -d'/mnt/d/downloads'

The last command runs the script and downloads media from users given in list and saves it in separate directories.

Run with Docker

To use the "Reddit Image Downloader" with Docker, follow these steps:

  • Adjust the Dockerfile up to your preferences
# build docker imagedocker build -t reddit-image-downloader.# rundocker run -v /your/local/directory:/app/downloads reddit-image-downloader

Pre-commit

To usepre-commit during the development run:

python3 -m venv .vevsource .venv/bin/activatepip install pre-commitpre-commit install

.pre-commit-config.yaml stores thepre-commit configuration.

FAQ

What is client_id and secret_id?

Inauthorized reddit application settings:

image

No permissions error

  1. WIN - Run the script in Powershell Admin session
  2. Linux - run script with sudo

About

Python module for batch-downloading media from Reddit, with support for duplicate removal.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp