Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Reddit archiver

License

NotificationsYou must be signed in to change notification settings

Yakabuff/redarc

Repository files navigation

A self-hosted solution to search, view and archive link aggregators.

Supports:

  • Reddit
  • HackerNews (in progress)

Features:

  • Ingest pushshift dumps
  • View threads/comments
  • Fulltext search via PostgresFTS
  • Submit threads to be archived via API
  • Periodically fetch rising, new and hot threads from specified subreddits
  • Downloadi.redd.it images from threads.

Please abide by the Reddit Terms of Service andUser Agreement if you are using their API

Alt textAlt text

Download pushshift dumps

https://the-eye.eu/redarcs/

All data 2005-06 to 2022-12:

magnet:?xt=urn:btih:7c0645c94321311bb05bd879ddee4d0eba08aaee&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

Top 20,000 subreddits:

magnet:?xt=urn:btih:c398a571976c78d346c325bd75c47b82edf6124e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

Installation:

Master branch is unstable. Please checkout a release

Docker

Install Docker:https://docs.docker.com/engine/install

Services:

  • postgres: Main database for threads, comments and subreddits
  • postgres_fts: Database for full-text searching
  • redarc: API backend and React frontend
    • Requires:redis,reddit_worker ifINGEST_ENABLED
  • redis: Required for any service that uses a task queue
  • image_downloader: Asynchronously downloads images from Reddit ifDOWNLOAD_IMAGES
    • Requires:redis,reddit_worker
  • index_worker: Indexes threads/comments into postgres_fts
    • Requires:postgres_fts andpostgres
  • reddit_worker: Asynchronously fetches threads/comments from Reddit
    • Requires:redis,image_downloader
  • subreddit_worker: Asynchronously fetches hot/new/rising thread IDs from subreddits
    • Requires:reddit_worker andredis

If you wish to change the postgres password, make surePOSTGRES_PASSWORD andPGPASSWORD are the same.

If you are using redarc on your personal machine, set docker envarsREDARC_API=http://localhost/api andSERVER_NAME=localhost.

REDARC_API is the URL of your API server; it must end with/apieg:http://redarc.mysite.org/api.

REDARC_FE_API is the URL of the API server you want the frontend to send requests to.
If you are not using a reverse proxy, it should be the same asREDARC_API.

SERVER_NAME is the URL your redarc instance is running on. eg:redarc.mysite.org

Setting anINGEST_PASSWORD andADMIN_PASSWORD in your API is highly recommended to prevent abuse.

IMAGE_PATH is the path you wantimage_downloader worker to download images. This is the same path the API backend fetches images from.

INDEX_DELAY is how often you wantindex_worker to index comments/threads

SUBREDDITS is a list of subreddits you wantsubreddit_worker to fetch threads from. It is delimited by commas

FETCH_DELAY is how often yousubreddit_worker to fetch threads.

NUM_THREADS is the number of threads you want downloaded from hot, rising or new.

Docker compose (Recommended):

Docker compose:

Modify envars as needed

$ git clone https://github.com/Yakabuff/redarc.git$ cd redarc$ git fetch --all --tags$ git checkout tags/vx.y.z -b vx.y.z// Modify .env as-needed$ cp default.env .env$ docker compose up -d

Manual installation:

$ git clone https://github.com/Yakabuff/redarc.git$ cd redarc

1) Provision Postgres database

$ docker pull postgres$ docker run \  --name pgsql-dev \  -e POSTGRES_PASSWORD=test1234 \  -d \  -v postgres-docker:/var/lib/postgresql/data \  -p 5432:5432 postgres
$ docker run \  --name pgsql-fts \  -e POSTGRES_PASSWORD=test1234 \  -d \  -v postgresfts-docker:/var/lib/postgresql/data \  -p 5433:5432 postgres
psql -h localhost -U postgres -a -f scripts/db_submissions.sqlpsql -h localhost -U postgres -a -f scripts/db_comments.sqlpsql -h localhost -U postgres -a -f scripts/db_subreddits.sqlpsql -h localhost -U postgres -a -f scripts/db_submissions_index.sqlpsql -h localhost -U postgres -a -f scripts/db_comments_index.sqlpsql -h localhost -U postgres -a -f scripts/db_status_comments.sqlpsql -h localhost -U postgres -a -f scripts/db_status_comments_index.sqlpsql -h localhost -U postgres -a -f scripts/db_status_submissions.sqlpsql -h localhost -U postgres -a -f scripts/db_status_submissions_index.sqlpsql -h localhost -U postgres -p 5433 -a -f scripts/db_fts.sqlpsql -h localhost -U postgres -a -f scripts/db_progress.sql

2) Process dump and insert rows into postgres database with the load_sub/load_comments scripts

Note: Be sure the ingest and Reddit workers are disabled

python3 scripts/load_sub.py <path_to_submission_file>python3 scripts/load_comments.py <path_to_comment_file>python3 scripts/load_sub_fts.py <path_to_submission_file>python3 scripts/load_comments_fts.py <path_to_comment_file>python3 scripts/index.py [subreddit_name]python3 scripts/unlist.py <subreddit> <true|false>

3) Start the API server.

$ cd api$ python -m venv venv$ source venv/bin/activate$ pip install gunicorn$ pip install falcon$ pip install rq$ pip install python-dotenv$ pip install psycopg2-binary$ gunicorn app

4) Start the frontend

cd ../redarc-frontendmv sample.env .env

Set address for API server in the .env file

VITE_API_DOMAIN=http://my-api-server.com/api/
npm inpm run dev // Dev server

5) Provision NGINX (Optional)

Edit nginx/nginx_original.conf with your own values

$ cd ..$ mv nginx/redarc_original.conf /etc/nginx/conf.d/redarc.conf
cd redarc-frontendnpm run build cp -R dist/* /var/www/html/redarc/systemctl restart nginx

6) Setup submission workers

Fill in .env files with your own credentials.

$ docker pull redis$ docker run --name some-redis -d redis$ cd redarc/ingest$ python -m venv venv$ source venv/bin/activate$ pip install rq$ pip install python-dotenv$ pip install praw$ pip install psycopg2-binary$ pip install gallery-dl$ python3 ingest/reddit_worker/reddit_worker.py$ python3 ingest/index_worker/index_worker.py$ python3 ingest/subreddit_worker/subreddit_worker.py$ python3 ingest/image_downloader/image_downloader.py

Ingest data:

Postgres:

Note: Be sure the ingest and Reddit workers are disabled

Ensurepython3,pip andpsycopg2-binary are installed:

# Decompress dumps$ unzstd <submission_file>.zst$ unzstd <comment_file>.zst$ pip install pyscopg2-binary# Change database credentials if needed$ python3 scripts/load_sub.py <path_to_submission_file>$ python3 scripts/load_sub_fts.py <path_to_submission_file>$ python3 scripts/load_comments.py <path_to_comment_file>$ python3 scripts/load_comments_fts.py <path_to_comment_file>$ python3 scripts/index.py [subreddit_name]# Optional$ python3 scripts/unlist.py <subreddit> <true|false>$ python3 scripts/backfill_images.py <subreddit> <after timestamp utc> <num urls>

Web:

  • Submit Reddit URL using the web form/submit to be fetched byreddit_worker
  • Add subreddits to theSUBREDDITS envar (delimited by commas) to be periodically fetched bysubreddit_worker

API:

search/comments?

  • [unflatten = <True/False>]
  • [subreddit = <name>]
  • [id = <id>]
  • [before = <utc_timestamp>]
  • [after = <utc_timestamp>]
  • [parent_id = <parent_id>]
  • [link_id = <link_id>]
  • [sort = <ASC/DESC>]

search/submissions?

  • [subreddit = <name>]
  • [id = <id>]
  • [before = <utc_timestamp>]
  • [after = <utc_timestamp>]
  • [sort = <ASC|DESC>]

search/subreddits

search?

  • <subreddit = <subreddit>>
  • [before = <unix timestamp>]
  • [after = <unix timestamp>]
  • [sort = <asc|desc>]
  • [query = <seach phrase>]
  • <type = <comment|submission>>

License:

Redarc is licensed under the MIT license


[8]ページ先頭

©2009-2025 Movatter.jp