- Notifications
You must be signed in to change notification settings - Fork13
Yakabuff/redarc
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A self-hosted solution to search, view and archive link aggregators.
- HackerNews (in progress)
- Ingest pushshift dumps
- View threads/comments
- Fulltext search via PostgresFTS
- Submit threads to be archived via API
- Periodically fetch rising, new and hot threads from specified subreddits
- Download
i.redd.it
images from threads.
Please abide by the Reddit Terms of Service andUser Agreement if you are using their API
https://the-eye.eu/redarcs/
All data 2005-06 to 2022-12:
magnet:?xt=urn:btih:7c0645c94321311bb05bd879ddee4d0eba08aaee&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
Top 20,000 subreddits:
magnet:?xt=urn:btih:c398a571976c78d346c325bd75c47b82edf6124e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
Master branch is unstable. Please checkout a release
Install Docker:https://docs.docker.com/engine/install
Services:
postgres
: Main database for threads, comments and subredditspostgres_fts
: Database for full-text searchingredarc
: API backend and React frontend- Requires:
redis
,reddit_worker
ifINGEST_ENABLED
- Requires:
redis
: Required for any service that uses a task queueimage_downloader
: Asynchronously downloads images from Reddit ifDOWNLOAD_IMAGES
- Requires:
redis
,reddit_worker
- Requires:
index_worker
: Indexes threads/comments into postgres_fts- Requires:
postgres_fts
andpostgres
- Requires:
reddit_worker
: Asynchronously fetches threads/comments from Reddit- Requires:
redis
,image_downloader
- Requires:
subreddit_worker
: Asynchronously fetches hot/new/rising thread IDs from subreddits- Requires:
reddit_worker
andredis
- Requires:
If you wish to change the postgres password, make surePOSTGRES_PASSWORD
andPGPASSWORD
are the same.
If you are using redarc on your personal machine, set docker envarsREDARC_API=http://localhost/api
andSERVER_NAME=localhost
.
REDARC_API
is the URL of your API server; it must end with/api
eg:http://redarc.mysite.org/api
.
REDARC_FE_API
is the URL of the API server you want the frontend to send requests to.
If you are not using a reverse proxy, it should be the same asREDARC_API
.
SERVER_NAME
is the URL your redarc instance is running on. eg:redarc.mysite.org
Setting anINGEST_PASSWORD
andADMIN_PASSWORD
in your API is highly recommended to prevent abuse.
IMAGE_PATH
is the path you wantimage_downloader
worker to download images. This is the same path the API backend fetches images from.
INDEX_DELAY
is how often you wantindex_worker
to index comments/threads
SUBREDDITS
is a list of subreddits you wantsubreddit_worker
to fetch threads from. It is delimited by commas
FETCH_DELAY
is how often yousubreddit_worker
to fetch threads.
NUM_THREADS
is the number of threads you want downloaded from hot, rising or new.
Docker compose:
Modify envars as needed
$ git clone https://github.com/Yakabuff/redarc.git$ cd redarc$ git fetch --all --tags$ git checkout tags/vx.y.z -b vx.y.z// Modify .env as-needed$ cp default.env .env$ docker compose up -d
$ git clone https://github.com/Yakabuff/redarc.git$ cd redarc
$ docker pull postgres$ docker run \ --name pgsql-dev \ -e POSTGRES_PASSWORD=test1234 \ -d \ -v postgres-docker:/var/lib/postgresql/data \ -p 5432:5432 postgres
$ docker run \ --name pgsql-fts \ -e POSTGRES_PASSWORD=test1234 \ -d \ -v postgresfts-docker:/var/lib/postgresql/data \ -p 5433:5432 postgres
psql -h localhost -U postgres -a -f scripts/db_submissions.sqlpsql -h localhost -U postgres -a -f scripts/db_comments.sqlpsql -h localhost -U postgres -a -f scripts/db_subreddits.sqlpsql -h localhost -U postgres -a -f scripts/db_submissions_index.sqlpsql -h localhost -U postgres -a -f scripts/db_comments_index.sqlpsql -h localhost -U postgres -a -f scripts/db_status_comments.sqlpsql -h localhost -U postgres -a -f scripts/db_status_comments_index.sqlpsql -h localhost -U postgres -a -f scripts/db_status_submissions.sqlpsql -h localhost -U postgres -a -f scripts/db_status_submissions_index.sqlpsql -h localhost -U postgres -p 5433 -a -f scripts/db_fts.sqlpsql -h localhost -U postgres -a -f scripts/db_progress.sql
Note: Be sure the ingest and Reddit workers are disabled
python3 scripts/load_sub.py <path_to_submission_file>python3 scripts/load_comments.py <path_to_comment_file>python3 scripts/load_sub_fts.py <path_to_submission_file>python3 scripts/load_comments_fts.py <path_to_comment_file>python3 scripts/index.py [subreddit_name]python3 scripts/unlist.py <subreddit> <true|false>
$ cd api$ python -m venv venv$ source venv/bin/activate$ pip install gunicorn$ pip install falcon$ pip install rq$ pip install python-dotenv$ pip install psycopg2-binary$ gunicorn app
cd ../redarc-frontendmv sample.env .env
Set address for API server in the .env file
VITE_API_DOMAIN=http://my-api-server.com/api/
npm inpm run dev // Dev server
Edit nginx/nginx_original.conf with your own values
$ cd ..$ mv nginx/redarc_original.conf /etc/nginx/conf.d/redarc.conf
cd redarc-frontendnpm run build cp -R dist/* /var/www/html/redarc/systemctl restart nginx
Fill in .env files with your own credentials.
$ docker pull redis$ docker run --name some-redis -d redis$ cd redarc/ingest$ python -m venv venv$ source venv/bin/activate$ pip install rq$ pip install python-dotenv$ pip install praw$ pip install psycopg2-binary$ pip install gallery-dl$ python3 ingest/reddit_worker/reddit_worker.py$ python3 ingest/index_worker/index_worker.py$ python3 ingest/subreddit_worker/subreddit_worker.py$ python3 ingest/image_downloader/image_downloader.py
Note: Be sure the ingest and Reddit workers are disabled
Ensurepython3
,pip
andpsycopg2-binary
are installed:
# Decompress dumps$ unzstd <submission_file>.zst$ unzstd <comment_file>.zst$ pip install pyscopg2-binary# Change database credentials if needed$ python3 scripts/load_sub.py <path_to_submission_file>$ python3 scripts/load_sub_fts.py <path_to_submission_file>$ python3 scripts/load_comments.py <path_to_comment_file>$ python3 scripts/load_comments_fts.py <path_to_comment_file>$ python3 scripts/index.py [subreddit_name]# Optional$ python3 scripts/unlist.py <subreddit> <true|false>$ python3 scripts/backfill_images.py <subreddit> <after timestamp utc> <num urls>
- Submit Reddit URL using the web form
/submit
to be fetched byreddit_worker
- Add subreddits to the
SUBREDDITS
envar (delimited by commas) to be periodically fetched bysubreddit_worker
search/comments?
[unflatten = <True/False>]
[subreddit = <name>]
[id = <id>]
[before = <utc_timestamp>]
[after = <utc_timestamp>]
[parent_id = <parent_id>]
[link_id = <link_id>]
[sort = <ASC/DESC>]
search/submissions?
[subreddit = <name>]
[id = <id>]
[before = <utc_timestamp>]
[after = <utc_timestamp>]
[sort = <ASC|DESC>]
search/subreddits
search?
<subreddit = <subreddit>>
[before = <unix timestamp>]
[after = <unix timestamp>]
[sort = <asc|desc>]
[query = <seach phrase>]
<type = <comment|submission>>
Redarc is licensed under the MIT license
About
Reddit archiver
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.