- Notifications
You must be signed in to change notification settings - Fork12
Yakabuff/redarc
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A self-hosted solution to search, view and archive link aggregators.
- HackerNews (in progress)
- Ingest pushshift dumps
- View threads/comments
- Fulltext search via PostgresFTS
- Submit threads to be archived via API
- Periodically fetch rising, new and hot threads from specified subreddits
- Download
i.redd.it
images from threads.
Please abide by the Reddit Terms of Service andUser Agreement if you are using their API
https://the-eye.eu/redarcs/
All data 2005-06 to 2022-12:
magnet:?xt=urn:btih:7c0645c94321311bb05bd879ddee4d0eba08aaee&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
Top 20,000 subreddits:
magnet:?xt=urn:btih:c398a571976c78d346c325bd75c47b82edf6124e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
Master branch is unstable. Please checkout a release
Install Docker:https://docs.docker.com/engine/install
Services:
postgres
: Main database for threads, comments and subredditspostgres_fts
: Database for full-text searchingredarc
: API backend and React frontend- Requires:
redis
,reddit_worker
ifINGEST_ENABLED
- Requires:
redis
: Required for any service that uses a task queueimage_downloader
: Asynchronously downloads images from Reddit ifDOWNLOAD_IMAGES
- Requires:
redis
,reddit_worker
- Requires:
index_worker
: Indexes threads/comments into postgres_fts- Requires:
postgres_fts
andpostgres
- Requires:
reddit_worker
: Asynchronously fetches threads/comments from Reddit- Requires:
redis
,image_downloader
- Requires:
subreddit_worker
: Asynchronously fetches hot/new/rising thread IDs from subreddits- Requires:
reddit_worker
andredis
- Requires:
If you wish to change the postgres password, make surePOSTGRES_PASSWORD
andPGPASSWORD
are the same.
If you are using redarc on your personal machine, set docker envarsREDARC_API=http://localhost/api
andSERVER_NAME=localhost
.
REDARC_API
is the URL of your API server; it must end with/api
eg:http://redarc.mysite.org/api
.
REDARC_FE_API
is the URL of the API server you want the frontend to send requests to.
If you are not using a reverse proxy, it should be the same asREDARC_API
.
SERVER_NAME
is the URL your redarc instance is running on. eg:redarc.mysite.org
Setting anINGEST_PASSWORD
andADMIN_PASSWORD
in your API is highly recommended to prevent abuse.
IMAGE_PATH
is the path you wantimage_downloader
worker to download images. This is the same path the API backend fetches images from.
INDEX_DELAY
is how often you wantindex_worker
to index comments/threads
SUBREDDITS
is a list of subreddits you wantsubreddit_worker
to fetch threads from. It is delimited by commas
FETCH_DELAY
is how often yousubreddit_worker
to fetch threads.
NUM_THREADS
is the number of threads you want downloaded from hot, rising or new.
Docker compose:
Modify envars as needed
$ git clone https://github.com/Yakabuff/redarc.git$ cd redarc$ git fetch --all --tags$ git checkout tags/vx.y.z -b vx.y.z// Modify .env as-needed$ cp default.env .env$ docker compose up -d
$ git clone https://github.com/Yakabuff/redarc.git$ cd redarc
$ docker pull postgres$ docker run \ --name pgsql-dev \ -e POSTGRES_PASSWORD=test1234 \ -d \ -v postgres-docker:/var/lib/postgresql/data \ -p 5432:5432 postgres
$ docker run \ --name pgsql-fts \ -e POSTGRES_PASSWORD=test1234 \ -d \ -v postgresfts-docker:/var/lib/postgresql/data \ -p 5433:5432 postgres
psql -h localhost -U postgres -a -f scripts/db_submissions.sqlpsql -h localhost -U postgres -a -f scripts/db_comments.sqlpsql -h localhost -U postgres -a -f scripts/db_subreddits.sqlpsql -h localhost -U postgres -a -f scripts/db_submissions_index.sqlpsql -h localhost -U postgres -a -f scripts/db_comments_index.sqlpsql -h localhost -U postgres -a -f scripts/db_status_comments.sqlpsql -h localhost -U postgres -a -f scripts/db_status_comments_index.sqlpsql -h localhost -U postgres -a -f scripts/db_status_submissions.sqlpsql -h localhost -U postgres -a -f scripts/db_status_submissions_index.sqlpsql -h localhost -U postgres -p 5433 -a -f scripts/db_fts.sqlpsql -h localhost -U postgres -a -f scripts/db_progress.sql
Note: Be sure the ingest and Reddit workers are disabled
python3 scripts/load_sub.py <path_to_submission_file>python3 scripts/load_comments.py <path_to_comment_file>python3 scripts/load_sub_fts.py <path_to_submission_file>python3 scripts/load_comments_fts.py <path_to_comment_file>python3 scripts/index.py [subreddit_name]python3 scripts/unlist.py <subreddit> <true|false>
$ cd api$ python -m venv venv$ source venv/bin/activate$ pip install gunicorn$ pip install falcon$ pip install rq$ pip install python-dotenv$ pip install psycopg2-binary$ gunicorn app
cd ../redarc-frontendmv sample.env .env
Set address for API server in the .env file
VITE_API_DOMAIN=http://my-api-server.com/api/
npm inpm run dev // Dev server
Edit nginx/nginx_original.conf with your own values
$ cd ..$ mv nginx/redarc_original.conf /etc/nginx/conf.d/redarc.conf
cd redarc-frontendnpm run build cp -R dist/* /var/www/html/redarc/systemctl restart nginx
Fill in .env files with your own credentials.
$ docker pull redis$ docker run --name some-redis -d redis$ cd redarc/ingest$ python -m venv venv$ source venv/bin/activate$ pip install rq$ pip install python-dotenv$ pip install praw$ pip install psycopg2-binary$ pip install gallery-dl$ python3 ingest/reddit_worker/reddit_worker.py$ python3 ingest/index_worker/index_worker.py$ python3 ingest/subreddit_worker/subreddit_worker.py$ python3 ingest/image_downloader/image_downloader.py
Note: Be sure the ingest and Reddit workers are disabled
Ensurepython3
,pip
andpsycopg2-binary
are installed:
# Decompress dumps$ unzstd <submission_file>.zst$ unzstd <comment_file>.zst$ pip install pyscopg2-binary# Change database credentials if needed$ python3 scripts/load_sub.py <path_to_submission_file>$ python3 scripts/load_sub_fts.py <path_to_submission_file>$ python3 scripts/load_comments.py <path_to_comment_file>$ python3 scripts/load_comments_fts.py <path_to_comment_file>$ python3 scripts/index.py [subreddit_name]# Optional$ python3 scripts/unlist.py <subreddit> <true|false>$ python3 scripts/backfill_images.py <subreddit> <after timestamp utc> <num urls>
- Submit Reddit URL using the web form
/submit
to be fetched byreddit_worker
- Add subreddits to the
SUBREDDITS
envar (delimited by commas) to be periodically fetched bysubreddit_worker
search/comments?
[unflatten = <True/False>]
[subreddit = <name>]
[id = <id>]
[before = <utc_timestamp>]
[after = <utc_timestamp>]
[parent_id = <parent_id>]
[link_id = <link_id>]
[sort = <ASC/DESC>]
search/submissions?
[subreddit = <name>]
[id = <id>]
[before = <utc_timestamp>]
[after = <utc_timestamp>]
[sort = <ASC|DESC>]
search/subreddits
search?
<subreddit = <subreddit>>
[before = <unix timestamp>]
[after = <unix timestamp>]
[sort = <asc|desc>]
[query = <seach phrase>]
<type = <comment|submission>>
Redarc is licensed under the MIT license
About
Reddit archiver