Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork1.3k
Docker
Running ArchiveBox with Docker allows you to manage it in a container without exposing it to the rest of your system. ArchiveBox generally works the same in Docker as it does outside Docker. You can even usepip
-installed ArchiveBox and Docker ArchiveBox in tandem, as they both share the same data directory format.
- Overview
- Docker Compose ⭐️ (recommended)
- Plain Docker
Official Docker Hub image:hub.docker.com/r/archivebox/archivebox
docker pull archivebox/archivebox:latest
PublishedDocker tags:
:latest
,:stable
(latest stable release, the default):x.x
and:x.x.x
for specific versions (e.g.:0.7
or:0.7.2
):dev
for unstable alpha builds (breaks often, only for developers and willing beta testers):sha-xxxxxxx
for builds of specific git commits (to test or pin specific PRs or commits)
Important
Make sure Docker isinstalled and up-to-date before following any instructions below! ➡️
To check installed version, run:docker --version
(must be>=17.04.0
)

A fulldocker-compose.yml
file is provided with all the extras included.
You can uncomment sections within it to enable extra features, or run the basic version as-is.
# create a folder to store your data (can be anywhere)mkdir -p~/archivebox/data&&cd~/archivebox# download the compose file into the directorycurl -fsSL'https://docker-compose.archivebox.io'> docker-compose.yml# (shortcut for getting https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/docker-compose.yml)# initialize your collection and create an admin user for the Web UI (or set ADMIN_USERNAME/ADMIN_PASSWORD env vars)docker compose run archivebox initdocker compose run archivebox manage createsuperuser
To useSonic for improved full-text search, download this config & uncomment the sonic service indocker-compose.yml
:
# download the sonic config file into your data folder (e.g. ~/archivebox)curl -fsSL'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/etc/sonic.cfg'> sonic.cfg# then uncomment the sonic-related sections in docker-compose.ymlnano docker-compose.yml# to backfill any existing archive data into the search index, run:docker compose run archivebox update --index-only
See the wiki page onUpgrading or Merging Archives: Upgrading with Docker Compose for instructions. ➡️
You can usedocker compose run archivebox [subcommand]
just like the non-Dockerarchivebox [subcommand]
CLI.
First, make sure you'recd
'ed into the same folder as yourdocker-compose.yml
file (e.g.~/archivebox
):
docker compose run archiveboxhelp
To add an individual URL, pass it in as an arg or via stdin:
docker compose run archivebox add'https://example.com'# ORecho'https://example.com'| docker compose run -T archivebox add
To add multiple URLs at once, pipe them in via stdin, or place them in a file inside./data/sources
so that ArchiveBox can access it from within the container:
# pipe URLs in from a file outside Dockerdocker compose run -T archivebox add<~/Downloads/example_urls.txt# OR ingest URLs from a file mounted inside Dockerdocker compose run archivebox add --depth=1 /data/sources/example_urls.txt# OR pipe in URLs from a remote sourcecurl'https://example.com/some/rss/feed.xml'| docker compose run archivebox adddocker compose run archivebox add --depth=1'https://example.com/some/rss/feed.xml'
The--depth=1
flag tells ArchiveBox to look inside the provided source and archive all the URLs within:
# this archives just the RSS file itself (probably not what you want)docker compose run archivebox add'https://example.com/some/feed.rss'# this archives the RSS feed file + all the URLs mentioned inside of itdocker compose run archivebox add --depth=1'https://example.com/some/feed.rss'
The outputted archive data is stored indata/
(relative to the project root), or whatever folder path you specified in thedocker-compose.yml
volumes:
section. Make sure thedata/
folder on the host has permissions initially set to777
so that the ArchiveBox command is able to set it to the specifiedOUTPUT_PERMISSIONS
config setting on the first run.
To access the results directly via the filesystem, open./data/archive/<timestamp>/index.html
(timestamp is shown in output of previous command).
Alternatively, to use the web UI, start the server with:
docker compose up# add -d to run in the background
Then openhttp://127.0.0.1:8000
.
ArchiveBox running withdocker compose
accepts all the same config options as other ArchiveBox distributions, see the full list of options available on theConfiguration page.
The recommended way configure ArchiveBox in Docker Compose is usingarchivebox config --set ...
or by editingArchiveBox.conf
.
docker compose run archivebox config --set MEDIA_MAX_SIZE=750mb# ORecho'MAX_MEDIA_SIZE=750mb'>> ./data/ArchiveBox.conf
This will apply the config to all containers or archivebox instances that access the collection.
If you're only running one container, or if you want to scope config options to only apply to a particular container, you can set them in that container'senvironment:
section:
...services:archivebox:...environment: -USE_COLOR=False -SHOW_PROGRESS=False -CHECK_SSL_VALIDITY=False -RESOLUTION=1900,1820 -MEDIA_TIMEOUT=512000...
You can also specify an env file via CLI when running compose usingdocker compose --env-file=/path/to/config.env ...
although you must specify the variables in theenvironment:
section that you want to have passed down to the ArchiveBox container from the passed env file.
If you want to access your archive server with HTTPS, put a reverse proxy like Nginx or Caddy in front ofhttp://127.0.0.1:8000
to do SSL termination. Here is an exampleArchiveBox nginx container +nginx.conf
that you can modify to add your preferred TLS settings.
Fetch and run the ArchiveBox Docker image to create your initial archive.
docker pull archivebox/archiveboxmkdkir -p~/archivebox/data&&cd~/archivebox/datadocker run -it -v$PWD:/data archivebox/archivebox init --setup
(You can create a collection in any directory you want,~/archivebox/data
is just used as an example here)
If you encounter permissions issues, you may need configure user/group ownership explicitly withPUID
/PGID
.
See the wiki page onUpgrading or Merging Archives: Upgrading with plain Docker for instructions. ➡️
The Docker CLIdocker run ... archivebox/archivebox [subcommand]
works just like the non-Dockerarchivebox [subcommand]
CLI.
First, make sure you'recd
'ed into your collection data folder (e.g.~/archivebox/data
).
docker run -it -v$PWD:/data archivebox/archiveboxhelp
To add a single URL, pass it as an arg or pipe it in via stdin:
docker run -it -v$PWD:/data archivebox/archivebox add'https://example.com'# ORecho'https://example.com'| docker run -i -v$PWD:/data archivebox/archivebox add
To archive multiple URLs at once, pass text containing URLs in via stdin:
docker run -i -v$PWD:/data archivebox/archivebox add< urls.txt# ORcurl'https://example.com/some/rss/feed.xml'| docker run -i -v$PWD:/data archivebox/archivebox add
You can also use the--depth=1
flag to tell ArchiveBox to recursively archive the URLs within a provided source.
docker run -it -v$PWD:/data archivebox/archivebox add --depth=1'https://example.com/some/rss/feed.xml'
Thedocker run
-v /path/on/host:/path/inside/container
flag specifies where your data dir lives on the host.
For example to use a folder on an external USB drive (instead of the current directory$PWD
or~/archivebox/data
):
docker run -it -v /media/USB-DRIVE/archivebox/data:/data archivebox/archivebox ...
Then to view your data, you can look in the folder on the host/media/USB-DRIVE/archivebox/data
, or use the Web UI:
docker run -it -v /media/USB_DRIVE/archivebox/data:/data -p 8000:8000 archivebox/archivebox# then open https://127.0.0.1:8000
The easiest way is to usearchivebox config --set KEY=value
or edit./ArchiveBox.conf
(in your collection dir).
For example, this setsMEDIA_TIMEOUT=120
as a persistent setting for the collection:
docker run -it -v$PWD:/data archivebox/archivebox config --set MEDIA_TIMEOUT=120# ORecho'MEDIA_TIMEOUT=120'>> ./ArchiveBox.conf
ArchiveBox in Docker also accepts config as environment variables, see more on theConfiguration page.
For example, this appliesFETCH_SCREENSHOT=False
to a single run (without persisting for other runs):
docker run -it -v$PWD:/data -e FETCH_SCREENSHOT=False archivebox/archivebox add'https://example.com'# ORecho'FETCH_SCREENSHOT=False'>> ./.envdocker run ... --env-file=./.env archivebox/archivebox ...
- Upgrading
- Setting up Storage (NFS/SMB/S3/etc)
- Setting up Authentication (SSO/LDAP/etc)
- Setting up Search (rg/sonic/etc)
- Scheduled Archiving
- Publishing Your Archive
- Chromium Install
- Cookies & Sessions Setup
- Merging Collections
- Troubleshooting
- ⭐️Web Archiving Community
- Background & Motivation
- Comparison to Other Tools
- Architecture Diagram
- Changelog &Roadmap