Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
Nick Sweeting edited this pageMay 9, 2024 ·152 revisions

Docker

Overview

Running ArchiveBox with Docker allows you to manage it in a container without exposing it to the rest of your system. ArchiveBox generally works the same in Docker as it does outside Docker. You can even usepip-installed ArchiveBox and Docker ArchiveBox in tandem, as they both share the same data directory format.


Official Docker Hub image:hub.docker.com/r/archivebox/archivebox

docker pull archivebox/archivebox:latest

PublishedDocker tags:

  • :latest,:stable (latest stable release, the default)
  • :x.x and:x.x.x for specific versions (e.g.:0.7 or:0.7.2)
  • :dev for unstable alpha builds (breaks often, only for developers and willing beta testers)
  • :sha-xxxxxxx for builds of specific git commits (to test or pin specific PRs or commits)

Important

Make sure Docker isinstalled and up-to-date before following any instructions below! ➡️
To check installed version, run:docker --version (must be>=17.04.0)


Docker Compose


Setup

A fulldocker-compose.yml file is provided with all the extras included.
You can uncomment sections within it to enable extra features, or run the basic version as-is.

# create a folder to store your data (can be anywhere)mkdir -p~/archivebox/data&&cd~/archivebox# download the compose file into the directorycurl -fsSL'https://docker-compose.archivebox.io'> docker-compose.yml# (shortcut for getting https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/docker-compose.yml)# initialize your collection and create an admin user for the Web UI (or set ADMIN_USERNAME/ADMIN_PASSWORD env vars)docker compose run archivebox initdocker compose run archivebox manage createsuperuser

To useSonic for improved full-text search, download this config & uncomment the sonic service indocker-compose.yml:

# download the sonic config file into your data folder (e.g. ~/archivebox)curl -fsSL'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/etc/sonic.cfg'> sonic.cfg# then uncomment the sonic-related sections in docker-compose.ymlnano docker-compose.yml# to backfill any existing archive data into the search index, run:docker compose run archivebox update --index-only

Upgrading

See the wiki page onUpgrading or Merging Archives: Upgrading with Docker Compose for instructions. ➡️


Usage

You can usedocker compose run archivebox [subcommand] just like the non-Dockerarchivebox [subcommand] CLI.

First, make sure you'recd'ed into the same folder as yourdocker-compose.yml file (e.g.~/archivebox):

docker compose run archiveboxhelp

To add an individual URL, pass it in as an arg or via stdin:

docker compose run archivebox add'https://example.com'# ORecho'https://example.com'| docker compose run -T archivebox add

To add multiple URLs at once, pipe them in via stdin, or place them in a file inside./data/sources so that ArchiveBox can access it from within the container:

# pipe URLs in from a file outside Dockerdocker compose run -T archivebox add<~/Downloads/example_urls.txt# OR ingest URLs from a file mounted inside Dockerdocker compose run archivebox add --depth=1 /data/sources/example_urls.txt# OR pipe in URLs from a remote sourcecurl'https://example.com/some/rss/feed.xml'| docker compose run archivebox adddocker compose run archivebox add --depth=1'https://example.com/some/rss/feed.xml'

The--depth=1 flag tells ArchiveBox to look inside the provided source and archive all the URLs within:

# this archives just the RSS file itself (probably not what you want)docker compose run archivebox add'https://example.com/some/feed.rss'# this archives the RSS feed file + all the URLs mentioned inside of itdocker compose run archivebox add --depth=1'https://example.com/some/feed.rss'

Accessing the data

The outputted archive data is stored indata/ (relative to the project root), or whatever folder path you specified in thedocker-compose.ymlvolumes: section. Make sure thedata/ folder on the host has permissions initially set to777 so that the ArchiveBox command is able to set it to the specifiedOUTPUT_PERMISSIONS config setting on the first run.

To access the results directly via the filesystem, open./data/archive/<timestamp>/index.html (timestamp is shown in output of previous command).

Alternatively, to use the web UI, start the server with:

docker compose up# add -d to run in the background

Then openhttp://127.0.0.1:8000.


Configuration

ArchiveBox running withdocker compose accepts all the same config options as other ArchiveBox distributions, see the full list of options available on theConfiguration page.

The recommended way configure ArchiveBox in Docker Compose is usingarchivebox config --set ... or by editingArchiveBox.conf.

docker compose run archivebox config --set MEDIA_MAX_SIZE=750mb# ORecho'MAX_MEDIA_SIZE=750mb'>> ./data/ArchiveBox.conf

This will apply the config to all containers or archivebox instances that access the collection.

If you're only running one container, or if you want to scope config options to only apply to a particular container, you can set them in that container'senvironment: section:

...services:archivebox:...environment:            -USE_COLOR=False            -SHOW_PROGRESS=False            -CHECK_SSL_VALIDITY=False            -RESOLUTION=1900,1820            -MEDIA_TIMEOUT=512000...

You can also specify an env file via CLI when running compose usingdocker compose --env-file=/path/to/config.env ... although you must specify the variables in theenvironment: section that you want to have passed down to the ArchiveBox container from the passed env file.

If you want to access your archive server with HTTPS, put a reverse proxy like Nginx or Caddy in front ofhttp://127.0.0.1:8000 to do SSL termination. Here is an exampleArchiveBox nginx container +nginx.conf that you can modify to add your preferred TLS settings.




Docker


Setup

Fetch and run the ArchiveBox Docker image to create your initial archive.

docker pull archivebox/archiveboxmkdkir -p~/archivebox/data&&cd~/archivebox/datadocker run -it -v$PWD:/data archivebox/archivebox init --setup

(You can create a collection in any directory you want,~/archivebox/data is just used as an example here)

If you encounter permissions issues, you may need configure user/group ownership explicitly withPUID/PGID.


Upgrading

See the wiki page onUpgrading or Merging Archives: Upgrading with plain Docker for instructions. ➡️


Usage

The Docker CLIdocker run ... archivebox/archivebox [subcommand] works just like the non-Dockerarchivebox [subcommand] CLI.

First, make sure you'recd'ed into your collection data folder (e.g.~/archivebox/data).

docker run -it -v$PWD:/data archivebox/archiveboxhelp

To add a single URL, pass it as an arg or pipe it in via stdin:

docker run -it -v$PWD:/data archivebox/archivebox add'https://example.com'# ORecho'https://example.com'| docker run -i -v$PWD:/data archivebox/archivebox add

To archive multiple URLs at once, pass text containing URLs in via stdin:

docker run -i -v$PWD:/data archivebox/archivebox add< urls.txt# ORcurl'https://example.com/some/rss/feed.xml'| docker run -i -v$PWD:/data archivebox/archivebox add

You can also use the--depth=1 flag to tell ArchiveBox to recursively archive the URLs within a provided source.

docker run -it -v$PWD:/data archivebox/archivebox add --depth=1'https://example.com/some/rss/feed.xml'

Accessing the data

Thedocker run-v /path/on/host:/path/inside/container flag specifies where your data dir lives on the host.

For example to use a folder on an external USB drive (instead of the current directory$PWD or~/archivebox/data):

docker run -it -v /media/USB-DRIVE/archivebox/data:/data archivebox/archivebox ...

Then to view your data, you can look in the folder on the host/media/USB-DRIVE/archivebox/data, or use the Web UI:

docker run -it -v /media/USB_DRIVE/archivebox/data:/data -p 8000:8000 archivebox/archivebox# then open https://127.0.0.1:8000

Configuration

The easiest way is to usearchivebox config --set KEY=value or edit./ArchiveBox.conf (in your collection dir).

For example, this setsMEDIA_TIMEOUT=120 as a persistent setting for the collection:

docker run -it -v$PWD:/data archivebox/archivebox config --set MEDIA_TIMEOUT=120# ORecho'MEDIA_TIMEOUT=120'>> ./ArchiveBox.conf

ArchiveBox in Docker also accepts config as environment variables, see more on theConfiguration page.

For example, this appliesFETCH_SCREENSHOT=False to a single run (without persisting for other runs):

docker run -it -v$PWD:/data -e FETCH_SCREENSHOT=False archivebox/archivebox add'https://example.com'# ORecho'FETCH_SCREENSHOT=False'>> ./.envdocker run ... --env-file=./.env archivebox/archivebox ...
Clone this wiki locally

[8]ページ先頭

©2009-2025 Movatter.jp