- Notifications
You must be signed in to change notification settings - Fork19
Convert a Kolibri channel in ZIM file(s)
License
openzim/kolibri
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
kolibr2zim
allows you to create aZIM file from a Kolibri Channel.
It downloads the video (webm
ormp4
extension – optionallyrecompress them in lower-quality, smaller size), the thumbnails, thesubtitles and the authors' profile pictures ; then, it create a staticHTML files folder of it before creating a ZIM off of it.
Warning
This scraper is under heavy modifications to prepare a v2 including a brand new UI for navigating the tree of content and a move to Vue.JS. These changesare already merged intomain
branch but not yet completed. Should you be interested in a stable version, please used published versions (PyPI or Docker).We also have av1
branch for any urgent patch needed to current production version.
- Node 20.x
- Python 3.11
ffmpeg
for video transcoding (only used with--use-webm
or--low-quality
).curl
andunzip
to install Javascript dependencies. Seeget_web_deps.sh
if you want to do it manually.
kolibri2zim
is a Python3 software. If you are not using theDocker image, you are advised to use it in avirtual environment to avoid installing software dependencies on your system.
python3 -m venv env# Create virtualenvsource env/bin/Activate# Activate the virtualenv ('env/Scripts/Activate' in Windows)pip3 install kolibri2zim# Install dependencieskolibri2zim --help# Display kolibri2zim help
Calldeactivate
to quit the virtual environment.
Seepyproject.toml
for the list of python dependencies.
To test epubs and pdfs rendering, a potential usefull command is:
kolibri2zim --name"Biblioteca Elejandria" --output /output --tmp-dir /tmp --zim-file Biblioteca_Elejandria.zim --channel-id"fed29d60e4d84a1e8dcfc781d920b40e" --node-ids'd92c07655128458f8248416154b18a68,89fe2f86ee3f4fbaa7fb2bf9bd56d088,75f99e6b97d14b14a4e74762ad77391f,89fe2f86ee3f4fbaa7fb2bf9bd56d088'
docker run -v my_dir:/output ghcr.io/openzim/kolibri kolibri2zim --help
kolibri2zim
works off achannel-id
that you must provide. This is a 32-characters long ID that you can find in the URL of the channel you want, either fromKolibri Studio or theKolibri Catalog
kolibri2zim adheres to openZIM'sContribution Guidelines.
kolibri2zim has implemented openZIM'sPython bootstrap, conventions and policiesv1.0.0.
Before contributing be sure to check out theCONTRIBUTING.md guidelines.
Some usefull test channels:
- 7f744ce8d28b471eaf663abd60c92267: a very minimal channel with all kind of content
- 9f15f4e9aeaa48b5ae271e5749d6fe80 : a small channel with significantly nested items and all kind of content
You have to:
- build the
zimui
frontend which will be embededed inside the ZIM (and redo it every time you make modifications to thezimui
) - run the
scraper
to retrieve FCC curriculum and build the ZIM
Sample commands:
cd zimuiyarn installyarn buildcd ../scraperhatch run kolibri2zim --name "Biblioteca Elejandria" --output output --zim-file Biblioteca_Elejandria.zim --channel-id "fed29d60e4d84a1e8dcfc781d920b40e" --node-ids 'd92c07655128458f8248416154b18a68,89fe2f86ee3f4fbaa7fb2bf9bd56d088,75f99e6b97d14b14a4e74762ad77391f,89fe2f86ee3f4fbaa7fb2bf9bd56d088'
Run from official version (published on GHCR.io) ; ZIM will be available in theoutput
sub-folder of current working directory.
docker run --rm -it -v $(pwd)/output:/output ghcr.io/openzim/kolibri:latest kolibri2zim --name "Biblioteca Elejandria" --output /output --tmp-dir /tmp --zim-file Biblioteca_Elejandria.zim --channel-id "fed29d60e4d84a1e8dcfc781d920b40e" --node-ids 'd92c07655128458f8248416154b18a68,89fe2f86ee3f4fbaa7fb2bf9bd56d088,75f99e6b97d14b14a4e74762ad77391f,89fe2f86ee3f4fbaa7fb2bf9bd56d088'
About
Convert a Kolibri channel in ZIM file(s)