- Notifications
You must be signed in to change notification settings - Fork496
Building
###- 欲速则不达。
- Haste makes waste.
Because of its intimate use ofspecific versions of bothPoppler andFontForge,cleanly buildingpdf2htmlEX is rather more complex thannormal.
The (shell) scripts in thebuildScripts directory help automate this mutli-stage process.
For all but the most experienced programmers, westrongly encourage youto use these scripts to buildpdf2htmlEX.
For most users, youprobably really want to simply download one of theprecompiled versions ofpdf2htmlEX:
- As aDebian archive
- As anAlpine tar archive
- As anAppImage
- As aDocker image
pdf2htmlEX can be built in any Unix-like environment:
GNU/Linux:
pdf2htmlEXis currently built and released inside Ubuntu(Bionic, Eoan, and Focal), Alpine 3.12 docker containers,as well as Ubuntu-Bionic on Travis, sopdf2htmlEXisknown to build on any Debian based distribution.The current
buildScriptsassume the use of eitherapt(Debian) orapk(Alpine) for (automatic) installation ofall required dependencies. These scripts should be easilymodified for other distributions.macOS:While it should in principle be possible to build on macOS,unfortunately we currently have no access to a development/testingenvironment with which to ensure the
buildScriptsareadequately tuned to build on macOS.NOTE that the existing
homebrewbuild scriptisnot up to date and will fail.Offers of help and/or temporary access to development/testingmachines would be greatly appreciated.
Windows 10 with theWindows Subsystemfor Linux:
The Debian(Apt) versions of our build scripts should build
pdf2htmlEX(untested).The AppImage or Debian archive binary release objectsare reputed to work.
Android: Have a look atVilius Sutkus'spdf2htmlEX-Android.
To buildpdf2htmlEX on a Debian/Apt related machine, inside the rootdirectory of a fresh clone of thepdf2htmlEX/pdf2htmlEXrepository, type:
./buildScripts/buildInstallLocallyAptThis will automatically install all required development tools andlibraries, and then proceed to download and statically compile therequired versions of both Poppler and FontForge before compiling andinstallingpdf2htmlEX into/usr/local/bin.
NOTE: at the moment this willonly work on machines with aDebian based distribution. such asUbuntu,Linux Mint, etc.
NOTE: there is currently anexperimental build script,./buildScripts/buildInstallLocallyAlpine, for builds in Alpineenvironments.
The definitive list of build dependencies can be found in the following scripts:
- getBuildToolsAlpine for Alpine Linux
- getBuildToolsApt for Debian based systems
- getDevLibrariesAlpine for Alpine Linux
- getDevLibrariesApt for Debian based systems
To buildpdf2htmlEX you require static versions of the Poppler and FontForge libraries in specific 'well-known' locations.
An automatic build usescmake to build all of Poppler, FontForge andpdf2htmlEX.
The definitive list of cmake build options can be found in the following scripts:
To provide its full functionality, thepdf2htmlEX sources make directuse of source code and unexposed methods from both the Poppler andFontForge projects. Unfortunately the source code in the Poppler andFontForge projects that thepdf2htmlEX uses changes regularly.
This means that thepdf2htmlEX souce codemust be updated regularly tomatchspecific releases of both Poppler and FontForge.
Unfortunately, the installed versions of both Poppler and FontForge inmost Linux distributions, lag the official releases of both of theseprojects. Even worse few distributions install the same versions.
This means that it is nearly impossible for thepdf2htmlEX code to'predict' which version of Poppler or FontForge will be installed on agiven user's machine.
While wecould keep multiple versions of thepdf2htmlEX source code,each version matched to a particular distribution's installed versions ofPoppler and FontForge, this would be a logistic and testing 'nightmare'.
Instead, when buildingpdf2htmlEX, we download specific versions of boththe Poppler and FontForge sources (usually the most recent), and thencompilestatic versions of the Poppler and FontForge libraries which arethenstatically linked into thepdf2htmlEX binary.
This means that thepdf2htmlEX binary is completely independent of anylocally installed versions of either Poppler or FontForge.
However, to get the matched versions of Poppler and FontForge and thencompile them statically,our build process becomes much more complexthan a "simple",configure, make, make install cycle.
Hence there are a large number of shell scripts in thebuildScriptsdirectoryeach of which automates one 'simple' step in the overall build process.
When page images are stored as WebP in base64 format instead of PNG, the resulting PDF size is significantly reduced. If the images are called externally as WebP instead of embedding them as base64, the size is reduced by approximately 30% more. Below, I’m sharing an example BASH code block that converts PNGs to WebP and embeds the base64-encoded WebP images into all pages.
# Loop through all .png images in the specified directory (bg*.png)forimgin /path/to/your/directory/bg*.png;do# Extract the image filename without the extension (.png) img_name=$(basename"$img" .png)# Convert the .png image to .webp format with quality 75 and save it in the same directory convert"$img" -quality 75"/path/to/your/directory/$img_name.webp"done
# Set the folder path variable to the directory containing the images and other filesfolder_path="/path/to/your/directory"# Loop through all .page files in the specified folderforfilein"$folder_path"/*.page;do# Check if the file is a regular file (not a directory)if [[-f"$file" ]];then# Extract the src URL of the image in the .page file and replace the .png extension with .webp x=$(grep -oP'src="\K[^"]+'$file| sed's/\.png$//')&& x="$x.webp"# Encode the .webp image file to base64 and save it to encode.txt base64 /path/to/your/directory/$x> /path/to/your/directory/encode.txt# Remove any newlines from the base64-encoded content and save to a temporary file cat /path/to/your/directory/encode.txt| tr -d'\n'> /path/to/your/directory/temp_base64.txt# Update the .page file to use the .webp extension instead of .png sed -i's/\(src="[^"]*\)\.png"/\1.webp"/g'"$file"# Replace the image src in the .page file with the base64-encoded data URI for the .webp image awk -v x="$x"'NR==FNR{base64=$0; next} {gsub(x, "data:image/webp;base64," base64)}1' \ /path/to/your/directory/temp_base64.txt$file \> /path/to/your/directory/temp.page \&& mv /path/to/your/directory/temp.page$filefidone