Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
stephengaito edited this pageJun 24, 2020 ·64 revisions

###- 欲速则不达。
- Haste makes waste.

Building pdf2htmlEX

Because of its intimate use ofspecific versions of bothPoppler andFontForge,cleanly buildingpdf2htmlEX is rather more complex thannormal.

The (shell) scripts in thebuildScripts directory help automate this mutli-stage process.

For all but the most experienced programmers, westrongly encourage youto use these scripts to buildpdf2htmlEX.

Downloading precompiled versions

For most users, youprobably really want to simply download one of theprecompiled versions ofpdf2htmlEX:

Environment

pdf2htmlEX can be built in any Unix-like environment:

  • GNU/Linux:pdf2htmlEX is currently built and released inside Ubuntu(Bionic, Eoan, and Focal), Alpine 3.12 docker containers,as well as Ubuntu-Bionic on Travis, sopdf2htmlEX isknown to build on any Debian based distribution.

    The currentbuildScripts assume the use of eitherapt(Debian) orapk (Alpine) for (automatic) installation ofall required dependencies. These scripts should be easilymodified for other distributions.

  • macOS:While it should in principle be possible to build on macOS,unfortunately we currently have no access to a development/testingenvironment with which to ensure thebuildScripts areadequately tuned to build on macOS.

    NOTE that the existinghomebrewbuild scriptisnot up to date and will fail.

    Offers of help and/or temporary access to development/testingmachines would be greatly appreciated.

  • Windows 10 with theWindows Subsystemfor Linux:

    The Debian(Apt) versions of our build scripts should buildpdf2htmlEX (untested).

    The AppImage or Debian archive binary release objectsare reputed to work.

  • Android: Have a look atVilius Sutkus'spdf2htmlEX-Android.

Building yourself

To buildpdf2htmlEX on a Debian/Apt related machine, inside the rootdirectory of a fresh clone of thepdf2htmlEX/pdf2htmlEXrepository, type:

    ./buildScripts/buildInstallLocallyApt

This will automatically install all required development tools andlibraries, and then proceed to download and statically compile therequired versions of both Poppler and FontForge before compiling andinstallingpdf2htmlEX into/usr/local/bin.

NOTE: at the moment this willonly work on machines with aDebian based distribution. such asUbuntu,Linux Mint, etc.

NOTE: there is currently anexperimental build script,./buildScripts/buildInstallLocallyAlpine, for builds in Alpineenvironments.

Dependencies

The definitive list of build dependencies can be found in the following scripts:

  1. getBuildToolsAlpine for Alpine Linux
  2. getBuildToolsApt for Debian based systems
  3. getDevLibrariesAlpine for Alpine Linux
  4. getDevLibrariesApt for Debian based systems

Build options

To buildpdf2htmlEX you require static versions of the Poppler and FontForge libraries in specific 'well-known' locations.

An automatic build usescmake to build all of Poppler, FontForge andpdf2htmlEX.

The definitive list of cmake build options can be found in the following scripts:

  1. buildFontforge
  2. buildPdf2htmlEX
  3. buildPoppler

Why such a complex build system?

The problem

To provide its full functionality, thepdf2htmlEX sources make directuse of source code and unexposed methods from both the Poppler andFontForge projects. Unfortunately the source code in the Poppler andFontForge projects that thepdf2htmlEX uses changes regularly.

This means that thepdf2htmlEX souce codemust be updated regularly tomatchspecific releases of both Poppler and FontForge.

Unfortunately, the installed versions of both Poppler and FontForge inmost Linux distributions, lag the official releases of both of theseprojects. Even worse few distributions install the same versions.

This means that it is nearly impossible for thepdf2htmlEX code to'predict' which version of Poppler or FontForge will be installed on agiven user's machine.

Our solution

While wecould keep multiple versions of thepdf2htmlEX source code,each version matched to a particular distribution's installed versions ofPoppler and FontForge, this would be a logistic and testing 'nightmare'.

Instead, when buildingpdf2htmlEX, we download specific versions of boththe Poppler and FontForge sources (usually the most recent), and thencompilestatic versions of the Poppler and FontForge libraries which arethenstatically linked into thepdf2htmlEX binary.

This means that thepdf2htmlEX binary is completely independent of anylocally installed versions of either Poppler or FontForge.

However, to get the matched versions of Poppler and FontForge and thencompile them statically,our build process becomes much more complexthan a "simple",configure, make, make install cycle.

Hence there are a large number of shell scripts in thebuildScriptsdirectoryeach of which automates one 'simple' step in the overall build process.

When page images are stored as WebP in base64 format instead of PNG, the resulting PDF size is significantly reduced. If the images are called externally as WebP instead of embedding them as base64, the size is reduced by approximately 30% more. Below, I’m sharing an example BASH code block that converts PNGs to WebP and embeds the base64-encoded WebP images into all pages.

# Loop through all .png images in the specified directory (bg*.png)forimgin /path/to/your/directory/bg*.png;do# Extract the image filename without the extension (.png)    img_name=$(basename"$img" .png)# Convert the .png image to .webp format with quality 75 and save it in the same directory    convert"$img" -quality 75"/path/to/your/directory/$img_name.webp"done
# Set the folder path variable to the directory containing the images and other filesfolder_path="/path/to/your/directory"# Loop through all .page files in the specified folderforfilein"$folder_path"/*.page;do# Check if the file is a regular file (not a directory)if [[-f"$file" ]];then# Extract the src URL of the image in the .page file and replace the .png extension with .webp    x=$(grep -oP'src="\K[^"]+'$file| sed's/\.png$//')&& x="$x.webp"# Encode the .webp image file to base64 and save it to encode.txt    base64 /path/to/your/directory/$x> /path/to/your/directory/encode.txt# Remove any newlines from the base64-encoded content and save to a temporary file    cat /path/to/your/directory/encode.txt| tr -d'\n'> /path/to/your/directory/temp_base64.txt# Update the .page file to use the .webp extension instead of .png    sed -i's/\(src="[^"]*\)\.png"/\1.webp"/g'"$file"# Replace the image src in the .page file with the base64-encoded data URI for the .webp image    awk -v x="$x"'NR==FNR{base64=$0; next} {gsub(x, "data:image/webp;base64," base64)}1' \        /path/to/your/directory/temp_base64.txt$file \> /path/to/your/directory/temp.page \&& mv /path/to/your/directory/temp.page$filefidone

Clone this wiki locally


[8]ページ先頭

©2009-2025 Movatter.jp