JoshData/pdf-diffPublic

NotificationsYou must be signed in to change notification settings
Fork73
Star483

A PDF comparison utility in Python.

License

CC0-1.0 license

483 stars 73 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
pdf_diff		pdf_diff
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
example.png		example.png
setup.py		setup.py

Repository files navigation

pdf-diff

Finds differences between two PDF documents:

Compares the text layers of two PDF documents and outputs the bounding boxes of changed text in JSON.
Rasterizes the changed pages in the PDFs to a PNG and draws red outlines around changed text.

The script is written in Python 3, and it relies on thepdftotext program.

Requirements

libxml2 >= 2.7.0, libxslt >= 1.1.23, poppler

Requirements installation for Ubuntu:

sudo apt-get install python3-lxml poppler-utils

Requirements installation for OS X:

brew install libxml2 libxslt poppler

Installation

From PyPI:

pip install pdf-diff

From source:

sudo python3 setup.py install

Running

Turn two PDFs into one large PNG image showing the differences:

pdf-diff before.pdf after.pdf > comparison_output.png

Maintainer Notes

To deploy:

python3 -m pip install --user --upgrade setuptools wheel twinepython3 setup.py sdist bdist_wheelpython3 -m twine upload dist/*

Function flow diagram

compute_changes│├── serialize_pdf (called twice)│    ├── pdf_to_bboxes│    ├── mark_eol_hyphens│    │    └── mark_eol_hyphen│    └── Processes bounding boxes and text│├── perform_diff│    └── Calls external `fast_diff_match_patch`│└── process_hunks     ├── Iterates through diff hunks     └── mark_difference (called multiple times)render_changes│├── simplify_changes├── make_pages_images│    └── pdftopng (converts PDF pages to images)├── realign_pages│    ├── Splits pages into sub-pages│    └── Adjusts box coordinates├── draw_red_boxes│    └── Annotates images with rectangles or lines└── zealous_crop     └── Crops the image to reduce unnecessary marginsstack_pages│└── Combines processed images into a final output

About

A PDF comparison utility in Python.

Releases

No releases published

Packages

No packages published

Contributors11

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pdf-diff

Requirements

Requirements installation for Ubuntu:

Requirements installation for OS X:

Installation

Running

Maintainer Notes

Function flow diagram

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors11

Uh oh!

Languages

Movatterモバイル変換

License

JoshData/pdf-diff

Folders and files

Latest commit

History

Repository files navigation

pdf-diff

Requirements

Requirements installation for Ubuntu:

Requirements installation for OS X:

Installation

Running

Maintainer Notes

Function flow diagram

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors11

Uh oh!

Languages

Packages