smalot/pdfparserPublic

NotificationsYou must be signed in to change notification settings
Fork561
Star2.6k

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.

License

LGPL-3.0 license

2.6k stars 561 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 453 Commits
.github		.github
dev-tools		dev-tools
doc		doc
samples		samples
src/Smalot/PdfParser		src/Smalot/PdfParser
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.php-cs-fixer.php		.php-cs-fixer.php
.scrutinizer.yml		.scrutinizer.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
alt_autoload.php-dist		alt_autoload.php-dist
composer.json		composer.json
phpstan.neon		phpstan.neon
phpunit-windows.xml		phpunit-windows.xml
phpunit.xml		phpunit.xml

Repository files navigation

PDF parser

Thesmalot/pdfparser is a standalone PHP package that provides various tools to extract data from PDF files.

This library is underactive maintenance.There is no active development by the author of this library (at the moment), but we welcome any pull request adding/extending functionality!SeeCONTRIBUTING.md for further information about how to contribute.

Features

Load/parse objects and headers
Extract metadata (author, description, ...)
Extract text from ordered pages
Support of compressed PDFs
Support of MAC OS Roman charset encoding
Handling of hexa and octal encoding in text sections
Create custom configurations (seeCustomConfig.md).

Currently, secured documents and extracting form data are not supported.

License

This library is under theLGPLv3 license.

Install

This library requires PHP 7.1+ sincev1.You can install it viaComposer:

composer require smalot/pdfparser

In case you can't use Composer, you can includealt_autoload.php-dist. It will include all required files automatically.

Quick example

<?php// Parse PDF file and build necessary objects.$parser =new \Smalot\PdfParser\Parser();$pdf =$parser->parseFile('/path/to/document.pdf');$text =$pdf->getText();echo$text;