adobe/pdftools-extract-java-sdk-samplesPublic

NotificationsYou must be signed in to change notification settings
Fork4
Star6

This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.

opensource.adobe.com/pdftools-sdk-docs/beta/extract/

License

MIT license

6 stars 4 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
docs/apidocs		docs/apidocs
src/main		src/main
.gitignore		.gitignore
AUTHORS		AUTHORS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
pdftools-api-credentials.json		pdftools-api-credentials.json
pom.xml		pom.xml
private.key		private.key

Repository files navigation

Samples for the Adobe PDFTools Extract Java SDK

This sample project helps you get started with the PDFTools extract SDK.

The sample classes illustrate how to perform PDF-related extraction (extracting content of PDF in user friendlystructured format) using the SDK.

Prerequisites

The sample application has the following requirements:

Java JDK : Version 8 or above.
Build Tool: The application requires Maven to be installed. Maven installation instructions can be foundhere.

Authentication Setup

The api credentials file and corresponding private key file for the samples ispdftools-api-credentials.json andprivate.keyrespectively. Before the samples can be run, replace both the files with the ones present in the zip file receivedviaBeta Program Access workflow.

The SDK also supports providing the authentication credentials at runtime, without storing them in a config file. Pleaserefer thissection toknow more.

Build with maven

Run the following command to build the project:

mvn clean install

Note that the PDFTools Extract SDK is listed as a dependency in the pom.xml and will be downloaded automatically.

A Note on Logging

For logging, this SDK uses theslf4j API with a log4j2-slf4j binding. The logging configurationsare provided insrc/main/resources/log4j2.properties. Alternate bindings, if required, can be specified in pom.xml.

Structured Information Output Format

The output of SDK extract operation is Zip package. The Zip package consists of following:

The structuredData.json file with the extracted content & PDF element structure. See theJSON schema.
A renditions folder(s) containing renditions for each element type selected as input.The folder name is either “tables” or “figures” depending on your specified element type.Each folder contains renditions with filenames that correspond to the element information in the JSON file.

Running the samples

The following sub-sections describe how to run the samples. Prior to running the samples, check that the credentialsfile is set up as described above and that the project has been built.

The code itself is in thecom.adobe.platform.operation.samples.extractpdf package under thesrc/main/java/ folder. Testfiles used by the samples can be found insrc/main/resources/. When executed, all samples create anoutputchild folder under the working directory to store their results.

Extract PDF Elements from PDF Document

These samples illustrate how to extract PDF elements from PDF. Refer to the documentation ofExtractPDFOperation.javato see the list of inputs.

Extract Text Elements

The sample class ExtractTextInfoFromPDF.java extracts text elements from PDF Document.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoFromPDF

Extract Text, Table Elements

The sample class ExtractTextTableInfoFromPDF extracts text, table elements from PDF Document.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoFromPDF

Extract Text, Table Elements with Renditions of Table Elements

The sample class ExtractTextTableInfoWithRenditionsFromPDF extracts text, table elements along with table renditionsfrom PDF Document. Note that the output is a zip containing the structured information along with renditions as describedinsection.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithRenditionsFromPDF

Extract Text, Table Elements with Renditions of Figure, Table Elements

The sample class ExtractTextTableInfoWithFiguresTablesRenditionsFromPDF extracts text, table elements along with figureand table element's renditions from PDF Document. Note that the output is a zip containing the structured informationalong with renditions as described insection.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithFiguresTablesRenditionsFromPDF

Extract Text Elements (By providing in-memory Authentication credentials)

The sample class ExtractTextInfoFromPDFWithInMemoryAuthCredentials.java extracts text elements from PDF Document.This sample highlights how to provide in-memory auth credentials for performing an operation.This enables the clients to fetch the credentials from a secret server during runtime, instead of storing them in a file.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoFromPDFWithInMemoryAuthCredentials

Extract Text Elements and bounding boxes for Characters present in text blocks

The sample class ExtractTextInfoWithCharBoundsFromPDF extracts text elements and bounding boxes for characters present in text blocks. Note that the output is a zip containing the structured informationalong with renditions as described insection.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoWithCharBoundsFromPDF

Extract Text, Table Elements and bounding boxes for Characters present in text blocks with Renditions of Table Elements

The sample class ExtractTextTableInfoWithCharBoundsFromPDF extracts text, table elements, bounding boxes for characters present in text blocks andtable element's renditions from PDF Document. Note that the output is a zip containing the structured informationalong with renditions as described insection.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithCharBoundsFromPDF

Extract Text, Table Elements with Renditions and CSV's of Table Elements

The sample class ExtractTextTableInfoWithTableStructureFromPdf extracts text, table elements, table structures as CSV andtable element's renditions from PDF Document. Note that the output is a zip containing the structured informationalong with renditions as described insection.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithTableStructureFromPdf

Contributing

Contributions are welcome! Read theContributing Guide for more information.

Licensing

This project is licensed under the MIT License. SeeLICENSE for more information.

About

This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.

opensource.adobe.com/pdftools-sdk-docs/beta/extract/

Code of conduct

Contributing

Security policy

Activity

Custom properties

Stars

6 stars

Watchers

10 watching

Forks

4 forks

Report repository

Languages

Java100.0%

Movatterモバイル変換

License

adobe/pdftools-extract-java-sdk-samples

Folders and files

Latest commit

History

Repository files navigation

Samples for the Adobe PDFTools Extract Java SDK

Prerequisites

Authentication Setup

Build with maven

A Note on Logging

Structured Information Output Format

Running the samples

Extract PDF Elements from PDF Document

Extract Text Elements

Extract Text, Table Elements

Extract Text, Table Elements with Renditions of Table Elements

Extract Text, Table Elements with Renditions of Figure, Table Elements

Extract Text Elements (By providing in-memory Authentication credentials)

Extract Text Elements and bounding boxes for Characters present in text blocks

Extract Text, Table Elements and bounding boxes for Characters present in text blocks with Renditions of Table Elements

Extract Text, Table Elements with Renditions and CSV's of Table Elements

Contributing

Licensing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Uh oh!

Languages

Packages