- Notifications
You must be signed in to change notification settings - Fork4
This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.
License
adobe/pdftools-extract-java-sdk-samples
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This sample project helps you get started with the PDFTools extract SDK.
The sample classes illustrate how to perform PDF-related extraction (extracting content of PDF in user friendlystructured format) using the SDK.
The sample application has the following requirements:
- Java JDK : Version 8 or above.
- Build Tool: The application requires Maven to be installed. Maven installation instructions can be foundhere.
The api credentials file and corresponding private key file for the samples ispdftools-api-credentials.json andprivate.keyrespectively. Before the samples can be run, replace both the files with the ones present in the zip file receivedviaBeta Program Access workflow.
The SDK also supports providing the authentication credentials at runtime, without storing them in a config file. Pleaserefer thissection toknow more.
Run the following command to build the project:
mvn clean installNote that the PDFTools Extract SDK is listed as a dependency in the pom.xml and will be downloaded automatically.
For logging, this SDK uses theslf4j API with a log4j2-slf4j binding. The logging configurationsare provided insrc/main/resources/log4j2.properties. Alternate bindings, if required, can be specified in pom.xml.
The output of SDK extract operation is Zip package. The Zip package consists of following:
- The structuredData.json file with the extracted content & PDF element structure. See theJSON schema.
- A renditions folder(s) containing renditions for each element type selected as input.The folder name is either “tables” or “figures” depending on your specified element type.Each folder contains renditions with filenames that correspond to the element information in the JSON file.
The following sub-sections describe how to run the samples. Prior to running the samples, check that the credentialsfile is set up as described above and that the project has been built.
The code itself is in thecom.adobe.platform.operation.samples.extractpdf package under thesrc/main/java/ folder. Testfiles used by the samples can be found insrc/main/resources/. When executed, all samples create anoutputchild folder under the working directory to store their results.
These samples illustrate how to extract PDF elements from PDF. Refer to the documentation ofExtractPDFOperation.javato see the list of inputs.
The sample class ExtractTextInfoFromPDF.java extracts text elements from PDF Document.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoFromPDFThe sample class ExtractTextTableInfoFromPDF extracts text, table elements from PDF Document.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoFromPDFThe sample class ExtractTextTableInfoWithRenditionsFromPDF extracts text, table elements along with table renditionsfrom PDF Document. Note that the output is a zip containing the structured information along with renditions as describedinsection.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithRenditionsFromPDFThe sample class ExtractTextTableInfoWithFiguresTablesRenditionsFromPDF extracts text, table elements along with figureand table element's renditions from PDF Document. Note that the output is a zip containing the structured informationalong with renditions as described insection.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithFiguresTablesRenditionsFromPDFThe sample class ExtractTextInfoFromPDFWithInMemoryAuthCredentials.java extracts text elements from PDF Document.This sample highlights how to provide in-memory auth credentials for performing an operation.This enables the clients to fetch the credentials from a secret server during runtime, instead of storing them in a file.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoFromPDFWithInMemoryAuthCredentialsThe sample class ExtractTextInfoWithCharBoundsFromPDF extracts text elements and bounding boxes for characters present in text blocks. Note that the output is a zip containing the structured informationalong with renditions as described insection.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoWithCharBoundsFromPDFExtract Text, Table Elements and bounding boxes for Characters present in text blocks with Renditions of Table Elements
The sample class ExtractTextTableInfoWithCharBoundsFromPDF extracts text, table elements, bounding boxes for characters present in text blocks andtable element's renditions from PDF Document. Note that the output is a zip containing the structured informationalong with renditions as described insection.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithCharBoundsFromPDFThe sample class ExtractTextTableInfoWithTableStructureFromPdf extracts text, table elements, table structures as CSV andtable element's renditions from PDF Document. Note that the output is a zip containing the structured informationalong with renditions as described insection.
mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithTableStructureFromPdfContributions are welcome! Read theContributing Guide for more information.
This project is licensed under the MIT License. SeeLICENSE for more information.
About
This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.
Topics
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.