BobLd/tabula-sharpPublic

NotificationsYou must be signed in to change notification settings
Fork28
Star187

Extract tables from PDF files (port of tabula-java)

License

MIT license

187 stars 28 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github		.github
Tabula.Csv		Tabula.Csv
Tabula.Json		Tabula.Json
Tabula.Tests		Tabula.Tests
Tabula		Tabula
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
tabula-sharp.sln		tabula-sharp.sln

Repository files navigation

tabula-sharp

tabula-sharp is a library for extracting tables from PDF files — it is a port oftabula-java

Supports netstandard2.0, net462, net471, net6.0, net8.0
No java bindings

NuGet packages available on thereleases page and onwww.nuget.org:

Differences with tabula-java

UsesPdfPig, and not PdfBox.
Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).
TheNurminenDetectionAlgorithm is replaced bySimpleNurminenDetectionAlgorithm, because it requieres an image management library.
Table results might be different because of the way PdfPig builds Letters bounding box.

Usage

Stream mode - BasicExtractionAlgorithm

using(PdfDocumentdocument=PdfDocument.Open("doc.pdf",newParsingOptions(){ClipPaths=true})){PageAreapage=ObjectExtractor.Extract(document,1);// detect canditate table zonesSimpleNurminenDetectionAlgorithmdetector=newSimpleNurminenDetectionAlgorithm();varregions=detector.Detect(page);IExtractionAlgorithmea=newBasicExtractionAlgorithm();IReadOnlyList<Table>tables=ea.Extract(page.GetArea(regions[0].BoundingBox));// take first candidate areavartable=tables[0];varrows=table.Rows;}

Lattice mode - SpreadsheetExtractionAlgorithm

using(PdfDocumentdocument=PdfDocument.Open("doc.pdf",newParsingOptions(){ClipPaths=true})){PageAreapage=ObjectExtractor.Extract(document,1);IExtractionAlgorithmea=newSpreadsheetExtractionAlgorithm();IReadOnlyList<Table>tables=ea.Extract(page);vartable=tables[0];varrows=table.Rows;}

Results

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

About

Extract tables from PDF files (port of tabula-java)

Releases10

v0.1.5 Latest

Mar 17, 2025

+ 9 releases

Packages

No packages published

Contributors2

Languages

C#100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

tabula-sharp

Differences with tabula-java

Usage

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

Results

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases10

Packages

Contributors2

Languages

Movatterモバイル変換

License

BobLd/tabula-sharp

Folders and files

Latest commit

History

Repository files navigation

tabula-sharp

Differences with tabula-java

Usage

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

Results

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases10

Packages0

Contributors2

Languages

Packages