- Notifications
You must be signed in to change notification settings - Fork27
Extract tables from PDF files (port of tabula-java)
License
NotificationsYou must be signed in to change notification settings
BobLd/tabula-sharp
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
tabula-sharp
is a library for extracting tables from PDF files — it is a port oftabula-java
- Supports netstandard2.0, net462, net471, net6.0, net8.0
- No java bindings
NuGet packages available on thereleases page and onwww.nuget.org:
- UsesPdfPig, and not PdfBox.
- Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).
- The
NurminenDetectionAlgorithm
is replaced bySimpleNurminenDetectionAlgorithm
, because it requieres an image management library. - Table results might be different because of the way PdfPig builds Letters bounding box.
using(PdfDocumentdocument=PdfDocument.Open("doc.pdf",newParsingOptions(){ClipPaths=true})){PageAreapage=ObjectExtractor.Extract(document,1);// detect canditate table zonesSimpleNurminenDetectionAlgorithmdetector=newSimpleNurminenDetectionAlgorithm();varregions=detector.Detect(page);IExtractionAlgorithmea=newBasicExtractionAlgorithm();IReadOnlyList<Table>tables=ea.Extract(page.GetArea(regions[0].BoundingBox));// take first candidate areavartable=tables[0];varrows=table.Rows;}
using(PdfDocumentdocument=PdfDocument.Open("doc.pdf",newParsingOptions(){ClipPaths=true})){PageAreapage=ObjectExtractor.Extract(document,1);IExtractionAlgorithmea=newSpreadsheetExtractionAlgorithm();IReadOnlyList<Table>tables=ea.Extract(page);vartable=tables[0];varrows=table.Rows;}
About
Extract tables from PDF files (port of tabula-java)
Topics
Resources
License
Stars
Watchers
Forks
Packages0
No packages published