StevieCode/pythoncode-tutorialsPublic

forked fromx4nth055/pythoncode-tutorials

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Commite696905

committed

update pdf tables extractor tutorial

1 parent4c2a0e8 commite696905Copy full SHA for e696905

File tree

7 files changed

+33

-8

lines changed

general/pdf-table-extractor

7 files changed

+33

-8

lines changed

`‎general/pdf-table-extractor/1710.05006.pdf`

5.09 MB

Binary file not shown.

`‎general/pdf-table-extractor/README.md`

Lines changed: 4 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,8 +1,6 @@`
`1`	`1`	`#[How to Extract PDF Tables in Python](https://www.thepythoncode.com/article/extract-pdf-tables-in-python-camelot)`
`2`	`2`	`To run this:`
`3`		`-- You need to install required dependencies for the library[here](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#install-deps).`
`4`		--`pip3 install -r requirements.txt`
`5`		-- Extract PDFs of the file`foo.pdf`:
`6`		-```
`7`		`- python pdf_table_extractor.py foo.pdf`
`8`		- ```
	`3`	`+- You need to install required dependencies for the camelot library[here](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#install-deps).`
	`4`	+-`pip3 install -r requirements.txt`.
	`5`	+-`pdf_table_extractor_camelot.py` is using camelot library.
	`6`	+-`pdf_table_extractor_tabula.py` is using tabula-py library.

`‎general/pdf-table-extractor/pdf_table_extractor.pyrenamed to‎general/pdf-table-extractor/pdf_table_extractor_camelot.py`

Lines changed: 3 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -13,8 +13,10 @@`
`13`	`13`	`# print the first table as Pandas DataFrame`
`14`	`14`	`print(tables[0].df)`
`15`	`15`
`16`		`-# export individually`
	`16`	`+# export individually as CSV`
`17`	`17`	`tables[0].to_csv("foo.csv")`
	`18`	`+# export individually as Excel (.xlsx extension)`
	`19`	`+tables[0].to_excel("foo.xlsx")`
`18`	`20`
`19`	`21`	`# or export all in a zip`
`20`	`22`	`tables.export("foo.csv",f="csv",compress=True)`

`‎general/pdf-table-extractor/pdf_table_extractor_tabula.py`

Lines changed: 24 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,24 @@`
	`1`	`+importtabula`
	`2`	`+importos`
	`3`	`+# uncomment if you want to pass pdf file from command line arguments`
	`4`	`+# import sys`
	`5`	`+`
	`6`	`+# read PDF file`
	`7`	`+# uncomment if you want to pass pdf file from command line arguments`
	`8`	`+# tables = tabula.read_pdf(sys.argv[1], pages="all")`
	`9`	`+tables=tabula.read_pdf("1710.05006.pdf",pages="all")`
	`10`	`+`
	`11`	`+# save them in a folder`
	`12`	`+folder_name="tables"`
	`13`	`+ifnotos.path.isdir(folder_name):`
	`14`	`+os.mkdir(folder_name)`
	`15`	`+# iterate over extracted tables and export as excel individually`
	`16`	`+fori,tableinenumerate(tables,start=1):`
	`17`	`+table.to_excel(os.path.join(folder_name,f"table_{i}.xlsx"),index=False)`
	`18`	`+`
	`19`	`+# convert all tables of a PDF file into a single CSV file`
	`20`	`+# supported output_formats are "csv", "json" or "tsv"`
	`21`	`+tabula.convert_into("1710.05006.pdf","output.csv",output_format="csv",pages="all")`
	`22`	`+# convert all PDFs in a folder into CSV format`
	`23`	+# `pdfs` folder should exist in the current directory
	`24`	`+tabula.convert_into_by_batch("pdfs",output_format="csv",pages="all")`

`‎general/pdf-table-extractor/pdfs/1710.05006.pdf`

5.09 MB

Binary file not shown.

`‎general/pdf-table-extractor/pdfs/foo.pdf`

82.2 KB

Binary file not shown.

`‎general/pdf-table-extractor/requirements.txt`

Lines changed: 2 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -1 +1,2 @@`
`1`		`-camelot-py[cv]`
	`1`	`+camelot-py[cv]`
	`2`	`+tabula-py`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commite696905

File tree

7 files changed

7 files changed

`‎general/pdf-table-extractor/1710.05006.pdf`

`‎general/pdf-table-extractor/README.md`

`‎general/pdf-table-extractor/pdf_table_extractor.pyrenamed to‎general/pdf-table-extractor/pdf_table_extractor_camelot.py`

`‎general/pdf-table-extractor/pdf_table_extractor_tabula.py`

`‎general/pdf-table-extractor/pdfs/1710.05006.pdf`

`‎general/pdf-table-extractor/pdfs/foo.pdf`

`‎general/pdf-table-extractor/requirements.txt`

0 commit comments