Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commite696905

Browse files
committed
update pdf tables extractor tutorial
1 parent4c2a0e8 commite696905

File tree

7 files changed

+33
-8
lines changed

7 files changed

+33
-8
lines changed
5.09 MB
Binary file not shown.

‎general/pdf-table-extractor/README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
#[How to Extract PDF Tables in Python](https://www.thepythoncode.com/article/extract-pdf-tables-in-python-camelot)
22
To run this:
3-
- You need to install required dependencies for the library[here](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#install-deps).
4-
-`pip3 install -r requirements.txt`
5-
- Extract PDFs of the file`foo.pdf`:
6-
```
7-
python pdf_table_extractor.py foo.pdf
8-
```
3+
- You need to install required dependencies for the camelot library[here](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#install-deps).
4+
-`pip3 install -r requirements.txt`.
5+
-`pdf_table_extractor_camelot.py` is using camelot library.
6+
-`pdf_table_extractor_tabula.py` is using tabula-py library.

‎general/pdf-table-extractor/pdf_table_extractor.pyrenamed to‎general/pdf-table-extractor/pdf_table_extractor_camelot.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,10 @@
1313
# print the first table as Pandas DataFrame
1414
print(tables[0].df)
1515

16-
# export individually
16+
# export individually as CSV
1717
tables[0].to_csv("foo.csv")
18+
# export individually as Excel (.xlsx extension)
19+
tables[0].to_excel("foo.xlsx")
1820

1921
# or export all in a zip
2022
tables.export("foo.csv",f="csv",compress=True)
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
importtabula
2+
importos
3+
# uncomment if you want to pass pdf file from command line arguments
4+
# import sys
5+
6+
# read PDF file
7+
# uncomment if you want to pass pdf file from command line arguments
8+
# tables = tabula.read_pdf(sys.argv[1], pages="all")
9+
tables=tabula.read_pdf("1710.05006.pdf",pages="all")
10+
11+
# save them in a folder
12+
folder_name="tables"
13+
ifnotos.path.isdir(folder_name):
14+
os.mkdir(folder_name)
15+
# iterate over extracted tables and export as excel individually
16+
fori,tableinenumerate(tables,start=1):
17+
table.to_excel(os.path.join(folder_name,f"table_{i}.xlsx"),index=False)
18+
19+
# convert all tables of a PDF file into a single CSV file
20+
# supported output_formats are "csv", "json" or "tsv"
21+
tabula.convert_into("1710.05006.pdf","output.csv",output_format="csv",pages="all")
22+
# convert all PDFs in a folder into CSV format
23+
# `pdfs` folder should exist in the current directory
24+
tabula.convert_into_by_batch("pdfs",output_format="csv",pages="all")
5.09 MB
Binary file not shown.
82.2 KB
Binary file not shown.
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
camelot-py[cv]
1+
camelot-py[cv]
2+
tabula-py

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp