Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit83c4743

Browse files
committed
added pdf image extractor tutorial
1 parent79a41f5 commit83c4743

File tree

5 files changed

+47
-0
lines changed

5 files changed

+47
-0
lines changed

‎README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
9090
-[How to Get Domain Name Information in Python](https://www.thepythoncode.com/article/extracting-domain-name-information-in-python). ([code](web-scraping/get-domain-info))
9191
-[How to Extract YouTube Comments in Python](https://www.thepythoncode.com/article/extract-youtube-comments-in-python). ([code](web-scraping/youtube-comments-extractor))
9292
-[How to Extract All PDF Links in Python](https://www.thepythoncode.com/article/extract-pdf-links-with-python). ([code](web-scraping/pdf-url-extractor))
93+
-[How to Extract Images from PDF in Python](https://www.thepythoncode.com/article/extract-pdf-images-in-python). ([code](web-scraping/pdf-image-extractor))
9394

9495
-###[Python Standard Library](https://www.thepythoncode.com/topic/python-standard-library)
9596
-[How to Transfer Files in the Network using Sockets in Python](https://www.thepythoncode.com/article/send-receive-files-using-sockets-python). ([code](general/transfer-files/))
5.09 MB
Binary file not shown.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#[How to Extract Images from PDF in Python](https://www.thepythoncode.com/article/extract-pdf-images-in-python)
2+
To run this:
3+
-`pip3 install -r requirements.txt`
4+
- To extract and save all images of`1710.05006.pdf` PDF file, you run:
5+
```
6+
python pdf_image_extractor.py 1710.05006.pdf
7+
```
8+
This will save all available images in the current directory and outputs:
9+
```
10+
[!] No images found on page 0
11+
[+] Found a total of 3 images in page 1
12+
[+] Found a total of 3 images in page 2
13+
[!] No images found on page 3
14+
[!] No images found on page 4
15+
```
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
importfitz# PyMuPDF
2+
importio
3+
fromPILimportImage
4+
5+
# file path you want to extract images from
6+
file="1710.05006.pdf"
7+
# open the file
8+
pdf_file=fitz.open(file)
9+
# iterate over PDF pages
10+
forpage_indexinrange(len(pdf_file)):
11+
# get the page itself
12+
page=pdf_file[page_index]
13+
image_list=page.getImageList()
14+
# printing number of images found in this page
15+
ifimage_list:
16+
print(f"[+] Found a total of{len(image_list)} images in page{page_index}")
17+
else:
18+
print("[!] No images found on page",page_index)
19+
forimage_index,imginenumerate(page.getImageList(),start=1):
20+
# get the XREF of the image
21+
xref=img[0]
22+
# extract the image bytes
23+
base_image=pdf_file.extractImage(xref)
24+
image_bytes=base_image["image"]
25+
# get the image extension
26+
image_ext=base_image["ext"]
27+
# load it to PIL
28+
image=Image.open(io.BytesIO(image_bytes))
29+
# save it to local disk
30+
image.save(open(f"image{page_index+1}_{image_index}.{image_ext}","wb"))
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
PyMuPDF

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp