Commite840928

committed

update pdf image extractor code to fit the new version of PyMuPDF

1 parent94ee964 commite840928Copy full SHA for e840928

File tree

-3

lines changed

-3

lines changed

Lines changed: 4 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -10,17 +10,18 @@`
`10`	`10`	`forpage_indexinrange(len(pdf_file)):`
`11`	`11`	`# get the page itself`
`12`	`12`	`page=pdf_file[page_index]`
`13`		`-image_list=page.getImageList()`
	`13`	`+# get image list`
	`14`	`+image_list=page.get_images()`
`14`	`15`	`# printing number of images found in this page`
`15`	`16`	`ifimage_list:`
`16`	`17`	`print(f"[+] Found a total of{len(image_list)} images in page{page_index}")`
`17`	`18`	`else:`
`18`	`19`	`print("[!] No images found on page",page_index)`
`19`		`-forimage_index,imginenumerate(page.getImageList(),start=1):`
	`20`	`+forimage_index,imginenumerate(image_list,start=1):`
`20`	`21`	`# get the XREF of the image`
`21`	`22`	`xref=img[0]`
`22`	`23`	`# extract the image bytes`
`23`		`-base_image=pdf_file.extractImage(xref)`
	`24`	`+base_image=pdf_file.extract_image(xref)`
`24`	`25`	`image_bytes=base_image["image"]`
`25`	`26`	`# get the image extension`
`26`	`27`	`image_ext=base_image["ext"]`

Comments

(0)