
{xmpdf} provides functions for getting and settingExtensibeMetadata Platform (XMP) metadata in a variety of media file formatsas well as getting and setting PDFdocumentationinfo entries andbookmarks(aka outline aka table of contents).
remotes::install_github("trevorld/r-xmpdf")Depending on what you’d like to do you’ll need to install someadditional R packages and/or command-line tools:
{qpdf} canbe used to concatenate pdf files together as well as get the number ofpages in a pdf. Note currently a dependency of{pdftools}.
install.packages("qpdf"){pdftools} canbe used to get bookmarks and documentation info entries in pdf files.Note currently depends on{qpdf}.
install.packages("pdftools") will probably install{qpdf} as wellexiftool canbe used to get/set xmp metadata in a variety of media files as well asdocumentation info entries in pdf files. Can also be used to get thenumber of pages in a pdf. Note can be installed by{exiftoolr}.
install.packages("exiftoolr"); exiftoolr::install_exiftool()(Cross-Platform)sudo apt-get install libimage-exiftool-perl(Debian/Ubuntu)brew install exiftool (Homebrew)choco install exiftool (Chocolately)ghostscript can be usedto set bookmarks and documentation info entries in pdf files. Can alsobe used to concatenate pdf files together as well as get the number ofpages in a pdf.
sudo apt-get install ghostscript (Debian/Ubuntu)brew install ghostscript (Homebrew)choco install ghostscript (Chocolately)pdftk-java orperhapspdftkcan be used to get/set bookmarks and documentation info entries in pdffiles.
Can also be used to concatenate pdf files together as well as get thenumber of pages in a pdf.
sudo apt-get install pdftk-java (Debian/Ubuntu)brew install pdftk-java (Homebrew)choco install pdftk-java (Chocolately)A simple example where we create a two page pdf usingpdf() and then add XMP metadata, PDF documentation infometadata, and PDF bookmarks to it:
library("xmpdf")# Create a two page pdf using `pdf()`f<-tempfile(fileext =".pdf")pdf(f,onefile =TRUE)grid::grid.text("Page 1")grid::grid.newpage()grid::grid.text("Page 2")invisible(dev.off())# See what default metadata `pdf()` createdget_docinfo(f)[[1]]|>print()## Author: NULL## CreationDate: 2024-03-27T23:19:05## Creator: R## Producer: R 4.3.3## Title: R Graphics Output## Subject: NULL## Keywords: NULL## ModDate: 2024-03-27T23:19:05get_xmp(f)[[1]]|>print()## No XMP metadata foundget_bookmarks(f)[[1]]|>print()## [1] title page level count open color fontface## <0 rows> (or 0-length row.names)# Edit PDF documentation infod<-get_docinfo(f)[[1]]|>update(author ="John Doe",subject ="A minimal document to demonstrate {xmpdf} features on",title ="Two Boring Pages",keywords =c("R","xmpdf"))set_docinfo(d, f)get_docinfo(f)[[1]]|>print()## Author: John Doe## CreationDate: 2024-03-27T23:19:05## Creator: R## Producer: R 4.3.3## Title: Two Boring Pages## Subject: A minimal document to demonstrate {xmpdf} features on## Keywords: R, xmpdf## ModDate: 2024-03-27T23:19:05# Edit XMP metadatax<-as_xmp(d)|>update(attribution_url ="https://example.com/attribution",date_created =Sys.Date(),spdx_id ="CC-BY-4.0")set_xmp(x, f)get_xmp(f)[[1]]|>print()## cc:attributionName := John Doe## cc:attributionURL := https://example.com/attribution## cc:license := https://creativecommons.org/licenses/by/4.0/## dc:creator := John Doe## dc:description := A minimal document to demonstrate {xmpdf} features on## dc:rights := © 2024 John Doe. Some rights reserved.## dc:subject := R, xmpdf## dc:title := Two Boring Pages## pdf:Keywords := R, xmpdf## pdf:Producer := R 4.3.3## photoshop:Credit := John Doe## photoshop:DateCreated := 2024-03-27## x:XMPToolkit := Image::ExifTool 12.40## xmp:CreateDate := 2024-03-27T23:19:05## xmp:CreatorTool := R## xmp:ModifyDate := 2024-03-27T23:19:05## xmpRights:Marked := TRUE## xmpRights:UsageTerms := This work is licensed to the public under the Creative Commons## Attribution 4.0 International license## https://creativecommons.org/licenses/by/4.0/## xmpRights:WebStatement := https://creativecommons.org/licenses/by/4.0/# Edit PDF bookmarksbm<-data.frame(title =c("Page 1","Page 2"),page =c(1,2))set_bookmarks(bm, f)get_bookmarks(f)[[1]]|>print()## title page level count open color fontface## 1 Page 1 1 1 NA NA <NA> <NA>## 2 Page 2 2 1 NA NA <NA> <NA>Besides pdf files withexiftool we can also edit the XMPmetadata fora large number ofimage formats including “gif”, “png”, “jpeg”, “tiff”, and “webp”. Inparticular we may be interested in setting the subset ofIPTCPhoto XMP metadata displayed by Google Images as well as embeddingCreative Commonslicense XMP metadata.
library("xmpdf")f<-tempfile(fileext =".png")png(f)grid::grid.text("This is an image!")dev.off()|>invisible()get_xmp(f)[[1]]|>print()## No XMP metadata foundx<-xmp(attribution_url ="https://example.com/attribution",creator ="John Doe",description ="An image caption",date_created =Sys.Date(),spdx_id ="CC-BY-4.0")print(x,mode ="google_images",xmp_only =TRUE)## dc:creator := John Doe## => dc:rights = © 2024 John Doe. Some rights reserved.## => photoshop:Credit = John Doe## X plus:Licensor (not currently supported by {xmpdf})## => xmpRights:WebStatement = https://creativecommons.org/licenses/by/4.0/print(x,mode ="creative_commons",xmp_only =TRUE)## => cc:attributionName = John Doe## cc:attributionURL := https://example.com/attribution## => cc:license = https://creativecommons.org/licenses/by/4.0/## cc:morePermissions := NULL## => dc:rights = © 2024 John Doe. Some rights reserved.## => xmpRights:Marked = TRUE## => xmpRights:UsageTerms = This work is licensed to the public under the Creative Commons## Attribution 4.0 International license## https://creativecommons.org/licenses/by/4.0/## => xmpRights:WebStatement = https://creativecommons.org/licenses/by/4.0/set_xmp(x, f)get_xmp(f)[[1]]|>print()## cc:attributionName := John Doe## cc:attributionURL := https://example.com/attribution## cc:license := https://creativecommons.org/licenses/by/4.0/## dc:creator := John Doe## dc:description := An image caption## dc:rights := © 2024 John Doe. Some rights reserved.## photoshop:Credit := John Doe## photoshop:DateCreated := 2024-03-27## x:XMPToolkit := Image::ExifTool 12.40## xmpRights:Marked := TRUE## xmpRights:UsageTerms := This work is licensed to the public under the Creative Commons## Attribution 4.0 International license## https://creativecommons.org/licenses/by/4.0/## xmpRights:WebStatement := https://creativecommons.org/licenses/by/4.0/# Create two multi-page pdfs and add bookmarks to themf_a<-tempfile(fileext =".pdf")pdf(f_a,title ="Document A",onefile =TRUE)grid::grid.text("Document A: First Page")grid::grid.newpage()grid::grid.text("Document A: Second Page")dev.off()|>invisible()f_b<-tempfile(fileext =".pdf")pdf(f_b,title ="Document B",onefile =TRUE)grid::grid.text("Document B: First Page")grid::grid.newpage()grid::grid.text("Document B: Second Page")dev.off()|>invisible()bm<-data.frame(title =c("First Page","Second Page"),page =c(1,2))set_bookmarks(bm, f_a)set_bookmarks(bm, f_b)# Concatenate pdfs to a single pdf and add their concatenated bookmarks to itfiles<-c(f_a, f_b)f_cat<-tempfile(fileext =".pdf")cat_pages(files, f_cat)cat_bookmarks(get_bookmarks(files),method ="title")|>set_bookmarks(f_cat)print(get_bookmarks(f_cat)[[1]])## title page level count open color fontface## 1 Document A 1 1 NA NA <NA> <NA>## 2 First Page 1 2 NA NA <NA> <NA>## 3 Second Page 2 2 NA NA <NA> <NA>## 4 Document B 3 1 NA NA <NA> <NA>## 5 First Page 3 2 NA NA <NA> <NA>## 6 Second Page 4 2 NA NA <NA> <NA>{xmpdf} feature | exiftool | pdftk | ghostscript |
|---|---|---|---|
| Get XMP metadata | Yes | No | No |
| Set XMP metadata | Yes | No | Poor: when documentation info metadata is set thenas a side effect it seems the documentation info metadata will also beset as XMP metadata |
| Get PDF bookmarks | No | Okay: can only get Title, Page number, andLevel | No |
| Set PDF bookmarks | No | Okay: can only set Title, Page number, andLevel | Good: supports most bookmarks features includingcolor and font face but only action supported is to view a particularpage |
| Get PDF documentation info | Good: may “widen” datetimes which are less than“second” precision | Yes | No |
| Set PDF documentation info | Yes | Good: may not handle entries with newlines inthem | Yes: as a side effect when documentation infometadata is set then it seems will also be set as XMP metadata |
| Concatenate PDF files | No | Yes | Yes |
Known limitations:
get_bookmarks_pdftk() doesn’t report information aboutbookmarks color, font face, and whether the bookmarks should start openor closed.get_bookmarks_pdftools()’s doesn’t report informationabout bookmarks pages, color, font face, and whether the bookmarksget_docinfo_exiftool() “widens” datetimes to secondprecision. An hour-only UTC offset will be “widened” to minuteprecision.get_docinfo_pdftools()’s datetimes may not accuratelyreflect the embedded datetimes.set_bookmarks_gs() supports most bookmarks featuresincluding color and font face but only action supported is to view aparticular page.set_bookmarks_pdftk() only supports setting the title,page number, and level of bookmarks.set_docinfo_pdftk() may not handle entries withnewlines in them.set_docinfo() methods currently do notsupport arbitrary info dictionary entries.set_docinfo_gs() seems to also updateany matching XPN metadata whileset_docinfo_exiftool() andset_docinfo_pdftk() don’t update any previously setmatching XPN metadata. Some pdf viewers will preferentially use thepreviously set document title from XPN metadata if it exists instead ofusing the title set in documentation info dictionary entry. Consideralso manually setting this XPN metadata usingset_xmp()qpdf::pdf_compress(input, linearize = TRUE) at theend.Note most of the R packages listed below are focused ongetting metadata rather thansettingmetadata and/or only provide low-level wrappers around the relevantcommand-line tools. Please feel free toopen apull request to add any missing relevant R packages.
exiftool command-line tool. Can downloadexiftool.exiftool command-line tool. Can downloadexiftool.{tools} hasfind_gs_cmd() to find aGhostScript executable in a cross-platform way.pdftk(), a low-level wrapper around thepdftkcommand-line tool.pdfinfo tool