- Notifications
You must be signed in to change notification settings - Fork1
Python script to add file extensions based on PRONOM ID (PUID)
License
tw4l/addext
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Python script to add file extensions to files without them, based onSiegfried format identification.
addext.py
takes two positional arguments:
target
: Path to target file or directoryjson
: Path to addext PRONOM JSON file (pronom_v95.json
is included in this repository for convenience. SeePRONOM JSON file section below for instructions on how to create a new JSON file in expected format from PRONOM XML exports)
Options include:
-d, --dryrun
: Perform dry run (print would-be changes to terminal instead of renaming files)-m, --manual
: Manually choose extension to add to files when PRONOM gives several options (not available in Windows)
In its default mode,addext
adds file extensions to files if they meet a few conditions:
- Siegfried can positively identify a PUID for the file
- There is at least one file extension associated with the PUID in PRONOM
- The file does not already have one of the extensions listed in PRONOM for that PUID (case-insensitive)
If all conditions are met,addext
adds the file extension to the file in-place. It is recommended that you try a dry run first to evaluate the proposed changes before renaming files.
In-m, --manual
mode,addext
follows the following logic:
- If Siegfried cannot positively identify a PUID for the file, skip the file
- If there is only one file extension associated with the PUID in PRONOM and the file does not already have this extension (case-insensitive), add the extension
- If there is more than one file extension associated with the PUID in PRONOM and the file does not already have this extension, allow the user to choose which extension to add and then modify the filename in-place
Note that for directories with many files, going through the files one-by-one in manual mode may take some time. Runningaddext
as a dry run in manual mode may help give an idea of the extent of manual choices you will be asked to make.
Due to its dependency onInquirer, manual mode is not available on Windows.
- Python 3.6+
- Siegfried
- Inquirer: For selection between extension options in
-m, --manual
mode (Linux/macOS only); installed withpip install inquirer
Install Siegfried following the instructions foundhere.
The easiest way to useaddext
is to clone or download this repository and then run the script withpython3 /path/to/addext.py [options]
.
If taking this route, install additional Python library dependencies:pip install -r requirements.txt
orpip install inquirer
(this may require sudo permissions).
addext
can also be installed viapip install addext
. This will install a script in the/usr/local/bin
directory (assuming a Linux/macOS installation) so thataddext
can be called from anywhere with simplyaddext.py [options]
.
Note that following installation, you will need to download or create a PRONOM JSON file to use withaddext
.
The PRONOM JSON file is a lightweight representation of information from PRONOM needed for addext to function. The file contains an object for each format described with a PRONOM ID (PUID), structured like the following example:
"fmt/858": { "file_format": "Navisworks Document", "version": "2010", "file_extensions": [ "nwd", "nwc" ] }
pronom_v95.json
is currently up-to-date with PRONOM release v95.
To create a new PRONOM JSON file (for instance, after a new PRONOM release):
- Get PRONOM XML export from Ross Spencer'sRelease repository for The Skeleton Test Suite, which provides a set of DOIs for archives of PRONOM releases.
- Run
addext/pronom_xml_to_json.py
to create a new PRONOM JSON file from the XML exports:python3 pronom_xml_to_json.py /path/to/pronom/export/directory pronom.json
- Canadian Centre for Architecture
- Tessa Walsh
This project was initially developed in 2016-2017 for theCanadian Centre for Architecture by Tessa Walsh, Digital Archivist, as part of the development of the Archaeology of the Digital project.
About
Python script to add file extensions based on PRONOM ID (PUID)