- Notifications
You must be signed in to change notification settings - Fork80
File validation and characterisation.
License
Unknown, LGPL-2.1 licenses found
Licenses found
openpreserve/jhove
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
JSTOR/Harvard Object Validation Environment
Copyright 2003-2012 by JSTOR and the President and Fellows of Harvard College,2015-2022 by theOpen Preservation Foundation.JHOVE is made available under theGNU Lesser General Public License (LGPL).
Rev. 1.32.1, 2025-02-06
http://jhove.openpreservation.org/
JHOVE (the JSTOR/Harvard Object Validation Environment, pronounced "jove")is an extensible software framework for performing format identification,validation, and characterization of digital objects.
- Format identification is the process of determining the format to which adigital object conforms: "I have a digital object; what format is it?"
- Format validation is the process of determining the level of compliance of adigital object to the specification for its purported format: "I have anobject purportedly of format F; is it?"
- Format characterization is the process of determining the format-specificsignificant properties of an object of a given format: "I have an object offormat F; what are its salient properties?"
These actions are frequently necessary during routine operation of digitalrepositories and for digital preservation activities.
The output from JHOVE is controlled by output handlers. JHOVE uses anextensible plug-in architecture; it can be configured at the time of itsinvocation to include whatever specific format modules and output handlersthat are desired. The initial release of JHOVE includes modules forarbitrary byte streams, ASCII and UTF-8 encoded text, AIFF and WAVE audio,GIF, JPEG, JPEG 2000, TIFF, and PDF; and text and XML output handlers.
The JHOVE project is a collaboration of JSTOR and the Harvard UniversityLibrary. Development of JHOVE was funded in part by the Andrew W. MellonFoundation. JHOVE is made available under the GNU Lesser General PublicLicense (LGPL; see the file LICENSE for details).
JHOVE is currently being maintained by theOpen Preservation Foundation.
Java JRE 1.8
Version 1.20 of JHOVE is built and tested against Oracle JDK 8,and OpenJDK 8 on Travis. Releases are built using Oracle JDK 8from theOPF's Jenkins server.If you would like to build JHOVE from source, then life will be easiest ifyou useApache Maven.
You can download thelatest version of JHOVE here.
From v1.16 onwards all production releases of JHOVE are deployed to MavenCentral. Add the version of JHOVE you'd like to use as a property in your MavenPOM:
<properties> ... <jhove.version>1.20.1</jhove.version></properties>
Use this dependency for the core classes Maven module (e.g.JhoveBase
,Module
,ModuleBase
, etc.):
<dependency> <groupId>org.openpreservation.jhove</groupId> <artifactId>jhove-core</artifactId> <version>${jhove.version}</version></dependency>
this for the JHOVE internal module implementations:
<dependency> <groupId>org.openpreservation.jhove</groupId> <artifactId>jhove-modules</artifactId> <version>${jhove.version}</version></dependency>
this for the JHOVE external module implementations:
<dependency> <groupId>org.openpreservation.jhove</groupId> <artifactId>jhove-ext-modules</artifactId> <version>${jhove.version}</version></dependency>
and this for the JHOVE applications:
<dependency> <groupId>org.openpreservation.jhove</groupId> <artifactId>jhove-apps</artifactId> <version>${jhove.version}</version></dependency>
If you want the latest development packages you'll need to add theOpen Preservation Foundation's Maven repositoryto your settings file:
<profiles> <profile> <id>opf-artifactory</id> <repositories> <repository> <snapshots> <enabled>false</enabled> </snapshots> <id>central</id> <name>opf-dev</name> <url>http://artifactory.openpreservation.org/artifactory/opf-dev</url> </repository> </repositories> </profile> </profiles> <activeProfiles> <activeProfile>opf-artifactory</activeProfile> </activeProfiles>
You can then follow the instructions above to include particular Maven modules,but you can now also choose odd minor versioned development builds. At the timeof writing the latest development version could be included by using thefollowing property:
<properties> ... <jhove.version>1.21.1</jhove.version></properties>
or even:
<properties> ... <jhove.version>[1.21.0,1.22.0]</jhove.version></properties>
to always use the latest 1.21 build.
Clone this project, checkout the integration branch, and use Maven, e.g.:
git clone git@github.com:openpreserve/jhove.gitcd jhovegit checkout integrationmvn clean install
See theProject Structure section for a guide tothe Maven artifacts produced by the build.
Download the JHOVE installer. The installer itself requires Java 1.6 or laterto be pre-installed. Installation is OS dependant:
Currently only tested on Windows 7.
Simply double-click the downloaded installer JAR. If Java is installed then thewindowed installer will guide you through selection. It's best to stay withthe default choices if installing the beta.
Once the installation has finished you'll be able to double-clickC:\Users\yourName\jhove\jhove-gui
to start the JHOVE GUI. Alternatively,open a Command window, e.g. press theWindows
key and typecmd
, then issuethese commands:
C:\Users\yourName>cd jhoveC:\Users\yourName\jhove>jhove
to display the command-line usage message.
It is also possible to use JHOVE with the openJDK, e. g. jdk-13. It might be necessary to set the java path in the Environment variables, for which one usually needs administration rights for the windows machine.
Currently only tested on OS X Mavericks.
Simply double-click the downloaded installer JAR. If Java is installed then thewindowed installer will guide you through selection. It's best to stay withthe default choices if installing the beta.
Once the installation has finished you'll be able to double-click/Users/yourName/jhove/jhove-gui
to start the JHOVE GUI. Alternatively,open a Terminal command window and then issue these commands:
cd ~/jhove./jhove
to display the command-line usage message.
Currently tested on Ubuntu 16.10 and Debian Jessie.
Once the installer has downloaded, start a terminal, e.g.Ctrl+Alt+T
,and type the following, assuming the download is in~/Downloads
:
java -jar ~/Downloads/jhove-latest.jar
Once the installation is finished you'll be able to:
cd ~/jhove./jhove
to run the command-line application and show the usage message. Alternatively:
cd ~/jhove./jhove-gui
will run the GUI application.
We've moved to Maven and have taken the opportunity to update the distribution.For now we're producing:
- a Maven package, for developers wishing to incorporate JHOVE into theirown software;
- a "fat" (1MB) JAR that contains the old CLI and desktop GUI, for anyonewho doesn't want to use the new installer; and
- a simple cross-platform installer that installs the application JAR, supportscripts, etc.
jhove [-c config] [-m module] [-h handler] [-e encoding] [-H handler] [-o output] [-x saxclass] [-t tempdir] [-b bufsize] [-l loglevel] [[-krs] dir-file-or-uri [...]]-c config Configuration file pathname-m module Module name-h handler Output handler name (defaults to TEXT)-e encoding Character encoding used by output handler (defaults to UTF-8)-H handler About handler name-o output Output file pathname (defaults to standard output)-x saxclass SAX parser class (defaults to J2SE default)-t tempdir Temporary directory in which to create temporary files-b bufsize Buffer size for buffered I/O (defaults to J2SE 1.4 default)-l loglevel Logging level-k Calculate CRC32, MD5, and SHA-1 checksums-r Display raw data flags, not textual equivalents-s Format identification based on internal signatures onlydir-file-or-uri Directory or file pathname or URI of formated content stream
All named modules and output handlers must be found on the Java CLASSPATH atthe time of invocation. The JHOVE driver script, jhove/jhove, automaticallysets the CLASSPATH and invokes the Jhove main class:
jhove [-c config] [-m module] [-h handler] [-e encoding] [-H handler] [-o output] [-x saxclass] [-t tempdir] [-b bufsize] [-l loglevel] [[-krs] dir-file-or-uri [...]]
The following additional programs are available, primarily for testingand debugging purposes. They display a minimally processed, human-readableversion of the contents of AIFF, GIF, JPEG, JPEG 2000, PDF, TIFF, and WAVEfiles:
java ADump aiff-filejava GDump gif-filejava JDump jpeg-filejava J2Dump jpeg2000-filejava PDump pdf-filejava TDump tiff-filejava WDump wave-file
For convenience, the following driver scripts are also available:
adump aiff-filegdump gif-filejdump jpeg-filej2dump jpeg2000-filepdump pdf-filetdump tiff-filewdump wave-file
The JHOVE Swing-based GUI interface can be invoked from a command shell fromthe jhove/bin sub-directory:
jhove-gui -c <configFile>
where<configFile>
is the pathname of the JHOVE configuration file.
A quick introduction to the restructured Maven project. The project's beenbroken into three Maven modules with an additional installer module added.
jhove/ |-jhove-apps/ |-jhove-core/ |-jhove-installer/ |-jhove-ext-modules/ |-jhove-modules/
All Maven artifacts are produced in versioned form,i.e.${artifactId}-${project.version}.jar
, where${project.version}
defaultsto1.20.0
unless you explicitly set the version number.
Thejhove
project root acts as a Maven parent and reactor for the sub-modules.This simply builds sub-modules and doesn't produce any artifacts, but decideswhich sub-modules are built.
Thejhove-core
andjhove-modules
are most likely all that are required fordevelopers wishing to call and run JHOVE from their own code.
Thejhove-core
module contains all of the main data type definitions and theoutput handlers. This module produces a single JAR:
./jhove/jhove-core/target/jhove-core-${project.version}.jar
Thejhove-core
JAR contains a single module implementation, the defaultBytestreamModule
. For the format-specific modules you'll needthejhove-modules
JAR.
Thejhove-modules
contains all of JHOVE's core format-specific moduleimplementations, specifically:
- AIFF
- ASCII
- GIF
- HTML
- JPEG
- JPEG 2000
- TIFF
- UTF-8
- WAVE
- XML
These are all packaged in a single modules JAR:
./jhove/jhove-modules/target/jhove-modules-${project.version}.jar
Thejhove-ext-modules
contains JHOVE modules developed by external parties, specifically:
- PNG
- WARC
- GZIP
- EPUB
These are all packaged in a single modules JAR:
./jhove/jhove-ext-modules/target/jhove-ext-modules-${project.version}.jar
Thejhove-apps
module contains the command-line and GUI application code andbuilds a fat JAR containing the entire Java application. This JAR can be usedto execute the command-line app:
./jhove/jhove-apps/target/jhove-apps-${project.version}.jar
Finally, thejhove-installer
module takes the fat JAR and creates a Java-basedinstaller for JHOVE. The installer bundles up invocation scripts and the like,installs them under<userHome>/jhove/
(default, can be changed) while alsolooking after:
- variable substitution to ensure that JHOVE_HOME and the like are set toreflect a user's install location;
- making sure that Windows users get batch scripts, while Mac and Linux usersget bash scripts; and
- optionally generating unattended install and uninstall files.
The module produces two JARs, one calledjhove-installer-${project.version}
,which contains the JARs for the installer, and an executable JAR to installJHOVE:
./jhove/jhove-installer/target/jhove-xplt-installer-${project.version}.jar
Thexplt
stands for cross-platform.
About
File validation and characterisation.