- Notifications
You must be signed in to change notification settings - Fork0
OAICAT-figshare is an extension library to OAICat (the OAI-PMH java servlet) that implements customised interfaces for harvesting the figshare repository. It's also a command-line tool for extracting recent records as XML from figshare.
License
lylewinton/oaicat-figshare
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
OAICAT-figshare is an extension library forOAICatthat implements customisable interfaces that accessing your figshare repository.Once configured OAICat will provide an OAI-PMH web service that can be usedto harvest recently updated figshare records.By configuring aFigshareOAICatalog.searchFilter and/orFigshareOAICatalog.institution you can present a virtualrepository, eg. an institutional figshare repository, or specific groups, or via specific tags.TheJSON2qdc Crosswalk outputs qualified Dublin Core metadata (DC).TheJSON2oai_dc Crosswalk outputs essentially the same Dublin Core metadata (DC) but can be customised separately.TheJSON2json Crosswalk simply outputs the source JSON from figshare.Beyond ordinary DC elements, figshare files and custom_fields can be output asother metadata elements and can be customised flexibly.
OAICAT-figshare is now also an executable tool. If all you want is to harvestrecent figshare records (OAI-PMH ListRecords style) without the need for an OAIweb server in between, you can now also execute the JAR library via the command line.Run it without arguments to obtain help on what arguments are required.
OAICat is anopen source software project. It is a Java Servlet web application whichprovides a repository framework that conforms to theOpen Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) v2.0.OAICat can be customised to work with arbitrary data repositories.
OAICAT-figshare has been built withoaicat-1.5.63 and tested onfigshare API v2 (2021-Nov-21);Apache Tomcat Version 9.0.41;Ubuntu 20.04.1 LTS; default-jdk package (openjdk 11.0.9.1).
Instructions for Tomcat:
- Download the following files required for install:
- First deploy oaicat.war on a running Tomcat. This should create the
webapps\oaicat
folder. - Copy the other libraries (oaicat-figshare.jar, json-simple-1.1.1.jar) somewhere they can befound by Tomcat, ideally the oaicat lib folder
webapps\oaicat\WEB-INF\lib
- Replace oaicat.properties with oaicat-figshare-example.properties in the
webapps\oaicat\WEB-INF
folder. In the web.xml file in this folder, find the<context-param
block containing<param-name>properties</param-name>
and<param-value>
should specify the oaicat.properties file.IMPORTANT: Modify the<param-value>
line to specifying the full path to the file, as this is often necessary. - (Optional) Check the new oaicat.properties values that set the oaicat-figshare custom classes:
AbstractCatalog.oaiCatalogClassName=net.datanoid.oaipmh.figshare.FigshareOAICatalogAbstractCatalog.recordFactoryClassName=net.datanoid.oaipmh.figshare.JSONRecordFactoryCrosswalks.oai_dc=net.datanoid.oaipmh.figshare.JSON2oai_dcCrosswalks.qdc=net.datanoid.oaipmh.figshare.JSON2qdcCrosswalks.json=net.datanoid.oaipmh.figshare.JSON2json
- Update oaicat.properties settings, especially the following:
Identify.* - normal OAICAT settingsFigshareOAICatalog.searchFilter - set to a custom search stringFigshareOAICatalog.institution - set to your institution/portal ID, an integerFigshareOAICatalog.* JSON2oai_dc.* JSON2qdc.* - defaults should work for most, but check custom settings
- (Optional) Install the example logging.properties file in
webapps\oaicat\WEB-INF\classes
so you can get an oaicat logfile including oaicat-figshare info or debug. - Replace the oaicat.xsl (web browser transform) as oaicat-figsharemakes use of extended xmlns:dcterms, URI types and this improves presentation.
- Restart Tomcat then access oaicat (http://localhost:8080/oaicat/) and watchthe console for errors.
- Download the following files into a folder:
- Update the properties file oaicat-figshare-example.properties, especially the FigshareOAICatalog.searchFilter custom filter or FigshareOAICatalog.institution.
- Make an outputs folder in which it can write records files
- Execute the jar file, without arguments for more information on arguments:
$ java -jar oaicat-figshare.jar$ java -jar oaicat-figshare.jar -get-xml-element qdc:qualifieddc oaicat-figshare-example.properties ./outputfolder 2022-04-01 - qdc$ java -jar oaicat-figshare.jar -get-xml-element json:element -get-xml-content oaicat-figshare-example.properties ./outputfolder 2022-04-01 - json
Source for OAICAT files:https://github.com/OCLC-Research/oaicat
Source for OAICAT library, look in the distribution war file:oaicat.war\WEB-INF\lib\oaicat.jar
The README.txt from OAICAT has essential information for installation,included here for reference:
To upgrade OAICat with the latest code changes, copy the latestoaicat.jar file to webapps/oaicat/WEB-INF/lib/.Before customizing OAICat, first install oaicat.war in a J2EE ServletEngine and verify that the default configuration works. If so, proceedwith any necessary code and configuration changes as described below.Before building this probject with Ant, create a 'build.properties'file in the project directory with the following entries:catalina.home=/path/to/jakarta-tomcatTo create a new distribution set, issue the command:ant distTo customize OAICat, answer these questions:Q1: What Java package should I use to hold my custom classes? a) For example, if you work for Acme Inc., create a directory hierarchy somewhere named: com/acme/oaiQ2: What database engine will I use? a) For example, if using the Foo database, copy oaicatjar/src/ORG/oclc/oai/server/catalog/DummyOAICatalog.java to com/acme/oai/server/catalog/FooOAICatalog.java and modify the code so the class name matches the new filename. b) Change the code in this class to use the Foo database Java API. In general, all this class needs to know about the records is that they are black-box Java Objects. To make life easier downstream, however, it may be worthwhile to convert the records to a more convenient processing form immediately after reading. For example, if the records are stored as XML Strings, load them into DOM objects as soon as they are read. Beyond that, though, leave it to the Crosswalk and RecordFactory implementations to understand the true semantics of the records. Doing this may mean you can't reuse this class for cases where the database returns non-XML byte arrays, but then again, what are the chances of that? c) Make a corresponding package/class name change to the AbstractCatalog.oaiCatalogClassName entry in the webapps/oaicat/WEB-INF/oaicat.properties file to have OAICat use your custom class.Q3: What are the semantics of these record objects? a) If FooOAICatalog returns records as byte arrays, examples can be anything such as MARC Communications Format. If FooOAICatalog returns Strings, examples might include MARC BER, or any kind of XML String. If FooOAICatalog returns DOM Documents, examples can be any XML-based metadata format. Let's assume FooOAICatalog returns records as DOM Documents containing MARCXML content. b) Copy oaicatjar/src/ORG/oclc/oai/server/catalog/XMLRecordFactory.java to com/acme/oai/server/catalog/MARCXMLDOMRecordFactory.java and modify the code so the class name matches the new filename. c) Change the methods to cast each Object nativeItem parameter to a org.w3c.dom.Document and use it to extract the relevant data for each method. d) Make a corresponding package/class name change to the AbstractCatalog.recordFactoryClassName entry in the webapps/oaicat/WEB-INF/oaicat.properties file to have OAICat use your custom class.Q4: What OAI metadatdaFormats will be supported? a) Examples include oai_dc, marcxml, or oai_etdms. b) For oai_dc, copy oaicatjar/src/ORG/oclc/oai/crosswalk/XML2oai_dc.java to com/acme/oai/server/catalog/MARCXMLDOM2oai_dc.java and modify the code so the class name matches the new filename. c) Change the constructor to use the appropriate schemaLocation for this metadataFormat. d) Change the methods to cast each Object nativeItem parameter to a org.w3c.dom.Document and use it to service the method accordingly. In this case, you could use the Library of Congress MARCXML to DC XSL stylesheet (see http://www.loc.gov/standards/marcxml/) to perform the crosswalk to Dublin Core. e) Repeat steps b, c, and d for each metadatdaFormat to be supported. f) Make a corresponding package/class name change to the Crosswalks.* entries in the webapps/oaicat/WEB-INF/oaicat.properties file to have OAICat use your custom classes.Finally, change other properties in oaicat.properties according to yourpreferences.That's essentially what it takes to customize OAICat. Contact Jeff Youngat jyoung@oclc.org with questions and comments.
About
OAICAT-figshare is an extension library to OAICat (the OAI-PMH java servlet) that implements customised interfaces for harvesting the figshare repository. It's also a command-line tool for extracting recent records as XML from figshare.