- Notifications
You must be signed in to change notification settings - Fork6
Datahub - A standards compliant metadata aggregator platform
License
thedatahub/Datahub
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Datahub is a metadata aggregator. This application allows data providers toaggregate and publish metadata describing objects on the web through a RESTfulAPI leveraging standardized exchange formats.
The Datahub is build with theSymfony framework andMongoDB.
- A RESTful API which supports:
- Ingest and retrieval of individual metadata records.
- Validation of ingested records against XSD schemas.
- Supports OAuth to restrict access to the API.
- An OAI-PMH endpoint for harvesting metadata records.
- Includes support forLIDO XML but can be extendedto include MARC XML, Dublin Core or other formats.
This project requires following dependencies:
- PHP = 5.6.* or 7.0.*
- With the php-cli, php-intl, php-mbstring and php-mcrypt extensions.
- ThePECL Mongo (PHP5) orPECL Mongodb (PHP7) extension. Note that themongodb extension must be version 1.2.0 or higher. Notably, the package included in Ubuntu 16.04 (php-mongodb) is only at 1.1.5.
- MongoDB >= 3.2.10
Via Git:
$ git clone https://github.com/thedatahub/Datahub.git datahub$cd datahub$ composer install# Composer will ask you to fill in any missing parameters before it continues
You will be asked to configure the connection to your MongoDB database. Youwill need to provide these details:
- The connection to your MongoDB instance (i.e. mongodb://127.0.0.1:27017)
- The username of the user (i.e. datahub)
- The password of the user
- The database where your data will persist (i.e. datahub)
Before you install, ensure that you have a running MongoDB instance, and youhave created a user with the right permissions. From the[Mongo shell]https://docs.mongodb.com/getting-started/shell/client/) run thesecommands to create the required artefacts in MongoDB:
> use datahub> db.createUser( { user: "datahub", pwd: "password", roles: [ "readWrite", "dbAdmin" ] })
The configuration parameters will be stored inapp/config/parameters.yml
.
You'll need to run an initiial one-time setup script, which will scaffold thedatabase structure, generate CSS assets and create the application 'admin' user.
$ app/console app:setup$ app/console doctrine:mongodb:fixtures:load --append
If you want to run the datahub for testing or development purposes, executethis command:
$ app/console server:run
Use a browser and Navigate tohttp://127.0.0.1:8000.You should now see the welcome screen.
Refer to theSymfony setup documentationto complete your installation using a fully featured web server to make yourinstallation operational in a production environment.
The application is installed with as default usernameadmin
and as default passworddatahub
. Changing this is highly recommended.
The REST API is available atapi/v1/data
. Documentation about the availableAPI methods can be found at/docs/api
.
The PUT and POST actions expect and XML formatted body in the HTTP request.The Content-Type HTTP request header also needs to be set accordingly.Currently, supported:application/lido+xml
. Finally, you will need to add avalid OAuth token via theaccess_token
query parameter.
A valid POST HTTP request looks like this:
POST /api/v1/data?access_token=MThmYWMxMjFlZWZmYjVmZDU2NDNmZWIzYTE0YmNiYTk3YTc5ODJmMWJjOGI1MjE5MWY4ZjEyZWZlZmM2ZmZmNg HTTP/1.1Host: example.orgContent-Type: application/lido+xmlCache-Control: no-cache<?xml version="1.0" encoding="UTF-8"?><lido:lido xmlns:lido="http://www.lido-schema.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.lido-schema.org http://www.lido-schema.org/schema/v1.0/lido-v1.0.xsd"><lido:lidoRecID lido:source="Deutsches Dokumentationszentrum für Kunstgeschichte - Bildarchiv Foto Marburg" lido:type="local">DE-Mb112/lido-obj00154983</lido:lidoRecID><lido:category>...
Sending a GET HTTP request to theapi/v1/data
endpoint will return apaginated list of all the records available in the API. The endpoint willreturn a HTTP response with a JSON formatted body. The endpoint respects theHATEOAS constraint.
Content negotation is currently only supported via a file extension onindividual resource URL's. Negotation via the HTTP Accept header is on theroadmap.
GET api/v1/data # only JSON supportedGET api/v1/data/objectPID # return JSONGET api/v1/data/objectPID.xml # return XML
The datahub supports theOAI-PMH protocol.The endpoint is available via the/oai
path.
GET oai/?metadataPrefix=oai_lido&verb=ListIdentifiersGET oai/?metadataPrefix=oai_lido&verb=ListSetsGET oai/?metadataPrefix=oai_lido&verb=ListRecordsGET oai/?metadataPrefix=oai_lido&verb=ListRecords&metadataPrefix=oai_lido&set=creator:brueghel_pieter_iiGET oai/?metadataPrefix=oai_lid&verb=GetRecord&metadataPrefix=oai_lido&identifier=objectPIDGET oai/?metadataPrefix=oai_lido&verb=ListIdentifiers&metadataPrefix=oai_lido&from=2017-06-29T05:22:30Z&until=2017-07-14T04:22:30Z
The datahub implements grouping of records into sets, but no soft deletes. As such, the OAI endpoint doesn't indicate whether a record has been deleted.
The datahub API can be set up to be either a public or a private API. Thepublic_api_method_access
parameter inparameters.yml
allows you toconfigure which parts of the API are public or private:
# Setting this to some unknown value like [FOO] disables public api access# Leaving this option empty [] means allowing all methods for anonymous access# public_api_method_access: [FOO]public_api_method_access:[GET]
The datahub requires OAuth authentication to ingest or retrieve metadatarecords. The administrator has to issue a user account with a client_id and aclient_secret to individual Users or client applications. Before clients canaccess the API, they have to request an access token:
curl'http://localhost:8000/oauth/v2/token?grant_type=password&username=admin&password=datahub&client_id=slightlylesssecretpublicid&client_secret=supersecretsecretphrase'
Example output:
{ "access_token": "ZDIyMGFiZGZkZWUzY2FjMmY4YzNmYjU0ODZmYmQ2ZGM0NjZiZjBhM2Q0Y2ZjMGNiMjc0ZWIyMmYyODMzMGJjZg", "expires_in": 3600, "token_type": "bearer", "scope": "internal web external", "refresh_token": "MzhkYzY0MzMxM2FmNmQyODhiOWM4YzEzZjI3YzViZjg3ZThlMTA2YWY4ZTc2YjUwYzgxNzVhNTlmYTBkYWZhNQ"}
The endpoint can also be used to revoke both access and refresh tokens.
curl 'http://localhost:8000/oauth/v2/revoke?token=ZDIyMGFiZGZkZWUzY2FjMmY4YzNmYjU0ODZmYmQ2ZGM0NjZiZjBhM2Q0Y2ZjMGNiMjc0ZWIyMmYyODMzMGJjZg'
Example output:
{ "result": "success", "message": "The token has been revoked."}
Please seeCHANGELOG for more information what has changedrecently.
Testing will require a MongoDB instance, as well as Catmandu installed. Youcan either take care of this yourself, or run the tests using the providedDocker container.
Please ensure you've taken care of the initial setup described above beforeattempting to run the tests.
Running tests:
./scripts/run_tests
Running tests using Docker:
./scripts/run_tests_docker
Front end workflows are managed viayarn andwebpack-encore.
The layout is based onBootstrap 3.3and managed via sass. The code can be found underapp/resources/public/sass
.
Javascript files can be found underapp/resources/public/js
. Dependencies aremanaged viayarn
. Add vendor modules usingrequire
.
Files are build and stored inweb/build
and included inapp/views/app/base.html.twig
via theasset()
function.
The workflow configuration can be found inwebpack.config.js
.
Get started:
# Install all dependencies$ yarn install# Build everything in development$ yarn run encore dev# Watch files and build automatically$ yarn run encore dev --watch# Build for production$ yarn run encore production
Please seeCONTRIBUTING for details.
The Datahub is copyright (c) 2016 by Vlaamse Kunstcollectie vzw and PACKED vzw.
This is free software; you can redistribute it and/or modify it under theterms of the The GPLv3 License (GPL). Please seeLicense File formore information.
About
Datahub - A standards compliant metadata aggregator platform