Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Docker Compose for Metadata Quality Assessment (MQA) on CKAN and European Data Portal catalogs

License

NotificationsYou must be signed in to change notification settings

mjanez/ckan-mqa

Repository files navigation

mqa2ckan versionLicense: Unlicense

OverviewQuick startDebugContainersDCAT-AP info

Requirements:

Overview

ckan-mqa offers a Docker Compose solution for performingMetadata Quality Assessment (MQA) on both CKAN endpoints and the European Data Portal catalogs. MQA is a crucial process to ensure the accuracy, completeness, and reliability of metadata, enhancing the overall data interoperability and accessibility.

This Docker Compose configuration enhances a Python MQA software1 to integrates the powerful MQA toolset seamlessly with CKAN endpoints and European Data Portal catalogs, enabling users to perform in-depth assessments of metadata quality effortlessly. The setup provides an efficient way to run comprehensive quality checks on various metadata attributes, including data relevance, schema compliance, data format consistency, and adherence to standard vocabularies.

Tip

It can be tested with an open data portal of the CKAN type such as:mjanez/ckan-docker2

The MQA measures the quality of various indicators, each indicator is explained in the tables below. The results of the checks are stored as Data Quality Vocabulary (DQV) . DQV is a specification of the W3C that is used to describe the quality of a dataset.

DimensionMaximal points
Findability100
Accessibility100
Interoperability110
Reusability75
Contextuality20
Sum405

The dimensions are derived from the FAIR principles:

  • FindabilityThe following table describes the metrics that help people and machines in finding datasets. A maximum of 100 points can be scored in this area.

  • AccessibilityThe following table describes which metrics are used to determine whether access to the data referenced by the distributions is guaranteed. A maximum of 100 points can be scored in this area.

  • InteroperabilityThe following table describes the metrics used to determine whether a distribution is considered interoperable. According to the assumption 'identical content with several distributions', only the distribution with the highest number of points is used to calculate the points. A maximum of 110 points can be scored in this area

  • ReusabilityThe following table describes which metrics are used to check the reusability of the data. A maximum of 75 points can be scored in this area.

  • ContextualityThe following table show some light weight properties, that provide more context to the user. A maximum of 20 points can be scored in this area.

5 MQA_dimensions png

The final rating happens via four rating groups. The mapping of the points to the rating category is shown in the table below. The representation of the rating in the MQA is expressed exclusively via the rating categories. This enables providers to achieve the highest rating even with a slight deduction of points.

RatingRange of points
Excellent351 - 405
Good221 – 350
Sufficient121 – 220
Bad0 - 120

Example of ckan-mqa results summary

DimensionIndicator/propertyCountPopulationPercentagePointsWeight
Findabilitydcat:keyword46461.030.030
Findabilitydcat:theme46461.030.030
Findabilitydct:spatial42460.9118.2620
Findabilitydct:temporal0460.0020
Accessibilitydcat:accessURL code=2002552551.050.050
Accessibilitydcat:downloadURL02550.0020
Accessibilitydcat:downloadURL code=20002550.0030
Interoperabilitydct:format2552551.020.020
Interoperabilitydcat:mediaType2552551.010.010
Interoperabilitydct:format/dcat:mediaType from vocabulary3785100.747.4110
Interoperabilitydct:format non-proprietary1312550.5110.2720
Interoperabilitydct:format machine-readable2522550.9919.7620
InteroperabilityDCAT-AP compliance0460.0030
Reusabilitydct:license2552551.020.020
Reusabilitydct:license from vocabulary2452550.969.6110
Reusabilitydct:accessRights46461.010.010
Reusabilitydct:accessRights from vocabulary0460.005
Reusabilitydcat:contactPoint46461.020.020
Reusabilitydct:publisher46461.010.010
Contextualitydct:rights2552551.05.05
Contextualitydcat:byteSize02550.005
Contextualitydct:issued46461.05.05
Contextualitydct:modified46461.05.05
Total pointsRating: Good0.69280.31405

Quick start

First copy the.env.example template as.env and configure by changing theCKAN_CATALOG_URL, as well as the DCAT-AP Profile version (DCATAP_FILES_VERSION), if needed.

cp .env.example .env

Custom ennvars:

  • CKAN_CATALOG_URL: URL of the CKAN catalog to be downloaded (i.e.http://localhost:5000/catalog.rdf?q=organization:test).
  • APP_DIR: Path to the application folder in Docker.
  • TZ: Timezone.
  • DCATAP_FILES_VERSION: DCAT-AP version (Avalaibles: 2.0.1, 2.1.0, 2.1.1).
  • UPDATE_VOCABS: Update vocabs from the EU Publications Office at start (True orFalse).
  • CKAN_METADATA_TYPE: CKAN Metadata elements type:ckan_uris for GeoDCAT-AP schema with all elements described by URIs (e.g.dct:format =http://publications.europa.eu/resource/authority/file-type/XML) orckan if used a CKAN default schema with label metadata elements (e.g.dct:format = "XML").

With docker compose

To deploy the environment,docker compose will build the latest image (ghcr.io/mjanez/ckan-mqa:latest).

git clone https://github.com/mjanez/ckan-mqacd ckan-mqadocker compose up --build# Or detached modedocker compose up -d --build

Note

Deploy the dev (local build)docker-compose.dev.yml with:

docker compose -f docker-compose.dev.yml up --build

If needed, to build a specific container simply run:

 docker build -t target_name xxxx/

Without Docker

Dependencies:

python3 -m pip install --user pipxpipx install pdmpdm install --no-self

Run:

pdm run python ckan2mqa/ckan2mqa.py

Debug

VSCode

  1. Build and run container.
  2. Attach Visual Studio Code to container
  3. Start debugging onckan2mqa.py Python file (Debug the currently active Python file).

Containers

List ofcontainers:

Base images

RepositoryTypeDocker tagSizeNotes
python 3.11base imagepython/python:3.11-slim45.57 MB-

Built images

RepositoryTypeDocker tagSizeNotes
mjanez/ckan-mqacustom imagemjanez/ckan-mqa:v*.*.*264 MBTag version.
mjanez/ckan-mqacustom imagemjanez/ckan-mqa:latest264 MBLatest stable version.
mjanez/ckan-mqacustom imagemjanez/ckan-mqa:main264 MBDev version.

References

DCAT-AP Validator Validation Cases

The different cases to validate in theDCAT-AP Validator are based on the level of completeness of the checks and the incorporation of background knowledge (vocabularies). Each case is designed for a specific data exchange scenario.The following describes each case and recommends which one you should use for a CKAN catalog:

Case 1: DCAT-AP Base Zero (no background knowledge)

Includes all constraints required for technical coherence, excluding range class membership constraints and controlled vocabulary usage.

SHACL Profiles:

Case 2: DCAT-AP Ranges Zero (no background knowledge)

Includes all range class membership constraints.

SHACL Profiles:

Case 3: DCAT-AP Base (with background knowledge)

Extends Case 1 with background knowledge, including all vocabularies used in DCAT-AP.

SHACL Profiles:

Case 4: DCAT-AP Ranges (with background knowledge)

Extends Case 2 with background knowledge, adding validation of range class membership and vocabulary standards compliance.

SHACL Profiles:

Case 5: DCAT-AP Recommendations (with background knowledge)

Includes all constraints related to recommended properties.

SHACL Profiles:

Case 6: DCAT-AP Controlled Vocabularies

Includes all constraints related to controlled vocabularies.

SHACL Profiles:

Case 7: DCAT-AP Full (with background knowledge)

The union of Cases 3, 4, 5, and 6.

SHACL Profiles:

Recommendation:

For most use cases,Case 3: DCAT-AP Base (with background knowledge) is recommended. It provides comprehensive validation of basic coherence and vocabulary standards compliance.If your CKAN catalog uses controlled vocabularies, consider usingCase 6: DCAT-AP Controlled Vocabularies orCase 7: DCAT-AP Full (with background knowledge) for more exhaustive validation.Remember, the choice of the appropriate validation case depends on your specific needs and data exchange context.

License

Copyright (c) the respective contributors.It is open and licensed under the GNU Affero General Public License (AGPL) v3.0 whose full text may be found at:http://www.fsf.org/licensing/licenses/agpl-3.0.html

Footnotes

  1. Program to test MQA evaluation: Javier Nogueras (jnog@unizar.es), Javier Lacasta (jlacasta@unizar.es), Manuel Ureña (maurena@ujaen.es), F. Javier Ariza (fjariza@ujaen.es), Héctor Ochoa Ortiz (719509@unizar.es). Trafair Project 2020.

  2. A custom installation of Docker Compose with specific extensions for spatial data andGeoDCAT-AP/INSPIRE metadataprofiles.

About

Docker Compose for Metadata Quality Assessment (MQA) on CKAN and European Data Portal catalogs

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

[8]ページ先頭

©2009-2025 Movatter.jp