- Notifications
You must be signed in to change notification settings - Fork1
👩⚖️ Library that sanitizes and formalizes the data model of Dutch court judgments from Rechtspraak.nl
License
cacfd3a/rechtspraak-js
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This library sanitizes and formalizes the data model for Dutch court judgments published by Rechtspraak.nl. The process is performed on all documents publicly offered by Rechtspraak.nl and are published as a linked data graph (well-formed JSON-LD with a JSON Schema).
Rechtspraak.nl publishes information about a lot of Dutch court judgments with a rich collection of metadata. Sadly, the data model is ill-described and rife with syntactical errors. Rechtspraak.nl provides no schema for its documents other than an incomplete PDF in natural language and a lot ofRDF fields are invalid. It's hard to know what to expect when downloading a document, especially for some of the more esoteric metadata fields.
The purpose of this project is to formalize the data model of Rechtspraak.nl. I have done this by analyzing all existing documents (~2 million) on Rechtspraak.nl to generate aJSON Schema andTypescript typings for the metadata associated with the court judgments. I have corrected some common errors in the source files (mostly to do with not properly encoding URIs) and generate validJSON-LD (which is compatible with RDF) from them.
This work is a tangible step forward towards machine readable legal data, hence the ease of automated processing of these documents is improved.
A dump of the sanitized metadata is available athttps://rechtspraak.lawreader.nl/_all. A human-readable interface around the data is athttps://rechtspraak.lawreader.nl/
The former URL will load the complete knowledge graph of Rechtspraak.nl. This page returns a JSON document of about 20 gigabyes in size. The document has two fields:
@context, which provides the URI mappings for the concepts, and@graph, which is an array filled with the actual data (see JSON-LD specification for more information)
I recommend using a streaming JSON parser likeOboe.js to consume the data.
For accessing subsets of the knowledge graph, you can use most of the API fromCouchDB views, ie:https://rechtspraak.lawreader.nl/_all?limit=100&skip=50 will limit your request to 100 docs after the first 50. Mind that you can also usestartkey to paginate faster:_all?startkey="ECLI:NL:CBB:2015:5"&limit=50 will fetch the first 50 docs starting atECLI:NL:CBB:2015:5. Documents are ordered alphabetically by their ids ([European Case Law Identifier][https://en.wikipedia.org/wiki/European_Case_Law_Identifier]).
I try to stick to the vocabularies used in the source documents (dcterms, and some from the Dutch government), and also introduce relevant fields fromschema.org. I've invented my own URIs where appropriate. In time I'm planning to make all of these URIs resolvable as well.
Tip: use a tool likeJSON-LD playground to visualise the data.
Code is written in Typescript,compiled project suppliesd.ts typing files along with the Javascript code.
~ ~ I'm still working on converting the Typescript interface to JSON Schema ~ ~ (for the impatient, look for source files to generate the JSON-LD document)
Here is a list of some of the syntactical errors I encountered in the data offering for Rechtspraak.nl, which are sanitized in this work.
- Some
dcterms:typetriples don't have a resourceIdentifier, e.g.ECLI:NL:RBMNE:2016:1637:<dcterms:type rdf:language="nl" resourceIdentifier="">Uitspraak</dcterms:type> - Some docs miss .nl in the URI; egECLI:NL:CBB:2002:AD9059:
psi:type="http://psi.rechtspraak/conclusie" - Many URIs aren't encoded properly, most notably the "gevolg" URIs: eg.
http://psi.rechtspraak.nl/gevolg#(Gedeeltelijke) vernietiging en zelf afgedaan. Consideringthe official URI specification, spaces are illegal in URIs.- This also applies to some references, eg. inhttp://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:HR:1992:AA2957:
1.0:v:BWB:BWBV0001506&artikel=7 (oud)&g=1992-12-23 - Most dramatically, the URI
http://psi.rechtspraak.nl/procedure#
tussenbeschikking¬ontains line feeds (seeECLI:NL:RBMNE:2016:1780)
- This also applies to some references, eg. inhttp://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:HR:1992:AA2957:
Some issues derived froman earlier report:
In general,the W3C RDF validator crashes on input documents
The subject of a triple is not always clear. There are two dcterms:modified properties described, and it is unclear which one refers to the date on which the document was modified and which one to the date on which the metadata was modified.
Values are usually not typed, for example in the case of dates.
Resource identifiers are not always used, when they easily can be. An example is the
dcterms:coverageproperty. This might not seem important, such as in the case of dcterms:accessRights, which is fixed to the string literal public. But RDF processors typically do not treat two equal strings literals as the same concept: URIs are used for that. (Also, properties in the Dublin Core normally define a range which usually imply URIs.)There are some ECLI identifiers that turn up when searching for documents that have a body, but actually do not have a body. Encountered are:
Property-specific issues:
dcterms:referencesprefixes the resourceIdentifier attribute with the namespace of the corpus that the referent is in. This is not properly formed RDF.dcterms:subject: when a judgment is about multiple fields, a resource identifier is given that contains both subjects concatenated. An example ishttp://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_socialezekerheidsrecht. It makes more sense to have one URI for 'bestuursrecht' and one URI for 'socialezekerheidsrecht'.psi:zaaknummerdoesn't seem to split lists of identifiers correctly. A string like 97/8236 TW, 97/8241 TW is probably two case numbers, not one.
The XML defines a prefix that refers to the relative URI
bwb-dl. Prefixing to relative URIs is a practice that has been deprecated by W3C.
This work is a spin-off from my Master thesis,Automatic Assignment of Section Structure to Texts of Dutch Court Judgments.
GPL v3. This is a viral open source license. If you create derivatives,you must publish your code under compatible license terms.
About
👩⚖️ Library that sanitizes and formalizes the data model of Dutch court judgments from Rechtspraak.nl
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
