- Notifications
You must be signed in to change notification settings - Fork1
Scripts for uploading Dutch legislation to a CouchDB database
License
statengeneraal/tools-bwb-cloner
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Dutch government has changed (and greatly improved) its open data service which now completely overshadows this project. This repository now only serves as an artifact of history.
This project contains some scripts to clone Dutch laws from the official government CRM to a CouchDB database. Theresulting database is faster, easier to use and keeps track of laws through time. We also generate a table of contents.
While the Dutch government publishes Dutch laws through their own BWB service, some problems exist with it:
- The service is very slow
- Only one version of a law (the current) is available; there are no historical consolidations available
- Bulk requests are awkward (we can only download ametadata dump for all laws as a zip file)
While theMetaLex Document Server solves the first two of these problems, and arguably thethird, it also introduces problems, like it performs destructive XML transforations and throws away the source material.Also, the MDS has not been updated since April 2014.
InstallDocker.
# Run from Docker registrydocker run \ -ti \ -e COUCH_URL_WETTEN={COUCHDB_URL} \ -e COUCH_USER_WETTEN={COUCHDB_USERNAME} \ -e COUCH_PASSWORD_WETTEN={COUCHDB_PASSWORD} \ digitalheir/bwb-clonerInstallRuby 2.1.6
Set environment variables:
COUCH_URL_WETTEN | URL to a CouchDB database |
COUCH_USER_WETTEN | username for CouchDB database |
COUCH_PASSWORD_WETTEN | password for CouchDB database |
Run script:
gem install bundlerbundle installruby update_couch_db
The reference database is hosted athttps://wetten.cloudant.com/. Read access is public.
Following the standardCouchDB Document API, one may access anydocument through the_all_docs view with some query parameters set, .e.g,:https://wetten.cloudant.com/bwb/_all_docs?limit=10&startkey="BWBR0002178"&endkey="BWBR0002179"
This will show full documents, but we may be interested in just the metadata. I have defined some additional views:
| View name | Description | Example |
|---|---|---|
all_from_metalex | Likeall, but shows only index files that have been converted from MetaLex | http://wetten.cloudant.com/bwb/_design/RegelingInfo/_view/all_from_metalex?limit=10 |
all_non_metalex | Likeall, but shows only index files that havenot been converted from MetaLex | http://wetten.cloudant.com/bwb/_design/RegelingInfo/_view/all_non_metalex?limit=10 |
countKinds | Summary showing the number of documents pertaining to a certain kind (e.g.: law, circulaire, etc.) | http://wetten.cloudant.com/bwb/_design/RegelingInfo/_view/countKinds |
Documents are stored as CouchDB JSON-files with metadata fields that are notexactly the same as the XML elements.CompareXmlConstants.rb withJsonConstants.rb. An attachment calleddata.xml contains the document content,unsurprisingly in XML format. Document IDs are of the form{BWBID}:{EXPRESSION DATE}, where the expression date isspecified by the fielddatumLaatsteWijzing (date last modified).
Note that a field called "xml" has been added to the documents, which should always benull. This is a remnant ofearly iterations of the database in which document content was inlined along with the metadata. The reasoning wasthat bulk requests are easier this way. However, metadata does not get compressed, as attachments do. Also inlining XMLmade it harder to query documents just for their metadata, needing secondary queries.
About
Scripts for uploading Dutch legislation to a CouchDB database
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.