Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Streaming generic JSON to RDF converter

License

NotificationsYou must be signed in to change notification settings

AtomGraph/JSON2RDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Streaming generic JSON to RDF converter

Reads JSON data and streams N-Triples output. The conversion algorithm is similar to that ofJSON-LD but accepts arbitrary JSON and does not require a@context.

The resulting RDF representation is lossless with the exception of array ordering and somedatatype round-tripping.The lost ordering should not be a problem in the majority of cases, as RDF applications tend to impose their own value-based ordering using SPARQLORDER BY.

A common use case is feeding the JSON2RDF output into a triplestore or SPARQL processor and using a SPARQLCONSTRUCT query to map the generic RDF to more specific RDF that uses terms from some vocabulary.SPARQL is an inherently more flexible RDF mapping mechanism than JSON-LD@context.

Build

mvn clean install

That should produce an executable JAR filetarget/json2rdf-jar-with-dependencies.jar in which dependency libraries will be included.

Maven

Each version is released to the Maven central repository ascom.atomgraph.etl.json/json2rdf

Usage

The JSON data is read fromstdin, the resulting RDF data is written tostdout.

JSON2RDF is available as a.jar as well as a Docker imageatomgraph/json2rdf (recommended).

Parameters:

  • base - the base URI for the data. Property namespace is constructed by adding# to the base URI.

Options:

  • --input-charset - JSON input encoding, by default UTF-8
  • --output-charset - RDF output encoding, by default UTF-8

Examples

JSON2RDF output is streaming and produces N-Triples, therefore we pipe it throughriot to get a more readable Turtle output.


Bob DuCharme's blog post on using JSON2RDF:Converting JSON to RDF.


JSON data inordinary-json-document.json

{"name":"Markus Lanthaler","homepage":"http://www.markus-lanthaler.com/","image":"http://twitter.com/account/profile_image/markuslanthaler"}

Java execution from shell:

cat ordinary-json-document.json| java -jar json2rdf-jar-with-dependencies.jar https://localhost/| riot --formatted=TURTLE

Alternatively, Docker execution from shell:

cat ordinary-json-document.json| docker run --rm -i -a stdin -a stdout -a stderr atomgraph/json2rdf https://localhost/| riot --formatted=TURTLE

Note that using Docker you need tobindstdin/stdout/stderr streams.

Turtle output

[<https://localhost/#homepage>"http://www.markus-lanthaler.com/" ;<https://localhost/#image>"http://twitter.com/account/profile_image/markuslanthaler" ;<https://localhost/#name>"Markus Lanthaler"] .

The following SPARQL query can be used to map this generic RDF to the desired target RDF, e.g. a structure that usesschema.org vocabulary.

BASE<https://localhost/>PREFIX :<#>PREFIX schema:<http://schema.org/>CONSTRUCT{?person schema:homepage?homepage ;    schema:image?image ;    schema:name?name .}{?person :homepage?homepageStr ;    :image?imageStr ;    :name?name .  BIND (URI(?homepageStr)AS?homepage)  BIND (URI(?imageStr)AS?image)}

Turtle output after the mapping

[<http://schema.org/homepage><http://www.markus-lanthaler.com/> ;<http://schema.org/image><http://twitter.com/account/profile_image/markuslanthaler> ;<http://schema.org/name>"Markus Lanthaler"] .

JSON data incity-distances.json

{"desc"    :"Distances between several cities, in kilometers.","updated" :"2014-02-04T18:50:45","uptodate":true,"author"  :null,"cities"  : {"Brussels": [      {"to":"London","distance":322},      {"to":"Paris","distance":265},      {"to":"Amsterdam","distance":173}    ],"London": [      {"to":"Brussels","distance":322},      {"to":"Paris","distance":344},      {"to":"Amsterdam","distance":358}    ],"Paris": [      {"to":"Brussels","distance":265},      {"to":"London","distance":344},      {"to":"Amsterdam","distance":431}    ],"Amsterdam": [      {"to":"Brussels","distance":173},      {"to":"London","distance":358},      {"to":"Paris","distance":431}    ]  }}

Java execution from shell:

cat city-distances.json| java -jar json2rdf-jar-with-dependencies.jar https://localhost/| riot --formatted=TURTLE

Alternatively, Docker execution from shell:

cat city-distances.json| docker run --rm -i -a stdin -a stdout -a stderr atomgraph/json2rdf https://localhost/| riot --formatted=TURTLE

Turtle output

[<https://localhost/#cities>    [<https://localhost/#Amsterdam>  [<https://localhost/#distance>"431"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Paris"                                                                   ] ;<https://localhost/#Amsterdam>  [<https://localhost/#distance>"358"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"London"                                                                   ] ;<https://localhost/#Amsterdam>  [<https://localhost/#distance>"173"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Brussels"                                                                   ] ;<https://localhost/#Brussels>   [<https://localhost/#distance>"322"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"London"                                                                   ] ;<https://localhost/#Brussels>   [<https://localhost/#distance>"265"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Paris"                                                                   ] ;<https://localhost/#Brussels>   [<https://localhost/#distance>"173"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Amsterdam"                                                                   ] ;<https://localhost/#London>     [<https://localhost/#distance>"358"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Amsterdam"                                                                   ] ;<https://localhost/#London>     [<https://localhost/#distance>"322"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Brussels"                                                                   ] ;<https://localhost/#London>     [<https://localhost/#distance>"344"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Paris"                                                                   ] ;<https://localhost/#Paris>      [<https://localhost/#distance>"431"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Amsterdam"                                                                   ] ;<https://localhost/#Paris>      [<https://localhost/#distance>"344"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"London"                                                                   ] ;<https://localhost/#Paris>      [<https://localhost/#distance>"265"^^<http://www.w3.org/2001/XMLSchema#int> ;<https://localhost/#to>"Brussels"                                                                   ]                                 ] ;<https://localhost/#desc>"Distances between several cities, in kilometers." ;<https://localhost/#updated>"2014-02-04T18:50:45" ;<https://localhost/#uptodate>true] .

Mapping Twitter export to RDF

You candownload your Twitter data which includes tweets intweets.js. Remove thewindow.YTD.tweets.part0 = string and save the rest astweets.json.

To get the RDF output, save the following query astweets.rq

BASE<https://twitter.com/>PREFIX :<#>PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>PREFIX sioc:<http://rdfs.org/sioc/ns#>PREFIX dct:<http://purl.org/dc/terms/>CONSTRUCT{?tweeta sioc:Post ;        sioc:id?id ;        dct:created?created ;        sioc:content?content ;        sioc:reply_of?reply_of .}{?tweet_obj :id?id ;        :created_at?created_at_string ;        :full_text?content .OPTIONAL    {?tweet_obj :in_reply_to_status_id?in_reply_to_status_id ;            :in_reply_to_screen_name?in_reply_to_screen_name .        BIND(URI(CONCAT(?in_reply_to_screen_name,"/status/", ?in_reply_to_status_id)) AS ?reply_of)    }    BIND("atomgraphhq" AS ?username)    BIND(URI(CONCAT(?username,"/status/", ?id)) AS ?tweet)    BIND(SUBSTR(?created_at_string,27,4) AS ?year_string)    BIND(SUBSTR(?created_at_string,5,3) AS ?month_string)    BIND(SUBSTR(?created_at_string,9,2) AS ?day_string)    VALUES (?month_string ?month_number_string)    {         ("Jan""01")         ("Feb""02")         ("Mar""03")         ("Apr""04")         ("May""05")         ("Jun""06")         ("Jul""07")         ("Aug""08")         ("Sep""09")         ("Oct""10")         ("Nov""11")         ("Dec""12")    }    BIND(SUBSTR(?created_at_string,12,8) AS ?time)    BIND(SUBSTR(?created_at_string,21,3) AS ?tz_hours)    BIND(SUBSTR(?created_at_string,24,2) AS ?tz_minutes)    BIND(STRDT(CONCAT(?year_string,"-", ?month_number_string,"-", ?day_string,"T", ?time, ?tz_hours,":", ?tz_minutes), xsd:dateTime) AS ?created)}

adjust your Twitter handle in the query string as?username, and then run this command:

cat tweets.json| docker run --rm -i -a stdin -a stdout -a stderr atomgraph/json2rdf https://twitter.com/> tweets.nt&& \    sparql --data tweets.nt --query tweets.rq> tweets.ttl

Output sample:

<https://twitter.com/atomgraphhq/status/1535239790693699587>a              sioc:Post ;        dct:created"2022-06-10T12:37:44+00:00"^^xsd:dateTime ;        sioc:content"Follow it on GitHub!\nhttps://t.co/pu5KkOoIOX" ;        sioc:id"1535239790693699587" ;        sioc:reply_of<https://twitter.com/atomgraphhq/status/1535211486582382593> .

Improvements to the mapping query are welcome.

Performance

Largest dataset tested so far: 2.95 GB / 30459482 lines of JSON to 4.5 GB / 21964039 triples in 2m10s.Hardware: x64 Windows 10 PC with Intel Core i5-7200U 2.5 GHz CPU and 16 GB RAM.

Dependencies


[8]ページ先頭

©2009-2025 Movatter.jp