Martin Morgan1 and Marcel Ramos2
1Roswell Park Comprehensive Cancer Center, Buffalo, NY, US
2CUNY School of Public Health at Hunter College, New York, NY, US
Userjsoncons for querying, transforming, and searching JSON,NDJSON, or R objects usingJMESpath,JSONpath, orJSONpointer.rjsoncons supportsJSON patch for documentediting, andJSON schema validation. Link to the package fordirect access to additional features in thejsoncons C++ library.
Install the released package version from CRAN
install.packages("rjsoncons", repos = "https://CRAN.R-project.org")Install the development version with
if (!requireNamespace("remotes", quiety = TRUE)) install.packages("remotes", repos = "https://CRAN.R-project.org")remotes::install_github("mtmorgan/rjsoncons")Attach the installed package to yourR session, and check theversion of the C++ library in use
library(rjsoncons)rjsoncons::version()## [1] "0.173.4 [+57967655d]"Functions in this package work on JSON or NDJSON character vectors,file paths and URLs to JSON or NDJSON documents, andR objects thatcan be transformed to a JSON string.
j_query()Here is a simple JSON example document
json <- '{ "locations": [ {"name": "Seattle", "state": "WA"}, {"name": "New York", "state": "NY"}, {"name": "Bellevue", "state": "WA"}, {"name": "Olympia", "state": "WA"} ]}'There are several common use cases. Userjsoncons to query theJSON string usingJSONpath,JMESpath orJSONpointersyntax to filter larger documents to records of interest, e.g., onlycities in New York state, using ‘JMESpath’ syntax.
j_query(json, "locations[?state == 'NY']") |> cat("\n")## [{"name":"New York","state":"NY"}]Use theas = "R" argument to extract deeply nested elements asRobjects, e.g., a character vector of city names in Washington state.
j_query(json, "locations[?state == 'WA'].name", as = "R")## [1] "Seattle" "Bellevue" "Olympia"The JSON Pointer specification is simpler, indexing a single object inthe document. JSON arrays are 0-based.
j_query(json, "/locations/0/state")## [1] "WA"The examples above usej_query(), which automatically infers queryspecification from the form ofpath usingj_path_type(). It may beuseful to indicate query specification more explicitly usingjsonpointer(),jsonpath(), orjmespath(); examples illustratingfeatures available for each query specification are on the help pages?jsonpointer,?jsonpath, and?jmespath.
j_pivot()The following transforms a nested JSON document into a format that canbe incorporated directly inR as adata.frame.
path <- '{ name: locations[].name, state: locations[].state}'j_query(json, path, as = "R") |> data.frame()## name state## 1 Seattle WA## 2 New York NY## 3 Bellevue WA## 4 Olympia WAThe transformation from JSON ‘array-of-objects’ to ‘object-of-arrays’suitable for direct representation as adata.frame is common, and isimplemented directly asj_pivot()
j_pivot(json, "locations", as = "data.frame")## name state## 1 Seattle WA## 2 New York NY## 3 Bellevue WA## 4 Olympia WAj_pivot() also supportas = "tibble" when thedplyr package isinstalled.
rjsoncons supportsNDJSON (new-line delimited JSON). NDJSONconsists of a file or character vector where each line / elementrepresents a JSON record. This example uses data from theGitHubArchive project recording all actions on public GitHubrepositories. The data included in the package are the first 10 linesofhttps://data.gharchive.org/2023-02-08-0.json.gz.
ndjson_file <- system.file(package = "rjsoncons", "extdata", "2023-02-08-0.json")NDJSON can be read into R (ndjson <- readLines(ndjson_file)) andused inj_query() /j_pivot(), but it is often better to leavefull NDJSON files on disk. Thus the first argument toj_query() orj_pivot() is usually a (text or gz-compressed) file path or URL.Two additional options are available when working withNDJSON.n_records limits the number of records processed. Usingn_records can be very useful when exploring the data. For instance,the first record of a file can be viewed interactively with
j_query(ndjson_file, n_records = 1) |> listviewer::jsonedit()The optionverbose = TRUE adds a progress indicator, which providesconfidence that progress is being made while parsing large files. Theprogress bar requires thecli package.
j_query() provides a one-to-one mapping of NDJSON lines / elementsto the return value, e.g.,j_query(ndjson_file, "@", as = "string")on an NDJSON file with 1000 lines will return a character vector of1000 elements, or withj_query(ndjson, "@", as = "R") anR listwith length 1000.
j_query(ndjson_file, "{id: id, type: type}", n_records = 5)## [1] "{\"id\":\"26939254345\",\"type\":\"DeleteEvent\"}"## [2] "{\"id\":\"26939254358\",\"type\":\"PushEvent\"}" ## [3] "{\"id\":\"26939254361\",\"type\":\"CreateEvent\"}"## [4] "{\"id\":\"26939254365\",\"type\":\"CreateEvent\"}"## [5] "{\"id\":\"26939254366\",\"type\":\"PushEvent\"}"j_pivot() transforms an NDJSON file or character vector of objectsinto a format convenient for input inR.j_pivot() with NDJSONfiles and JMESpath paths work particularly well together, becauseJMESpath provides flexibility in creating JSON objects to be pivoted.
j_pivot(ndjson_file, "{id: id, type: type}", as = "data.frame")## id type## 1 26939254345 DeleteEvent## 2 26939254358 PushEvent## 3 26939254361 CreateEvent## 4 26939254365 CreateEvent## 5 26939254366 PushEvent## 6 26939254367 PushEvent## 7 26939254379 PushEvent## 8 26939254380 IssuesEvent## 9 26939254382 PushEvent## 10 26939254383 PushEventFiltering NDJSON files can require relatively more complicated paths,e.g., to filter ‘PushEvent’ types from organizations, construct aquery that acts on each NDJSON record to return an array of a singleobject, then apply a filter to replace uninteresting elements with0-length arrays (usingas = "tibble" often transforms theRlist-of-vectors to a tibble in a more pleasing and robust mannercompared toas = "data.frame").
path <- "[{id: id, type: type, org: org}] [?@.type == 'PushEvent' && @.org != null] | [0]"j_pivot(ndjson_file, path, as = "data.frame")## id type org.id org.login org.gravatar_id## 1 26939254358 PushEvent 123667276 johnbieren-testing ## 2 26939254382 PushEvent 123667276 johnbieren-testing ## org.url## 1 https://api.github.com/orgs/johnbieren-testing## 2 https://api.github.com/orgs/johnbieren-testing## org.avatar_url org.id.1 org.login.1## 1 https://avatars.githubusercontent.com/u/123667276? 120284018 mornystannit## 2 https://avatars.githubusercontent.com/u/123667276? 120284018 mornystannit## org.gravatar_id.1 org.url.1## 1 https://api.github.com/orgs/mornystannit## 2 https://api.github.com/orgs/mornystannit## org.avatar_url.1## 1 https://avatars.githubusercontent.com/u/120284018?## 2 https://avatars.githubusercontent.com/u/120284018?A more complete example is used in theNDJSON extendedvignette
rjsoncons can filter and transformR objects. These areconverted to JSON usingjsonlite::toJSON() before queries are made;toJSON() arguments likeauto_unbox = TRUE can be added to thefunction call.
## `lst` is an *R* listlst <- jsonlite::fromJSON(json, simplifyVector = FALSE)j_query(lst, "locations[?state == 'WA'].name | sort(@)", auto_unbox = TRUE) |> cat("\n")## ["Bellevue","Olympia","Seattle"]JSON Patch provides a simple way to edit or transform a JSONdocument using JSON commands.
j_patch_apply()Starting with the JSON document
json <- '{ "biscuits": [ { "name": "Digestive" }, { "name": "Choco Leibniz" } ]}'one can"add" another biscuit, and copy a favorite biscuit to a newlocations using the following patch
patch <- '[ {"op": "add", "path": "/biscuits/1", "value": { "name": "Ginger Nut" }}, {"op": "copy", "from": "/biscuits/2", "path": "/best_biscuit"}]'The paths are specified usingJSONpointer notation; remember thatJSON arrays are 0-based, compared to 1-basedR arrays. Applying thepatch results in a new JSON document.
j_patch_apply(json, patch)## [1] "{\"biscuits\":[{\"name\":\"Digestive\"},{\"name\":\"Ginger Nut\"},{\"name\":\"Choco Leibniz\"}],\"best_biscuit\":{\"name\":\"Choco Leibniz\"}}"Patches can also be created fromR objects with the helper functionj_patch_op().
ops <- c( j_patch_op( "add", "/biscuits/1", value = list(name = "Ginger Nut"), auto_unbox = TRUE ), j_patch_op("copy", "/best_biscuit", from = "/biscuits/2"))identical(j_patch_apply(json, patch), j_patch_apply(json, ops))## [1] TRUEj_patch_op() takes care of unboxingop=,path=, andfrom=, butsome care must be taken in ‘unboxing’ thevalue= argument foroperations such as ‘add’; it may also be appropriate to unbox onlyspecific fields, e.g.,
value <- list(name = jsonlite::unbox("Ginger Nut"))j_patch_op("add", "/biscuits/1", value = value)## [## {"op": "add", "path": "/biscuits/1", "value": {"name": "Ginger Nut"}}## ]From theJSON patch web site, available operations and exampleJSON are:
add – add elements to an existing document.
{"op": "add", "path": "/biscuits/1", "value": {"name": "Ginger Nut"}}remove – remove elements from a document.
{"op": "remove", "path": "/biscuits/0"}replace – replace one element with another
{ "op": "replace", "path": "/biscuits/0/name", "value": "Chocolate Digestive"}copy – copy a path to another location.
{"op": "copy", "from": "/biscuits/0", "path": "/best_biscuit"}move – move a path to another location.
{"op": "move", "from": "/biscuits", "path": "/cookies"}test – test for the existence of a path; if the path does notexist, do not apply any of the patch.
{"op": "test", "path": "/best_biscuit/name", "value": "Choco Leibniz"}Formal description of these operations is provided in Section 4 ofRFC6902. A patch command isalways an array, even when a singleoperation is involved.
j_patch_from()Thej_patch_from() function constructs a patch from the differencebetween two documents
j_patch_from(j_patch_apply(json, patch), json)## [1] "[{\"op\":\"replace\",\"path\":\"/biscuits/1/name\",\"value\":\"Choco Leibniz\"},{\"op\":\"remove\",\"path\":\"/biscuits/2\"},{\"op\":\"remove\",\"path\":\"/best_biscuit\"}]"JSON schema provides structure to JSONdocuments.j_schema_is_valid() checks that a JSON document is validagainst a specified schema, andj_schema_validate() tries toillustrate how a document deviates from the schema.
As an example considerj_patch_op(), where the operation is supposedto conform to theJSON patch schema. For convenience, a copy ofthis schema is available inrjsoncons.
## alternatively: schema <- "https://json.schemastore.org/json-patch"schema <- system.file(package = "rjsoncons", "extdata", "json-patch.json")cat(readLines(schema), sep = "\n")## {## "$schema": "http://json-schema.org/draft-04/schema#",## "definitions": {## "path": {## "description": "A JSON Pointer path.",## "type": "string"## }## },## "id": "https://json.schemastore.org/json-patch.json",## "items": {## "oneOf": [## {## "additionalProperties": false,## "required": ["value", "op", "path"],## "properties": {## "path": {## "$ref": "#/definitions/path"## },## "op": {## "description": "The operation to perform.",## "type": "string",## "enum": ["add", "replace", "test"]## },## "value": {## "description": "The value to add, replace or test."## }## }## },## {## "additionalProperties": false,## "required": ["op", "path"],## "properties": {## "path": {## "$ref": "#/definitions/path"## },## "op": {## "description": "The operation to perform.",## "type": "string",## "enum": ["remove"]## }## }## },## {## "additionalProperties": false,## "required": ["from", "op", "path"],## "properties": {## "path": {## "$ref": "#/definitions/path"## },## "op": {## "description": "The operation to perform.",## "type": "string",## "enum": ["move", "copy"]## },## "from": {## "$ref": "#/definitions/path",## "description": "A JSON Pointer path pointing to the location to move/copy from."## }## }## }## ]## },## "title": "JSON schema for JSONPatch files",## "type": "array"## }The well-formed ‘op’ is valid, andj_schema_validate() produces no output
op <- '[{ "op": "add", "path": "/biscuits/1", "value": { "name": "Ginger Nut" }}]'j_schema_is_valid(op, schema)## [1] TRUEj_schema_validate(op, schema)## [1] "[]"Introduce an invalid ‘op’,"op": "invalid_op", and the schema is nolonger valid.
op <- '[{ "op": "invalid_op", "path": "/biscuits/1", "value": { "name": "Ginger Nut" }}]'j_schema_is_valid(op, schema)## [1] FALSEThe reason can be understood from (careful!) consideration of theoutput ofj_schema_validate(), with reference to the schema itself.
j_schema_validate(op, schema, as = "tibble") |> tibble::glimpse()## Rows: 1## Columns: 6## $ valid <lgl> FALSE## $ evaluationPath <chr> "/items/oneOf"## $ schemaLocation <chr> "https://json.schemastore.org/json-patch.json#/items/…## $ instanceLocation <chr> "/0"## $ error <chr> "No schema matched, but exactly one of them is requir…## $ details <list> [[FALSE, "/items/oneOf/0/properties/op/enum", "https:…The validation indicates that the schemaevaluationPath‘/items/oneOf’ is not satisfied, because of theerror ‘No schema[i.e., ’oneOf’ elements] matched, …’.
The ‘details’ column summarizes why each of the 3 elements of/items/oneOf fails the schema specification; useas = "details" toextract this directly
j_schema_validate(op, schema, as = "details") |> tibble::glimpse()## Rows: 6## Columns: 5## $ valid <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE## $ evaluationPath <chr> "/items/oneOf/0/properties/op/enum", "/items/oneOf/1/…## $ schemaLocation <chr> "https://json.schemastore.org/json-patch.json#/items/…## $ instanceLocation <chr> "/0/op", "/0/op", "/0/value", "/0", "/0/op", "/0/valu…## $ error <chr> "'invalid_op' is not a valid enum value.", "'invalid_…This indicates that the first item in the schema is rejected because‘invalid_op’ is not a valid enum
j_query(schema, "/items/oneOf/0/properties/op/enum") |> noquote()## [1] ["add","replace","test"]Reasons for rejecting other items can be explored using similar steps.
It can sometimes be helpful to explore JSON documents by ‘flattening’the JSON to an object of path / value pairs, where the path is theJSONpointer path to the corresponding value. It is thenstraight-forward to search this flattened object for, e.g., the pathto a known field or value. As an example, consider the object
codes <- '{ "discards": { "1000": "Record does not exist", "1004": "Queue limit exceeded", "1010": "Discarding timed-out partial msg" }, "warnings": { "0": "Phone number missing country code", "1": "State code missing", "2": "Zip code missing" }}'The ‘flat’ JSON of this can be represented as named list (usingstr() to provide a compact visual representation)
j_flatten(codes, as = "R") |> str()## List of 6## $ /discards/1000: chr "Record does not exist"## $ /discards/1004: chr "Queue limit exceeded"## $ /discards/1010: chr "Discarding timed-out partial msg"## $ /warnings/0 : chr "Phone number missing country code"## $ /warnings/1 : chr "State code missing"## $ /warnings/2 : chr "Zip code missing"The names of the list are JSONpointer (default) or JSONpath, so can beused inj_query() andj_pivot() as appropriate
j_query(codes, "/discards/1010")## [1] "Discarding timed-out partial msg"There are two ways to find known keys and values. The first is to useexact matching to one or more keys or values, e.g.,
j_find_values( codes, c("Record does not exist", "State code missing"), as = "tibble")## # A tibble: 2 × 2## path value ## <chr> <chr> ## 1 /discards/1000 Record does not exist## 2 /warnings/1 State code missingj_find_keys(codes, "warnings", as = "tibble")## # A tibble: 3 × 2## path value ## <chr> <chr> ## 1 /warnings/0 Phone number missing country code## 2 /warnings/1 State code missing ## 3 /warnings/2 Zip code missingIt is also possible to match using a regular expression.
j_find_values_grep(codes, "missing", as = "tibble")## # A tibble: 3 × 2## path value ## <chr> <chr> ## 1 /warnings/0 Phone number missing country code## 2 /warnings/1 State code missing ## 3 /warnings/2 Zip code missingj_find_keys_grep(codes, "card.*/100", as = "tibble") # span key delimiters## # A tibble: 2 × 2## path value ## <chr> <chr> ## 1 /discards/1000 Record does not exist## 2 /discards/1004 Queue limit exceededKeys are always character vectors, but values can be of differenttype;j_find_values() supports searches on these.
j <- '{"x":[1,[2, 3]],"y":{"a":4}}'j_flatten(j, as = "R") |> str()## List of 4## $ /x/0 : int 1## $ /x/1/0: int 2## $ /x/1/1: int 3## $ /y/a : int 4j_find_values(j, c(2, 4), as = "tibble")## # A tibble: 2 × 2## path value## <chr> <int>## 1 /x/1/0 2## 2 /y/a 4A common operation might be to find the path to a know value, and thento query the original JSON to find the object in which the value iscontained.
j_find_values(j, 3, as = "tibble")## # A tibble: 1 × 2## path value## <chr> <int>## 1 /x/1/1 3## path to '3' is '/x/1/1', so containing object is at '/x/1'j_query(j, "/x/1")## [1] "[2,3]"j_query(j, "/x/1", as = "R")## [1] 2 3Both JSONpointer and JSONpath are supported; an advantage of thelatter is that the path distinguishes between integer-valued(unquoted) and string-valued (quoted) keys
j_find_values(j, 3, as = "tibble", path_type = "JSONpath")## # A tibble: 1 × 2## path value## <chr> <int>## 1 $['x'][1][1] 3The first argument toj_find_*() can be anR object, JSON orNDJSON string, file, or URL. Usingj_find_values() with anRobject and JSONpathpath_type leads to a path that is easilyconverted into anR index: double the[ and] in the path andincrement each numerical index by 1:
l <- j |> as_r()j_find_values(l, 3, auto_unbox = TRUE, path_type = "JSONpath", as = "tibble")## # A tibble: 1 × 2## path value## <chr> <int>## 1 $['x'][1][1] 3l[['x']][[2]] # siblings## [1] 2 3NDJSON files are flattened into character vectors, witheach element the flattened version of the corresponding NDJSON record.
The package includes a JSON parser, used with the argumentas = "R"or directly withas_r()
as_r('{"a": 1.0, "b": [2, 3, 4]}') |> str()#> List of 2#> $ a: num 1#> $ b: int [1:3] 2 3 4The main rules of this transformation are outlined here. JSON arraysof a single type (boolean, integer, double, string) are transformed toR vectors of the same length and corresponding type.
as_r('[true, false, true]') # boolean -> logical## [1] TRUE FALSE TRUEas_r('[1, 2, 3]') # integer -> integer## [1] 1 2 3as_r('[1.0, 2.0, 3.0]') # double -> numeric## [1] 1 2 3as_r('["a", "b", "c"]') # string -> character## [1] "a" "b" "c"JSON arrays mixing integer and double values are transformed toR numeric vectors.
as_r('[1, 2.0]') |> class() # numeric## [1] "numeric"If a JSON integer array contains a value larger thanR’s 32-bitinteger representation, the array is transformed to anR numericvector. NOTE that this results in loss of precision for JSON integervalues greater than2^53.
as_r('[1, 2147483648]') |> class() # 64-bit integers -> numeric## [1] "numeric"JSON objects are transformed toR named lists.
as_r('{}')## named list()as_r('{"a": 1.0, "b": [2, 3, 4]}') |> str()## List of 2## $ a: num 1## $ b: int [1:3] 2 3 4There are several additional details. A JSON scalar and a JSON vectorof length 1 are represented in the same way inR.
identical(as_r("3.14"), as_r("[3.14]"))## [1] TRUEJSON arrays mixing types other than integer and double are transformed toR lists
as_r('[true, 1, "a"]') |> str()## List of 3## $ : logi TRUE## $ : int 1## $ : chr "a"JSONnull values are represented asRNULL values; arrays ofnull are transformed to lists
as_r('null') # NULL## NULLas_r('[null]') |> str() # list(NULL)## List of 1## $ : NULLas_r('[null, null]') |> str() # list(NULL, NULL)## List of 2## $ : NULL## $ : NULLOrdering of object members is controlled by theobject_names=argument. The default preserves names as they appear in the JSONdefinition; use"sort" to sort names alphabetically. This argumentis applied recursively.
json <- '{"b": 1, "a": {"d": 2, "c": 3}}'as_r(json) |> str()## List of 2## $ b: int 1## $ a:List of 2## ..$ d: int 2## ..$ c: int 3as_r(json, object_names = "sort") |> str()## List of 2## $ a:List of 2## ..$ c: int 3## ..$ d: int 2## $ b: int 1The parser corresponds approximately tojsonlite::fromJSON() withargumentssimplifyVector = TRUE, simplifyDataFrame = FALSE, simplifyMatrix = FALSE). Unit tests (using thetinytestframework) providing additional details are available at
system.file(package = "rjsoncons", "tinytest", "test_as_r.R")jsonlite::fromJSON()The built-in parser can be replaced by alternative parsers by returningthe query as a JSON string, e.g., using thefromJSON() in thejsonlite package.
json <- '{ "locations": [ {"name": "Seattle", "state": "WA"}, {"name": "New York", "state": "NY"}, {"name": "Bellevue", "state": "WA"}, {"name": "Olympia", "state": "WA"} ]}'j_query(json, "locations[?state == 'WA']") |> ## `fromJSON()` simplifies list-of-objects to data.frame jsonlite::fromJSON()## name state## 1 Seattle WA## 2 Bellevue WA## 3 Olympia WATherjsoncons package is particularly useful when accessingelements that might otherwise require complicated application ofnestedlapply(),purrr expressions, ortidyrunnest_*()(seeR for Data Science chapter ‘Hierarchical data’).
The package includes the complete ‘jsoncons’ C++ header-only library,available to other R packages by adding
LinkingTo: rjsonconsSystemRequirements: C++11to the DESCRIPTION file. Typical use in an R package would alsoincludeLinkingTo: specifications for thecpp11 orRcpp(this package usescpp11) packages to provide a C / C++ interfacebetween R and the C++ ‘jsoncons’ library.
This vignette was compiled using the following software versions
sessionInfo()## R Under development (unstable) (2025-03-08 r87910)## Platform: aarch64-apple-darwin24.3.0## Running under: macOS Sequoia 15.3.1## ## Matrix products: default## BLAS: /Users/mtmorgan/bin/R-devel/lib/libRblas.dylib ## LAPACK: /Users/mtmorgan/bin/R-devel/lib/libRlapack.dylib; LAPACK version 3.12.1## ## locale:## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8## ## time zone: America/New_York## tzcode source: internal## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages:## [1] rjsoncons_1.3.2 BiocStyle_2.35.0## ## loaded via a namespace (and not attached):## [1] vctrs_0.6.5 cli_3.6.4 knitr_1.49 ## [4] rlang_1.1.5 xfun_0.50 jsonlite_1.8.9 ## [7] glue_1.8.0 htmltools_0.5.8.1 sass_0.4.9 ## [10] rmarkdown_2.29 evaluate_1.0.3 jquerylib_0.1.4 ## [13] tibble_3.2.1 fastmap_1.2.0 yaml_2.3.10 ## [16] lifecycle_1.0.4 bookdown_0.42 BiocManager_1.30.25## [19] compiler_4.5.0 pkgconfig_2.0.3 digest_0.6.37 ## [22] R6_2.6.1 utf8_1.2.4 pillar_1.10.1 ## [25] magrittr_2.0.3 bslib_0.9.0 tools_4.5.0 ## [28] cachem_1.1.0