This R package provides tools to accessPX-WEBAPI. Yourcontributions andbug reports and otherfeedback are welcome!
We can find more information on the PX-Web/PC-Axis APIhere.
PXWEB is an API structure developed by Statistics Sweden and othernational statistical institutions (NSI) to disseminate public statisticsin a structured way. This API enables downloading and using data fromstatistical agencies without using a web browser direct overHTTP/HTTPS.
Thepxweb R package connects any PXWEB API to R andfacilitates the access, use and referencing of data from PXWEB APIs.
Anumber of organizations use PXWEB to distribute hierarchical data.You can browse the available data sets at:
The data in PXWEB APIs consists of metadata and data parts. Metadatais structured in a hierarchical node tree, where each node containsinformation about subnodes. The leaf nodes have information on which thedimensions are available for the data at that leaf node.
To install the latest stable release version from CRAN, just use:
To install the latest stable release version from GitHub, justuse:
Test the installation by loading the library:
A tutorial is included with the package with:
There are two ways of using thepxweb R package toaccess data, either interactively or using the core functions. To accessdata, two parts are needed, an URL to the data table in the API and aquery specifying what data is of interest.
The simplest way of usingpxweb is to use itinteractively, navigate the API to the data of interest, and then set upthe query of interest.
# Navigate through all pxweb api:s in the R package API catalogued<-pxweb_interactive()# Get data from SCB (Statistics Sweden)d<-pxweb_interactive("api.scb.se")# Fetching data from statfi (Statistics Finland)d<-pxweb_interactive("pxnet2.stat.fi")# Fetching data from StatBank (Statistics Norway)d<-pxweb_interactive("data.ssb.no")# To see all available PXWEB APIs usepxweb_apis<-pxweb_api_catalogue()In the example above, we use the interactive functionality from thePXWEB API root, but we could use any path to the API.
# Start with a specific path.d<-pxweb_interactive("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A")This functionality also means that we can navigate any PXWEB API,irrespectively of if they are a part of the R package API catalogue ornot. Just supply an URL to somewhere in the API and then navigate theAPI from there.
Due to new CRAN policies, it is not possible to use an R function toedit the API catalogue of the R package, but editing them can be donequickly from R usingfile.edit().
Although, if thepxweb is installed again, it willoverwrite the old API catalogue. So the easiest way is to add a PXWEBAPI to the global catalogue. To do this, do a pull request at the pxwebGitHub pagehere.
Under the hood, the pxweb package uses thepxweb_get()function to access data from the PXWEB API. It also keeps track of theAPI’s time limits and splits big queries into optimal downloadablechunks. If we usepxweb_get() without a query, the functioneither returns a PXWEB LEVELS object or a PXWEB METADATA object. What isreturned depends on if the URL points to a table in the API or not. Hereis an example of a PXWEB LEVELS object.
# Get PXWEB levelspx_levels<-pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/")px_levels## PXWEB LEVELS## BefolkningNy (t): Population by region, marital status, age and sex. Year 1968 - 2022## FolkmangdNov (t): Population 1 November by region, age and sex. Year 2002 - 2023## FolkmangdDistrikt (t): Population by district, Landscape or Part of the country by sex. Year 2015 - 2022## BefolkManad (t): Population per month by region, age and sex. Year 2000M01 - 2023M11## BefolkningR1860N (t): Population by age and sex. Year 1860 - 2022And if we usepxweb_get() for a table, a PXWEB METADATAobject is returned.
# Get PXWEB metadata about a tablepx_meta<-pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy")px_meta## PXWEB METADATA## Population by region, marital status, age, sex, observations and year ## variables:## [[1]] Region: region## [[2]] Civilstand: marital status## [[3]] Alder: age## [[4]] Kon: sex## [[5]] ContentsCode: observations## [[6]] Tid: yearTo download data, we need both the URL to the table and a queryspecifying what parts of the table are of interest. An URL to a table isan URL that will return a metadata object if not a query is supplied.Creating a query can be done in three main ways. The first and moststraightforward approach is to usepxweb_interactive() toexplore the table URL and create a query interactively.
The interactive function will return the query and the URL, even ifthe data is not downloaded.
## [1] "http://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy"## PXWEB QUERY## query:## [[1]] Region (item):## 00## [[2]] Civilstand (item):## OG, G, ÄNKL, SK## [[3]] Alder (item):## tot## [[4]] ContentsCode (item):## BE0101N1## [[5]] Tid (item):## 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017We can also turn the query into a JSON query that we can use outsideR.
## {## "query": [## {## "code": "Region",## "selection": {## "filter": "item",## "values": ["00"]## }## },## {## "code": "Civilstand",## "selection": {## "filter": "item",## "values": ["OG", "G", "ÄNKL", "SK"]## }## },## {## "code": "Alder",## "selection": {## "filter": "item",## "values": ["tot"]## }## },## {## "code": "ContentsCode",## "selection": {## "filter": "item",## "values": ["BE0101N1"]## }## },## {## "code": "Tid",## "selection": {## "filter": "item",## "values": ["2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017"]## }## }## ],## "response": {## "format": "json"## }## }The second approach is to specify the query either as an R list or aJSON object. Some Statistical Agencies, such as Statistics Sweden,supply queries directly as a JSON object on their web pages. We can usethese queries directly. Below is another example of a JSON query for thetable above. For details on setting up a JSON query, see the PXWEB APIdocumentation.
{ "query": [ { "code": "Civilstand", "selection": { "filter": "item", "values": ["OG", "G", "ÄNKL", "SK"] } }, { "code": "Kon", "selection": { "filter": "item", "values": ["1", "2"] } }, { "code": "ContentsCode", "selection": { "filter": "item", "values": ["BE0101N1"] } }, { "code": "Tid", "selection": { "filter": "item", "values": ["2015", "2016", "2017"] } } ], "response": { "format": "json" }}To use this JSON query, we store the JSON query as a file and supplythe path to the file to the “pxweb_query()”function.
Finally, we can create a PXWEB query from an R list where each listelement is a variable and selected observation.
pxweb_query_list<-list("Civilstand"=c("*"),# Use "*" to select all"Kon"=c("1","2"),"ContentsCode"=c("BE0101N1"),"Tid"=c("2015","2016","2017") )pxq<-pxweb_query(pxweb_query_list)pxq## PXWEB QUERY## query:## [[1]] Civilstand (all):## *## [[2]] Kon (item):## 1, 2## [[3]] ContentsCode (item):## BE0101N1## [[4]] Tid (item):## 2015, 2016, 2017We can validate the query against the metadata object to asses thatwe can use the query. This validation is done automatically when thedata is fetched withpxweb_get() but can also be donemanually.
When we have the URL to a data table and a query, we can download thedata with “pxweb_get()”. The function returns apxweb_data object that contains the downloaded data.
## PXWEB DATA## With 4 variables and 24 observations.If we instead want a JSON-stat object, we change the response formatto JSON-stat, and we will get a JSON-stat object returned.
pxq$response$format<-"json-stat"pxjstat<-pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq)pxjstat## {## "dataset": {## "dimension": {## "Civilstand": {## "label": ["marital status"],## "category": {## "index": {## "OG": [0],## "G": [1],## "ÄNKL": [2],## "SK": [3]## },## "label": {## "OG": ["single"],## "G": ["married"],## "ÄNKL": ["widowers/widows"],## "SK": ["divorced"]## }## },## "extension": {## "show": ["value"]## }## },## "Kon": {## "label": ["sex"],## "category": {## "index": {## "1": [0],## "2": [1]## },## "label": {## "1": ["men"],## "2": ["women"]## }## },## "link": {## "describedby": [## {## "extension": {## "Kon": ["Kön"]## }## }## ]## },## "extension": {## "show": ["value"]## }## },## "ContentsCode": {## "label": ["observations"],## "category": {## "index": {## "BE0101N1": [0]## },## "label": {## "BE0101N1": ["Population"]## },## "unit": {## "BE0101N1": {## "base": ["number"],## "decimals": [0]## }## }## },## "extension": {## "show": ["value"]## }## },## "Tid": {## "label": ["year"],## "category": {## "index": {## "2015": [0],## "2016": [1],## "2017": [2]## },## "label": {## "2015": ["2015"],## "2016": ["2016"],## "2017": ["2017"]## }## },## "extension": {## "show": ["code"]## }## },## "id": [## ["Civilstand"],## ["Kon"],## ["ContentsCode"],## ["Tid"]## ],## "size": [## [4],## [2],## [1],## [3]## ],## "role": {## "metric": [## ["ContentsCode"]## ],## "time": [## ["Tid"]## ]## }## },## "label": ["Population by marital status, sex, observations and year"],## "source": ["Statistics Sweden"],## "updated": ["2023-02-09T07:57:00Z"],## "value": [## [2762601],## [2820248],## [2870477],## [2394842],## [2437315],## [2477012],## [1651482],## [1672460],## [1687016],## [1639519],## [1657129],## [1671381],## [99751],## [99654],## [99682],## [345008],## [340709],## [335961],## [417132],## [420985],## [425487],## [540682],## [546653],## [553226]## ],## "extension": {## "px": {## "infofile": ["BE0101"],## "tableid": ["TAB638"],## "decimals": [0]## }## }## }## }Some return formats return files. Then, these responses are stored inthe Rtempdir() folded, and the file paths are returned bypxweb_get(). Currently,px andsdmx formats can be downloaded as files, but file an issueif you need other response formats.
pxq$response$format<-"px"pxfp<-pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq)pxfp## [1] "/var/folders/x9/dsgck_4s5mx2nrzzs8zd64rc0000gq/T//RtmpFdmiD7/50026bd2b2d8df2e3f190ca568b3b587d8207465.px"If the queries are large (contain more values than the PXWEB APImaximum allowed values), the query is chunked into optimal chunks and isthen downloaded sequentially. PXWEB data objects are then combined intoone large PXWEB data object, while JSON-stat objects are returned as alist of JSON-stat objects, and other files are stored intempdir() as separate files.
For more advanced connections to the API, thepxweb_advanced_get() gives the flexibility to access theunderlying HTTP calls usinghttr and log the HTTP calls fordebugging.
We can then convert the downloaded PXWEB data objects to adata. frame or to a character matrix. The character matrixcontains the “raw” data whiledata. frame returns an Rdata.frame in a tidy format. This conversion means missingvalues (such as “..” are converted toNA) in adata. frame. Using the argumentsvariable.value.type andcolumn.name.type, wecan choose if we want the code or the text column names and valuetypes.
## marital status sex year Population## 1 single men 2015 2762601## 2 single men 2016 2820248## 3 single men 2017 2870477## 4 single women 2015 2394842## 5 single women 2016 2437315## 6 single women 2017 2477012## Civilstand Kon Tid BE0101N1## 1 OG 1 2015 2762601## 2 OG 1 2016 2820248## 3 OG 1 2017 2870477## 4 OG 2 2015 2394842## 5 OG 2 2016 2437315## 6 OG 2 2017 2477012Similarly, we can access the raw data as a character matrix withas.matrix.
## Civilstand Kon Tid BE0101N1 ## [1,] "OG" "1" "2015" "2762601"## [2,] "OG" "1" "2016" "2820248"## [3,] "OG" "1" "2017" "2870477"## [4,] "OG" "2" "2015" "2394842"## [5,] "OG" "2" "2016" "2437315"## [6,] "OG" "2" "2017" "2477012"In addition to the data, the PXWEB DATA object may also containcomments for the data. This can be accessed usingpxweb_data_comments() function.
## NO PXWEB DATA COMMENTSIn this case, we did not have any comments. If we have comments, wecan turn the comments into adata. frame with one commentper row.
Finally, if we use the data, we can easily create a citation for apxweb_data object using thepxweb_cite()function. For full reproducibility, please also cite the package.
## Statistics Sweden (2024). “Population by region, marital status, age,## sex, observations and year.” [Data accessed 2024-01-27 16:19:42.712139## using pxweb R package 0.16.3],## <https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy>.## ## A BibTeX entry for LaTeX users is## ## @Misc{,## title = {Population by region, marital status, age, sex, observations and year},## author = {{Statistics Sweden}},## organization = {Statistics Sweden},## address = {Stockholm, Sweden},## year = {2024},## url = {https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy},## note = {[Data accessed 2024-01-27 16:19:42.712139 using pxweb R package 0.16.3]},## }## Kindly cite the pxweb R package as follows:## ## Mans Magnusson, Markus Kainu, Janne Huovari, and Leo Lahti## (rOpenGov). pxweb: R tools for PXWEB API. URL:## http://github.com/ropengov/pxweb## ## A BibTeX entry for LaTeX users is## ## @Misc{,## title = {pxweb: R tools for PX-WEB API},## author = {Mans Magnusson and Markus Kainu and Janne Huovari and Leo Lahti},## year = {2019},## }SeeTROUBLESHOOTING.mdfor a list of current known issues.
This work can be freely used, modified and distributed under the openlicense specified in theDESCRIPTIONfile.
We created this vignette with
## R version 4.3.1 (2023-06-16)## Platform: aarch64-apple-darwin20 (64-bit)## Running under: macOS Sonoma 14.3## ## Matrix products: default## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0## ## locale:## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8## ## time zone: Europe/Stockholm## tzcode source: internal## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages:## [1] pxweb_0.17.0## ## loaded via a namespace (and not attached):## [1] backports_1.4.1 digest_0.6.33 R6_2.5.1 fastmap_1.1.1 ## [5] xfun_0.40 cachem_1.0.8 knitr_1.43 htmltools_0.5.6 ## [9] rmarkdown_2.24 cli_3.6.1 sass_0.4.7 jquerylib_0.1.4 ## [13] compiler_4.3.1 rstudioapi_0.15.0 tools_4.3.1 checkmate_2.2.0 ## [17] evaluate_0.21 bslib_0.5.1 yaml_2.3.7 rlang_1.1.2 ## [21] jsonlite_1.8.7