- Notifications
You must be signed in to change notification settings - Fork4
an R package which writes SPARQL queries
lvaudor/glitter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DSL for SPARQL in R. ✨
glitter
producessparkleSPARQL!:sparkles:
This package aims at writing and sending SPARQL queries without advancedknowledge of the SPARQL language syntax. It makes the exploration anduse of Linked Open Data (Wikidata in particular) easier for those who donot know SPARQL well.
With glitter, compared to writing SPARQL queries by hand, your codeshould be easier to write, and easier to read by your peers who do notknow SPARQL. The glitter package supports a “domain-specific language”(DSL) with function names (and syntax) closer to the tidyverse and baseR than to SPARQL.
For instance, to find a corpus of 5 articles with a title in English and“wikidata” in that title, instead of writing SPARQL by hand you can run:
library("glitter")query<- spq_init() %>% spq_add("?item wdt:P31 wd:Q13442814") %>% spq_label(item) %>% spq_filter(str_detect(str_to_lower(item_label),'wikidata')) %>% spq_head(n=5)query#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>#> SELECT ?item ?item_label#> WHERE {#>#> ?item wdt:P31 wd:Q13442814.#> OPTIONAL {#> ?item rdfs:label ?item_labell.#> FILTER(lang(?item_labell) IN ('en'))#> }#>#> BIND(COALESCE(?item_labell,'') AS#> ?item_label)FILTER(REGEX(LCASE(?item_label),"wikidata"))#> }#>#> LIMIT 5
Note how we were able to usestr_detect()
andstr_to_lower()
(as inthe stringr package) instead of SPARQL’s functionsREGEX
andLCASE
.
To perform the query,
spq_perform(query)#> # A tibble: 5 × 2#> item item_label#> <chr> <chr>#> 1 http://www.wikidata.org/entity/Q18507561 Wikidata: A Free Collaborative Knowl…#> 2 http://www.wikidata.org/entity/Q21503276 Utilizing the Wikidata system to imp…#> 3 http://www.wikidata.org/entity/Q21503284 Wikidata: A platform for data integr…#> 4 http://www.wikidata.org/entity/Q23712646 Wikidata as a semantic framework for…#> 5 http://www.wikidata.org/entity/Q24074986 From Freebase to Wikidata: The Great…
To get a random subset of movies with the date they were released, youcould use
spq_init() %>% spq_add("?film wdt:P31 wd:Q11424") %>% spq_label(film) %>% spq_add("?film wdt:P577 ?date") %>% spq_mutate(date= year(date)) %>% spq_head(10) %>% spq_perform()#> # A tibble: 10 × 3#> film date film_label#> <chr> <dbl> <chr>#> 1 http://www.wikidata.org/entity/Q372 2009 We Live in Public#> 2 http://www.wikidata.org/entity/Q595 2011 The Intouchables#> 3 http://www.wikidata.org/entity/Q595 2011 The Intouchables#> 4 http://www.wikidata.org/entity/Q595 2012 The Intouchables#> 5 http://www.wikidata.org/entity/Q595 2012 The Intouchables#> 6 http://www.wikidata.org/entity/Q593 2011 A Gang Story#> 7 http://www.wikidata.org/entity/Q1365 1974 Swept Away#> 8 http://www.wikidata.org/entity/Q1365 1974 Swept Away#> 9 http://www.wikidata.org/entity/Q1365 1975 Swept Away#> 10 http://www.wikidata.org/entity/Q1365 1975 Swept Away
Note that we were able to “overwrite” the date variable, which isstraightforward in dplyr, but not so much in SPARQL.
Install this packages through R-universe:
install.packages("glitter",repos="https://lvaudor.r-universe.dev")
Or through GitHub:
install.packages("remotes")#if remotes is not already installedremotes::install_github("lvaudor/glitter")
You can access the documentation regarding packageglitter
on itspkgdownwebsite.