Thebdl package is an interface to Local Data Bank(BankDanych Lokalnych - bdl)API with a set ofuseful tools like quick plotting using data from the data bank.
Working withbdl is based on id codes. Most of the datadownloading functions require specifying one or vector of multiple unitor variable ids as a string.
It is recommended to use a private API key which u can gethere. To apply ituse:options(bdl.api_private_key ="your_key")
Also, every function returns data in Polish by default. If you wouldlike to get data in English, just addlang = "en" to anyfunction.
Any metadata information (unit levels, aggregates, NUTS codeexplanation, etc.) can be foundhere.
When searching for unit id, we can use two methods:
search_units()get_units()Units consist of 6 levels:
get_levels()The lowest - seventh level has its own separate functions with suffixlocalities. Warning - thelocalities functionshave a different set of arguments. Check package or API documentationfor more info.
Direct searchingsearch_units() takes couple differentarguments like:
name - required search phrase (can be emptystring)level - narrows returned units to given leveland more. To look for more arguments on any given function checkpackage or API documentation.
search_units(name ="wro")search_units(name ="",level =3)To get all units available in local data bank runget_units() without any argument(warning - it can eat datalimit very fast around 4.5k rows):
To narrow the list addunitParentId. The function willreturn all children units for a given parent at all levels. Addlevel argument to filter units even further.
get_units(parentId ="000000000000",level =5)Subjects are themed directories of variables.
We have two searching methods for both subjects and variables:
search_variables() andsearch_subjects()get_subjects() andget_variables()To directly search for subject we just provide search phrase:
search_subjects("lud")Subjects consist of 3 levels (categories, groups, subgroups) -K,G andP respectively. Thefourth level of the subject (child of a subgroup) would bevariables.
To list all top level subjects useget_subjects():
get_subjects()To list sub-subjects to given category or group useget_subjects() withparentId argument:
get_subjects(parentId ="K3")get_subjects(parentId ="G7")Firstly you can list variables for given subject (subgroup):
get_variables("P2425")Secondly, you can direct search variables withsearch_variables(). You can use an empty string asname to list all variables but I strongly advise against asit has around 40 000 rows and you will probably hit data limit.
search_variables("samochod")You can narrow the search to the given subject - subgroup:
search_variables("lud",subjectId ="P2425")If you picked unit and variable codes, you are ready to downloaddata. You can do this two ways:
get_data_by_unit()get_data_by_variable()We will useget_data_by_unit(). We specify our singleunit asunitId string argument and variables by a vector ofstrings. Optionally we can specify years of data. If not all availableyears are used.
get_data_by_unit(unitId ="023200000000",varId ="3643")get_data_by_unit(unitId ="023200000000",varId =c("3643","2137","148190"))To get more information about data we can addtypeargument and set it to"label" to add an additional columnwith the variable info.
get_data_by_unit(unitId ="023200000000",varId ="3643",type ="label")We will useget_data_by_variable(). We specify oursingle variable asvarId string argument. If nounitParentId is provided, the function will return allavailable units for a given variable. SettingunitParentIdwill return all available children units (on all levels). To narrow unitlevel setunitLevel. Optionally we can specify years ofdata. If not all available years are used.
get_data_by_variable("420",unitParentId ="011210000000",year =2013:2016)get_data_by_variable("420",unitLevel ="2",year =2013:2016)Thebdl package provides a couple of additionalfunctions for summarizing and visualizing data.
Data downloaded viaget_data_by_unit() orget_data_by_variable() and their locality versions can beeasily summarized bysummary():
df<-get_data_by_variable(varId ="3643",unitParentId ="010000000000")summary(df)Plotting functions in this package are interfaces to the datadownloading functions. Some of them require specifyingdata_type - a method for downloading data, and the rest ofthe arguments will be relevant to specifydata_typefunction. Check documentation for more details.
line_plot(data_type ="unit",unitId ="000000000000",varId =c("415","420"))pie_plot(data_type ="variable" ,"1","2018",unitParentId="042214300000",unitLevel ="6")Scatter plot is unique - requires vector of only 2 variables.
scatter_2var_plot(data_type ="variable" ,c("60559","415"),unitLevel ="2")Thebdl package comes with thebdl.mapsdataset containing spatial maps for each Poland’s level.generate_map() use them to generate maps filled with thebdl data. UseunitLevel to change the type of map. When thelower level is chosen, the map generation can be more time consuming asit has more spatial data to process. This function will download andload maps automatically. In case of any errors you can download themmanuallyhere.
Download data file and double-click to load it to environment.
generate_map(varId ="60559",year ="2017",unitLevel =3)Downloading functionsget_data_by_unit() andget_data_by_variable() have alternative “multi” downloadingmode. Function that would work for example single unit, if provided avector will make additional column with values for each unitprovided:
get_data_by_unit(unitId =c("023200000000","020800000000"),varId =c("3643","2137","148190"))Or multiple variables forget_data_by_variable():
get_data_by_variable(varId =c("3643","420"),unitParentId ="010000000000")This mode works for the locality version as well.
More consistent method of downloading multiple variables for multipleunits is provided byget_panel_data() function:
get_panel_data(unitId =c("030210101000","030210105000","030210106000"),varId =c("60270","461668"),year =c(2015:2016))It offers also parameterggplot = TRUE which producesoutput in the long form suitable for plotting with ggplot package:
library(ggplot2)df<-get_panel_data(unitId =c("030210101000","030210105000","030210106000"),varId =c("60270","461668"),year =c(2015:2018),ggplot =TRUE)ggplot(df,aes(x=year,y= values,color = unit))+geom_line(aes(linetype = variables))+scale_color_discrete(labels =c("A","B","C"))+scale_linetype_discrete(labels =c("X","Y"))