| Title: | A Method to Download Department of Education College ScorecardData |
| Version: | 0.32.0 |
| Maintainer: | Benjamin Skinner <ben@btskinner.io> |
| URL: | https://www.btskinner.io/rscorecard/ |
| BugReports: | https://github.com/btskinner/rscorecard/issues |
| Description: | A method to download Department of Education College Scorecard data using the public APIhttps://collegescorecard.ed.gov/data/data-documentation/. It is based on the 'dplyr' model of piped commands to select and filter data in a single chained function call. An API key from the U.S. Department of Education is required. |
| Depends: | R (≥ 4.1.0) |
| License: | MIT + file LICENSE |
| Imports: | dplyr, httr, jsonlite, lazyeval, tibble, tidyselect, tidyr,purrr |
| RoxygenNote: | 7.3.2 |
| Encoding: | UTF-8 |
| Suggests: | knitr, rmarkdown, testthat |
| NeedsCompilation: | no |
| Packaged: | 2025-04-30 14:23:08 UTC; benski |
| Author: | Benjamin Skinner |
| Repository: | CRAN |
| Date/Publication: | 2025-05-12 13:40:05 UTC |
rscorecard: A Method to Download College Scorecard Data.
Description
The rscorecard package provides a series of piped functions (a ladplyr) tofacilitate downloading Department of Education College Scorecarddata. In reality it is simply a method for converting idiomatic Rcode into a properly formatted URL string that is thenqueried. This package requires an API key, which can be requestedathttps://api.data.gov/signup/.
Details
All command pipes must start withsc_init(), end withsc_get(),and be linked with the base pipe,|>, or magrittr pipe function,%>%. Internal commands,sc_select,sc_filter,sc_year,sc_zip, cancome in any order in the pipe chain. Onlysc_select isrequired.
Author(s)
Maintainer: Benjamin Skinnerben@btskinner.io (ORCID)
See Also
Useful links:
Search data dictionary.
Description
This function is used to search the College Scorecarddata dictionary.
Usage
sc_dict( search_string, search_col = c("all", "description", "varname", "dev_friendly_name", "dev_category", "label", "source"), ignore_case = TRUE, limit = 10, confirm = FALSE, print_dev = FALSE, print_notes = FALSE, return_df = FALSE, print_off = FALSE, can_filter = FALSE, filter_vars = FALSE)Arguments
search_string | Character string for search. Can use regularexpression for search. Must escape special characters, |
search_col | Column to search. The default is to search allcolumns. Other options include: "varname","dev_friendly_name", "dev_category", "label". |
ignore_case | Search is case insensitive by default. Change to |
limit | Only the first 10 dictionary items are returned bydefault. Increase to return more values. Set to |
confirm | Use to confirm status of variable name indictionary. Returns |
print_dev | Set to |
print_notes | Set to |
return_df | Return a tibble of the subset data dictionary. |
print_off | Do not print to console; useful if you only wantto return a tibble of dictionary values. |
can_filter | Use to confirm that a variable can be used as afiltering variable. Returns |
filter_vars | Use to print variables that can be used tofilter calls. Use with argument |
Examples
## simple search for 'state' in any part of the dictionarysc_dict('state')## variable names starting with 'st'sc_dict('^st', search_col = 'varname')## return full dictionary (only recommended if not printing and## storing in object)df <- sc_dict('.', limit = Inf, print_off = TRUE, return_df = TRUE)## print list of variables that can be used to filterdf <- sc_dict('.', filter_vars = TRUE, return_df = TRUE)Filter scorecard data by variable values.
Description
This function is used to filter the downloaded scorecard data. Itconverts idiomatic R into the format required by the API call.
Usage
sc_filter(sccall, ...)sc_filter_(sccall, filter_string)Arguments
sccall | Current list of parameters carried forward from priorfunctions in the chain (ignore) |
... | Expressions to evaluate |
filter_string | Filter as character string or vector offilters as character strings |
Functions
sc_filter_(): Standard evaluation version ofsc_filter(filter_stringmust be a stringor vector of strings when using this version)
Examples
## Not run: sc_filter(region == 1) # New England institutionssc_filter(stabbr == c("TN","KY")) # institutions in Tennessee and Kentuckysc_filter(control != 3) # exclude private, for-profit institutionssc_filter(control == c(1,2)) # same as abovesc_filter(control == 1:2) # same as abovesc_filter(stabbr == "TN", control == 1, locale == 41:43) # TN rural publics## End(Not run)## Not run: sc_filter_("region == 1")sc_filter_("control != 3")## With internal strings, you must either use both double and single quotes## or escape internal quotessc_filter_("stabbr == c('TN','KY')")sc_filter_('stabbr == c(\'TN\',\'KY\')')## stored in objectfilters <- c("control == 1", "locale == 41:43")sc_filter_(filters)## End(Not run)Get scorecard data.
Description
This function gets the College Scorecard data by compiling andconverting all the previous piped output into a single URL stringthat is used to get the data.
Usage
sc_get( sccall, api_key, debug = FALSE, print_key_debug = FALSE, return_json = FALSE)Arguments
sccall | Current list of parameters carried forward from priorfunctions in the chain (ignore) |
api_key | Personal API key requested fromhttps://api.data.gov/signup stored in a string. If youfirst set your key using |
debug | Set to true to print and return API call (URL string)rather than make actual request. Should only be used whendebugging calls. |
print_key_debug | Only used when |
return_json | Return data in JSON format rather than as atibble. |
Obtain a key
To obtain an API key, visithttps://api.data.gov/signup
Examples
## Not run: sc_get("<API KEY IN STRING>")key <- "<API KEY IN STRING>"sc_get(key)## End(Not run)Initialize chained request.
Description
This function initializes the data request. It should always bethe first in the series of piped functions.
Usage
sc_init(dfvars = FALSE)Arguments
dfvars | Set to |
Examples
## Not run: sc_init()sc_init(dfvars = TRUE)## End(Not run)Store Data.gov API key in system environment.
Description
This function stores your data.gov API key in the system environmentso that you only have to load it once at the start of the session.If you set your key usingsc_key, then you may omitapi_key parameter in thesc_get function.
Usage
sc_key(api_key)Arguments
api_key | Personal API key requested fromhttps://api.data.gov/signup stored in a string. |
Obtain a key
To obtain an API key, visithttps://api.data.gov/signup.
Examples
## Not run: sc_key('<API KEY IN STRING>')## End(Not run)Select scorecard data variables.
Description
This function is used to select the variables returned in the final dataset.
Usage
sc_select(sccall, ...)sc_select_(sccall, vars)Arguments
sccall | Current list of parameters carried forward from priorfunctions in the chain (ignore) |
... | Desired variable names separated by commas (not case sensitive) |
vars | Character string of variable name or vector ofcharacter string variable names |
Functions
sc_select_(): Standard evaluation version ofsc_select(varsmust be string or vectorof strings when using this version)
Examples
## Not run: sc_select(UNITID)sc_select(UNITID, INSTNM)sc_select(unitid, instnm)## End(Not run)## Not run: sc_select_("UNITID")sc_select_(c("UNITID", "INSTNM"))sc_select_(c("unitid", "instnm"))## stored in objectvars_to_pull <- c("unitid","instnm")sc_select(vars_to_pull)## End(Not run)Select scorecard data year.
Description
This function is used to select the year of the data.
Usage
sc_year(sccall, year)Arguments
sccall | Current list of parameters carried forward from priorfunctions in the chain (ignore) |
year | Four-digit year or string |
Important notes
Not all variables have a year option.
At this time, only one year at a time is allowed.
The year selected is not necessarily the year the data were produced.It may be the year the data were collected. For data collected over splityears (fall to spring), it is likely the year represents the fall data (e.g.,2011 for 2011/2012 data).
Be sure to check with the College Scorecarddatadocumentation report when choosing the year.
Examples
## Not run: sc_year() # latestsc_year("latest")sc_year(2012)## End(Not run)Subset results to those within specified area around zip code.
Description
Subset results to those within specified area around zip code.
Usage
sc_zip(sccall, zip, distance = 25, km = FALSE)Arguments
sccall | Current list of parameters carried forward from priorfunctions in the chain (ignore) |
zip | A 5-digit zipcode |
distance | An integer distance in miles or kilometers |
km | A boolean value set to |
Note
Zip codes with leading zeros (Northeast) can becalled either using a string ("02111") or as a numeric(02111). R will drop the leading zero from the secondversion, butsc_zip() will add it back before thecall. The shortened version without the leading zero may alsobe used (2111 and "2111" both become "02111"), but is notrecommended for clarity.
Examples
## Not run: sc_zip(37203)sc_zip(37203, 50)sc_zip(37203, 50, km = TRUE)sc_zip("02111") # 1. Using stringsc_zip(02111) # 2. Dropped leading zero will be addedsc_zip(2111) # 3. Will become "02111" (not recommended)## End(Not run)