| Type: | Package |
| Title: | Functions and Datasets for the Data Science Course at IBAW |
| Version: | 1.2.0 |
| Description: | A collection of useful functions and datasets for the Data Science Course at IBAW. |
| License: | MIT + file LICENSE |
| URL: | https://stibu81.github.io/ibawds/,https://github.com/stibu81/ibawds |
| BugReports: | https://github.com/stibu81/ibawds/issues |
| Encoding: | UTF-8 |
| LazyData: | true |
| Language: | en-GB |
| RoxygenNote: | 7.3.2 |
| Depends: | R (≥ 4.1.0), dslabs |
| Imports: | stats, tools, grDevices, rlang, rstudioapi, remotes, ggplot2,tidyr, readr, scales, dplyr (≥ 1.1.0), stringr, purrr,magrittr, cli, memuse |
| Suggests: | knitr, rmarkdown, kableExtra, deldir, rvest, lubridate,nanoparquet, usethis, vdiffr, testthat (≥ 3.0.0), httr2, covr,spelling, withr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2025-08-18 16:39:55 UTC; slanz |
| Author: | Stefan Lanz [aut, cre] |
| Maintainer: | Stefan Lanz <slanz1137@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-08-18 17:10:07 UTC |
Functionality for Data Science at IBAW
Description
A collection of useful functions and datasets for the Data Science Course at IBAW.
Author(s)
Maintainer: Stefan Lanzslanz1137@gmail.com
See Also
Useful links:
Report bugs athttps://github.com/stibu81/ibawds/issues
Summarised Data on Restaurant Bills
Description
Summary of data on restaurant bills from the datasetreshape2::tips.Labels are in German.
Usage
billsFormat
A data frame with 8 rows and 4 variables:
- sex
sex of the bill payer
- time
time of day
- smoker
whether there were smokers in the party
- mean_bill
mean of all the bills in dollars
Wisconsin Breast Cancer Database
Description
Breast cancer database obtained from the University of WisconsinHospitals, Madison from Dr. William H. Wolberg. The data were collectedin 8 from 1989 to 1991 and are sorted in chronological order.
Usage
breast_cancerFormat
a tibble with 699 rows and 11 variables. All numerical values areintegers in the range 1 to 10.
- id
sample code number
- clump_thick
clump thickness
- unif_cell_size
uniformity of cell size
- unif_cell_shape
uniformity of cell shape
- marg_adh
marginal adhesion
- ep_cell_size
single epithelial cell size
- bare_nucl
bare nuclei
- bland_chromat
bland chromatin
- norm_nucl
normal nucleoli
- mitoses
mitoses
- class
"benign" (458) or "malignant" (241)
Source
The data is available on theUC Irvine Machine Learning Repository.
O. L. Mangasarian and W. H. Wolberg,Cancer diagnosis via linear programming,SIAM News, Volume 23(5) (1990) 1 & 18.
Check If the User Is Ready for the Course
Description
Check if the current system is ready for the course by verifying thefollowing:
R and RStudio are up to date
the ibawds package is up to date
all the required packages are installed
The function must be run from RStudio in order to run properly.
Usage
check_ibawds_setup()Value
a logical indicating whether the system is up to date (invisibly).Messages inform the user about the status of the system.
Find Packages Used For Lectures not Installed by ibawds
Description
ibawds offers the functioninstall_ibawds() which installs all the packagesthat are required for the course.check_lecture_packages() finds all thepackages that are used in the slides and exercise solution inside a directory.It then checks whether they are all installed byinstall_ibawds() andreturns a tibble of those that are not. This can help to identify, ifadditional packages need to be installed byinstall_ibawds().
Usage
check_lecture_packages(path = ".")Arguments
path | the path to a folder inside the directory with the slides andexercise solutions. The function automatically tries to identify thetop level directory of the course material. |
Value
a tibble with two columns:
- file
the file where the package is used
- package
the name of the package
Check All Links in a Text File
Description
Find and check all http(s) URLs in an text file.Only links starting withhttp:// orhttps:// are found and checked.
Usage
check_links_in_file(file)Arguments
file | the path to the file to be checked. |
Value
a tibble with two columns:
url: the URL that was found and checkedreachable: whether the URL could be reached
Check All Links in the Slide Deck
Description
Check links in all files of a slide deck usingcheck_links_in_file().
Usage
check_links_in_slides(path)Arguments
path | path to the top level directory of a lecture |
Value
a tibble listing the links that did not work.
Check That an URL Can Be Reached
Description
Send a request to an URL and return a logical indicating whether the requestwas successful.
Usage
check_url(url)Arguments
url | the URL to send the request to |
Value
a logical indicating whether the request was successful.
Cluster Data According to Centres and Recompute Centres
Description
For a given dataset and given centres,cluster_with_centers()assigns each data point to its closest centre and then recomputesthe centres as the mean of all points assigned to each class. An initialset of random cluster centres can be obtained withinit_rand_centers().These functions can be used to visualise the mechanism of k-means.
Usage
cluster_with_centers(data, centers)init_rand_centers(data, n, seed = sample(1000:9999, 1))Arguments
data | a data.frame containing only the variables to be used forclustering. |
centers | a data.frame giving the centres of the clusters. It must havethe same number of columns as |
n | the number of cluster centres to create |
seed | a random seed for reproducibility |
Value
a list containing two tibbles:
centers: the new centres of the clusters computed after cluster assignmentwith the given centrescluster: the cluster assignment for each point indatausing thecentres that were passed to the function
Examples
# demonstrate k-means with iris data# keep the relevant columnsiris2 <- iris[, c("Sepal.Length", "Petal.Length")]# initialise the cluster centresclust <- init_rand_centers(iris2, n = 3, seed = 2435)# plot the data with the cluster centreslibrary(ggplot2)ggplot(iris2, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(data = clust$centers, aes(colour = factor(1:3)), shape = 18, size = 6) + geom_point() + scale_colour_brewer(palette = "Set1")# assign clusters and compute new centresclust_new <- cluster_with_centers(iris2, clust$centers)# plot the data with clusteringclust$cluster <- clust_new$clustervoronoi_diagram(clust, x = "Sepal.Length", y = "Petal.Length", data = iris2)# plot the data with new cluster centresclust$centers <- clust_new$centersvoronoi_diagram(clust, x = "Sepal.Length", y = "Petal.Length", data = iris2, colour_data = FALSE)# this procedure may be repeated until the algorithm convergesHistory of the Number of Available CRAN Packages
Description
Table with the number of packages available on CRAN and the current R versionfor historic dates back to 21 June 2001.
Usage
cran_historyFormat
A data frame with 74 rows and4 variables.
- date
date
- n_packages
the number of available R packages on CRAN
- version
the then current version of R
- source
source of the data (see 'Details')
Details
Data on the number of packages on CRAN between 2001-06-21 and 2014-04-13is obtained fromCRANpackagesfrom the packageEcdat.This data was collected by John Fox and Spencer Graves.Intervals between data points are irregularly spaced. These data aremarked with "John Fox" or "Spencer Graves" in the columnsource.They are licenced under GPL-2/GPL-3.
Data between 2014-10-01 and 2023-03-06 was collected by the package authorfrom CRAN snapshots on Microsoft's MRAN, which was retired on 1 July 2023.Data was collected on the first day of each quarter.These data are marked with "MRAN" in the columnsource.
Newer data has been collected in irregular intervals using the functionsn_available_packages() andavailable_r_version().These data are marked with "CRAN" in the columnsource.
Examples
library(ggplot2)ggplot(cran_history, aes(x = date, y = n_packages)) + geom_point()Define LaTeX commands for statistical symbols
Description
Add the definitions for various useful LaTeXequation symbols for statistics to an RMarkdown or Quarto document.
Usage
define_latex_stats()Details
Run this function from within a code chunk in a RMarkdown or Quarto documentwith optionsresults = "asis" andecho = FALSE (see "Examples").It only works for pdf output.
It defines the following macros:\E,\P,\Var,\Cov,\Cor,\SD,\SE,\Xb,\Yb.
Value
The function returnsNULL invisibly. The command definitionsare output as a side effect.
Examples
## Not run: # add this code chunk to a RMarkdown or Quarto document```{r results = "asis", echo = FALSE} define_latex_stats()```## End(Not run)Dentition of Mammals
Description
Dental formulasfor various mammals. The dental formula describes the number of incisors,canines, premolars and molars per quadrant. Upper and lower teeth maydiffer and are therefore shown separately. The total number of teethis twice the number given.
Usage
dentitionFormat
Data frame with 66 rows and 9 variables:
- name
name of the mammal
- I
number of top incisors
- i
number of bottom incisors
- C
number of top canines
- c
number of bottom canines
- P
number of top premolars
- p
number of bottom premolars
- M
number of top molars
- m
number of bottom molars
Source
The data have been downloaded fromhttps://people.sc.fsu.edu/~jburkardt/datasets/hartigan/file19.txt
They come from the following textbook:
Hartigan, J. A. (1975).Clustering Algorithms, John Wiley, New York.
Table 9.1, page 170.
Simulated Dice Throws
Description
A list with 6 numeric vectors containing the resultof a number of simulated throws with a six-sided dice. Not all of thedice are fair and they are unfair in different ways.
Usage
dice_dataFormat
a list containing 6 numeric vectors with varying lengthbetween 158 and 1027. Theelements of the list are named "d1", "d2", etc.
Examples
# the numeric vectors differ in lengthlengths(dice_data)# compute the mean for each dicesapply(dice_data, mean)# look at the contingency table for dice 3table(dice_data$d3)Plot Density and Distribution Function With Markings
Description
Create plots of the density and distribution functionsof a probability distribution. It is possible tomark points and shade the area under the curve.
Usage
distribution_plot( fun, range, ..., points = NULL, var = "x", title = "Verteilungsfunktion", is_discrete = NULL)density_plot( fun, range, ..., from = NULL, to = NULL, points = NULL, var = "x", title = "Dichte", is_discrete = NULL)Arguments
fun | a density or distribution function that takesquantiles as its first argument. |
range | numeric vector of length two giving therange of quantiles to be plotted. |
... | further arguments that are passed to |
points | numeric vector giving quantiles where thefunction should be marked with a red dot (continuous) or a red bar(discrete). |
var | character giving the name of the quantilevariable. This is only used to label the axes. |
title | character giving the title of the plot |
is_discrete | logical indicating whether this is a discrete distribution.For discrete distributions, a bar plot is created. If omitted, the functiontries to automatically determine, whether the distributions is discrete.In case this should fail, set this argument explicitly. |
from,to | numeric values giving start and end of arange where the area under the density will be shaded (continuous)or the bars will be drawn in red (discrete).If only one of the two values is given, the shadingwill start at negative infinity or go until positive infinity,respectively. |
Value
aggplot object
Examples
# plot density of the normal distributiondensity_plot(dnorm, c(-5, 7), mean = 1, sd = 2, to = 3)# plot distribution function of the Poisson distributiondistribution_plot(ppois, c(0, 12), lambda = 4, points = c(2, 6, 10), var = "y")Downgrade Packages to an Older Version
Description
Downgrade packages to an older version available on CRAN. This can be usefulwhen debugging problems that might have arisen due to a package update.
Usage
downgrade_packages(pkg, dec_version = c("any", "patch", "minor", "major"))Arguments
pkg | character with the names of the packages to be downgraded. |
dec_version | character giving the version to decrease. Possiblevalues are "any", "patch", "minor", and "major". See 'Details'. |
Details
Using the argumentdec_version, the user can control which version willbe installed. The possible values are:
"any"The previous available version will be installed.
"patch"The newest available version with a smaller patch versionnumber will be installed. For packages with three version numbers, thisis the same as using
"any"."minor"The newest available version with a smaller minor versionnumber will be installed.
"major"The newest available version with a smaller major versionnumber will be installed.
Downgrading is only possible for packages that are currently installed. Forpackages that are not installed, a warning is issued.
The function usesremotes::install_version() to install a versionof a package that is older than the currently installed version.
Value
A character vector with the names of the downgraded packages, invisibly.
Evaluate Predictions for the Case Study Handed in By Students
Description
Evaluate Predictions for the Case Study Handed in By Students
Usage
evaluate_casestudy(prediction_files, solution_file)Arguments
prediction_files | character of file paths of csv files withmodel predictions. |
solution_file | path to the parquet file containing the correctsolutions. |
Details
The prediction files must be csv-files (comma separated) with two columns:
- id
a five-digit integer giving the ID of the person.
- class
the predicted income class, one of
"<=50K"and">50K".
Missing IDs and any class that is not one of the accepted values count asfailed predictions. The performance metrics are always computed on thefull data set, not just on the available predictions.
Value
a tibble with one row for each file given inprediction_files and thefollowing columns:
- rank
the rank of the prediction among all predictions in the tibbleThe tibble is sorted according to rank and ranking occurs first by
balanced_accuracyand thenaccuracy.- file
the name of the file that contained the prediction.
- n_valid
the number of valid predictions in the file.
- balanced_accuracy
the mean of sensitivity and specificity.
- accuracy
accuracy of the prediction.
- sensitivity
sensitivity, i.e., the rate of correct predictions forthe "positive" class
"<=50K".- specificity
specificity, i.e., the rate of correct predictions forthe "negative" class
">50K".
Find a Named Colour that is Similar to Any Given Colour
Description
Find the named colour that is most similar to a given colour.
Usage
find_similar_colour( colour, distance = c("euclidean", "manhattan"), verbose = interactive())Arguments
colour | a colour specified in one of three forms: a hexadecimal stringof the form |
distance | character indicating the distance metric to be used. |
verbose | should additional output be produced? This shows the RGB valuesfor the input colour, the most similar named colour and the difference betweenthe two. |
Value
a character of length one with the name of the most similar named colour.
Examples
find_similar_colour("#d339da")find_similar_colour(c(124, 34, 201))# suppress additional outputfind_similar_colour("#85d3a1", verbose = FALSE)# use Manhattan distancefind_similar_colour(c(124, 34, 201), distance = "manhattan")Galton's data on the heights of fathers and their children
Description
Two tables of father's heights with heights of one of their sons(galton_sons) or daughters (galton_daughters), respectively. All heightsare given in centimetres. It is created fromHistData::GaltonFamilies byrandomly selecting one son or daughter per family. Since some families consistof only sons or only daughters, not all families are contained in both tables.
Usage
galton_sonsgalton_daughtersFormat
Two data frames with 179 (galton_sons) or 176 (galton_daughters)$rows, respectively, and 2 variables:
- father
size of the father in cm.
- son/daughter
size of the son or daughter, respectively, in cm.
Get Files for File Reading Exercise
Description
Copy the files for an exercise for reading files to a directory.
Usage
get_reading_exercise_files(path, unzip = TRUE)Arguments
path | path where the files should be copied to. |
unzip | logical indicating whether the files should be unzipped. Set thisto |
Details
There are 8 files in total.Apart from a few errors that were introduced for the purpose of the exercise,they all contain the same data: information about 100 randomly selectedSwiss municipalities. The full file can be downloaded fromhttps://www.bfs.admin.ch/bfsstatic/dam/assets/7786544/master.
Value
Logical indicating the success of the copy operation.
Tables Used for Grading the Papers
Description
These functions create two tables that can be used for the gradingof the student's papers.
Usage
create_minreq_table( repro, n_tab, n_plot_kinds, n_plots, n_stat, lang = c("de", "en"))create_grading_table( p_text, p_tab, p_plot, p_code, p_stat, lang = c("de", "en"))Arguments
repro | logical, is the paper reproducible? |
n_tab | integer, number of tables |
n_plot_kinds | integer, number of different kinds of plots |
n_plots | integer, number of plots |
n_stat | integer, number of statistical computations |
lang | language to use in the tables. Supported languages areGerman ( |
p_text | numeric between 0 and 3, points given for the text |
p_tab | numeric between 0 and 3, points given for the tables |
p_plot | numeric between 0 and 5, points given for the plots |
p_code | numeric between 0 and 5, points given for the code |
p_stat | numeric between 0 and 5, points given for thestatistic computations |
Details
The tables are created usingknitr::kable() andkableExtra::kableExtra isused for additional styling.
create_minreq_table() creates a table that checks that the minimal requirementsare satisfied:
the paper must be reproducible
there must be at least one formatted table
there must be at least 5 plots of at least three different types
there must be at least two statistical computations
The table lists for each of those requirement whether it is satisfied or not.
create_grading_table() creates a table that gives grades in percent foreach of five categories:
Text
Tables
Plots
Code
Statistical computations
In each category, up to five points may be awarded. The last row of thetable gives the percentage over all categories.
Value
both functions return an object of classkableExtra.
Install the R-Packages Required for the Course
Description
A number of R-packages are used in the courses andthe video lectures. They are also dependencies ofthis package. Useinstall_ibawds() to install thepackages that are not yet installed.
Usage
install_ibawds()Details
This function checks whether all the packages thatibawds depends on,imports or suggests are installed. In interactive sessions, it eitherinforms the user that all packages are installed or asks to installmissing packages. The function relies onrlang::check_installed().
Value
nothing orNULL invisibly
Dataset mtcars without row names
Description
In themtcars dataset, the names of the car models arestored as row names. However, when working withggplot2 and otherpackages from thetidyverse, it is convenient to have all data in columns.mtcars2 is a variant ofmtcars that contains car models in a columninstead of storing them as row names.mtcars_na is the same dataset asmtcars2, but some of the columnscontain missing values.
Usage
mtcars2mtcars2_naFormat
A data frame with 32 rows and 12 variables. The format is identicaltomtcars and details can be found in its documentation. The onlydifference is that the car model names are stored in the columnmodelinstead of the row names.
Number of Available R Packages and R Versions from CRAN
Description
Obtain the number of available packages on CRAN and the current R version.
Usage
n_available_packages(cran = getOption("repos"))available_r_version(cran = getOption("repos"))Arguments
cran | character vector giving the base URL of the CRAN server to use. |
Details
The number of packages on CRAN and the R version can be obtained for selecteddates in the past from the datasetcran_history.
Note: Previously, these functions could obtain the number of packages onCRAN and the then current R version also for past dates by using snapshotsfrom Microsoft's MRAN. However, MRAN shut down on 1 July 2023 such that thisfunctionality is no longer available.
Value
the number of available packages as an integer or the R version number asa character
See Also
Noisy Data From a Tenth Order Polynomial
Description
Training and test data created from a tenth order polynomial with added noise.The polynomial is given by
f(x) = 2 x - 10 x^5 + 15 x^{10}
The noise follows a standard normal distribution. The data can be used todemonstrate overfitting. It is inspired by section II. B. inA high-bias, low-variance introduction to Machine Learning for physicists
Usage
noisy_dataFormat
a list of two tibbles with two columns each.x stands for theindependent,y for the dependent variable. The training data(noisy_data$train) contains 1000 rows, the test data (noisy_data$test)20 rows.
References
P. Mehta et al.,A high-bias, low-variance introduction to Machine Learning for physicistsPhys. Rep. 810 (2019), 1-124.arXiv:1803.08823doi:10.1016/j.physrep.2019.03.001
Protein Consumption in European Countries
Description
Protein Consumption from various sources in European countries inunspecified units. The exact year of data collection is not known but theoldest known publication of the data is from 1973.
Usage
proteinFormat
Data frame with 25 rows and 10 variables:
- country
name of the country
- red_meat
red meat
- white_meat
white meat
- eggs
eggs
- milk
milk
- fish
fish
- cereals
cereals
- starch
starchy foods
- nuts
pulses, nuts, oil-seeds
- fruit_veg
fruits, vegetables
Source
The data have been downloaded fromhttps://raw.githubusercontent.com/jgscott/STA380/master/data/protein.csv
They come from the following book:
Hand, D. J. et al. (1994).A Handbook of Small Data Sets,Chapman and Hall, London.
Chapter 360, p. 297.
In the book, it is stated that the data have first been published in
Weber, A. (1973).Agrarpolitik im Spannungsfeld der internationalen Ernährungspolitik,Institut für Agrarpolitik und Marktlehre, Kiel.
Create a Random Vector With Fixed Correlation With Another Vector
Description
rand_with_cor() creates a vector of random number that hascorrelationrho with a given vectory.Also mean and standard deviation of the random vectorcan be fixed by the user. By default, they will be equal to the meanand standard deviation ofy, respectively.
Usage
rand_with_cor(y, rho, mu = mean(y), sigma = sd(y))Arguments
y | a numeric vector |
rho | numeric value between -1 and 1 giving the desired correlation. |
mu | numeric value giving the desired mean |
sigma | numeric value giving the desired standarddeviation |
Value
a vector of the same length asy that has correlationrho withy.
Source
This solution is based on ananswer bywhuberonCross Validated.
Examples
x <- runif(1000, 5, 8)# create a random vector with positive correlationy1 <- rand_with_cor(x, 0.8)all.equal(cor(x, y1), 0.8)# create a random vector with negative correlation# and fixed mean and standard deviationy2 <- rand_with_cor(x, -0.3, 2, 3)all.equal(cor(x, y2), -0.3)all.equal(mean(y2), 2)all.equal(sd(y2), 3)Rescale Mean And/Or Standard Deviation of a Vector
Description
Rescale Mean And/Or Standard Deviation of a Vector
Usage
rescale(x, mu = mean(x), sigma = sd(x))Arguments
x | numeric vector |
mu | numeric value giving the desired mean |
sigma | numeric value giving the desired standarddeviation |
Details
By default, mean and standard deviation are not changed, i.e.,rescale(x) is identical tox. Only if a value is specifiedformu and/orsigma the mean and/or the standard deviation arerescaled.
Value
a numeric vector with the same length asx with meanmu andstandard deviationsigma.
Examples
x <- runif(1000, 5, 8)# calling rescale without specifying mu and sigma doesn't change anythingall.equal(x, rescale(x))# change the mean without changing the standard deviationx1 <- rescale(x, mu = 3)all.equal(mean(x1), 3)all.equal(sd(x1), sd(x))# rescale mean and standard deviationx2 <- rescale(x, mu = 3, sigma = 2)all.equal(mean(x2), 3)all.equal(sd(x2), 2)Road Casualties in Great Britain 1969-84
Description
Extract of the data in theSeatbelts dataset as a data frame. Theoriginal dataset is a multiple time series (classmts). Labels arein German.
Usage
seatbeltsFormat
A data frame with 576 rows and 3 variables:
- date
data of the first data of the month for which thedata was collected.
- seat
seat where the persons that were killed or seriouslyinjured were seated. One of "Fahrer" (driver's seat), "Beifahrer"(front seat), "Rücksitz" (rear seat).
- victims
number of persons that were killed or seriously injured.
Set Options for Slides
Description
Set options for ggplot plots and tibble outputs for IBAW slides.
Usage
set_slide_options( ggplot_text_size = 22, ggplot_margin_pt = rep(10, 4), tibble_print_max = 12, tibble_print_min = 8)Arguments
ggplot_text_size | Text size to be used in ggplot2 plots.This applies to all texts in the plots. |
ggplot_margin_pt | numeric vector of length 4 giving the sizes of thetop, right, bottom, and left margins in points. |
tibble_print_max | Maximum number of rows printed for a tibble. Setto |
tibble_print_min | Number of rows to be printed if a tibble has morethan |
Details
The function usesggplot2::theme_update() to modify the default themefor ggplot andoptions() to set base R options that influence the printingof tibbles.
Note that if you make changes to these options in a R Markdown file, you mayhave to delete the knitr cache in order for the changes to apply.
Value
a named list (invisibly) with to elements containing the old values of theoptions for the ggplot theme and the base R options, respectively. These canbe used to reset the ggplot theme and the base R options to their previousvalues.
Check Spelling in the Evaluation of the Papers or the Slide Decks
Description
Evaluation of the student papers, lecture slides and some exercises are alldone in the form of Rmd files. These function find all the relevantRmd-files in a directory and check the spelling using the packagespelling.
Usage
spell_check_evaluation(path = ".", students = NULL, use_wordlist = TRUE)spell_check_slides(path = ".", use_wordlist = TRUE)Arguments
path | path to the top level directory of the evaluations for |
students | an optional character vector with student names. If given,only the evaluation for these students will be checked. |
use_wordlist | should a list of words be excluded from the spellcheck? The package contains separate word lists for evaluations andslides/exercises with words that have typically appeared in these documentsin the past. When spell checking the paper evaluations, the names of thestudents will always be excluded from spell check, even if |
Details
spell_check_evaluation() finds Rmd-files with evaluations in subfoldersstarting from the current working directory or the directory given bypath. The file names must be of the form "Beurteilung_Student.Rmd", where"Student" must be replaced by the student's name. By default, words containedin a wordlist that is part of the package as well as all the students' namesare excluded from the spell check, but this can be turned off by settinguse_wordlist = FALSE. (Note that the students' names will still beexcluded.)
spell_check_slides() finds Rmd-files with evaluations in subfoldersstarting from the current working directory or the directory given bypath. In order to exclude a file from the spell check, it must be markedwith the html-comment
<!-- nospellcheck -->
The comment must appear either in the first line of the file or in thefirst two lines after the end of the YAML-header.
By default, words contained in a wordlist that is part of the package areexcluded from the spell check, but this can be turned off by settinguse_wordlist = FALSE.
Simulate Throws With One Or More Fair Dice
Description
Simulate throws with one or multiple fair dice with an arbitrarynumber of faces.
Usage
throw_dice(n, faces = 6L, dice = 1L)Arguments
n | number of throws. The value is cast to integer. |
faces | the number of faces of the dice. The value is cast to integer. |
dice | the number of dices to use for each throw. The value iscast to integer. |
Value
an integer vector of lengthn with the results of the throws.
Examples
# throw a single 6-sided dice 5 timesthrow_dice(5)# throw a single 20-sided dice 7 timesthrow_dice(7, faces = 20)# throw two 6-sided dice 9 timesthrow_dice(9, dice = 2)Create a Voronoi Diagram for a Clustering
Description
Create a Voronoi diagram for a given clustering object.
Usage
voronoi_diagram( cluster, x, y, data = NULL, show_data = !is.null(data), colour_data = TRUE, legend = TRUE, point_size = 2, linewidth = 0.7)Arguments
cluster | an object containing the result of a clustering, e.g.,created by |
x,y | character giving the names of the variables to be plottedon the x- and y-axis. |
data | The data that has been used to create the clustering. If thisis provided, the extension of the plot is adapted to the data and thedata points are plotted unless this is suppressed by specifying |
show_data | should the data points be plotted? This is |
colour_data | should the data points be coloured according to theassigned cluster? |
legend | should a colour legend for the clusters be plotted? |
point_size | numeric indicating the size of the data points and thecluster centres. |
linewidth | numeric indicating the width of the lines that separatethe areas for the clusters. Set to 0 to show no lines at all. |
Details
The function uses thedeldir package to create the polygons for theVoronoi diagram. The code has been inspired byggvoronoi, which canhandle more complex situations.
References
Garrett et al.,ggvoronoi: Voronoi Diagrams and Heatmaps with ggplot2,Journal of Open Source Software 3(32) (2018) 1096,doi:10.21105/joss.01096
Examples
cluster <- kmeans(iris[, 1:4], centers = 3)voronoi_diagram(cluster, "Sepal.Length", "Sepal.Width", iris)Wine Quality
Description
Physicochemical data and quality ratings for red and white PortugueseVinho Verde wines.
Usage
wine_qualityFormat
a tibble with 6497 rows and 13 variables:
- colour
colour of the wine; "red" (1'599) or "white" (4'898)
- fixed_acidity
tartaric acid per volume in
g/dm^3- volatile_acidity
acetic acid per volume in
g/dm^3- citric_acid
citric acid per volume in
g/dm^3- residual_sugar
residual sugar per volume in
g/dm^3- chlorides
sodium chloride per volume in
g/dm^3- free_sulfur_dioxide
free sulphur dioxide per volume in
mg/dm^3- total_sulfur_dioxide
total sulphur dioxide per volume in
mg/dm^3- density
density in
g/dm^3- pH
pH value
- sulphates
potassium sulphate per volume in
g/dm^3- alcohol
alcohol content per volume in %
- quality
quality score between 0 (worst) and 10 (best) determinedby sensory analysis.
Source
The data is available on theUC Irvine Machine Learning Repository.
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis,Modeling wine preferences by data mining from physicochemical properties,Decision Support Systems 47(4) (2009), 547-553.