bdilday/grcdrPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star13

A collection of ggplot2 extensions and scripts for graphics in R

License

MIT license

13 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
R		R
README_files		README_files
inst		inst
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
grcdr.Rproj		grcdr.Rproj

Repository files navigation

grcdr

This package provides a collection of customggplot2 extensions -Geoms,Stats,Coords,Themes, etc. The namegrcdr is a contraction ofGraphs with Code and Data in R and is a reference to theGraphs with Code and Data blog, to which this package is loosely related.

Installation

You can installgrcdr from github with:

# install.packages("devtools")devtools::install_github("bdilday/grcdr")

Examples

The below will usedplyr andggplot2

library(grcdr)## load basicslibrary(dplyr)library(ggplot2)## set the themeggplot2::theme_set(theme_minimal(base_size=14))

geom_excursion

geom_excursion plots running quantities as connected scatter plots. It requires anx andy aesthetic and also at aesthetic to give the ordering (i.e. "time")

MLB runs scored and allowed

Here's some example data provided with the package that gives team-level stats for 4 MLB teams (Cleveland 1999, New York Yankees 1998, Houston Astros 2017, Milwaukee Brewers 1982).

# load some example data from the packageteams_df= read.csv(system.file("extdata/team_stats.csv",package="grcdr"),stringsAsFactors=FALSE)teams_df$game_date= as.Date(teams_df$game_date)head(teams_df,2)#>          k     game_key game_source  game_date game_number site_key#> 1 CLE_1999 ANA199904060         evt 1999-04-06           0    ANA01#> 2 CLE_1999 ANA199904070         evt 1999-04-07           0    ANA01#>   season_phase team_alignment team_key opponent_key r_g r_w r_l r_t b_g#> 1            R              0      CLE          ANA   1   0   1   0   1#> 2            R              0      CLE          ANA   1   1   0   0   1#>   b_pa b_ab b_r b_h b_tb b_2b b_3b b_hr b_hr4 b_rbi b_gw b_bb b_ibb b_so#> 1   41   36   5  10   16    1    1    1     0     4   NA    4     0    6#> 2   47   40   9  13   17    4    0    0     0     9   NA    6     0    7#>   b_gdp b_hp b_sh b_sf b_sb b_cs b_xi b_lob p_g p_gs p_cg p_sho p_csho#> 1     0    0    1    0    0    0    0     9   1    1    0     0      0#> 2     1    1    0    0    2    0    0    11   1    1    0     0      0#>   p_gf p_w p_l p_sv p_out p_tbf p_ab p_r season season_game_number#> 1    1   0   1    0    24    35   32   6   1999                  1#> 2    1   1   0    0    27    34   31   1   1999                  2

We can plot runs scored (b_r) on the x-axis and runs allowed (p_r) on the y-axis. The time coordinate is the season game number.

base_plot=teams_df %>%   ggplot()+ labs(x="runs scored",y="runs allowed")

By default the data aren't averaged (the run length is 1)

p=base_plot+   geom_excursion(aes(x=b_r,y=p_r,t=season_game_number))+   facet_wrap(~k)print(p)

Therun_length parameter controls how many items get summed. The following usesrun_length = 10

p=base_plot+   geom_excursion(aes(x=b_r,y=p_r,t=season_game_number),run_length=10)+   facet_wrap(~k)print(p)

averages instead of sums

If you givex_weight ory_weight aesthetics then weighted averages are computed instead of sums. Passing_weight = 1 therefore results in straight averages.

p=base_plot+   geom_excursion(aes(x=b_r,y=p_r,t=season_game_number,x_weight=1,y_weight=1),run_length=10)+   facet_wrap(~k)print(p)

time ordering

Additionally, the time ordering can be changed

set.seed(101)random_idx= sample(1:nrow(teams_df), nrow(teams_df))# order by randomp=teams_df %>% cbind.data.frame(random_idx=random_idx) %>%    ggplot()+ labs(x="runs scored",y="runs allowed")+   geom_excursion(aes(x=b_r,y=p_r,t=random_idx,x_weight=1,y_weight=1),run_length=10)+   facet_wrap(~k)print(p)

stat_run

stat_run is a lower-level utility thangeom_excursion. It can change thegeom (frompath) and also can plot running line charts in addition to connected scatterplots.

smoothed line charts

Here I set the y aesthetic but not x, which is effectively a line chart.

p=teams_df %>% ggplot()+  stat_run(aes(y=b_r,t=season_game_number))+   facet_wrap(~k)+ labs(x="game number",y="runs scored")print(p)

However, unlike a traditional line chart, I can average or sum the y variable

p=teams_df %>% ggplot()+  stat_run(aes(y=b_r,t=season_game_number),run_length=10)+   facet_wrap(~k)+ labs(x="game number",y="runs scored")print(p)

The default is to generate running totals by using the cumulative sum function and taking differences. A different cumulative aggregation function can be specified, however. Note that following is for illustration and that the run_fun doesn't correspond to any particularly useful quantity(that I'm aware of, anyway).

fun_with_cumsum_fun=function(x) {  cumsum(x* (x-1)* sin(x/5*pi))}p=teams_df %>% ggplot()+  stat_run(aes(y=p_r,t=season_game_number),run_length=10,y_run_fun=fun_with_cumsum_fun)+   facet_wrap(~k)+ labs(x="game number",y="runs scored")print(p)

reproducing a geom_excursion

Setting thex aesthetic reproduces ageom_excursion

p=teams_df %>% ggplot()+  stat_run(aes(x=b_r,y=p_r,t=season_game_number),run_length=10)+   facet_wrap(~k)+ labs(x="runs scored",y="runs allowed")print(p)

fill missing values

If the time coordinate is missing values, we can fill in the corresponding x and y.

censored_df=teams_df %>% filter(season_game_number<40|season_game_number>60)p=censored_df %>%   ggplot()+   stat_run(aes(y=b_r,t=season_game_number,x=season_game_number))+ facet_wrap(~k)print(p)

p=censored_df %>%   ggplot()+   stat_run(aes(y=b_r,t=season_game_number,x=season_game_number),run_length=10)+   facet_wrap(~k)print(p)

p=censored_df %>%   ggplot()+   stat_run(aes(y=b_r,t=season_game_number,x=season_game_number),run_length=10,y_run_fill_value=20,x_run_fill_value=50)+    facet_wrap(~k)print(p)

other geoms

Thegeom_excursion layer forces apathGeom, but the lower-levelstat_run layer can change theGeom. For example it can usepolygon (although unclear what the interpretation is)

p=teams_df %>%   ggplot()+   stat_run(aes(x=b_r,y=p_r,t=game_date),geom='polygon')+   facet_wrap(~k)print(p)

StatMDS

This stat applies dimensionality reduction using multi-dimensional scaling. As of this writing the available algorithms are principal components analysis (pca) or t-distributed stochastic neighbor embedding (tsne). The variables to use in the dimensionality reduction are passed in the aestheticsx# where# is an arbitrary integer. The defaultgeom isGeomPoint.

example data

set.seed(101)df1=data.frame(x1= rnorm(100))for (iin2:10) {k= sprintf("x%d",i)df1[,k]= rnorm(100)}# now, for the last 25 add a constant to create two well separated groupsdf1[75:100, ]=df1[75:100,]+2

pca

Use only 2 variables

set.seed(101)p=df1 %>% ggplot(aes(x1=x1,x2=x2))+   stat_mds(mds_method="pca")print(p)

Use them all

set.seed(101)p=df1 %>% ggplot(aes(x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8,x9=x9,x10=x10))+   stat_mds(mds_method="pca")print(p)

Use them all and label them

set.seed(101)p=df1 %>% mutate(rn=row_number()) %>%   ggplot(aes(x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8,x9=x9,x10=x10))+   stat_mds(mds_method="pca",geom="text", aes(label=rn))print(p)

tsne

Apply t-SNE. This requires theRtsne package.

set.seed(101)p=df1 %>% mutate(rn=row_number()) %>%   ggplot(aes(x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8,x9=x9,x10=x10))+   stat_mds(mds_method="tsne",geom="text", aes(label=rn))print(p)#> Loading required package: Rtsne

GeomTailScatter

This geom implements a tail scatter plot. It is inspired by thexenographics project. Thex andy aesthetics are points in a two-d plane. Subsequent variables are passed in aesthetics namedx# where x is an arbitrary integer. They do not need to start at1, however, the order will be interpreted lexigraphically. Thex# variables are mapped to lines extending at an angle of-(15 + 30 * i) degrees. This means that variables trying to use 12 or more variables in addition tox andy is not supported at this time and will result in lines that overlap.

geom_tailscatter example

Some simulated data

set.seed(101)df1=data.frame(x1= rnorm(100),x2= rnorm(100))df1$x3= with(df1,x1**2+ abs(x2))df1$x4=100*df1$x1**2# make a categorical vardf1$g=factor(sample(c(0,1),100,replace=TRUE))

Plot withgeom_tailscatter

p=df1 %>%   ggplot(aes(x=x1,y=x2,x3=x3,x4=x4))+   geom_tailscatter(size=2)print(p)

The parametertail_scale controls the length of the tail lines

p=df1 %>%   ggplot(aes(x=x1,y=x2,x3=x3,x4=x4))+   geom_tailscatter(size=2,tail_scale=0.5)print(p)

Color by group

p=df1 %>%   ggplot(aes(x=x1,y=x2,x3=x3,x4=x4,color=g))+   geom_tailscatter(size=2)print(p)

tsne_linked

tsne_linked is anhtmlwidget. It takes a data set, projects it into 2-dimensions using thet-SNE algorithm, and then plots a 2-d scatter plot. The points in the scatter plot are linked to a bar graph that shows the values of the coordinates that went into thet-SNE calculation. The scatter plot uses a Voronoi tessellation to make the mouse-over highlighting smoother.

example

simulated data

set.seed(101)df1=data.frame(x1= rnorm(100),x2= rnorm(100))df1$x3= rnorm(100)df1$x4= rnorm(100)df1$id= row.names(df1)df1$g= ifelse(df1$x1>0,1,0)tsne_coords= c("x1","x2","x3","x4")

tsne_linked(df1,tsne_coords=tsne_coords,label_var="id",group_var="g")

About

A collection of ggplot2 extensions and scripts for graphics in R

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

License

bdilday/grcdr

Folders and files

Latest commit

History

Repository files navigation

grcdr

Installation

Examples

geom_excursion

MLB runs scored and allowed

averages instead of sums

time ordering

stat_run

smoothed line charts

reproducing a geom_excursion

fill missing values

other geoms

StatMDS

example data

pca

tsne

GeomTailScatter

geom_tailscatter example

tsne_linked

example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages