Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A collection of ggplot2 extensions and scripts for graphics in R

License

NotificationsYou must be signed in to change notification settings

bdilday/grcdr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This package provides a collection of customggplot2 extensions -Geoms,Stats,Coords,Themes, etc. The namegrcdr is a contraction ofGraphs with Code and Data in R and is a reference to theGraphs with Code and Data blog, to which this package is loosely related.

Installation

You can installgrcdr from github with:

# install.packages("devtools")devtools::install_github("bdilday/grcdr")

Examples

The below will usedplyr andggplot2

library(grcdr)## load basicslibrary(dplyr)library(ggplot2)## set the themeggplot2::theme_set(theme_minimal(base_size=14))

geom_excursion

geom_excursion plots running quantities as connected scatter plots. It requires anx andy aesthetic and also at aesthetic to give the ordering (i.e. "time")

MLB runs scored and allowed

Here's some example data provided with the package that gives team-level stats for 4 MLB teams (Cleveland 1999, New York Yankees 1998, Houston Astros 2017, Milwaukee Brewers 1982).

# load some example data from the packageteams_df= read.csv(system.file("extdata/team_stats.csv",package="grcdr"),stringsAsFactors=FALSE)teams_df$game_date= as.Date(teams_df$game_date)head(teams_df,2)#>          k     game_key game_source  game_date game_number site_key#> 1 CLE_1999 ANA199904060         evt 1999-04-06           0    ANA01#> 2 CLE_1999 ANA199904070         evt 1999-04-07           0    ANA01#>   season_phase team_alignment team_key opponent_key r_g r_w r_l r_t b_g#> 1            R              0      CLE          ANA   1   0   1   0   1#> 2            R              0      CLE          ANA   1   1   0   0   1#>   b_pa b_ab b_r b_h b_tb b_2b b_3b b_hr b_hr4 b_rbi b_gw b_bb b_ibb b_so#> 1   41   36   5  10   16    1    1    1     0     4   NA    4     0    6#> 2   47   40   9  13   17    4    0    0     0     9   NA    6     0    7#>   b_gdp b_hp b_sh b_sf b_sb b_cs b_xi b_lob p_g p_gs p_cg p_sho p_csho#> 1     0    0    1    0    0    0    0     9   1    1    0     0      0#> 2     1    1    0    0    2    0    0    11   1    1    0     0      0#>   p_gf p_w p_l p_sv p_out p_tbf p_ab p_r season season_game_number#> 1    1   0   1    0    24    35   32   6   1999                  1#> 2    1   1   0    0    27    34   31   1   1999                  2

We can plot runs scored (b_r) on the x-axis and runs allowed (p_r) on the y-axis. The time coordinate is the season game number.

base_plot=teams_df %>%   ggplot()+ labs(x="runs scored",y="runs allowed")

By default the data aren't averaged (the run length is 1)

p=base_plot+   geom_excursion(aes(x=b_r,y=p_r,t=season_game_number))+   facet_wrap(~k)print(p)

Therun_length parameter controls how many items get summed. The following usesrun_length = 10

p=base_plot+   geom_excursion(aes(x=b_r,y=p_r,t=season_game_number),run_length=10)+   facet_wrap(~k)print(p)

averages instead of sums

If you givex_weight ory_weight aesthetics then weighted averages are computed instead of sums. Passing_weight = 1 therefore results in straight averages.

p=base_plot+   geom_excursion(aes(x=b_r,y=p_r,t=season_game_number,x_weight=1,y_weight=1),run_length=10)+   facet_wrap(~k)print(p)

time ordering

Additionally, the time ordering can be changed

set.seed(101)random_idx= sample(1:nrow(teams_df), nrow(teams_df))# order by randomp=teams_df %>% cbind.data.frame(random_idx=random_idx) %>%    ggplot()+ labs(x="runs scored",y="runs allowed")+   geom_excursion(aes(x=b_r,y=p_r,t=random_idx,x_weight=1,y_weight=1),run_length=10)+   facet_wrap(~k)print(p)

stat_run

stat_run is a lower-level utility thangeom_excursion. It can change thegeom (frompath) and also can plot running line charts in addition to connected scatterplots.

smoothed line charts

Here I set the y aesthetic but not x, which is effectively a line chart.

p=teams_df %>% ggplot()+  stat_run(aes(y=b_r,t=season_game_number))+   facet_wrap(~k)+ labs(x="game number",y="runs scored")print(p)

However, unlike a traditional line chart, I can average or sum the y variable

p=teams_df %>% ggplot()+  stat_run(aes(y=b_r,t=season_game_number),run_length=10)+   facet_wrap(~k)+ labs(x="game number",y="runs scored")print(p)

The default is to generate running totals by using the cumulative sum function and taking differences. A different cumulative aggregation function can be specified, however. Note that following is for illustration and that the run_fun doesn't correspond to any particularly useful quantity(that I'm aware of, anyway).

fun_with_cumsum_fun=function(x) {  cumsum(x* (x-1)* sin(x/5*pi))}p=teams_df %>% ggplot()+  stat_run(aes(y=p_r,t=season_game_number),run_length=10,y_run_fun=fun_with_cumsum_fun)+   facet_wrap(~k)+ labs(x="game number",y="runs scored")print(p)

reproducing a geom_excursion

Setting thex aesthetic reproduces ageom_excursion

p=teams_df %>% ggplot()+  stat_run(aes(x=b_r,y=p_r,t=season_game_number),run_length=10)+   facet_wrap(~k)+ labs(x="runs scored",y="runs allowed")print(p)

fill missing values

If the time coordinate is missing values, we can fill in the corresponding x and y.

censored_df=teams_df %>% filter(season_game_number<40|season_game_number>60)p=censored_df %>%   ggplot()+   stat_run(aes(y=b_r,t=season_game_number,x=season_game_number))+ facet_wrap(~k)print(p)

p=censored_df %>%   ggplot()+   stat_run(aes(y=b_r,t=season_game_number,x=season_game_number),run_length=10)+   facet_wrap(~k)print(p)

p=censored_df %>%   ggplot()+   stat_run(aes(y=b_r,t=season_game_number,x=season_game_number),run_length=10,y_run_fill_value=20,x_run_fill_value=50)+    facet_wrap(~k)print(p)

other geoms

Thegeom_excursion layer forces apathGeom, but the lower-levelstat_run layer can change theGeom. For example it can usepolygon (although unclear what the interpretation is)

p=teams_df %>%   ggplot()+   stat_run(aes(x=b_r,y=p_r,t=game_date),geom='polygon')+   facet_wrap(~k)print(p)

StatMDS

This stat applies dimensionality reduction using multi-dimensional scaling. As of this writing the available algorithms are principal components analysis (pca) or t-distributed stochastic neighbor embedding (tsne). The variables to use in the dimensionality reduction are passed in the aestheticsx# where# is an arbitrary integer. The defaultgeom isGeomPoint.

example data

set.seed(101)df1=data.frame(x1= rnorm(100))for (iin2:10) {k= sprintf("x%d",i)df1[,k]= rnorm(100)}# now, for the last 25 add a constant to create two well separated groupsdf1[75:100, ]=df1[75:100,]+2

pca

Use only 2 variables

set.seed(101)p=df1 %>% ggplot(aes(x1=x1,x2=x2))+   stat_mds(mds_method="pca")print(p)

Use them all

set.seed(101)p=df1 %>% ggplot(aes(x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8,x9=x9,x10=x10))+   stat_mds(mds_method="pca")print(p)

Use them all and label them

set.seed(101)p=df1 %>% mutate(rn=row_number()) %>%   ggplot(aes(x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8,x9=x9,x10=x10))+   stat_mds(mds_method="pca",geom="text", aes(label=rn))print(p)

tsne

Apply t-SNE. This requires theRtsne package.

set.seed(101)p=df1 %>% mutate(rn=row_number()) %>%   ggplot(aes(x1=x1,x2=x2,x3=x3,x4=x4,x5=x5,x6=x6,x7=x7,x8=x8,x9=x9,x10=x10))+   stat_mds(mds_method="tsne",geom="text", aes(label=rn))print(p)#> Loading required package: Rtsne

GeomTailScatter

This geom implements a tail scatter plot. It is inspired by thexenographics project. Thex andy aesthetics are points in a two-d plane. Subsequent variables are passed in aesthetics namedx# where x is an arbitrary integer. They do not need to start at1, however, the order will be interpreted lexigraphically. Thex# variables are mapped to lines extending at an angle of-(15 + 30 * i) degrees. This means that variables trying to use 12 or more variables in addition tox andy is not supported at this time and will result in lines that overlap.

geom_tailscatter example

Some simulated data

set.seed(101)df1=data.frame(x1= rnorm(100),x2= rnorm(100))df1$x3= with(df1,x1**2+ abs(x2))df1$x4=100*df1$x1**2# make a categorical vardf1$g=factor(sample(c(0,1),100,replace=TRUE))

Plot withgeom_tailscatter

p=df1 %>%   ggplot(aes(x=x1,y=x2,x3=x3,x4=x4))+   geom_tailscatter(size=2)print(p)

The parametertail_scale controls the length of the tail lines

p=df1 %>%   ggplot(aes(x=x1,y=x2,x3=x3,x4=x4))+   geom_tailscatter(size=2,tail_scale=0.5)print(p)

Color by group

p=df1 %>%   ggplot(aes(x=x1,y=x2,x3=x3,x4=x4,color=g))+   geom_tailscatter(size=2)print(p)

tsne_linked

tsne_linked is anhtmlwidget. It takes a data set, projects it into 2-dimensions using thet-SNE algorithm, and then plots a 2-d scatter plot. The points in the scatter plot are linked to a bar graph that shows the values of the coordinates that went into thet-SNE calculation. The scatter plot uses a Voronoi tessellation to make the mouse-over highlighting smoother.

example

simulated data

set.seed(101)df1=data.frame(x1= rnorm(100),x2= rnorm(100))df1$x3= rnorm(100)df1$x4= rnorm(100)df1$id= row.names(df1)df1$g= ifelse(df1$x1>0,1,0)tsne_coords= c("x1","x2","x3","x4")
tsne_linked(df1,tsne_coords=tsne_coords,label_var="id",group_var="g")

About

A collection of ggplot2 extensions and scripts for graphics in R

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp