- Notifications
You must be signed in to change notification settings - Fork5
Historical statistics of every R package ever
ropensci-review-tools/pkgstats
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Extract summary statistics of R package structure and functionality. Notall statistics of course, but a good go at balancing insightfulstatistics while ensuring computational feasibility.pkgstats is astatic code analysis tool, so is generally very fast (a few seconds atmost for very large packages). Installation is described ina separatevignette.
Statistics are derived from these primary sources:
- Numbers of lines of code, documentation, and white space (bothbetween and within lines) in each directory and language
- Summaries of package
DESCRIPTIONfile and related packagemeta-statistics - Summaries of all objects created via package code across multiplelanguages and all directories containing source code (
./R,./src, and./inst/include). - A function call network derived from function definitions obtainedfromthe code tagging library,
ctags, andreferences (“calls”) to those obtained fromanother tagginglibrary,gtags. Thisnetwork roughly connects every object making a call (asfrom) withevery object being called (to). - An additional function call network connecting calls within Rfunctions to all functions from other R packages.
Theprimary function,pkgstats(),returns a list of these various components, including fulldata.frameobjects for the final three components described above. The statisticalproperties of this list can be aggregated by thepkgstats_summary()function,which returns adata.frame with a single row of summary statistics.This function is demonstrated below, including full details of allstatistics extracted.
The following code demonstrates the output of the main function,pkgstats, using an internally bundled.tar.gz “tarball” of thispackage. Thesystem.time call demonstrates that the static codeanalyses ofpkgstats are generally very fast.
library (pkgstats)tarball<- system.file ("extdata","pkgstats_9.9.tar.gz",package="pkgstats")system.time (p<- pkgstats (tarball))
## user system elapsed ## 1.701 0.124 1.802names (p)## [1] "loc" "vignettes" "data_stats" "desc" ## [5] "translations" "objects" "network" "external_calls"The result is a list of various data extracted from the code. All exceptforobjects andnetwork represent summary data:
p [!names (p)%in% c ("objects","network","external_calls")]
## $loc## # A tibble: 4 × 12## langage dir nfiles nlines ncode ndoc nempty nspaces nchars nexpr ntabs## <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <int>## 1 C++ src 3 365 277 21 67 933 7002 1 0## 2 R R 19 3741 2698 536 507 27575 94022 1 0## 3 R tests 2 146 121 1 24 395 2423 1 0## 4 R tests/tes… 5 202 145 9 48 375 3738 1 0## # ℹ 1 more variable: indentation <int>## ## $vignettes## vignettes demos ## 0 0 ## ## $data_stats## n total_size median_size ## 0 0 0 ## ## $desc## package version date license## 1 pkgstats 9.9 2022-05-12 19:41:22 GPL-3## urls## 1 https://docs.ropensci.org/pkgstats/,\nhttps://github.com/ropensci-review-tools/pkgstats## bugs aut ctb fnd rev ths## 1 https://github.com/ropensci-review-tools/pkgstats/issues 1 0 0 0 0## trl depends imports## 1 0 NA brio, checkmate, dplyr, fs, igraph, methods, readr, sys, withr## suggests## 1 hms, knitr, pbapply, pkgbuild, Rcpp, rmarkdown, roxygen2, testthat, visNetwork## enhances linking_to## 1 NA cpp11## ## $translations## [1] NAThe various components of these results are described in further detailin themain packagevignette.
A summary of thepkgstats data can be obtained by submitting theobject returned frompkgstats() to thepkgstats_summary()function:
s<- pkgstats_summary (p)
This function reduces the result of thepkgstats()functionto a single line with 95 entries, represented as adata.frame with onerow and that number of columns. This format is intended to enablesummary statistics from multiple packages to be aggregated by simplybinding rows together. While 95 statistics might seem like a lot, thepkgstats_summary()functionaims to return as many usable raw statistics as possible in order toflexibly allow higher-level statistics to be derived through combinationand aggregation. These 95 statistics can be roughly grouped into thefollowing categories (not shown in the order in which they actuallyappear), with variable names in parentheses after each description. Somestatistics are summarised as comma-delimited character strings, such astranslations into human languages, or other packages listed under“depends”, “imports”, or “suggests”. This enables subsequent analyses oftheir contents, for example of actual translated languages, or bothaggregate numbers and individual details of all package dependencies, asdemonstrated immediately below.
Package Summaries
- name (
package) - Package version (
version) - Package date, as modification time of
DESCRIPTIONfile where notexplicitly stated (date) - License (
license) - Languages, as a single comma-separated character value (
languages),and excludingRitself. - List of translations where package includes translations files, givenas list of (spoken) language codes (
translations).
Information fromDESCRIPTION file
- Package URL(s) (
url) - URL for BugReports (
bugs) - Number of contributors with role ofauthor (
desc_n_aut),contributor (desc_n_ctb),funder (desc_n_fnd),reviewer(desc_n_rev),thesis advisor (ths), andtranslator (trl,relating to translation between computer and not spoken languages). - Comma-separated character entries for all
depends,imports,suggests, andlinking_topackages.
Numbers of entries in each the of the last two kinds of items can beobtained from by a simplestrsplit call, like this:
deps<- strsplit (s$suggests,",") [[1]]length (deps)print (deps)
## [1] 9print (deps)## [1] "hms" "knitr" "pbapply" "pkgbuild" "Rcpp" ## [6] "rmarkdown" "roxygen2" "testthat" "visNetwork"Numbers of files and associated data
- Number of vignettes (
num_vignettes) - Number of demos (
num_demos) - Number of data files (
num_data_files) - Total size of all package data (
data_size_total) - Median size of package data files (
data_size_median) - Numbers of files in main sub-directories (
files_R,files_src,files_inst,files_vignettes,files_tests), where numbers arerecursively counted in all sub-directories, and whereinstonlycounts files in theinst/includesub-directory.
Statistics on lines of code
- Total lines of code in each sub-directory (
loc_R,loc_src,loc_ins,loc_vignettes,loc_tests). - Total numbers of blank lines in each sub-directory (
blank_lines_R,blank_lines_src,blank_lines_inst,blank_lines_vignette,blank_lines_tests). - Total numbers of comment lines in each sub-directory(
comment_lines_R,comment_lines_src,comment_lines_inst,comment_lines_vignettes,comment_lines_tests). - Measures of relative white space in each sub-directory (
rel_space_R,rel_space_src,rel_space_inst,rel_space_vignettes,rel_space_tests), as well as an overall measure for theR/,src/, andinst/directories (rel_space). - The number of spaces used to indent code (
indentation), with valuesof -1 indicating indentation with tab characters. - The median number of nested expression per line of code, counting onlythose lines which have any expressions (
nexpr).
Statistics on individual objects (including functions)
These statistics all refer to “functions”, but actually represent moregeneral “objects,” such as global variables or class definitions(generally from languages other than R), as detailed below.
- Numbers of functions in R (
n_fns_r) - Numbers of exported and non-exported R functions (
n_fns_r_exported,n_fns_r_not_exported) - Number of functions (or objects) in other computer languages(
n_fns_src), including functions in bothsrcandinst/includedirectories. - Number of functions (or objects) per individual file in R and in allother (
src) directories (n_fns_per_file_r,n_fns_per_file_src). - Median and mean numbers of parameters per exported R function(
npars_exported_mn,npars_exported_md). - Mean and median lines of code per function in R and other languages,including distinction between exported and non-exported R functions(
loc_per_fn_r_mn,loc_per_fn_r_md,loc_per_fn_r_exp_m,loc_per_fn_r_exp_md,loc_per_fn_r_not_exp_mn,loc_per_fn_r_not_exp_m,loc_per_fn_src_mn,loc_per_fn_src_md). - Equivalent mean and median numbers of documentation lines per function(
doclines_per_fn_exp_mn,doclines_per_fn_exp_md,doclines_per_fn_not_exp_m,doclines_per_fn_not_exp_md,docchars_per_par_exp_mn,docchars_per_par_exp_m).
Network Statistics
The full structure of thenetwork table is described below, withsummary statistics including:
- Number of edges, including distinction between languages (
n_edges,n_edges_r,n_edges_src). - Number of distinct clusters in package network (
n_clusters). - Mean and median centrality of all network edges, calculated from bothdirected and undirected representations of network(
centrality_dir_mn,centrality_dir_md,centrality_undir_mn,centrality_undir_md). - Equivalent centrality values excluding edges with centrality of zero(
centrality_dir_mn_no0,centrality_dir_md_no0,centrality_undir_mn_no0,centrality_undir_md_no). - Numbers of terminal edges (
num_terminal_edges_dir,num_terminal_edges_undir). - Summary statistics on node degree (
node_degree_mn,node_degree_md,node_degree_max)
External Call Statistics
The final column in the result ofthepkgstats_summary()functionsummarises theexternal_calls object detailing all calls make toexternal packages (including to base and recommended packages). Thissummary is also represented as a single character string. Each packagelists total numbers of function calls, and total numbers of uniquefunction calls. Data for each package are separated by a comma, whiledata within each package are separated by a colon.
s$external_calls
## [1] "base:447:78,brio:7:1,dplyr:7:4,fs:4:2,graphics:10:2,hms:1:1,igraph:3:3,pbapply:1:1,pkgstats:99:60,readr:8:5,stats:16:2,sys:13:1,tools:2:2,utils:10:7,visNetwork:3:2,withr:5:1"This structure allows numbers of calls to all packages to be readilyextracted with code like the following:
calls<- do.call (rbind, strsplit (strsplit (s$external_call,",") [[1]],":"))calls<-data.frame (package=calls [,1],n_total= as.integer (calls [,2]),n_unique= as.integer (calls [,3]))print (calls)
## package n_total n_unique## 1 base 447 78## 2 brio 7 1## 3 dplyr 7 4## 4 fs 4 2## 5 graphics 10 2## 6 hms 1 1## 7 igraph 3 3## 8 pbapply 1 1## 9 pkgstats 99 60## 10 readr 8 5## 11 stats 16 2## 12 sys 13 1## 13 tools 2 2## 14 utils 10 7## 15 visNetwork 3 2## 16 withr 5 1The two numeric columns respectively show the total number of calls madeto each package, and the total number of unique functions used withinthose packages. These results provide detailed information on numbers ofcalls made to, and functions used from, other R packages, including baseand recommended packages.
Finally, the summary statistics conclude with two further statistics ofafferent_pkg andefferent_pkg. These are package-internal measuresofafferent and efferentcouplingsbetween the files of a package. Theafferent couplings (ca) arenumbers ofincoming calls to each file of a package from functionsdefined elsewhere in the package, while theefferent couplings (ce)are numbers ofoutgoing calls from each file of a package to functionsdefined elsewhere in the package. These can be used to derive a measureof “internal package instability” as the ratio of efferent to totalcoupling (ce / (ce + ca)).
There are many other “raw” statistics returned by the mainpkgstats()function which are not represented inpkgstats_summary(). Themainpackagevignetteprovides further detail on the full results.
The following sub-sections provide further detail on theobjects,network, andexternal_call items, which could be used to extractadditional statistics beyond those described here.
Please note that this package is released with aContributor Code ofConduct. By contributing to thisproject, you agree to abide by its terms.
All contributions to this project are gratefully acknowledged using theallcontributors packagefollowing theallcontributorsspecification. Contributions of any kind are welcome!
mpadge | jhollist | jeroen | Bisaloo | thomaszwagerman |
helske | rpodcast | assignUser | GFabien | pawelru | stitam | willgearty |
krlmlr | noamross | maelle | mdsumner | kellijohnson-NOAA | ScottClaessens | schneiderpy |
About
Historical statistics of every R package ever
Topics
Resources
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors6
Uh oh!
There was an error while loading.Please reload this page.