Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Historical statistics of every R package ever

NotificationsYou must be signed in to change notification settings

ropensci-review-tools/pkgstats

Repository files navigation

R build statuscodecovProject Status: ActiveCRAN_Status_BadgeCRAN Downloads

pkgstats

Extract summary statistics of R package structure and functionality. Notall statistics of course, but a good go at balancing insightfulstatistics while ensuring computational feasibility.pkgstats is astatic code analysis tool, so is generally very fast (a few seconds atmost for very large packages). Installation is described ina separatevignette.

What statistics?

Statistics are derived from these primary sources:

  1. Numbers of lines of code, documentation, and white space (bothbetween and within lines) in each directory and language
  2. Summaries of packageDESCRIPTION file and related packagemeta-statistics
  3. Summaries of all objects created via package code across multiplelanguages and all directories containing source code (./R,./src, and./inst/include).
  4. A function call network derived from function definitions obtainedfromthe code tagging library,ctags, andreferences (“calls”) to those obtained fromanother tagginglibrary,gtags. Thisnetwork roughly connects every object making a call (asfrom) withevery object being called (to).
  5. An additional function call network connecting calls within Rfunctions to all functions from other R packages.

Theprimary function,pkgstats(),returns a list of these various components, including fulldata.frameobjects for the final three components described above. The statisticalproperties of this list can be aggregated by thepkgstats_summary()function,which returns adata.frame with a single row of summary statistics.This function is demonstrated below, including full details of allstatistics extracted.

Demonstration

The following code demonstrates the output of the main function,pkgstats, using an internally bundled.tar.gz “tarball” of thispackage. Thesystem.time call demonstrates that the static codeanalyses ofpkgstats are generally very fast.

library (pkgstats)tarball<- system.file ("extdata","pkgstats_9.9.tar.gz",package="pkgstats")system.time (p<- pkgstats (tarball))
##    user  system elapsed ##   1.701   0.124   1.802
names (p)
## [1] "loc"            "vignettes"      "data_stats"     "desc"          ## [5] "translations"   "objects"        "network"        "external_calls"

The result is a list of various data extracted from the code. All exceptforobjects andnetwork represent summary data:

p [!names (p)%in% c ("objects","network","external_calls")]
## $loc## # A tibble: 4 × 12##   langage dir        nfiles nlines ncode  ndoc nempty nspaces nchars nexpr ntabs##   <chr>   <chr>       <int>  <int> <int> <int>  <int>   <int>  <int> <dbl> <int>## 1 C++     src             3    365   277    21     67     933   7002     1     0## 2 R       R              19   3741  2698   536    507   27575  94022     1     0## 3 R       tests           2    146   121     1     24     395   2423     1     0## 4 R       tests/tes…      5    202   145     9     48     375   3738     1     0## # ℹ 1 more variable: indentation <int>## ## $vignettes## vignettes     demos ##         0         0 ## ## $data_stats##           n  total_size median_size ##           0           0           0 ## ## $desc##    package version                date license## 1 pkgstats     9.9 2022-05-12 19:41:22   GPL-3##                                                                                      urls## 1 https://docs.ropensci.org/pkgstats/,\nhttps://github.com/ropensci-review-tools/pkgstats##                                                       bugs aut ctb fnd rev ths## 1 https://github.com/ropensci-review-tools/pkgstats/issues   1   0   0   0   0##   trl depends                                                        imports## 1   0      NA brio, checkmate, dplyr, fs, igraph, methods, readr, sys, withr##                                                                         suggests## 1 hms, knitr, pbapply, pkgbuild, Rcpp, rmarkdown, roxygen2, testthat, visNetwork##   enhances linking_to## 1       NA      cpp11## ## $translations## [1] NA

The various components of these results are described in further detailin themain packagevignette.

Overview of statistics and thepkgstats_summary() function

A summary of thepkgstats data can be obtained by submitting theobject returned frompkgstats() to thepkgstats_summary()function:

s<- pkgstats_summary (p)

This function reduces the result of thepkgstats()functionto a single line with 95 entries, represented as adata.frame with onerow and that number of columns. This format is intended to enablesummary statistics from multiple packages to be aggregated by simplybinding rows together. While 95 statistics might seem like a lot, thepkgstats_summary()functionaims to return as many usable raw statistics as possible in order toflexibly allow higher-level statistics to be derived through combinationand aggregation. These 95 statistics can be roughly grouped into thefollowing categories (not shown in the order in which they actuallyappear), with variable names in parentheses after each description. Somestatistics are summarised as comma-delimited character strings, such astranslations into human languages, or other packages listed under“depends”, “imports”, or “suggests”. This enables subsequent analyses oftheir contents, for example of actual translated languages, or bothaggregate numbers and individual details of all package dependencies, asdemonstrated immediately below.

Package Summaries

  • name (package)
  • Package version (version)
  • Package date, as modification time ofDESCRIPTION file where notexplicitly stated (date)
  • License (license)
  • Languages, as a single comma-separated character value (languages),and excludingR itself.
  • List of translations where package includes translations files, givenas list of (spoken) language codes (translations).

Information fromDESCRIPTION file

  • Package URL(s) (url)
  • URL for BugReports (bugs)
  • Number of contributors with role ofauthor (desc_n_aut),contributor (desc_n_ctb),funder (desc_n_fnd),reviewer(desc_n_rev),thesis advisor (ths), andtranslator (trl,relating to translation between computer and not spoken languages).
  • Comma-separated character entries for alldepends,imports,suggests, andlinking_to packages.

Numbers of entries in each the of the last two kinds of items can beobtained from by a simplestrsplit call, like this:

deps<- strsplit (s$suggests,",") [[1]]length (deps)print (deps)
## [1] 9
print (deps)
## [1] "hms"        "knitr"      "pbapply"    "pkgbuild"   "Rcpp"      ## [6] "rmarkdown"  "roxygen2"   "testthat"   "visNetwork"

Numbers of files and associated data

  • Number of vignettes (num_vignettes)
  • Number of demos (num_demos)
  • Number of data files (num_data_files)
  • Total size of all package data (data_size_total)
  • Median size of package data files (data_size_median)
  • Numbers of files in main sub-directories (files_R,files_src,files_inst,files_vignettes,files_tests), where numbers arerecursively counted in all sub-directories, and whereinst onlycounts files in theinst/include sub-directory.

Statistics on lines of code

  • Total lines of code in each sub-directory (loc_R,loc_src,loc_ins,loc_vignettes,loc_tests).
  • Total numbers of blank lines in each sub-directory (blank_lines_R,blank_lines_src,blank_lines_inst,blank_lines_vignette,blank_lines_tests).
  • Total numbers of comment lines in each sub-directory(comment_lines_R,comment_lines_src,comment_lines_inst,comment_lines_vignettes,comment_lines_tests).
  • Measures of relative white space in each sub-directory (rel_space_R,rel_space_src,rel_space_inst,rel_space_vignettes,rel_space_tests), as well as an overall measure for theR/,src/, andinst/ directories (rel_space).
  • The number of spaces used to indent code (indentation), with valuesof -1 indicating indentation with tab characters.
  • The median number of nested expression per line of code, counting onlythose lines which have any expressions (nexpr).

Statistics on individual objects (including functions)

These statistics all refer to “functions”, but actually represent moregeneral “objects,” such as global variables or class definitions(generally from languages other than R), as detailed below.

  • Numbers of functions in R (n_fns_r)
  • Numbers of exported and non-exported R functions (n_fns_r_exported,n_fns_r_not_exported)
  • Number of functions (or objects) in other computer languages(n_fns_src), including functions in bothsrc andinst/includedirectories.
  • Number of functions (or objects) per individual file in R and in allother (src) directories (n_fns_per_file_r,n_fns_per_file_src).
  • Median and mean numbers of parameters per exported R function(npars_exported_mn,npars_exported_md).
  • Mean and median lines of code per function in R and other languages,including distinction between exported and non-exported R functions(loc_per_fn_r_mn,loc_per_fn_r_md,loc_per_fn_r_exp_m,loc_per_fn_r_exp_md,loc_per_fn_r_not_exp_mn,loc_per_fn_r_not_exp_m,loc_per_fn_src_mn,loc_per_fn_src_md).
  • Equivalent mean and median numbers of documentation lines per function(doclines_per_fn_exp_mn,doclines_per_fn_exp_md,doclines_per_fn_not_exp_m,doclines_per_fn_not_exp_md,docchars_per_par_exp_mn,docchars_per_par_exp_m).

Network Statistics

The full structure of thenetwork table is described below, withsummary statistics including:

  • Number of edges, including distinction between languages (n_edges,n_edges_r,n_edges_src).
  • Number of distinct clusters in package network (n_clusters).
  • Mean and median centrality of all network edges, calculated from bothdirected and undirected representations of network(centrality_dir_mn,centrality_dir_md,centrality_undir_mn,centrality_undir_md).
  • Equivalent centrality values excluding edges with centrality of zero(centrality_dir_mn_no0,centrality_dir_md_no0,centrality_undir_mn_no0,centrality_undir_md_no).
  • Numbers of terminal edges (num_terminal_edges_dir,num_terminal_edges_undir).
  • Summary statistics on node degree (node_degree_mn,node_degree_md,node_degree_max)

External Call Statistics

The final column in the result ofthepkgstats_summary()functionsummarises theexternal_calls object detailing all calls make toexternal packages (including to base and recommended packages). Thissummary is also represented as a single character string. Each packagelists total numbers of function calls, and total numbers of uniquefunction calls. Data for each package are separated by a comma, whiledata within each package are separated by a colon.

s$external_calls
## [1] "base:447:78,brio:7:1,dplyr:7:4,fs:4:2,graphics:10:2,hms:1:1,igraph:3:3,pbapply:1:1,pkgstats:99:60,readr:8:5,stats:16:2,sys:13:1,tools:2:2,utils:10:7,visNetwork:3:2,withr:5:1"

This structure allows numbers of calls to all packages to be readilyextracted with code like the following:

calls<- do.call (rbind,    strsplit (strsplit (s$external_call,",") [[1]],":"))calls<-data.frame (package=calls [,1],n_total= as.integer (calls [,2]),n_unique= as.integer (calls [,3]))print (calls)
##       package n_total n_unique## 1        base     447       78## 2        brio       7        1## 3       dplyr       7        4## 4          fs       4        2## 5    graphics      10        2## 6         hms       1        1## 7      igraph       3        3## 8     pbapply       1        1## 9    pkgstats      99       60## 10      readr       8        5## 11      stats      16        2## 12        sys      13        1## 13      tools       2        2## 14      utils      10        7## 15 visNetwork       3        2## 16      withr       5        1

The two numeric columns respectively show the total number of calls madeto each package, and the total number of unique functions used withinthose packages. These results provide detailed information on numbers ofcalls made to, and functions used from, other R packages, including baseand recommended packages.

Finally, the summary statistics conclude with two further statistics ofafferent_pkg andefferent_pkg. These are package-internal measuresofafferent and efferentcouplingsbetween the files of a package. Theafferent couplings (ca) arenumbers ofincoming calls to each file of a package from functionsdefined elsewhere in the package, while theefferent couplings (ce)are numbers ofoutgoing calls from each file of a package to functionsdefined elsewhere in the package. These can be used to derive a measureof “internal package instability” as the ratio of efferent to totalcoupling (ce / (ce + ca)).

There are many other “raw” statistics returned by the mainpkgstats()function which are not represented inpkgstats_summary(). Themainpackagevignetteprovides further detail on the full results.

The following sub-sections provide further detail on theobjects,network, andexternal_call items, which could be used to extractadditional statistics beyond those described here.

Code of Conduct

Please note that this package is released with aContributor Code ofConduct. By contributing to thisproject, you agree to abide by its terms.

Contributors

All contributions to this project are gratefully acknowledged using theallcontributors packagefollowing theallcontributorsspecification. Contributions of any kind are welcome!

Code


mpadge

jhollist

jeroen

Bisaloo

thomaszwagerman

Issue Authors


helske

rpodcast

assignUser

GFabien

pawelru

stitam

willgearty

Issue Contributors


krlmlr

noamross

maelle

mdsumner

kellijohnson-NOAA

ScottClaessens

schneiderpy

About

Historical statistics of every R package ever

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors6


[8]ページ先頭

©2009-2025 Movatter.jp