Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Word counts and readability statistics in R markdown documents

License

NotificationsYou must be signed in to change notification settings

benmarwick/wordcountaddin

Repository files navigation

Last-changedateminimal R versionLicenceTravis-CI Build Statuscodecov.ioORCiD

This R package is anRStudioaddin to count words andcharacters in text in anR markdowndocument. It also has a function to compute readability statistics soyou can get an indication of how easy or difficult your document is toread.

You can count words in your Rmd file in three ways:

  • In a selection of text in your active Rmd, by selecting some textwith your mouse in RStudio and using the Wordcount Addin
  • All the words in your active Rmd in RStudio, by using the WordcountAddin with no text selected
  • All the words in an Rmd file, directly using theword_countfunction from the console or command line (RStudio not required),and specifiying the filename as an argument to the function (e.g.wordcountaddin::word_count("my_file.Rmd")). This will give you asingle integer result, rather than the Markdown table that the otherfunctions return.

Independent of an Rmd file, you can also count words in a charactervector from the console using thetext_stats_chr function (and thereisreadability_chr for readability).

Word count

When counting words in the text of your Rmd document, these things willbe ignored:

  • YAML front matter
  • code chunks and inline code
  • text in HTML comment tags:<!-- text -->
  • HTML tags in the text:<br>,</br>
  • inline URLs in this format:[text of link](url)
  • images with captions in this format:![this is the caption](/path/to/image.png)
  • header level indicators such as# and##, etc.

And because my regex is quite simple, the word count function may alsoignore parts of your actual text that resemble these things.

The word count will include text in headers, block quotations, verbatimcode blocks, tables, raw LaTeX and raw HTML.

In general, there are numerous ways to count words, with no widelyaccepted standard method. The variety of methods is due to differencesin the definitions of a word and a sentence. Run?stringi::stri_stats_latex and?koRpus::describe to learn more aboutthe word counting methods.

For this addin I’ve included two methods, mostly out of curiosity to seehow they differ from each other. I use functions from thestringiandkoRpuspackages. If you’re curious, you can compare the results you get withthis addin to an online tool such ashttp://wordcounttools.com/.

The output of theWord count function is a markdown table in your Rconsole that might look like this:

|Method          |koRpus      |stringi       ||:---------------|:-----------|:-------------||Word count      |107         |104           ||Character count |604         |603           ||Sentence count  |10          |Not available ||Reading time    |0.5 minutes |0.5 minutes   |

If you want to reuse these results in other R functions, you can use anunexported function like thiswordcountaddin:::text_stats_fn_(text),wheretext is a character vector of your text (with length one, ie.all your text in a single character string). The output will be a listobject, and will include several other items not shown in the markdowntable.

Readability

The readability function ignores all the same parts of the text as theword count function, and then computes the values of a bunch ofreadabilitystatistics.

Most of these readability measurements aim to approximate the years ofeducation required to understand your text. They look at the number ofcharacters and syllables per word, the number of words per sentence, andso on. They don’t analyse the meaning of the words. A score of around10-12 is roughly the reading level on completion of high school in theUS. These stats are computed by thekoRpuspackage.

There about 27 measurements that this readability function returns(depending on how long your text is), including the AutomatedReadability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level,and the Simple Measure of Gobbledygook (SMOG). For the full list ofreadability measurements that are returned by the readability function,run?koRpus::readability. That help page also shows the formulae andcitations for each statistic (and an additional 20-odd other readabilitystatistics not used here).

Readability stats are, of course, no substitute for criticalself-reflection on the effectiveness of your writing at communicatingideas and information. To help with that, readStyle: Toward Clarityand Grace.

The output of thereadability function is a markdown table in your Rconsole that might look like this:

|index                 |flavour     |raw   |grade |age  ||:---------------------|:-----------|:-----|:-----|:----||ARI                   |            |      |2.31  |     ||Coleman-Liau          |            |66    |4.91  |     ||Danielson-Bryan DB1   |            |6.46  |      |     ||Danielson-Bryan DB2   |            |60.39 |6     |     ||Dickes-Steiwer        |            |53.07 |      |     ||ELF                   |            |1.83  |      |     ||Farr-Jenkins-Paterson |            |66.81 |8-9   |     ||Flesch                |en (Flesch) |69.57 |8-9   |     ||Flesch-Kincaid        |            |      |4.85  |9.8  ||FOG                   |            |      |7.84  |     ||FORCAST               |            |      |10.28 |15.3 ||Fucks                 |            |23.38 |4.83  |     ||Linsear-Write         |            |      |2.35  |     ||LIX                   |            |32.41 |< 5   |     ||nWS1                  |            |      |4.19  |     ||nWS2                  |            |      |4.72  |     ||nWS3                  |            |      |4.14  |     ||nWS4                  |            |      |3.64  |     ||RIX                   |            |1.42  |5     |     ||SMOG                  |            |      |8.08  |13.1 ||Strain                |            |2.44  |      |     ||TRI                   |            |-94   |      |     ||Tuldava               |            |2.57  |      |     ||Wheeler-Smith         |            |18.33 |2     |     |

Similar to theword count function, if you want to reuse these resultsin other R functions, you can use an unexported function like thiswordcountaddin:::readability_fn_(text), wheretext is a charactervector of your text (with length one, ie. all your text in a singlecharacter string). The output will be a list object with slightly moredetail than the summary table above.

Inspiration for this addin came fromjadd andWrapRmd.

How to install

Install withdevtools::install_github("benmarwick/wordcountaddin", type = "source", dependencies = TRUE)

Go toTools > Addins in RStudio to select and configure addins.

How to use

  1. Open a Rmd file in RStudio.
  2. Select some text, it can include YAML, code chunks and inline code
  3. Go toTools > Addins in RStudio and click onWord count orReadability. ComputingReadability may take a few moments onlonger documents because it has to count syllables for some of thestats.
  4. Look in the console for the output

Feedback, contributing, etc.

Pleaseopen anissue if youfind something that doesn’t work as expected. Note that this project isreleased with aGuide to Contributing and aContributor Code of Conduct. By participating in thisproject you agree to abide by its terms.

About

Word counts and readability statistics in R markdown documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp