trinker/sentimentrPublic

NotificationsYou must be signed in to change notification settings
Fork85
Star438

Dictionary based sentiment analysis that considers valence shifters

License

View license

438 stars 85 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 390 Commits
R		R
data		data
inst		inst
man		man
sentiment_data		sentiment_data
tests		tests
tools		tools
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS		NEWS
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md

Repository files navigation

sentimentr

sentimentr is designed to quickly calculate text polarity sentiment in theEnglish language at the sentence level and optionally aggregate by rows orgrouping variable(s).

sentimentr is a response to my own needs with sentiment detectionthat were not addressed by the currentR tools. My ownpolarityfunction in theqdap package is slower on larger data sets. It is adictionary lookup approach that tries to incorporate weighting forvalence shifters (negation and amplifiers/deamplifiers). Matthew Jockerscreated thesyuzhet packagethat utilizes dictionary lookups for the Bing, NRC, and Afinn methods aswell as a custom dictionary. He also utilizes a wrapper for theStanford coreNLP whichuses much more sophisticated analysis. Jocker's dictionary methods arefast but are more prone to error in the case of valence shifters.Jocker'saddressed thesecritiquesexplaining that the method is good with regard to analyzing generalsentiment in a piece of literature. He points to the accuracy of theStanford detection as well. In my own work I need better accuracy than asimple dictionary lookup; something that considers valence shifters yetoptimizes speed which the Stanford's parser does not. This leads to atrade off of speed vs. accuracy. Simply,sentimentr attempts tobalance accuracy and speed.

Why sentimentr

So what doessentimentrdo that other packages don't andwhy does it matter?

sentimentr attempts to take into account valence shifters (i.e.,negators, amplifiers (intensifiers), de-amplifiers (downtoners), andadversative conjunctions) while maintaining speed. Simply put,sentimentr is an augmented dictionary lookup. The next questionsaddress why it matters.

So what are these valence shifters?

Anegator flips the sign of a polarized word (e.g., "I donotlike it."). Seelexicon::hash_valence_shifters[y==1] for examples.Anamplifier (intensifier) increases the impact of a polarized word(e.g., "Ireally like it."). Seelexicon::hash_valence_shifters[y==2] for examples. Ade-amplifier(downtoner) reduces the impact of a polarized word (e.g., "Ihardly like it."). Seelexicon::hash_valence_shifters[y==3]for examples. Anadversative conjunction overrules the previousclause containing a polarized word (e.g., "I like itbut it'snot worth it."). Seelexicon::hash_valence_shifters[y==4] forexamples.

Do valence shifters really matter?

Well valence shifters affect the polarized words. In the case ofnegators andadversative conjunctions the entire sentiment of theclause may be reversed or overruled. So if valence shifters occurfairly frequently a simple dictionary lookup may not be modeling thesentiment appropriately. You may be wondering how frequently thesevalence shifters co-occur with polarized words, potentially changing,or even reversing and overruling the clause's sentiment. The tablebelow shows the rate of sentence level co-occurrence of valenceshifters with polarized words across a few types of texts.

Text	Negator	Amplifier	Deamplifier	Adversative
Cannon reviews	21%	23%	8%	12%
2012 presidential debate	23%	18%	1%	11%
Trump speeches	12%	14%	3%	10%
Trump tweets	19%	18%	4%	4%
Dylan songs	4%	10%	0%	4%
Austen books	21%	18%	6%	11%
Hamlet	26%	17%	2%	16%

Indeednegators appear ~20% of the time a polarized word appears in asentence. Conversely,adversative conjunctions appear with polarizedwords ~10% of the time. Not accounting for the valence shifters couldsignificantly impact the modeling of the text sentiment.

Thescript to replicate the frequencyanalysis,shown in the table above, can be accessed via:

val_shift_freq <- system.file("the_case_for_sentimentr/valence_shifter_cooccurrence_rate.R", package = "sentimentr")file.copy(val_shift_freq, getwd())

Functions

There are two main functions (top 2 in table below) insentimentrwith several helper functions summarized in the table below:

Function	Description
`sentiment`	Sentiment at the sentence level
`sentiment_by`	Aggregated sentiment by group(s)
`profanity`	Profanity at the sentence level
`profanity_by`	Aggregated profanity by group(s)
`emotion`	Emotion at the sentence level
`emotion_by`	Aggregated emotion by group(s)
`uncombine`	Extract sentence level sentiment from`sentiment_by`
`get_sentences`	Regex based string to sentence parser (or get sentences from`sentiment`/`sentiment_by`)
`replace_emoji`	repalcement
`replace_emoticon`	Replace emoticons with word equivalent
`replace_grade`	Replace grades (e.g., "A+") with word equivalent
`replace_internet_slang`	replacment
`replace_rating`	Replace ratings (e.g., "10 out of 10", "3 stars") with word equivalent
`as_key`	Coerce a`data.frame` lexicon to a polarity hash key
`is_key`	Check if an object is a hash key
`update_key`	Add/remove terms to/from a hash key
`highlight`	Highlight positive/negative sentences as an HTML document
`general_rescale`	Generalized rescaling function to rescale sentiment scoring
`sentiment_attribute`	Extract the sentiment based attributes from a text
`validate_sentiment`	Validate sentiment score sign against known results

The Equation

The equation below describes the augmented dictionary method ofsentimentr that may give better results than a simple lookupdictionary approach that does not consider valence shifters. Theequation used by the algorithm to assign value to polarity of eachsentence fist utilizes a sentiment dictionary (e.g., Jockers,(2017)) to tag polarized words.Each paragraph(p_i = {s₁, s₂, ..., s_n})composed of sentences, is broken into element sentences(s_i, j = {w₁, w₂, ..., w_n})wherew are the words within sentences. Each sentence(s_j) is broken into a an ordered bag of words.Punctuation is removed with the exception of pause punctuations (commas,colons, semicolons) which are considered a word within the sentence. Iwill denote pause words asc**w (comma words) for convenience. We canrepresent these words as an i,j,k notation asw_i, j, k. For examplew_3, 2, 5 would bethe fifth word of the second sentence of the third paragraph. While Iuse the term paragraph this merely represent a complete turn of talk.For example it may be a cell level response in a questionnaire composedof sentences.

The words in each sentence (w_i, j, k) are searchedand compared to a dictionary of polarized words (e.g., a combined andaugmented version of Jocker's (2017) [originally exported by thesyuzhet package] & Rinker'saugmented Hu & Liu (2004) dictionaries in thelexicon package).Positive (w_i, j, k⁺) and negative(w_i, j, k⁻) words are tagged with a +1 and−1 respectively (or other positive/negative weighting if the userprovides the sentiment dictionary). I will denote polarized words asp**w for convenience. These will form a polar cluster(c_i, j, l) which is a subset of the a sentence(c_i, j, l ⊆ s_i, j).

The polarized context cluster (c_i, j, l) of words ispulled from around the polarized word (p**w) and defaults to 4 wordsbefore and two words afterp**w to be considered as valence shifters.The cluster can be represented as(c_i, j, l = {p**w_{i, j, k − n**b}, ..., p**w_i, j, k, ..., p**w_{i, j, k − n**a}}),wheren**b &n**a are the parametersn.before andn.after set bythe user. The words in this polarized context cluster are tagged asneutral (w_i, j, k⁰), negator(w_i, j, kⁿ), amplifier [intensifier](w_i, j, k^a), or de-amplifier[downtoner] (w_i, j, k^d). Neutral wordshold no value in the equation but do affect word count (n). Eachpolarized word is then weighted (w) based on the weights from thepolarity_dt argument and then further weighted by the function andnumber of the valence shifters directly surrounding the positive ornegative word (p**w). Pause (c**w) locations (punctuation thatdenotes a pause including commas, colons, and semicolons) are indexedand considered in calculating the upper and lower bounds in thepolarized context cluster. This is because these marks indicate a changein thought and words prior are not necessarily connected with wordsafter these punctuation marks. The lower bound of the polarized contextcluster is constrained tomax{p**w_{i, j, k − n**b}, 1, max{c**w_i, j, k < p**w_i, j, k}}and the upper bound is constrained tomin{p**w_{i, j, k + n**a}, w_i, j**n, min{c**w_i, j, k > p**w_i, j, k}}wherew_i, j**n is the number of words in the sentence.

The core value in the cluster, the polarized word is acted upon byvalence shifters. Amplifiers increase the polarity by 1.8 (.8 is thedefault weight (z)). Amplifiers(w_i, j, k^a) become de-amplifiers if thecontext cluster contains an odd number of negators(w_i, j, kⁿ). De-amplifiers work todecrease the polarity. Negation(w_i, j, kⁿ) acts onamplifiers/de-amplifiers as discussed but also flip the sign of thepolarized word. Negation is determined by raising −1 to the power of thenumber of negators (w_i, j, kⁿ) plus 2.Simply, this is a result of a belief that two negatives equal apositive, 3 negatives a negative, and so on.

The adversative conjunctions (i.e., 'but', 'however', and 'although')also weight the context cluster. An adversative conjunction before thepolarized word(w_{adversative conjunction}, ..., w_i, j, k^p)up-weights the cluster by 1 +z₂ * {|w_{adversative conjunction}|,...,w_i, j, k^p}(.85 is the default weight (z₂) where|w_{adversative conjunction}|are the number of adversative conjunctions before the polarized word).An adversative conjunction after the polarized word down-weights thecluster by 1 +{w_i, j, k^p, ..., |w_{adversative conjunction}|* − 1}*z₂.This corresponds to the belief that an adversative conjunction makes thenext clause of greater values while lowering the value placed on theprior clause.

The researcher may provide a weight (z) to be utilized withamplifiers/de-amplifiers (default is .8; de-amplifier weight isconstrained to −1 lower bound). Last, these weighted context clusters(c_i, j, l) are summed (c′_i, j) anddivided by the square root of the word count(√w_i, j**n) yielding anunbounded polarity score(δ_i, j) for each sentence.

δ_i**j =c'_i**j/√w_ijn

Where:

c′_i, j = ∑((1 + w_amp + w_deamp)⋅w_i, j, k^p(−1)^{2 + w_neg})

w_amp = ∑(w_neg ⋅ (z ⋅ w_i, j, k^a))

w_deamp = max(w_deamp′, −1)

w_deamp′ = ∑(z(−w_neg ⋅ w_i, j, k^a + w_i, j, k^d))

w_b = 1 + z₂ * w_b′

w_b′ = ∑(|w_{adversative conjunction}|,...,w_i, j, k^p, w_i, j, k^p, ..., |w_{adversative conjunction}|* − 1)

w_neg = (∑w_i, j, kⁿ )mod 2

To get the mean of all sentences (s_i, j) within aparagraph/turn of talk (p_i) simply take the averagesentiment scorep_{i, δ_i, j} = 1/n ⋅ ∑δ_i, j or use an available weighted average (the defaultaverage_weighted_mixed_sentiment which upweights the negative valuesin a vector while also downweighting the zeros in a vector oraverage_downweighted_zero which simply downweights the zero polarityscores).

Installation

To download the development version ofsentimentr:

Download thezipball ortarball, decompressand runR CMD INSTALL on it, or use thepacman package to installthe development version:

if (!require("pacman")) install.packages("pacman")pacman::p_load_current_gh("trinker/lexicon", "trinker/sentimentr")

Examples

if (!require("pacman")) install.packages("pacman")pacman::p_load(sentimentr, dplyr, magrittr)

Preferred Workflow

Here is a basicsentiment demo. Notice that the first thing you shoulddo is to split your text data into sentences (a process called sentenceboundary disambiguation) via theget_sentences function. This can behandled withinsentiment (i.e., you can pass a raw character vector)but it slows the function down and should be done one time rather thanevery time the function is called. Additionally, a warning will bethrown if a larger raw character vector is passed. The preferredworkflow is to spit the text into sentences withget_sentences beforeany sentiment analysis is done.

mytext <- c(    'do you like it?  But I hate really bad dogs',    'I am the best friend.',    'Do you really like it?  I\'m not a fan')mytext <- get_sentences(mytext)sentiment(mytext)##    element_id sentence_id word_count  sentiment## 1:          1           1          4  0.2500000## 2:          1           2          6 -1.8677359## 3:          2           1          5  0.5813777## 4:          3           1          5  0.4024922## 5:          3           2          4  0.0000000

To aggregate by element (column cell or vector element) usesentiment_by withby = NULL.

mytext <- c(    'do you like it?  But I hate really bad dogs',    'I am the best friend.',    'Do you really like it?  I\'m not a fan')mytext <- get_sentences(mytext)sentiment_by(mytext)##    element_id word_count       sd ave_sentiment## 1:          1         10 1.497465    -0.8088680## 2:          2          5       NA     0.5813777## 3:          3          9 0.284605     0.2196345

To aggregate by grouping variables usesentiment_by using thebyargument.

(out <- with(    presidential_debates_2012,     sentiment_by(        get_sentences(dialogue),         list(person, time)    )))##        person   time word_count        sd ave_sentiment##  1:     OBAMA time 1       3599 0.2535006    0.12256892##  2:     OBAMA time 2       7477 0.2509177    0.11217673##  3:     OBAMA time 3       7243 0.2441394    0.07975688##  4:    ROMNEY time 1       4085 0.2525596    0.10151917##  5:    ROMNEY time 2       7536 0.2205169    0.08791018##  6:    ROMNEY time 3       8303 0.2623534    0.09968544##  7:   CROWLEY time 2       1672 0.2181662    0.19455290##  8:    LEHRER time 1        765 0.2973360    0.15473364##  9:  QUESTION time 2        583 0.1756778    0.03197751## 10: SCHIEFFER time 3       1445 0.2345187    0.08843478

Tidy Approach

Or if you prefer a more tidy approach:

library(magrittr)library(dplyr)presidential_debates_2012 %>%    dplyr::mutate(dialogue_split = get_sentences(dialogue)) %$%    sentiment_by(dialogue_split, list(person, time))##        person   time word_count        sd ave_sentiment##  1:     OBAMA time 1       3599 0.2535006    0.12256892##  2:     OBAMA time 2       7477 0.2509177    0.11217673##  3:     OBAMA time 3       7243 0.2441394    0.07975688##  4:    ROMNEY time 1       4085 0.2525596    0.10151917##  5:    ROMNEY time 2       7536 0.2205169    0.08791018##  6:    ROMNEY time 3       8303 0.2623534    0.09968544##  7:   CROWLEY time 2       1672 0.2181662    0.19455290##  8:    LEHRER time 1        765 0.2973360    0.15473364##  9:  QUESTION time 2        583 0.1756778    0.03197751## 10: SCHIEFFER time 3       1445 0.2345187    0.08843478

Note that you can skip thedplyr::mutate step by usingget_sentenceson adata.frame as seen below:

presidential_debates_2012 %>%    get_sentences() %$%    sentiment_by(dialogue, list(person, time))##        person   time word_count        sd ave_sentiment##  1:     OBAMA time 1       3599 0.2535006    0.12256892##  2:     OBAMA time 2       7477 0.2509177    0.11217673##  3:     OBAMA time 3       7243 0.2441394    0.07975688##  4:    ROMNEY time 1       4085 0.2525596    0.10151917##  5:    ROMNEY time 2       7536 0.2205169    0.08791018##  6:    ROMNEY time 3       8303 0.2623534    0.09968544##  7:   CROWLEY time 2       1672 0.2181662    0.19455290##  8:    LEHRER time 1        765 0.2973360    0.15473364##  9:  QUESTION time 2        583 0.1756778    0.03197751## 10: SCHIEFFER time 3       1445 0.2345187    0.08843478

Plotting

Plotting at Aggregated Sentiment

plot(out)

Plotting at the Sentence Level

Theplot method for the classsentiment usessyuzhet'sget_transformed_values combined withggplot2 to make a reasonable,smoothed plot for the duration of the text based on percentage, allowingfor comparison between plots of different texts. This plot gives theoverall shape of the text's sentiment. The user can seesyuzhet::get_transformed_values for more details.

plot(uncombine(out))

Making and Updating Dictionaries

It is pretty straight forward to make or update a new dictionary(polarity or valence shifter). To create a key from scratch the userneeds to create a 2 columndata.frame, with words on the left andvalues on the right (see?lexicon::hash_sentiment_jockers_rinker &?lexicon::hash_valence_shifters for what the values mean). Note thatthe words need to be lower cased. Here I show an exampledata.frameready for key conversion:

set.seed(10)key <- data.frame(    words = sample(letters),    polarity = rnorm(26),    stringsAsFactors = FALSE)

This is not yet a key.sentimentr provides theis_key function totest if a table is a key.

is_key(key)## [1] FALSE

It still needs to bedata.table-ified. Theas_key function coercesadata.frame to adata.table with the left column namedx andthe right column namedy. It also checks the key against another keyto make sure there is not overlap using thecompare argument. Bydefaultas_key checks againstvalence_shifters_table, assuming theuser is creating a sentiment dictionary. If the user is creating avalence shifter key then a sentiment key needs to be passed tocompareinstead and set the argumentsentiment = FALSE. Below I coercekeyto a dictionary thatsentimentr can use.

mykey <- as_key(key)

Now we can check thatmykey is a usable dictionary:

is_key(mykey)## [1] TRUE

The key is ready for use:

sentiment_by("I am a human.", polarity_dt = mykey)##    element_id word_count sd ave_sentiment## 1:          1          4 NA    -0.7594893

You can see the values of a key that correspond to a word usingdata.table syntax:

mykey[c("a", "b")][[2]]## [1] -0.2537805 -0.1951504

Updating (adding or removing terms) a key is also useful. Theupdate_key function allows the user to add or drop terms via thex(add adata.frame) anddrop (drop a term) arguments. Below I dropthe "a" and "h" terms (notice there are now 24 rows rather than 26):

mykey_dropped <- update_key(mykey, drop = c("a", "h"))nrow(mykey_dropped)## [1] 24sentiment_by("I am a human.", polarity_dt = mykey_dropped)##    element_id word_count sd ave_sentiment## 1:          1          4 NA     -0.632599

Next I add the terms "dog" and "cat" as adata.frame with sentimentvalues:

mykey_added <- update_key(mykey, x = data.frame(x = c("dog", "cat"), y = c(1, -1)))## Warning in as_key(x, comparison = comparison, sentiment = sentiment): Column 1 was a factor...## Converting to character.nrow(mykey_added)## [1] 28sentiment("I am a human. The dog.  The cat", polarity_dt = mykey_added)##    element_id sentence_id word_count  sentiment## 1:          1           1          4 -0.7594893## 2:          1           2          2  0.7071068## 3:          1           3          2 -0.7071068

Annie Swafford's Examples

AnnieSwaffordcritiqued Jocker's approach to sentiment and gave the following examplesof sentences (ase for Annie Swafford example). Here I test each ofJocker's 4 dictionary approaches (syuzhet, Bing, NRC, Afinn), hisStanford wrapper (note I use my ownGitHub Stanford wrapperpackage based off of Jocker'sapproach as it works more reliably on my own Windows machine), theRSentiment package, thelookup basedSentimentAnalysispackage, themeanr package(written in C level code), and my own algorithm with default combinedJockers (2017) & Rinker's augmented Hu & Liu (2004) polarity lexicons aswell as Hu & Liu (2004) and Baccianella, Esuli and Sebastiani's(2010) SentiWord lexicons availablefrom thelexicon package.

if (!require("pacman")) install.packages("pacman")pacman::p_load_gh("trinker/sentimentr", "trinker/stansent", "sfeuerriegel/SentimentAnalysis", "wrathematics/meanr")pacman::p_load(syuzhet, qdap, microbenchmark, RSentiment)ase <- c(    "I haven't been sad in a long time.",    "I am extremely happy today.",    "It's a good day.",    "But suddenly I'm only a little bit happy.",    "Then I'm not happy at all.",    "In fact, I am now the least happy person on the planet.",    "There is no happiness left in me.",    "Wait, it's returned!",    "I don't feel so bad after all!")syuzhet <- setNames(as.data.frame(lapply(c("syuzhet", "bing", "afinn", "nrc"),    function(x) get_sentiment(ase, method=x))), c("jockers", "bing", "afinn", "nrc"))SentimentAnalysis <- apply(analyzeSentiment(ase)[c('SentimentGI', 'SentimentLM', 'SentimentQDAP') ], 2, round, 2)colnames(SentimentAnalysis) <- gsub('^Sentiment', "SA_", colnames(SentimentAnalysis))left_just(data.frame(    stanford = sentiment_stanford(ase)[["sentiment"]],    sentimentr_jockers_rinker = round(sentiment(ase, question.weight = 0)[["sentiment"]], 2),    sentimentr_jockers = round(sentiment(ase, lexicon::hash_sentiment_jockers, question.weight = 0)[["sentiment"]], 2),        sentimentr_huliu = round(sentiment(ase, lexicon::hash_sentiment_huliu, question.weight = 0)[["sentiment"]], 2),        sentimentr_sentiword = round(sentiment(ase, lexicon::hash_sentiment_sentiword, question.weight = 0)[["sentiment"]], 2),        RSentiment = calculate_score(ase),     SentimentAnalysis,    meanr = score(ase)[['score']],    syuzhet,    sentences = ase,    stringsAsFactors = FALSE), "sentences")[1] "Processing sentence: i have not been sad in a long time"[1] "Processing sentence: i am extremely happy today"[1] "Processing sentence: its a good day"[1] "Processing sentence: but suddenly im only a little bit happy"[1] "Processing sentence: then im not happy at all"[1] "Processing sentence: in fact i am now the least happy person on the planet"[1] "Processing sentence: there is no happiness left in me"[1] "Processing sentence: wait its returned"[1] "Processing sentence: i do not feel so bad after all"  stanford sentimentr_jockers_rinker sentimentr_jockers sentimentr_huliu1     -0.5                      0.18               0.18             0.352        1                       0.6                0.6              0.83      0.5                      0.38               0.38              0.54     -0.5                         0                  0                05     -0.5                     -0.31              -0.31            -0.416     -0.5                      0.04               0.04             0.067     -0.5                     -0.28              -0.28            -0.388        0                     -0.14              -0.14                09     -0.5                      0.28               0.28             0.38  sentimentr_sentiword RSentiment SA_GI SA_LM SA_QDAP meanr jockers bing1                 0.18          1 -0.25     0   -0.25    -1    -0.5   -12                 0.65          1  0.33  0.33       0     1    0.75    13                 0.32          1   0.5   0.5     0.5     1    0.75    14                    0          0     0  0.25    0.25     1    0.75    15                -0.56         -1     1     1       1     1    0.75    16                 0.11          1  0.17  0.17    0.33     1    0.75    17                -0.05          1   0.5   0.5     0.5     1    0.75    18                -0.14         -1     0     0       0     0   -0.25    09                 0.24          0 -0.33 -0.33   -0.33    -1   -0.75   -1  afinn nrc sentences                                              1    -2   0 I haven't been sad in a long time.                     2     3   1 I am extremely happy today.                            3     3   1 It's a good day.                                       4     3   1 But suddenly I'm only a little bit happy.              5     3   1 Then I'm not happy at all.                             6     3   1 In fact, I am now the least happy person on the planet.7     2   1 There is no happiness left in me.                      8     0  -1 Wait, it's returned!                                   9    -3  -1 I don't feel so bad after all!

Also of interest is the computational time used by each of thesemethods. To demonstrate this I increased Annie's examples by 100replications andmicrobenchmark on a few iterations (Stanford takesso long I didn't extend to more). Note that if a text needs to be brokeninto sentence partssyuzhet has theget_sentences function thatuses theopenNLP package, this is a time expensive task.sentimentr uses a much faster regex based approach that is nearly asaccurate in parsing sentences with a much lower computational time. Wesee thatRSentiment and Stanford take the longest time whilesentimentr andsyuzhet are comparable depending upon lexiconused.meanr is lighting fast.SentimentAnalysis is a bit slowerthan other methods but is returning 3 scores from 3 differentdictionaries. I do not testRSentiment because it causes an out ofmemory error.

ase_100 <- rep(ase, 100) stanford <- function() {sentiment_stanford(ase_100)}sentimentr_jockers_rinker <- function() sentiment(ase_100, lexicon::hash_sentiment_jockers_rinker)sentimentr_jockers <- function() sentiment(ase_100, lexicon::hash_sentiment_jockers)sentimentr_huliu <- function() sentiment(ase_100, lexicon::hash_sentiment_huliu)sentimentr_sentiword <- function() sentiment(ase_100, lexicon::hash_sentiment_sentiword)     RSentiment <- function() calculate_score(ase_100)     SentimentAnalysis <- function() analyzeSentiment(ase_100)meanr <- function() score(ase_100)syuzhet_jockers <- function() get_sentiment(ase_100, method="syuzhet")syuzhet_binn <- function() get_sentiment(ase_100, method="bing")syuzhet_nrc <- function() get_sentiment(ase_100, method="nrc")syuzhet_afinn <- function() get_sentiment(ase_100, method="afinn")     microbenchmark(    stanford(),    sentimentr_jockers_rinker(),    sentimentr_jockers(),    sentimentr_huliu(),    sentimentr_sentiword(),    #RSentiment(),     SentimentAnalysis(),    syuzhet_jockers(),    syuzhet_binn(),     syuzhet_nrc(),    syuzhet_afinn(),    meanr(),    times = 3)Unit: milliseconds                        expr          min           lq         mean                  stanford() 20225.158418 20609.912899 23748.607689 sentimentr_jockers_rinker()   283.271569   283.391307   285.273047        sentimentr_jockers()   224.436569   228.487136   235.022980          sentimentr_huliu()   255.438460   260.156352   261.994973      sentimentr_sentiword()  1048.496476  1060.058681  1064.804513         SentimentAnalysis()  4267.380620  4335.857740  4369.068442           syuzhet_jockers()   342.764273   346.408800   349.115379              syuzhet_binn()   258.453721   267.449255   271.441450               syuzhet_nrc()   642.814135   648.150176   653.361347             syuzhet_afinn()   118.191289   120.576642   122.294740                     meanr()     1.172578     1.317333     1.795786       median          uq          max neval 20994.667381 25510.33232 30025.997269     3   283.511045   286.27379   289.036528     3   232.537703   240.31619   248.094669     3   264.874245   265.27323   265.672214     3  1071.620886  1072.95853  1074.296176     3  4404.334860  4419.91235  4435.489845     3   350.053327   352.29093   354.528537     3   276.444790   277.93532   279.425840     3   653.486217   658.63495   663.783689     3   122.961995   124.34647   125.730937     3     1.462088     2.10739     2.752692     3

Comparing sentimentr, syuzhet, meanr, and Stanford

The accuracy of an algorithm weighs heavily into the decision as to whatapproach to take in sentiment detection. I have selectedalgorithms/packages that stand out as fast and/or accurate to performbenchmarking on actual data. Thesyuzhet package provides multipledictionaries with a general algorithm to compute sentiment scores.Likewise,sentimentr uses a general algorithm but uses thelexicon package's dictionaries.syuzhet provides 4 dictionarieswhilesentimentr useslexicon's 9 dictionaries and can beextended easily other dictionaries including the 4 dictionaries from thesyuzhet package.meanr is a very fast algorithm. The followvisualization provides the accuracy of these approaches in comparison toStanford'sJava based implementation of sentiment detection. Thevisualization is generated from testing on three reviews data sets fromKotzias, Denil, De Freitas, & Smyth (2015). These authors utilized thethree 1000 element data sets from:

amazon.com
imdb.com
yelp.com

The data sets are hand scored as either positive or negative. Thetesting here usesMean Directional Accuracy(MDA)and merely matches the sign of the algorithm to the human coded outputto determine accuracy rates.

Kotzias, D., Denil, M., De Freitas, N., & Smyth,P. (2015).Fromgroup to individual labels using deep features. Proceedings of the21th ACM SIGKDD International Conference on Knowledge Discovery andData Mining. 597-606.http://mdenil.com/media/papers/2015-deep-multi-instance-learning.pdf

The bar graph on the left shows the accuracy rates for the varioussentiment set-ups in the three review contexts. The rank plot on theright shows how the rankings for the methods varied across the threereview contexts.

The take away here seems that, unsurprisingly, Stanford's algorithmconsistently outscoressentimentr,syuzhet, andmeanr. Thesentimentr approach loaded with the Jockers' customsyuzhetdictionary is a top pick for speed and accuracy. In addition to Jockers'custom dictionary thebing dictionary also performs well within boththesyuzhet andsentimentr algorithms. Generally, thesentimentr algorithm out performssyuzhet when theirdictionaries are comparable.

It is important to point out that this is a small sample data set thatcovers a narrow range of uses for sentiment detection. Jockers'syuzhet was designed to be applied across book chunks and it is, tosome extent, unfair to test it out of this context. Still this initialanalysis provides a guide that may be of use for selecting the sentimentdetection set up most applicable to the reader's needs.

The reader may access the R script used to generate this visual via:

testing <- system.file("sentiment_testing/sentiment_testing.R", package = "sentimentr")file.copy(testing, getwd())

In the figure below we compare raw table counts as a heat map, plottingthe predicted values from the various algorithms on the x axis versusthe human scored values on the y axis.

Across all three contexts, notice that the Stanford coreNLP algorithm isbetter at:

Detecting negative sentiment as negative
Discrimination (i.e., reducing neutral assignments)

The Jockers, Bing, Hu & Lu, and Afinn dictionaries all do well withregard to not assigning negative scores to positive statements, butperform less well in the reverse, often assigning positive scores tonegative statements, though Jockers' dictionary outperforms the others.We can now see that the reason for the NRC's poorer performance inaccuracy rate above is its inability to discriminate. The Sentiworddictionary does well at discriminating (like Stanford's coreNLP) butlacks accuracy. We can deduce two things from this observation:

Larger dictionaries discriminate better (Sentiword [n = 20,093]vs. Hu & Lu [n = 6,874])
The Sentiword dictionary may have words with reversed polarities

A reworking of the Sentiword dictionary may yield better results for adictionary lookup approach to sentiment detection, potentially,improving on discrimination and accuracy.

The reader may access the R script used to generate this visual via:

testing2 <- system.file("sentiment_testing/raw_results.R", package = "sentimentr")file.copy(testing2, getwd())

Text Highlighting

The user may wish to see the output fromsentiment_by line by linewith positive/negative sentences highlighted. Thehighlight functionwraps asentiment_by output to produces a highlighted HTML file(positive = green; negative = pink). Here we look at three randomreviews from Hu and Liu's (2004) Cannon G3 Camera Amazon productreviews.

library(magrittr)library(dplyr)set.seed(2)hu_liu_cannon_reviews %>%    filter(review_id %in% sample(unique(review_id), 3)) %>%    mutate(review = get_sentences(text)) %$%    sentiment_by(review, review_id) %>%    highlight()