- Notifications
You must be signed in to change notification settings - Fork85
NEWS
Versioning
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
- Breaking backward compatibility bumps the major (and resets the minor
and patch) - New additions without breaking backward compatibility bumps the minor
(and resets the patch) - Bug fixes and misc changes bumps the patch
sentimentr 2.5.0 - 2.6.1
BUG FIXES
plotreturned an error forsentimentobjects created bysentiment.get_sentences.data.framedue to the class assignments of the
output ('sentiment' was not assigned as a class) and thusplot.sentiment
was not called.combine_datacontained a bug in which data sets with extra columns were not
combined and resulted in an error (see#94).If a dataset was passed to
get_sentences()that had a column namedsentimentand was then passed tosentiment_by(), thesentimentfrom the
original data set was returned asave_sentimentnot thesentimentr
computed value.
NEW FEATURES
profanityadded as a means to assess the use of profanity in text.extract_profanity_termsadded to extract profanity terms from text.The remaining four Hu & Liu data sets (see
http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip) have been added in
addition to the Cannon reviews data set. The family of sentiment tagged data
from Hu & Liu now includes: "hu_liu_apex_reviews", "hu_liu_cannon_reviews",
"hu_liu_jukebox_reviews", "hu_liu_nikon_reviews", & "hu_liu_nokia_reviews".
CHANGES
- The
cannon_reviewsdata set has been renamed tohu_liu_cannon_reviewsto be
consistent with the otherhu_liu_data sets that have been added. This data
set is also now cleaner, excludes Hu & Liu's original categories that were some
times still visible. Cleaning includes better capitalization and removal of
spaces before punctuation to look less normalized. Additionally, thenumber
column is now calledreviewer_idto convey what the data actually is.
sentimentr 2.4.0 - 2.4.2
BUG FIXES
In
sentimentwhen there was a larger de-amplifier, negator, & polarized word
all in the same chunk the sentiment would equal 0. This occurred because the
de-amplifier weights below -1 are capped at -1 lower bound. To compute the
weight for de-amplifiers this was added with 1 and then multiplied by the
polity score. Adding 1 and -1 resulted in 0 * polarity = 0. This was spotted
thanks to Ashley Wysocki (see#80). In the case Ashley's example was with an
adversative conjunction which is treated as an extreme amplifier, which when
combined with a negator, is treated as a de-amplifier. This resulted in a -1
De-amplifier score. De-amplifiers are now capped at -.999 rather than -1 to
avoid this.Chunks containing adversative conjunctions were supposed to act in the following
way: "An adversative conjunction before the polarized word...up-weights the
cluster...An adversative conjunction after the polarized word down-weights the
cluster...". A bug was introduced in which up-weighting happened to the first
clause as well. This bug has been reversed. See#85.TheREADME contained a reference to themagritrr rather than the
magrittr package.
CHANGES
highlightnow writes the .html file to the temp directory rather than the
working directory by default.
sentimentr 2.3.0 - 2.3.2
BUG FIXES
- The README and
highlightfunction documentation both contained code that
produced an error. This is because all the data sets withinsentimentr
have been normalized to include the same columns, includingcannon_reviews.
The code that caused the error referred to a columnnumberwhich no longer
existed in the data set. This column now exists incannon_reviewsagain.
Spotted thanks to Tim Fisher.
CHANGES
Maintenance release to bring package up to date with the lexicon package API changes.
sentimentr 2.1.0 - 2.2.3
BUG FIXES
sentimentcontained a bug that caused sentences with multiple polarized
words and comma/semicolon/colon breaks to inappropriate replicate rows too many
times (a recycling error). This in turn caused the same polarized word to be
counted multiple times resulting in very extreme polarity values. This was
spotted by Lilly Wang.validate_sentimentcontained an error in the documentation; the predicted
and actual data were put into the wrong arguments for the first example.
NEW FEATURES
The default sentiment sentiment lookup table used withinsentimentr is now
lexicon::hash_sentiment_jockers_rinker, a combined and augmented version oflexicon::hash_sentiment_jockers(Jockers, 2017) & Rinker's augmentedlexicon::hash_sentiment_huliu(Hu & Liu, 2004) sentiment lookup tables.Five new sentiment scored data sets added:
kaggle_movie_reviews,nyt_articleshotel_reviews,crowdflower_self_driving_cars,crowdflower_products,crowdflower_deflategate,crowdflower_weather, &course_evaluationsfor
testing nd exploration.replace_emojiandreplace_emoji_identifierrexported from thetextclean
package for replacing emojis with word equivalents or an identifier token
that can be detected by thelexicon::hash_sentiment_emojipolarity table
within thesentimentfamily of functions.
MINOR FEATURES
sentimentpicks up theneutral.nonverb.likeargument. This allows the
user to treat specific non-verb uses of the word 'like' as neutral since 'like'
as a verb is usually when the word is polarized.combine_dataadded to easily combine trustedsentimentr sentiment
scored data sets.
CHANGES
The sentiment data sets have been reformatted to conform to one another. This
means columns have been renamed, ratings have been rescales to be zero as neutral,
and columns other thansentimentscore andtexthave been removed. This
makes it easier to compare and combine data sets.update_keynow allows adata.table object forxmeaninglexiconhash_sentiment_xxxpolarity tables can be combined. This is particularly
useful for combininghash_sentiment_emojiswith other polarity tables.
sentimentr 2.0.1
BUG FIXES
get_sentencesassigned the class to the data.frame when a data.frame was
passed but not to the text column, meaning the individual column could not be
passed tosentimentorsentiment_bywithout having sentence boundary
detection re-done. This has been fixed. See#53.
sentimentr 1.0.1 - 2.0.0
BUG FIXES
sentiment_attributesgave an incorrect count of words. This has been fixed
and number of tokens is reported as well now. Thanks to Siva Kottapalli for
catching this (see#42).extract_sentiment_termsdid not return positive, negative, and/or neutral
columns if these terms didn't exist in the data passed totext.varmaking it
difficult to use for programming. Thanks to Siva Kottapalli for
catching this (see#41).rescale_generalwould allowkeep.zerowhenlower>= 0 meaning the
original mid values were rescaled lower than the lowest values.
MINOR FEATURES
validate_sentimentpicks up Mean Directional Accuracy (MDA) and Mean
Absolute Rescaled Error (MARE) measures accuracy. These values are printed
for thevalidate_sentimentobject and can be accessed viaattributes.
CHANGES
- Manysentimentr functions performed sentence splitting (sentence boundary
disambiguation) internally. This made it (1) difficult to maintain the code,
(2) slowed the functions down and potentially increased overhead memory, and
(3) required a repeated cost of splitting the text every time one of these
functions was called. Sentence splitting is now handled vie thetextshape
package as the backend forget_sentences. It is recommended that the user
spits their data into sentences prior to using the sentiment functions. Using
a raw character vector still works but results in a warning. While this won't
break any code it may cause errors and is a fundamental shift in workflow,
thus the major bump to 2.0.0
sentimentr 0.5.0 - 1.0.0
BUG FIXES
- Previously
update_polarity_tableandupdate_valence_shifter_tablewere
accidentally not exported. This has been corrected.
NEW FEATURES
downweighted_zero_average,average_weighted_mixed_sentiment, andaverage_meanadded for use withsentiment_byto reweight
zero and negative values in the group by averaging (depending upon the
assumptions the analyst is making).general_rescaleadded as a means to rescale sentiment scores in a
generalized way.validate_sentimentadded as a means to assess sentiment model performance
against known sentiment scores.sentiment_attributesadded as a means to assess the rate that sentiment
attributes (attributes about polarized words and valence shifters) occur and
co-occur.
MINOR FEATURES
sentiment_bybecomes a method function that now acceptssentiment_by
andsentimentobjects fortext.varargument in addition to defaultcharacter.
IMPROVEMENTS
sentiment_bypicks up anaveraging.functionargument for performing the
group by averaging. The default usesdownweighted_zero_average, which
downweights zero values in the averaging (making them have less impact). To
get the old behavior back useaverage_meanas follows. There is also anaverage_weighted_mixed_sentimentavailable which upweights negative
sentences when the analysts suspects the speaker is likely to surround
negatives with positives (mixed) as a polite social convention but still the
affective state is negative.
CHANGES
The hash keys
polarity_table,valence_shifters_table, andsentiwordhave
been moved to thelexicon (https://github.com/trinker/lexicon) package in
order to make them more modular and maintainable. They have been renamed tohash_sentiment_huliu,hash_valence_shifters, andhash_sentiment_sentiword.The
replace_emoticon,replace_gradeandreplace_ratingfunctions have
been moved fromsentimentr to thetextclean package as these are
cleaning functions. This makes the functions more modular and generalizable
to all types of text cleaning. These functions are still imported and
exported bysentimentr.but.weightargument insentimentfunction renamed toadversative.weight
to better describe the function with a linguistics term.sentimentrnow uses the Jockers (2017) dictionary by default rather than the
Hu & Liu (2004). This may result in breaks to backwards compatibility,
hence the major version bump (1.0.0).
sentimentr 0.3.0 - 0.4.0
BUG FIXES
- Missing documentation for `but' conjunctions added to the documentation.
Spotted by Richard Watson (see#23).
NEW FEATURES
extract_sentiment_termsadded to enable users to extract the sentiment terms
from text aspolaritywould return in theqdap package.
MINOR FEATURES
update_polarity_tableandupdate_valence_shifter_tableadded to abstract
away thinking about thecomparisonargument toupdate_key.
sentimentr 0.2.0 - 0.2.3
BUG FIXES
Commas were not handled properly in some cases. This has been fixed (see#7).
highlightparsed sentences differently than the mainsentimentfunction
resulting in an error whenoriginal.textwas supplied that contained a colon
or semi-colon. Spotted by Patrick Carlson (see#2).
MINOR FEATURES
as_keyandupdate_keynow coerce the first column of thexargument
data.frame to lower case and warn if capital letters are found.
IMPROVEMENTS
A section on creating and updating dictionaries was added to the README:
https://github.com/trinker/sentimentr#making-and-updating-dictionariesplot.sentiment_byno longer color codes by grouping variables. This was
distracting and removed. A jitter + red average sentiment + boxplot visual
representation is used.
CHANGES
- Default sentiment and valence shifters get the following additions:
polarity_table: "excessively", 'overly', 'unduly', 'too much', 'too many',
'too often', 'i wish', 'too good', 'too high', 'too tough'valence_shifter_table: "especially"
sentimentr 0.1.0 - 0.1.3
BUG FIXES
get_sentencesconverted to lower case too early in the regex parsing,
resulting in missed sentence boundary detection. This has been corrected.highlightfailed for some occasions when usingoriginal.textbecause the
splitting algorithm forsentimentwas different.sentiment's split algorithm
now matches and is more accurate but at the cost of speed.
NEW FEATURES
emoticonsdictionary added. This is a simple dataset containing common
emoticons (adapted fromPopular Emoticon List)replace_emoticonfunction added to replace emoticons with word equivalents.get_sentences2added to allow for users that may want to get sentences from
text and retain case and non-sentence boundary periods. This should be
preferable in such instances where these features are deemed important to the
analysis at hand.highlightadded to allow positive/negative text highlighting.cannon_reviewsdata set added containing Amazon product reviews for the
Cannon G3 Camera compiled by Hu and Liu (2004).replace_ratingsfunction +ratingsdata set added to replace ratings.polarity_tablegets an upgrade with new positive and negative words to
improve accuracy.valence_shifters_tablepicks up a few non-traditional negators. Full list
includes: "could have", "would have", "should have", "would be",
"would suggest", "strongly suggest".is_keyandupdate_keyadded to test and easily update keys.gradesdictionary added. This is a simple dataset containing common
grades and word equivalents.replace_gradefunction added to replace grades with word equivalents.
IMPROVEMENTS
plot.sentimentnow uses...to pass parameters tosyuzhet'sget_transformed_values.as_key,is_key, &update_keyall pick up a logicalsentimentargument
that allows keys that have character y columns (2nd column).
sentimentr 0.0.1
This package is designed to quickly calculate text polarity sentiment at the
sentence level and optionally aggregate by rows or grouping variable(s).
Assets2
Uh oh!
There was an error while loading.Please reload this page.