- Notifications
You must be signed in to change notification settings - Fork85
Don't sort by sentiment in sentiment_by#13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Thanks for the PR and interest in sentimentr... Still debating on this. I use sentimentr for analysis and usually care most about the order of the output. Just like with plots, tables shine when they are ordered by the variable of interest. I think you have a specific reason of why it's screwing something up for you but am unsure what that is. I would appreciate a stronger argument for changing the behavior. I'd be more inclined to add this as an argument but retain the default behavior. |
I don't have a "stronger" argument than the one I have already said. If I do processing by a factor variable G, then I expect my output to be ordered by original order of data or by the order of the grouped factor. That's what all group_by processor do in R. The fact that the output is sorted by something else is unexpected unexpected. Groups are almost certainly have a meaningful order so, I think, preserving that order is desirable independently of the context. In my case the order is given by time. I have thousands sentences which are grouped by messages which in turn are ordered in time. Sorting it by sentiment doesn't make much sense AFAIC.
It's up to you to decide of course, but I think sorting is an 'extra" feature. It adds up to the default computation, so it should an extra optional option. Otherwise users would need to revert that sorting computation. |
I have been bitten by this twice this week. You're right. |
Usually grouping operators preserve the order of the groups. In this sense the behavior of sentiment_by is non-intuitive. Especially with data.table
bywould better have the same semantics as in data.table.While sorting by sentiment is potentially useful in some contexts, there is virtually no advantage for it with current plotting functionality. Plotting by grouping values is probably more meaningful in majority of the cases.
If they so desire, users can always sort by sentiment with a simple
setkey. Sorting by keys is more verbose and results in unnecessary groups names replication.For these reasons I would propose to remove this "feature".