Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-43960: [R] fixstr_sub binding to properly handle negativeend values#44141

Merged
thisisnic merged 2 commits intoapache:mainfrom
coussens:fix-str_sub
Sep 21, 2024
Merged

GH-43960: [R] fixstr_sub binding to properly handle negativeend values#44141
thisisnic merged 2 commits intoapache:mainfrom
coussens:fix-str_sub

Conversation

@coussens
Copy link
Contributor

@coussenscoussens commentedSep 17, 2024
edited by github-actionsbot
Loading

First-time contributor here, so let me know where I can improve!

Rationale for this change

Thestr_sub binding in arrow was not handling negativeend values properly. The problem was two-fold:

  1. Whenend values were negative (and less than thestart value, which might be positive),str_sub would improperly return an empty string.
  2. Whenend values were < -1 but theend position was still to the right of thestart position,str_sub failed to return the final character in the substring, since it did not account for the fact thatend is counted exclusively in the underlying C++ function (utf8_slice_codeunits), but inclusively in R.

See discussion/examples at#43960 for details.

What changes are included in this PR?

  1. The removal of lines fromr/R/dplyr-funcs-string.R that previously setend= 0 whenstart < end, which meant if the user was counting backwards from the end of the string (with a negativeend value), an empty string would [wrongly] be returned. It appears that the case that the previous code was trying to address is already handled properly by the underlying C++ function (utf8_slice_codeunits).
  2. Addition of lines tor/R/dplyr-funcs-string.R in order to account the difference in between R's inclusiveend and C++'s exclusiveend whenend is negative.
  3. The addition of a test (described below) tor/tests/testthat/test-dplyr-funcs-string.R to test for these cases.

Are these changes tested?

Yes, I ran all tests inr/tests/testthat/test-dplyr-funcs-string.R, including one which I added (see attached commit), which explicitly tests the case whereend is negative (-3) and less than thestart value (1). This also tests the case whereend < -1 and to the right of thestart position.

Are there any user-facing changes?

No.

This PR contains a "Critical Fix". Previously:

  • Whenend values were negative (and less than thestart value, which might be positive),str_sub would improperly return an empty string.
  • Whenend values were < -1 but theend position was still to the right of thestart position,str_sub failed to return the final character in the substring, since it did not account for the fact thatend is counted exclusively in the underlying C++ function (utf8_slice_codeunits), but inclusively in R.

@github-actions
Copy link

⚠️ GitHub issue#43960has been automatically assigned in GitHub to PR creator.

@thisisnic
Copy link
Member

This looks great - thanks for updating this@coussens, great to see you join the Arrow dev community!

I've just approved the workflow runs - once this passes, I'm happy to merge. Feel free to ping me as I'm a bit slow checking my notifications these days!

coussens reacted with hooray emoji

@github-actionsgithub-actionsbot added awaiting changesAwaiting changes and removed awaiting reviewAwaiting review labelsSep 20, 2024
Copy link
Member

@thisisnicthisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks so much for updating this@coussens!

coussens reacted with laugh emoji
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit37f62d0.

There were no benchmark performance regressions. 🎉

Thefull Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

@amoebaamoeba removed the awaiting mergeAwaiting merge labelOct 22, 2024
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@thisisnicthisisnicthisisnic approved these changes

@jonkeanejonkeaneAwaiting requested review from jonkeanejonkeane is a code owner

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@coussens@thisisnic@amoeba

[8]ページ先頭

©2009-2026 Movatter.jp