Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BUG: Raise MergeError when suffixes result in duplicate column names …#61422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
Farsidetfs wants to merge1 commit intopandas-dev:main
base:main
Choose a base branch
Loading
fromFarsidetfs:fix-merge-suffixes-61402

Conversation

Farsidetfs
Copy link

@FarsidetfsFarsidetfs commentedMay 9, 2025
edited
Loading

closes#61402

Allcode checks passed.
Addedtype annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch 2 times, most recently fromfe71e1a toc9bfc5aCompareMay 10, 2025 02:48
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

pre-commit-ci[bot] reacted with rocket emoji

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch fromdc1bb47 toad744dfCompareMay 10, 2025 05:54
@chilin0525
Copy link
Contributor

Thanks for your contribution!

Just a quick note: you don't need to writeGH#61402 in the PR description — simply using#61402 in PR description, is enough, GitHub will automatically link it 😀.
Also, since this PR addresses a bug, please make sure to:

  • Add a unit test that covers this case
  • Include an entry in thedoc/source/whatsnew/vx.y.z.rst file to document your fix

For reference, you can check the contributing guidelines here:https://pandas.pydata.org/docs/development/contributing_codebase.html#documenting-your-code

nikaltipar reacted with thumbs up emoji

@Farsidetfs
Copy link
Author

Thanks for the pointers. I'll get those added in here soon. Trying to track down why the Unit Tests / Linux-32-bit(pull_request) is failing. I didn't change anything that should have effected Series, so it's kinda weird.

I also can't get the pytest to run normally on my dev yet either, so I haven't been able to fully replicate the failure locally yet. So, still a little more work to do here.

chilin0525 reacted with thumbs up emoji

@chilin0525
Copy link
Contributor

@Farsidetfs I believe the CI failure is not related to your changes. It appears to be caused by the cython version — pandas unit tests fail withcython==3.1.0. You may notice that the same test failures have occurred in several recent PRs as well. I already address the issue in#61423.

Copy link

@nikaltiparnikaltipar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added a few comments from a bird's eye view. Thanks for the change.

Comment on lines 3064 to 3073
iflen(left_collisions)>0:
raiseMergeError(
"Passing 'suffixes' which cause duplicate columns "
f"{set(left_collisions)} is not allowed"
)
iflen(right_collisions)>0:
raiseMergeError(
"Passing 'suffixes' which cause duplicate columns "
f"{set(right_collisions)} is not allowed"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would recommend you combine this into a common error to reduce repetition (bonus points for combining it with the pre-existing error just a few lines below)

chilin0525 reacted with thumbs up emoji
# Check for duplicates created by suffixes
left_collisions=llabels.intersection(right.difference(to_rename))
right_collisions=rlabels.intersection(left.difference(to_rename))
iflen(left_collisions)>0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Doesn't justif not left_collisions.empty: work? Same for a similar check below.

@@ -3058,6 +3058,20 @@ def renamer(x, suffix: str | None):
llabels=left._transform_index(lrenamer)
rlabels=right._transform_index(rrenamer)

# Check for duplicates created by suffixes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For new readers of this code, the comment might not be descriptive enough. While your code is supposed to find suffixes that are caused by duplicated would-be created columns across dataframes, there is an extra section that does a duplicate checking just below your new code (but would-be created columns due to duplicity within the same dataframe).

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch fromad744df to9a40cd0CompareMay 12, 2025 20:45
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

pre-commit-ci[bot] reacted with rocket emoji

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch 3 times, most recently frome9efd0d to1476957CompareMay 12, 2025 22:35
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

pre-commit-ci[bot] reacted with rocket emoji

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch fromcb23817 to6b82c85CompareMay 13, 2025 00:02
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

pre-commit-ci[bot] reacted with rocket emoji

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch from5aff565 toe62c70fCompareMay 13, 2025 01:11
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

pre-commit-ci[bot] reacted with rocket emoji

@Farsidetfs
Copy link
Author

@nikaltipar I think this should be ready now. Please let me know if I've missed anything. I took your advice and combined the two with slight modifications to improve efficiency using sets throughout rather than just convert at the end.

chilin0525 reacted with thumbs up emoji

@nikaltipar
Copy link

@nikaltipar I think this should be ready now. Please let me know if I've missed anything. I took your advice and combined the two with slight modifications to improve efficiency using sets throughout rather than just convert at the end.

Thanks for taking care of that,@Farsidetfs ! It looks good to me, no other comments from my side. Thanks for adding the unit-tests, too!

@chilin0525
Copy link
Contributor

@nikaltipar Could you rebase main branch to trigger CI again?

@nikaltipar
Copy link

@nikaltipar Could you rebase main branch to trigger CI again?

I am not able to, I'll have to wait for@Farsidetfs

chilin0525 reacted with thumbs up emoji

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch from7d3837a toc377804CompareMay 15, 2025 04:03
@Farsidetfs
Copy link
Author

Rebase complete. Thanks. Let me know if there's anything else needed.

chilin0525 reacted with hooray emoji

@nikaltipar
Copy link

#61402

Copy link
Member

@rhshadrachrhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking good!

@@ -170,7 +170,7 @@ Groupby/resample/rolling

Reshaping
^^^^^^^^^
-
- Bug in:meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a:class:`MergeError` in such cases. (:issue:`61402`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you move this note to v3.0.0

df1=DataFrame({"col1": [1],"col2": [2]})
df2=DataFrame({"col1": [1],"col2": [2],"col2_dup": [3]})
withpytest.raises(MergeError,match="duplicate columns"):
merge(df1,df2,on="col1",suffixes=("_dup",""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you parametrize this test with

@pytest.mark.parametrize("suffixes", [("_dup", ""), ("", "_dup")])

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@rhshadrach Done. Please let me know if I missed anything.

@rhshadrachrhshadrach added the ReshapingConcat, Merge/Join, Stack/Unstack, Explode labelMay 19, 2025
@rhshadrachrhshadrach added this to the3.0 milestoneMay 19, 2025
@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch 2 times, most recently fromb7b3a95 tobc62afeCompareMay 22, 2025 03:49
@@ -3058,17 +3058,19 @@ def renamer(x, suffix: str | None):
llabels=left._transform_index(lrenamer)
rlabels=right._transform_index(rrenamer)

dups=[]
dups=set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you keep this as alist and do the uniquenessset call whendups is not empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you elaborate why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Small performance optimization - if there are no duplicate labels (which is probably the most common case) we are doing unnecessary work hashing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Got it. I was thinking set was more efficient for operations, but since this should be a very rare case, it is wasted. I'll make the adjustment. Thanks for the clarity.

@FarsidetfsFarsidetfsforce-pushed thefix-merge-suffixes-61402 branch from5b816ce to9bc2b66CompareMay 23, 2025 19:20
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@mroeschkemroeschkemroeschke left review comments

@nikaltiparnikaltiparnikaltipar approved these changes

@rhshadrachrhshadrachAwaiting requested review from rhshadrach

Requested changes must be addressed to merge this pull request.

Assignees
No one assigned
Labels
BugReshapingConcat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Milestone
3.0
Development

Successfully merging this pull request may close these issues.

BUG: Duplicate columns allowed onmerge if originating from separate dataframes
5 participants
@Farsidetfs@chilin0525@nikaltipar@mroeschke@rhshadrach

[8]ページ先頭

©2009-2025 Movatter.jp