Allcode checks passed.
Addedtype annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch 2 times, most recently fromfe71e1a toc9bfc5aCompare

May 10, 2025 02:48

Copy link

ContributorAuthor

Farsidetfs commentedMay 10, 2025

pre-commit.ci autofix

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch fromdc1bb47 toad744dfCompare

May 10, 2025 05:54

Copy link

Contributor

chilin0525 commentedMay 10, 2025

Thanks for your contribution!

Just a quick note: you don't need to writeGH#61402 in the PR description — simply using#61402 in PR description, is enough, GitHub will automatically link it 😀.
Also, since this PR addresses a bug, please make sure to:

Add a unit test that covers this case
Include an entry in thedoc/source/whatsnew/vx.y.z.rst file to document your fix

For reference, you can check the contributing guidelines here:https://pandas.pydata.org/docs/development/contributing_codebase.html#documenting-your-code

chilin0525 mentioned this pull request

May 10, 2025

BUG: Duplicate columns allowed onmerge if originating from separate dataframes#61402

Closed

3 tasks

Copy link

ContributorAuthor

Farsidetfs commentedMay 10, 2025

Thanks for the pointers. I'll get those added in here soon. Trying to track down why the Unit Tests / Linux-32-bit(pull_request) is failing. I didn't change anything that should have effected Series, so it's kinda weird.

I also can't get the pytest to run normally on my dev yet either, so I haven't been able to fully replicate the failure locally yet. So, still a little more work to do here.

Copy link

Contributor

chilin0525 commentedMay 10, 2025

@Farsidetfs I believe the CI failure is not related to your changes. It appears to be caused by the cython version — pandas unit tests fail withcython==3.1.0. You may notice that the same test failures have occurred in several recent PRs as well. I already address the issue in#61423.

nikaltipar reviewed

May 10, 2025

View reviewed changes

Copy link

nikaltipar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added a few comments from a bird's eye view. Thanks for the change.

pandas/core/reshape/merge.py Outdated

Comment on lines 3064 to 3073

		if len(left_collisions) > 0:
		raise MergeError(
		"Passing 'suffixes' which cause duplicate columns "
		f"{set(left_collisions)} is not allowed"
		)
		if len(right_collisions) > 0:
		raise MergeError(
		"Passing 'suffixes' which cause duplicate columns "
		f"{set(right_collisions)} is not allowed"
		)

Copy link

nikaltiparMay 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would recommend you combine this into a common error to reduce repetition (bonus points for combining it with the pre-existing error just a few lines below)

pandas/core/reshape/merge.py Outdated

		# Check for duplicates created by suffixes
		left_collisions = llabels.intersection(right.difference(to_rename))
		right_collisions = rlabels.intersection(left.difference(to_rename))
		if len(left_collisions) > 0:

Copy link

nikaltiparMay 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Doesn't justif not left_collisions.empty: work? Same for a similar check below.

pandas/core/reshape/merge.py Outdated

		@@ -3058,6 +3058,20 @@ def renamer(x, suffix: str \| None):
		llabels = left._transform_index(lrenamer)
		rlabels = right._transform_index(rrenamer)

		# Check for duplicates created by suffixes

Copy link

nikaltiparMay 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For new readers of this code, the comment might not be descriptive enough. While your code is supposed to find suffixes that are caused by duplicated would-be created columns across dataframes, there is an extra section that does a duplicate checking just below your new code (but would-be created columns due to duplicity within the same dataframe).

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch fromad744df to9a40cd0Compare

May 12, 2025 20:45

Copy link

ContributorAuthor

Farsidetfs commentedMay 12, 2025

pre-commit.ci autofix

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch 3 times, most recently frome9efd0d to1476957Compare

May 12, 2025 22:35

Copy link

ContributorAuthor

Farsidetfs commentedMay 12, 2025

pre-commit.ci autofix

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch fromcb23817 to6b82c85Compare

May 13, 2025 00:02

Copy link

ContributorAuthor

Farsidetfs commentedMay 13, 2025

pre-commit.ci autofix

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch from5aff565 toe62c70fCompare

May 13, 2025 01:11

Copy link

ContributorAuthor

Farsidetfs commentedMay 13, 2025

pre-commit.ci autofix

Copy link

ContributorAuthor

Farsidetfs commentedMay 13, 2025

@nikaltipar I think this should be ready now. Please let me know if I've missed anything. I took your advice and combined the two with slight modifications to improve efficiency using sets throughout rather than just convert at the end.

Copy link

nikaltipar commentedMay 14, 2025

@nikaltipar I think this should be ready now. Please let me know if I've missed anything. I took your advice and combined the two with slight modifications to improve efficiency using sets throughout rather than just convert at the end.

Thanks for taking care of that,@Farsidetfs ! It looks good to me, no other comments from my side. Thanks for adding the unit-tests, too!

Copy link

Contributor

chilin0525 commentedMay 14, 2025

@nikaltipar Could you rebase main branch to trigger CI again?

nikaltipar approved these changes

May 14, 2025

View reviewed changes

Copy link

nikaltipar commentedMay 14, 2025

@nikaltipar Could you rebase main branch to trigger CI again?

I am not able to, I'll have to wait for@Farsidetfs

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch from7d3837a toc377804Compare

May 15, 2025 04:03

Copy link

ContributorAuthor

Farsidetfs commentedMay 15, 2025

Rebase complete. Thanks. Let me know if there's anything else needed.

nikaltipar approved these changes

May 15, 2025

View reviewed changes

Copy link

nikaltipar commentedMay 15, 2025

#61402

rhshadrach requested changes

May 19, 2025

View reviewed changes

Copy link

Member

rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking good!

doc/source/whatsnew/v2.3.0.rst Outdated

		@@ -170,7 +170,7 @@ Groupby/resample/rolling

		Reshaping
		^^^^^^^^^
		-
		- Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`)

Copy link

Member

rhshadrachMay 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you move this note to v3.0.0

pandas/tests/reshape/merge/test_merge.py Outdated

		df1 = DataFrame({"col1": [1], "col2": [2]})
		df2 = DataFrame({"col1": [1], "col2": [2], "col2_dup": [3]})
		with pytest.raises(MergeError, match="duplicate columns"):
		merge(df1, df2, on="col1", suffixes=("_dup", ""))

Copy link

Member

rhshadrachMay 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you parametrize this test with

@pytest.mark.parametrize("suffixes", [("_dup", ""), ("", "_dup")])

Copy link

ContributorAuthor

FarsidetfsMay 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@rhshadrach Done. Please let me know if I missed anything.

rhshadrach added the Bug label

May 19, 2025

rhshadrach added the ReshapingConcat, Merge/Join, Stack/Unstack, Explode label

May 19, 2025

rhshadrach added this to the3.0 milestone

May 19, 2025

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch 2 times, most recently fromb7b3a95 tobc62afeCompare

May 22, 2025 03:49

Farsidetfs requested a review fromrhshadrach

May 22, 2025 05:26

mroeschke reviewed

May 22, 2025

View reviewed changes

pandas/core/reshape/merge.py OutdatedShow resolvedHide resolved

BUG: Raise MergeError when suffixes result in duplicate column names …

9bc2b66

…(GH#61402)

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch from5b816ce to9bc2b66Compare

May 23, 2025 19:20

mroeschke approved these changes

May 23, 2025

View reviewed changes

Copy link

ContributorAuthor

Farsidetfs commentedMay 24, 2025

@rhshadrach All requested changes are complete, so it should be ready for your review to unblock merge. Thanks

rhshadrach approved these changes

May 30, 2025

View reviewed changes

Copy link

Member

rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

lgtm

Copy link

Member

rhshadrach commentedMay 30, 2025

@Farsidetfs - just a conflict that needs resolved in the whatsnew.

Copy link

Member

datapythonista commentedJun 2, 2025

pre-commit.ci autofix

1 similar comment

Copy link

ContributorAuthor

Farsidetfs commentedJun 5, 2025•
edited
Loading

pre-commit.ci autofix

Merge remote-tracking branch 'upstream/main' into fix-merge-suffixes-…

fdb5861

…61402

Farsidetfs force-pushed thefix-merge-suffixes-61402 branch from12ce6ce tofdb5861Compare

June 5, 2025 23:33

Copy link

ContributorAuthor

Farsidetfs commentedJun 6, 2025

@rhshadrach Merge conflicts resolved, ready for merge

datapythonista merged commit297bec4 intopandas-dev:main

Jun 6, 2025

44 checks passed

Copy link

Member

datapythonista commentedJun 6, 2025

Thanks@Farsidetfs very nice

Labels

Bug Reshaping

Concat, Merge/Join, Stack/Unstack, Explode

6 participants

Movatterモバイル変換

Uh oh!

BUG: Raise MergeError when suffixes result in duplicate column names …#61422

BUG: Raise MergeError when suffixes result in duplicate column names …#61422

Uh oh!

Conversation

Farsidetfs commentedMay 9, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Farsidetfs commentedMay 10, 2025

Uh oh!

chilin0525 commentedMay 10, 2025

Uh oh!

Farsidetfs commentedMay 10, 2025

Uh oh!

chilin0525 commentedMay 10, 2025

Uh oh!

nikaltipar left a comment

Choose a reason for hiding this comment

Uh oh!

nikaltiparMay 10, 2025

Choose a reason for hiding this comment

Uh oh!

nikaltiparMay 10, 2025

Choose a reason for hiding this comment

Uh oh!

nikaltiparMay 10, 2025

Choose a reason for hiding this comment

Uh oh!

Farsidetfs commentedMay 12, 2025

Uh oh!

Farsidetfs commentedMay 12, 2025

Uh oh!

Farsidetfs commentedMay 13, 2025

Uh oh!

Farsidetfs commentedMay 13, 2025

Uh oh!

Farsidetfs commentedMay 13, 2025

Uh oh!

nikaltipar commentedMay 14, 2025

Uh oh!

chilin0525 commentedMay 14, 2025

Uh oh!

nikaltipar commentedMay 14, 2025

Uh oh!

Farsidetfs commentedMay 15, 2025

Uh oh!

nikaltipar commentedMay 15, 2025

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrachMay 19, 2025

Choose a reason for hiding this comment

Uh oh!

rhshadrachMay 19, 2025

Choose a reason for hiding this comment

Uh oh!

FarsidetfsMay 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Farsidetfs commentedMay 24, 2025

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commentedMay 30, 2025

Uh oh!

datapythonista commentedJun 2, 2025

Uh oh!

Farsidetfs commentedJun 5, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Farsidetfs commentedJun 6, 2025

Uh oh!

Uh oh!

datapythonista commentedJun 6, 2025

Uh oh!

Uh oh!

Farsidetfs commentedMay 9, 2025•
edited
Loading

Farsidetfs commentedJun 5, 2025•
edited
Loading