We've improved the heuristic ofhas_header() method inLib/csv.py. We wanted torespect the methodology on how this function was created,even if the determining factorstring length is meaningless .

We've made theaverage of string lengths and compared it to the header length to keep the consistency. We added a condition in which we check if the dictionnary is empty and if all elements are strings. If it's true, we use the average calculated before.

Contributors :@Drakariboo and @Vanille-22

Issue:False negative from csv.Sniffer.has_header with only strings #102140

johnD18and others added2 commits

March 27, 2023 12:07

fixing has_header

60d7501

Merge branch 'python:main' into has_header_false_neg_fix

1636e7b

Copy link

bedevere-bot commentedApr 7, 2023

Most changes to Pythonrequire a NEWS entry.

Please add it using theblurb_it web app or theblurb command-line tool.

bedevere-bot mentioned this pull request

Apr 7, 2023

False negative from csv.Sniffer.has_header with only strings#102140

Open

bedevere-bot added the awaiting review label

Apr 7, 2023

Merge branch 'main' into has_header_false_neg_fix

7020dd5

This comment was marked as duplicate.

arhadthedev added the stdlibPython modules in the Lib dir label

Apr 7, 2023

arhadthedev changed the title~~# gh-102140: False neg csv header bug fix~~gh-102140: False neg csv header bug fix

Apr 7, 2023

arhadthedev reviewed

Apr 7, 2023

View reviewed changes

Lib/csv.py OutdatedShow resolvedHide resolved

adding genericalias

e2a76d9

This comment was marked as duplicate.

adding fieldnames

02b645d

This comment was marked as duplicate.

1 similar comment

This comment was marked as duplicate.

Copy link

ghost commentedApr 7, 2023•
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.

johnD18 force-pushed thehas_header_false_neg_fix branch from667c9a2 to02b645dCompare

April 7, 2023 12:44

This comment was marked as duplicate.

Merge branch 'python:main' into has_header_false_neg_fix

747cbc8

This comment was marked as duplicate.

correcting deletions

6db356f

This comment was marked as duplicate.

Eclips4 reviewed

Apr 7, 2023

View reviewed changes

Lib/csv.py OutdatedShow resolvedHide resolved

Eclips4 reviewed

Apr 7, 2023

View reviewed changes

Lib/csv.py OutdatedShow resolvedHide resolved

Eclips4 reviewed

Apr 7, 2023

View reviewed changes

Lib/csv.py OutdatedShow resolvedHide resolved

Merge branch 'python:main' into has_header_false_neg_fix

38166fe

This comment was marked as duplicate.

correction of comments

2fe30c0

This comment was marked as duplicate.

corrections

b78cd78

merwok added the needs backport to 3.12only security fixes label

May 29, 2023

adding a comment on csv.py & csv.rst update

bb136bb

Copy link

Author

Drakariboo commentedMay 30, 2023

What you expect us to do to solve labels backports ? Is there another file to modify ?

Copy link

Member

merwok commentedMay 30, 2023

Oh don’t worry, the backport labels are used by bots to create follow-up pull requests!

Drakaribooand others added4 commits

May 30, 2023 21:37

Merge branch 'main' into has_header_false_neg_fix

dcf1af4

Merge branch 'python:main' into has_header_false_neg_fix

920cc1e

Merge branch 'python:main' into has_header_false_neg_fix

1971df6

Merge branch 'main' into has_header_false_neg_fix

8746e64

Copy link

johnD18 commentedMay 31, 2023

I have made the requested changes; please review again

Merge branch 'main' into has_header_false_neg_fix

a9ea1be

Copy link

Author

Drakariboo commentedMay 31, 2023

Hi@merwok !
I don't really get it. What do we have to do exactly to pass every tests ?
Also, is test/hypothesis ubuntu fixed ?
Because, we have again a failure withtest_xxsubintepreters, but as said before, it was not coming from us.

Thanks for your time to help us. I saw the other issue with the same problem, tell me if there is something new from this.
Have a good day ! :)

Copy link

Member

AlexWaygood commentedMay 31, 2023•
edited
Loading

@merwok, because you previously requested changes on this PR, you will either need to dismiss your prior review as "stale", or formally approve this PR. Otherwise the "check labels" CI check will continue to fail due to the "awaiting changes" label on this PR.

If you don't know how to dismiss your prior review as stale but would like to do that, I can do that for you.

@Drakariboo: pleasedon't worry about thetest_threading and/ortest__xxsubinterpreters failing on this PR. We're fully aware that it's not your fault, and it's not blocking this PR being merged. Once the PR has been approved by a core developer, we will be able to merge the PReven iftest_threading and/ortest__xxsubinterpreters are failing on this PR. (There's no requirement for all tests to be passing in order for a PR to be merged — if it's known that a test is failing for unrelated reasons, it can be ignored 🙂)

Thetest_threading andtest__xxsubinterpreters crashes are a known problem, and other people are working on fixing those tests.

Merge branch 'python:main' into has_header_false_neg_fix

6d139a6

Copy link

Member

AlexWaygood commentedMay 31, 2023

(You also don't really need to worry too much about keeping your PR branch bang-up-to-date withmain, unless there's a merge conflict. The merge commits just add noise for people subscribed to the thread :-)

merwok self-requested a review

May 31, 2023 14:02

Merge branch 'main' into has_header_false_neg_fix

141cf5d

Copy link

Member

arhadthedev commentedJun 14, 2023

I don't really get it. What do we have to do exactly to pass every tests ?

Check labels / DO-NOT-MERGE / unresolved review fails because of awaiting changes label left after the first review of@merwok. So we just need to wait.

CAM-Gerlach added the type-bugAn unexpected behavior, bug, or error label

Jun 15, 2023

CAM-Gerlach reviewed

Jun 15, 2023

View reviewed changes

Copy link

Member

CAM-Gerlach left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Standard reminder: You can directly apply all the suggestions you want in one go by going toFiles changed -> ClickingAdd to batch on each suggestion -> When done, clickingCommit

Thanks for the ping@merwok (and all your great help and guidance here!) and sorry for the delay, I was taking my annual post-PyCon GitHub notification break to recover a bit.

BTW, the docs warnings that are not on or near lines touched by this PR can be ignored for now; we want to have those only show up for such lines, but due to a few issues it's not quite as easy to do as it would seem, and we haven't been able to implement that just yet, sorry.

Misc/NEWS.d/next/Library/2023-05-01-18-53-20.gh-issue-102140._4gFLu.rst OutdatedShow resolvedHide resolved

Doc/library/csv.rst

Comment on lines +299 to +300

		lengths, the average length of all the strings becomes a crucial factor
		in the determination process.

Copy link

Member

CAM-GerlachJun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I found this rather vague, and would really recommend being specific here abouthow the average length is used, and under what conditions it means this method returnsTrue, just like the rest of this description does for the other cases. Even skimming the code and description here it wasn't totally clear to me, so I didn't suggest something specific, but this should presumably look something like the following:

      lengths, the average length of each row is (used to/compared with) ...      and if (greater than/less than) ... , ``True`` is returned.

Lib/csv.py

		@@ -394,6 +394,8 @@ def has_header(self, sample):
		# can't be determined, it is assumed to be a string in which case
		# the length of the string is the determining factor: if all of the
		# rows except for the first are the same length, it's a header.
		# when the strings have varying length, the average length of all
		# strings becomes a determining factor.

Copy link

Member

CAM-GerlachJun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Similar to the above, this is very unclear to me. I suggest something like

        # columns in a row determines...what? how?

Lib/csv.py OutdatedShow resolvedHide resolved

Drakaribooand others added6 commits

June 15, 2023 09:37

Update Lib/csv.py

9686653

Change line 397 : "w" in uppercase.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

Update Misc/NEWS.d/next/Library/2023-05-01-18-53-20.gh-issue-102140._…

680c67b

…4gFLu.rstRewording to specify this is more a defect fix than an enhancementCo-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

Update Lib/csv.py

149938a

Replacing "checking" by "check" in the commentsCo-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

Update Lib/csv.py

4b22c77

Change comments to keep a more reasonable line length and use imperative.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

Update Lib/csv.py

2092173

change lines 407-410, init and assignment columnTypes directly.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

Merge branch 'main' into has_header_false_neg_fix

95f2b61

merwok reviewed

Jun 15, 2023

View reviewed changes

Lib/csv.py

		@@ -402,8 +404,9 @@ def has_header(self, sample):
		header = next(rdr) # assume first row is header

		columns = len(header)
		columnTypes = {}
		for i in range(columns): columnTypes[i] = None
		columnTypes = {i: None for i in range(columns)}