Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-102140: fix false negative in csv.Sniffer.has_header#103341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
Drakariboo wants to merge75 commits intopython:main
base:main
Choose a base branch
Loading
fromDrakariboo:has_header_false_neg_fix

Conversation

Drakariboo
Copy link

@DrakaribooDrakariboo commentedApr 7, 2023
edited by AlexWaygood
Loading

We've improved the heuristic ofhas_header() method inLib/csv.py. We wanted torespect the methodology on how this function was created,even if the determining factorstring length is meaningless .

We've made theaverage of string lengths and compared it to the header length to keep the consistency. We added a condition in which we check if the dictionnary is empty and if all elements are strings. If it's true, we use the average calculated before.

Contributors :@Drakariboo and @Vanille-22

@bedevere-bot
Copy link

Most changes to Pythonrequire a NEWS entry.

Please add it using theblurb_it web app or theblurb command-line tool.

@bedevere-bot

This comment was marked as duplicate.

@arhadthedevarhadthedev added the stdlibPython modules in the Lib dir labelApr 7, 2023
@arhadthedevarhadthedev changed the title# gh-102140: False neg csv header bug fixgh-102140: False neg csv header bug fixApr 7, 2023
@bedevere-bot

This comment was marked as duplicate.

@bedevere-bot

This comment was marked as duplicate.

1 similar comment
@bedevere-bot

This comment was marked as duplicate.

@ghost
Copy link

ghost commentedApr 7, 2023
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.
CLA signed

@johnD18johnD18force-pushed thehas_header_false_neg_fix branch from667c9a2 to02b645dCompareApril 7, 2023 12:44
@bedevere-bot

This comment was marked as duplicate.

@bedevere-bot

This comment was marked as duplicate.

@bedevere-bot

This comment was marked as duplicate.

@bedevere-bot

This comment was marked as duplicate.

@bedevere-bot

This comment was marked as duplicate.

@merwokmerwok added the needs backport to 3.12only security fixes labelMay 29, 2023
@Drakariboo
Copy link
Author

What you expect us to do to solve labels backports ? Is there another file to modify ?

@merwok
Copy link
Member

Oh don’t worry, the backport labels are used by bots to create follow-up pull requests!

@johnD18
Copy link

I have made the requested changes; please review again

@Drakariboo
Copy link
Author

Hi@merwok !
I don't really get it. What do we have to do exactly to pass every tests ?
Also, is test/hypothesis ubuntu fixed ?
Because, we have again a failure withtest_xxsubintepreters, but as said before, it was not coming from us.

Thanks for your time to help us. I saw the other issue with the same problem, tell me if there is something new from this.
Have a good day ! :)

@AlexWaygood
Copy link
Member

AlexWaygood commentedMay 31, 2023
edited
Loading

@merwok, because you previously requested changes on this PR, you will either need to dismiss your prior review as "stale", or formally approve this PR. Otherwise the "check labels" CI check will continue to fail due to the "awaiting changes" label on this PR.

If you don't know how to dismiss your prior review as stale but would like to do that, I can do that for you.

@Drakariboo: pleasedon't worry about thetest_threading and/ortest__xxsubinterpreters failing on this PR. We're fully aware that it's not your fault, and it's not blocking this PR being merged. Once the PR has been approved by a core developer, we will be able to merge the PReven iftest_threading and/ortest__xxsubinterpreters are failing on this PR. (There's no requirement for all tests to be passing in order for a PR to be merged — if it's known that a test is failing for unrelated reasons, it can be ignored 🙂)

Thetest_threading andtest__xxsubinterpreters crashes are a known problem, and other people are working on fixing those tests.

@AlexWaygood
Copy link
Member

(You also don't really need to worry too much about keeping your PR branch bang-up-to-date withmain, unless there's a merge conflict. The merge commits just add noise for people subscribed to the thread :-)

@merwokmerwok self-requested a reviewMay 31, 2023 14:02
@arhadthedev
Copy link
Member

I don't really get it. What do we have to do exactly to pass every tests ?

Check labels / DO-NOT-MERGE / unresolved review fails because of awaiting changes label left after the first review of@merwok. So we just need to wait.

@CAM-GerlachCAM-Gerlach added the type-bugAn unexpected behavior, bug, or error labelJun 15, 2023
Copy link
Member

@CAM-GerlachCAM-Gerlach left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Standard reminder: You can directly apply all the suggestions you want in one go by going toFiles changed -> ClickingAdd to batch on each suggestion -> When done, clickingCommit

Thanks for the ping@merwok (and all your great help and guidance here!) and sorry for the delay, I was taking my annual post-PyCon GitHub notification break to recover a bit.

BTW, the docs warnings that are not on or near lines touched by this PR can be ignored for now; we want to have those only show up for such lines, but due to a few issues it's not quite as easy to do as it would seem, and we haven't been able to implement that just yet, sorry.

Comment on lines +299 to +300
lengths, the average length of all the strings becomes a crucial factor
in the determination process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I found this rather vague, and would really recommend being specific here abouthow the average length is used, and under what conditions it means this method returnsTrue, just like the rest of this description does for the other cases. Even skimming the code and description here it wasn't totally clear to me, so I didn't suggest something specific, but this should presumably look something like the following:

      lengths, the average length of each row is (used to/compared with) ...      and if (greater than/less than) ... , ``True`` is returned.

@@ -394,6 +394,8 @@ def has_header(self, sample):
# can't be determined, it is assumed to be a string in which case
# the length of the string is the determining factor: if all of the
# rows except for the first are the same length, it's a header.
# when the strings have varying length, the average length of all
# strings becomes a determining factor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Similar to the above, this is very unclear to me. I suggest something like

        # columns in a row determines...what? how?

Drakaribooand others added6 commitsJune 15, 2023 09:37
Change line 397 : "w" in uppercase.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
…4gFLu.rstRewording to specify this is more a defect fix than an enhancementCo-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Replacing "checking" by "check" in the commentsCo-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Change comments to keep a more reasonable line length and use imperative.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
change lines 407-410, init and assignment columnTypes directly.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
@@ -402,8 +404,9 @@ def has_header(self, sample):
header = next(rdr) # assume first row is header

columns = len(header)
columnTypes = {}
for i in range(columns): columnTypes[i] = None
columnTypes = {i: None for i in range(columns)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(could even use dict.fromkeys, but this is already clear!)

@serhiy-storchakaserhiy-storchaka added needs backport to 3.13bugs and security fixes and removed needs backport to 3.11only security fixes labelsMay 9, 2024
@Yhg1sYhg1s removed the needs backport to 3.12only security fixes labelApr 8, 2025
@serhiy-storchakaserhiy-storchaka added the needs backport to 3.14bugs and security fixes labelMay 8, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@merwokmerwokmerwok requested changes

@arhadthedevarhadthedevarhadthedev left review comments

@CAM-GerlachCAM-GerlachCAM-Gerlach left review comments

@Eclips4Eclips4Eclips4 left review comments

@johnD18johnD18johnD18 approved these changes

Assignees
No one assigned
Labels
3.13bugs and security fixesawaiting changesneeds backport to 3.13bugs and security fixesneeds backport to 3.14bugs and security fixesstdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

10 participants
@Drakariboo@bedevere-bot@merwok@johnD18@arhadthedev@AlexWaygood@CAM-Gerlach@Eclips4@serhiy-storchaka@Yhg1s

[8]ページ先頭

©2009-2025 Movatter.jp