Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-102140: fix false negative in csv.Sniffer.has_header#103341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
bedevere-bot commentedApr 7, 2023
Most changes to Pythonrequire a NEWS entry. Please add it using theblurb_it web app or theblurb command-line tool. |
This comment was marked as duplicate.
This comment was marked as duplicate.
Uh oh!
There was an error while loading.Please reload this page.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
ghost commentedApr 7, 2023 • edited by ghost
Loading Uh oh!
There was an error while loading.Please reload this page.
edited by ghost
Uh oh!
There was an error while loading.Please reload this page.
667c9a2
to02b645d
Compare This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
What you expect us to do to solve labels backports ? Is there another file to modify ? |
Oh don’t worry, the backport labels are used by bots to create follow-up pull requests! |
johnD18 commentedMay 31, 2023
I have made the requested changes; please review again |
Hi@merwok ! Thanks for your time to help us. I saw the other issue with the same problem, tell me if there is something new from this. |
AlexWaygood commentedMay 31, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@merwok, because you previously requested changes on this PR, you will either need to dismiss your prior review as "stale", or formally approve this PR. Otherwise the "check labels" CI check will continue to fail due to the "awaiting changes" label on this PR. If you don't know how to dismiss your prior review as stale but would like to do that, I can do that for you. @Drakariboo: pleasedon't worry about the The |
(You also don't really need to worry too much about keeping your PR branch bang-up-to-date with |
|
CAM-Gerlach left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Standard reminder: You can directly apply all the suggestions you want in one go by going toFiles changed
-> ClickingAdd to batch on each suggestion -> When done, clickingCommit
Thanks for the ping@merwok (and all your great help and guidance here!) and sorry for the delay, I was taking my annual post-PyCon GitHub notification break to recover a bit.
BTW, the docs warnings that are not on or near lines touched by this PR can be ignored for now; we want to have those only show up for such lines, but due to a few issues it's not quite as easy to do as it would seem, and we haven't been able to implement that just yet, sorry.
Misc/NEWS.d/next/Library/2023-05-01-18-53-20.gh-issue-102140._4gFLu.rst OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
lengths, the average length of all the strings becomes a crucial factor | ||
in the determination process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I found this rather vague, and would really recommend being specific here abouthow the average length is used, and under what conditions it means this method returnsTrue
, just like the rest of this description does for the other cases. Even skimming the code and description here it wasn't totally clear to me, so I didn't suggest something specific, but this should presumably look something like the following:
lengths, the average length of each row is (used to/compared with) ... and if (greater than/less than) ... , ``True`` is returned.
@@ -394,6 +394,8 @@ def has_header(self, sample): | |||
# can't be determined, it is assumed to be a string in which case | |||
# the length of the string is the determining factor: if all of the | |||
# rows except for the first are the same length, it's a header. | |||
# when the strings have varying length, the average length of all | |||
# strings becomes a determining factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Similar to the above, this is very unclear to me. I suggest something like
# columns in a row determines...what? how?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Change line 397 : "w" in uppercase.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
…4gFLu.rstRewording to specify this is more a defect fix than an enhancementCo-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Replacing "checking" by "check" in the commentsCo-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Change comments to keep a more reasonable line length and use imperative.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
change lines 407-410, init and assignment columnTypes directly.Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
@@ -402,8 +404,9 @@ def has_header(self, sample): | |||
header = next(rdr) # assume first row is header | |||
columns = len(header) | |||
columnTypes = {} | |||
for i in range(columns): columnTypes[i] = None | |||
columnTypes = {i: None for i in range(columns)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
(could even use dict.fromkeys, but this is already clear!)
Uh oh!
There was an error while loading.Please reload this page.
We've improved the heuristic of
has_header()
method inLib/csv.py
. We wanted torespect the methodology on how this function was created,even if the determining factorstring length
is meaningless .Contributors :@Drakariboo and @Vanille-22