Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-72680: Fix false positives when using zipfile.is_zipfile()#5053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
jjolly wants to merge3 commits intopython:mainfromjjolly:is_zipfile_verify

Conversation

jjolly
Copy link
Contributor

@jjollyjjolly commentedDec 30, 2017
edited by bedevere-bot
Loading

Fix zipfile validation issue by ... providing more validation!

Originally, zipfile.is_zipfile() only checked the End Central Directory
signature. If the signature could be found in the last 64k of the file,
success! This produced false positives on any file with 'PK\x05\x06' in the
last 64k of the file - including PDFs and PNGs.

This is now corrected by actually validating the Central Directory location
and size based on the information provided by the End Central Directory
along with verifying the Central Directory signature of the first entry.

This should be sufficient for the vast number of zipfiles, but more could be
done to absolutely validate the zipfile content - such as validating all
local file headers and Central Directory entries.

https://bugs.python.org/issue28494

mxmlnkn reacted with thumbs up emoji
@gpshead
Copy link
Member

Can you expand the unit testing foris_zipfile?is_zipfile should effectively suggest thatzipfile.ZipFile will be able to read the archive and in particular aFalse return value should always mean thatzipfile.ZipFile would fail to read it (ie: we don't want is_zipfile to fail something but be a file we could've read). We don't have much more than very basic coverage of this in thetest_zipfile.py andtest_zipfile64.py today.

It would not be unreasonable to assertTrue and assertFalse as appropriate the result ofis_zipfile() on any test.zip file that a test is using (writing, reading, or both), valid or invalid, and any zip file contents being opened for read.

The zipfile.is_zipfile function would only search for the EndOfZipfilesection header. This failed to correctly identify non-zipfiles thatcontained this header. Now the zipfile.is_zipfile function verifiesthe first central directory entry.Changes:* Extended zipfile.is_zipfile to verify zipfile catalog* Added tests to validate failure of binary non-zipfiles
@jjolly
Copy link
ContributorAuthor

Can you expand the unit testing foris_zipfile?is_zipfile should effectively suggest thatzipfile.ZipFile will be able to read the archive and in particular aFalse return value should always mean thatzipfile.ZipFile would fail to read it (ie: we don't want is_zipfile to fail something but be a file we could've read). We don't have much more than very basic coverage of this in thetest_zipfile.py andtest_zipfile64.py today.

I've added a PNG file that would succeed under the old algorithm, but fails correctly with the new algorithm. Perhaps the PNG is overkill, but it's proof that a valid format for another file should fail as a ZIP archive.

It would not be unreasonable to assertTrue and assertFalse as appropriate the result ofis_zipfile() on any test.zip file that a test is using (writing, reading, or both), valid or invalid, and any zip file contents being opened for read.

I've added a few checks throughout the test_zipfile.py script. If you see others you would like to check, let me know and I can add them.

facebook-github-bot pushed a commit to pytorch/pytorch that referenced this pull requestAug 30, 2019
Summary:The default implementation is lenient in that it recognizes a zipfile if the magic number appears anywhere in the archive. So if someone has the bytes `PK\x03\x04` in a tensor, it gets recognized as a zipfile. Seehttps://bugs.python.org/issue28494This implementation only checks the first 4 bytes of the file for the zip magic number. We could also copypython/cpython#5053 fix, but that seems like overkill.Fixes#25214](https://our.intern.facebook.com/intern/diff/17102516/)Pull Requestresolved:#25279Pulled By: driazatiDifferential Revision: D17102516fbshipit-source-id: 4d09645bd97e9ff7136a2229fba1d9a1bce5665a
@gpsheadgpshead self-assigned thisSep 9, 2019
@gpshead
Copy link
Member

FYI - This PR as is causeszipfile.is_zipfile to start rejecting some valid zipfiles thatzipfile.ZipFile() happily opens and properly processes. I'm looking into why and how to best update the test suite to reveal this...

@gpsheadgpshead added type-featureA feature request or enhancement staleStale PR or inactive for long period of time. labelsApr 20, 2022
@gpsheadgpshead removed their assignmentApr 20, 2022
@gpsheadgpshead self-requested a reviewApril 20, 2022 22:58
@thomaswaldmannthomaswaldmannmannequin mentioned this pull requestApr 10, 2022
@github-actionsgithub-actionsbot removed the staleStale PR or inactive for long period of time. labelJul 29, 2022
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actionsgithub-actionsbot added the staleStale PR or inactive for long period of time. labelFeb 23, 2023
@arhadthedev
Copy link
Member

I'm looking into why and how to best update the test suite to reveal this...

@gpshead ping

@arhadthedevarhadthedev changed the titlebpo-28494: Fix false positives when using zipfile.is_zipfile()gh-72680: Fix false positives when using zipfile.is_zipfile()Feb 25, 2023
@github-actionsgithub-actionsbot removed the staleStale PR or inactive for long period of time. labelFeb 26, 2023
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@gpshead
Copy link
Member

#134250 took this over.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@gpsheadgpsheadAwaiting requested review from gpshead

Assignees
No one assigned
Labels
awaiting reviewDO-NOT-MERGEstaleStale PR or inactive for long period of time.type-bugAn unexpected behavior, bug, or errortype-featureA feature request or enhancement
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

7 participants
@jjolly@gpshead@arhadthedev@ned-deily@the-knights-who-say-ni@ezio-melotti@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp