Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork939
Initial Migration of Fuzz Tests & Integration Scripts From the OSS-Fuzz Project Repo#1901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Initial Migration of Fuzz Tests & Integration Scripts From the OSS-Fuzz Project Repo#1901
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Migrates the OSS-Fuzz tests and setup scripts from the OSS-Fuzzrepository to GitPython's repo as discussed here:gitpython-developers#1887 (comment)These files include the changes that were originally proposed in:google/oss-fuzz#11763Additional changes include:- A first pass at documenting the contents of the fuzzing set up in a dedicated README.md- Adding the dictionary files to this repo for improved visibility. Seed corpra zips are still located in an external repo pending further discussion regarding where those should live in the long term.
Adds additional documentation links and fixes some typos.
- Updates the fuzzing documentation to include steps for working with locally modified versions of the gitpython repository.- Updates the build.sh script to make the fuzz target search path more specific, reducing the risk of local OSS-Fuzz builds picking up files located outside of where we expect them (for example, in a .venv directory.)- add artifacts produced by local OSS-Fuzz runs to gitignore
- Fix typos in the documentation on dictionaries- Link to the fuzzing directory in the main README where it is referenced.
Updates the gitpython project files to enable migrating and maintainingfuzz targets and build scripts upstream.Related PR in the upstream repo:gitpython-developers/GitPython#1901
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks so much for this wonderful PR! It's very well thought out and a delight. Particularly thefuzzing/README.md
is a great introduction to how it all works.>[!TIP]
is also new to me, and something I hope to be using more often in future.
About Fuzzing-Fixtures
Overall, the extra-files are just about 2MB in total, and I wonder how stable they will be. If stable, I could imagine including them in this repo.
If there is any concerns about this,gitpython-developers
could fork the repo and make you an admin on it.
About archives for publishing on PyPy
I created a new archive from the branch of this PR withpython -m build --sdist --wheel
and didn't see any trace offuzzing
in the produced archive.

It seems good.
Regarding the OSS fuzz PR
I didn't review it for correctness, but approved it in case it helps getting it merge.
Regarding my review
I just glanced at the shell scripts, and basically ignored the google-provided fuzz targets.
All that's left for me is to figure out where to put the ~2MB of corpus files.
And, thanks again for picking this up :)!
Uh oh!
There was an error while loading.Please reload this page.
Thanks, I appreciate the kind words 🙂 And yes,
I think a dedicated repo under the Ideally, a seed corpus would be added for each new fuzz target. Obviously that means more files but it doesn't necessarily mean "10 fuzz targets = 10mb". The size & number of interesting inputs will change over time and depending on the complexity/number of code paths in the functionality under test, and it's hard to know what that equates to in terms of per target corpus size at this point. The two zips in my repo right now actually demonstrate that pretty nicely: both of them basically top out their respective coverage quickly but one zip is ~450kb while the other is ~1.5mb. I like how Bitcoin Core handles inhttps://github.com/bitcoin-core/qa-assets. From what I've seen, that seems like the cleanest and most flexible.
Awesome thanks! |
This repo was created after discussion in PRgitpython-developers#1901.
Thanks@Byron! I've updated the URL to unblock this PR and I'll follow up with a PR to cleanup the new qa-assets repo following the pattern of the Bitcoin one. |
EliahKagan left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
At least one and possibly two changes will be needed to comply with the terms of theApache License. That license imposes requirements for distributing the licensed work, including, in subsection 4(a):
You must give any other recipients of the Work or Derivative Works a copy of this License
This is to say that merely preserving the license headers (even with other acknowledgement) is not sufficient, and the license text itself needs to be included somewhere in the GitPython repository.
It seems to me that the most hassle-free way to achieve this is to copy the license file from the repository the Apache 2.0 licensed Python scripts originally came from (so long as there is nothing unusual about it) into thefuzzing/
directory, and name itLICENSE-APACHE
.
The reason to put in thefuzzing/
directory rather than in thefuzzing/fuzz-targets/
subdirectory that contains those scripts, is that new files not covered by that license may end up in there later. So putting it there wouldn't really make it more specific. There is also a benefit to having it in a directory that also contains aREADME.md
that covers, among other code, the code it pertains to--especially if the details about what it covers are also moved from the main readme into that one, as I suggest in a review comment below. But it could go in thefuzzing/fuzz-targets/
subdirectory if you prefer.
It shouldn't go at the top level of the repository, because that could lead to confusion, and also because (at least without further changes) it would end up in built sdist and wheel packages. Currently nothing from thefuzzing/
directory is included in such packages; I have verified this (as has Byron).
The slightly trickier part is that the Apache License also requires that changes be noted. Per 4(b):
You must cause any modified files to carry prominent notices stating that You changed the files
I believe thatdoes apply here. The subtle aspect of this is that this is often not followed, and I think is not expected to be followed, when proposing changes for inclusion in an existing project that the files came from. For example, pull requests on a project that uses the Apache License often do not, and I think are not expected to, add any self-attribution, if the contributors decide they don't need to be attributed individually.
So I think anything that has formerly been published under this license in the OSS-Fuzz repository or other such places, if it included your modificationsthere, should not require that anything be added to put it here.
But it looks like your changes to the Apache licensed fileswere not ever included there. If this is where they are first published, then I think a note has to be added. If you want it to be a copyright notice naming you, I think that is fine. If not, then it could just say "This file has been modified by" followed by your name, or "GitPython contributors," or "contributors to GitPython," or something like that.
Please definitely do say if you think I am mistaken about any of that, or even if you are not sure but are worried I may be mistaken.
Besides the above, I think there are no other blockers for merging this. I've posted a number of comments below, some recommending other changes.
- A small number pertain to how the licensing situation is described. Unlike the above, none are important enough to delay merging this. However, those are changes I might not feel comfortable making separately without checking with you first. I recommend at least reading those, and they may lead to changes in this PR. But I emphasize that they do not need to stand in the way of this being merged.
- Most others are technical, and I believe at least one identifies a minor bug, but none are anywhere near serious enough that they need to delay merging,and you can even ignore them completely if you like. For these, if you happen to find any that recommend changes that you would prefernot be made, then you may want to state that, since unlike changes to how the licensing situation is documented, I would not otherwise feel a need to exercise any special caution when making such changes later. (You can also make changes in this PR based on them, but I am not asking you to do that and I don't want to delay this unnecessarily or push you into doing any further work on this that you may not wish to do at this time.)
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
shutil.rmtree(git_dir) | ||
os.mkdir(git_dir) | ||
with open(head_file, "w") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Unless there is a specific reason to do otherwise, I recommend specifying an encoding explicitly, for example,encoding="utf-8"
. Otherwise it is unclear if the reason it is omitted is just because it is expected not to make a practical difference across different platforms, or because a difference is actually desired. (I am guessing the reason is the former, because I don't think the fuzz tests run on Windows, which is where these issues usually come into play.) If omitting it is intentional, then it might be worthwhile to include a comment to state the reason.
python3 -m pip install . | ||
# Directory to look in for dictionaries, options files, and seed corpa: | ||
SEED_DATA_DIR="$SRC/seed_data" |
EliahKaganApr 15, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
IthinkSRC
comes from outside this script and is set in such a way that it is very unlikely to start with-
. I have refrained from mentioning places where--
should be added before arguments that have$SRC
as a prefix if the value ofSRC
can begin, even accidentally, with-
. Most commands accept--
, but not all, and I have no objection to omitting it on the occasion it istruly known not to be needed. I'll assume such comments may not be wanted, unless you request them. The same goes forcontainer-environment-boostrap.sh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ithink
SRC
comes from outside this script and is set in such a way that it is very unlikely to start with-
.
You are correct. I explained the OSS-Fuzz provided env vars a bit more in this comment:#1901 (comment). These variables are certainly a case where--
istruly known not to be needed, because if the omission of it results in these scripts breaking then most (more likely, all) of the other OSS-Fuzz projects would break as well.
That said, I wasn't particularly happy with having the"$SRC/seed_data"
directory specified in two separate scripts, where one implicitly depends on the other creating it. If nothing else, that feels typo prone.
Initially I considered having a third script that just defined the shared variable andsource
ing it in each, but that felt like overkill.
Ultimately, I think the best thing to do would be to remove the need for that directory all together. When I get around to the changes I described in a reply to you here:#1903 (comment), I should be able to do all of the dictionary & corpus zip creation insidecontainer-environment-bootstrap.sh
(i.e., during the Docker image build step) and put everything in$OUT/
directly from there. That would allow the removal of the"$SRC/seed_data"
directory completely, and also make the fuzz harness build step more efficient because it wouldn't need to produce the fixture artifacts every time.
I'm open to alternative suggestions though 🙂
I'll leave this comment unresolved for now for my own reference
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Thanks for the thorough review@EliahKagan! It's very helpful, and I'm particularly appreciative of the thoughtful explanations and suggestions you shared. I'll push some changes shortly to address your feedback, prioritizing the updates related to the license, so we can get this merged and unblock the downstream OSS-Fuzz PR. As for anything else that you explicitly noted as non-blocking: I'll either address them in a follow-up, or reply to in the |
Addresses feedback and encorperates suggestions from PRgitpython-developers#1901 to ensurethat the Apache License requirements are met for the two files that theyapply to, and the documentation pertaining to licensing of the files inthis repository is clear and concise.
Addresses feedback and encorperates suggestions from PRgitpython-developers#1901 to ensurethat the Apache License requirements are met for the two files that theyapply to, and the documentation pertaining to licensing of the files inthis repository is clear and concise.The contects of LICENSE-APACHE were coppied from the LICENSE file of theOSS-Fuzz repository that the two fuzz harnesses came from as of commit:https://github.com/google/oss-fuzz/blob/c2c0632831767ff05c568e7b552cef2801d739ff/LICENSE
321534c
to945a767
CompareDaveLak commentedApr 16, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@EliahKagan, commit945a767 addresses the blocking concerns regarding the licensing and documentation of Apache licensed files. Please see the diff of that commit for specifics. Sorry about the force-push; I wanted to make sure the OSS-Fuzz commit at the time I copied the Apache license file from was captured in the commit message.1 Regarding:
I did exactly that. I also checked the diff of the contents of the file included here againsthttps://www.apache.org/licenses/LICENSE-2.0.txt just to be sure; it looks good. Diff Results--- fuzzing/LICENSE-APACHE+++ fuzzing/LICENSE-APACHE-from-website-src@@ -1,3 +1,4 @@+ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/@@ -178,7 +179,7 @@APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following- boilerplate notice, with the fields enclosed by brackets "{}"+ boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a@@ -186,7 +187,7 @@same "printed page" as the copyright notice for easieridentification within third-party archives.- Copyright {yyyy} {name of copyright owner}+ Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. @@ -198,4 +199,4 @@ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and- limitations under the License.+ limitations under the License. \ No newline at end of file CC@Byron Footnotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for sorting this out!
It looks good to me, but I will wait for@EliahKagan to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks great, thanks!
FYI, I'm hesitant to push more commits here, including via the suggestion comments, because I don't want to keep making the review approvals stale. I'll reply to the remaining as necessary of course, but I'm going to start a new branch for them unless someone suggests otherwise. Lmk if either of you would prefer them on this PR directly. |
EliahKagan commentedApr 16, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I agree. I think this PR is in good shape to be merged, and further refinements can be made in one or more subsequent PRs. This is the case even for future adjustments, if any, that would affect how the licensing situation is described. Everything that was blocking has been fully resolved. |
Prefer executing these files using the OSS-Fuzz or `python` commandmethods outlined in the `fuzzing/README`.Based on feedback and discussion on:gitpython-developers#1901
This script is meant to be sourced by the OSS-Fuzz file of the samename, rather than executed directly. The shebang may lead to theincorrect assumption that the script is meant for direct execution.Replacing it with this directive instructs ShellCheck to treatthe script as a Bash script, regardless of how it is executed.Based@EliahKagan's suggestion and feedback on:gitpython-developers#1901
This script is executed directly, not sourced as is the case with`build.sh`, so it should have an executable bit set to avoid ambiguity.Based@EliahKagan's suggestion and feedback on:gitpython-developers#1901
- Make the link text for the OSS-Fuzz test status URL more descriptive- Fix capitalization of GitPython repository nameBased@EliahKagan's suggestion and feedback on:gitpython-developers#1901
Replaces the null character delimiter `-d $'\0'` with the simplerempty string `-d ''` in the fuzzing harness build loop.This changes leverages the Bash `read` builtin behavior to avoidunnecessary complexity and improving script readability.Based@EliahKagan's suggestion and feedback on:gitpython-developers#1901
Based@EliahKagan's suggestion and feedback on:gitpython-developers#1901
A misspelling in thehttps://github.com/gitpython-developers/qa-assetsrepository is still present here. It will need to be fixed in thatrepository first."corpora" is a difficult word to spell consistently I guess. This madefor a good opportunity to improve the phrasing of two other comments atat least.Based@EliahKagan's suggestion and feedback on:gitpython-developers#1901
FWIW@EliahKagan I tackled most of the simpler changes and put up a Draft PR:#1903 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Apologies, I read the last two comments too late.
However, what follows is still something I consider feasible, nothing is going to be stale. Instead, I prefer to have all the existing comments in one place and resolve conversations.
Never mind :D - see#1903 .
Thanks a lot for keeping working on the refinements. I thought I'd wait until the suggestions are committed (they can be batched), and a few more things are cleaned up.
Regarding the hardcoded/tmp
path that then expects a/tmp/.git
, I agree that this under no circumstances should be running in an unisolated environment. Since the documentation clearly states that docker is needed, andif it's unlikely to impossible to run outside (or to get to that line), then I think it's OK to leave it.
It seems to be common practice in the OSS-fuzz project as well, and portability is not their concern.
Thanks for bearing with us, this PR is already fantastic, and so is working with you on this!
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
e3fb1f2
intogitpython-developers:mainUh oh!
There was an error while loading.Please reload this page.
Thanks both! I updated the branch for#1903 and marked it ready for review. I agree the state of I still plan to share in a reply on the unresolved thread here, I just haven't had a chance to yet. I should be able to in a few hours. |
To be clear,Docker is not needed, so this can be the real, unisolated However, I realize I omitted a piece of my reasoning: Ithink it's easier to fix this after both this PR andgoogle/oss-fuzz#11803 are merged (the latter is not merged yet but looks like it will be merged soon), because that will make it easier to test the fix both outside and inside Docker. (The Docker case is, I believe, more important, since that's how the tests are automatically run on the OSS-Fuzz infrastructure, and running individual tests outside of Docker is, if I understand correctly, mostly of interest when working on the fuzz tests themselves.) For the same reason, I think it is justified for#1903 not to include a fix for this, since the changes in#1903 otherwise could be merged at any time and do not need to wait ongoogle/oss-fuzz#11803. |
DaveLak commentedApr 18, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Ok, sorry for the delay. Below are my thoughts regarding About the OSS-Fuzz EnvironmentYou both touched on it, but for the sake of shared understanding, the scripts and tests look like they do in part because of the OSS-Fuzz container environment implementation details. The key points to understand are:
Build Stage Execution Environment
Everything else is read-only. Fuzzer Execution Stage Environment
Regarding the Fuzz TestsWhen I initially started on this, my primary objective was really only about fixing the build. I did try to clean them up a bit in the process and apply some best practices, but those changes were mostly incidental and I intentionally avoided any significant refactors. After we decided to bring them here, my focus shifted to lowering the barrier to entry for contributors because of the nature of the project. In particular, I tried to distill the useful information from various place that I haven't seen documented in one place yet. One such tip is the direct execution section, but in hindsight I overlooked the environment assumptions in On |
Updates the gitpython project files to enable migrating and maintainingfuzz targets and build scripts upstream.Related PR in the upstream repo:gitpython-developers/GitPython#1901`project.yaml` updates:-@Byron, the maintainer of GitPython, is added as the primary contact.-@EliahKagan and myself are added to the `auto_ccs` list as discussedwith@Byron here:gitpython-developers/GitPython#1889 (comment)-@DavidKorczynski I removed what I believe is your email from the`vendor_ccs` because it looked like you were included as the defaultwhen no other contacts were listed. If this was a mistake on my part andyou want to remain listed as a CC, please let me know and I'll correctit.Thanks!
Thanks so much for the follow-up - I feel well-informed now! As OSS-fuzz strongly relies on docker, maybe it would be easiest to 'go with the flow' and provide a dockerfile as part of the As second step, one could see if improving the tempdir usage would fix certain issues that you have been seeing, including to make using OSS fuzz safe for running it directly. With that said, I also think that… I'd really want you to do more with |
Adds a Dockerfile to enable easily executing the fuzz targets directlyinside a container environment instead of directly on a host machine.This addresses concerns raised in PRgitpython-developers#1901 related to how `fuzz_tree.py`writes to the real `/tmp` directory of the file system it is executed onas part of setting up its own test fixtures, but also makes for aneasier to use development workflow.See this related comment on PRgitpython-developers#1901 for additional context:gitpython-developers#1901 (comment)
I've added a simple Dockerfile and I instructions for using it and a warning about not using Docker in#1904. Let me know what you think!
I'm planning to take a deeper at this. Even without the
I'm definitely interested in taking you up on that! I've been interested in learning more about Rust for a while and I've found working on fizzers a great way to get familiar with new projects and languages, so you can likely expect to see me pop up in that project sooner or later 😁 In other news, the downstream OSS-Fuzz PR was merged and the Gitpython builds are succeeding! https://introspector.oss-fuzz.com/project-profile?project=gitpython The Coverage build is still reporting failures, but that is expected until ClusterFuzz generates enough corpora to run the analysis which I expect will happen in the next day or two. Exciting to see! |
These files are already BSD-3-Clause even without the headers, butadding these comments and the `LICENSE-BSD` symlink to the root level`LICENSE` file are helpful to reinforce that there are only twoparticular files in the `fuzzing/` that are not under BSD-3-Clause.See:gitpython-developers#1901 (comment)
While discussing adding similar license comments to the shell scriptsintroduced in PRgitpython-developers#1901, it was noticed that the shell scripts in therepository root directory did not have such comments and suggested thatwe could add them when the scripts in the `fuzzing/` directory wereupdated, so this commit does just that.See:gitpython-developers#1901 (comment)
As discussed in the initial fuzzing integration PR[^1], `fuzz_tree.py`'simplementation was not ideal in terms of coverage and its reading/writing tohard-coded paths inside `/tmp` was problematic as (among other concerns), itcauses intermittent crashes on ClusterFuzz[^2] when multiple workers executethe test at the same time on the same machine.The changes here replace `fuzz_tree.py` completely with a completely new`fuzz_repo.py` fuzz target which:- Uses `tempfile.TemporaryDirectory()` to safely manage tmpdir creation and tear down, including during multi-worker execution runs.- Retains the same feature coverage as `fuzz_tree.py`, but it also adds considerably more from much smaller data inputs and with less memory consumed (and it doesn't even have a seed corpus or target specific dictionary yet.)- Can likely be improved further in the future by exercising additional features of `Repo` to the harness.Because `fuzz_tree.py` was removed and `fuzz_repo.py` was not derived from it,the Apache License call outs in the docs were also updated as they only apply tothe singe `fuzz_config.py` file now.[^1]:gitpython-developers#1901 (comment)[^2]:https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68355
Uh oh!
There was an error while loading.Please reload this page.
As discussed in#1889, this PR introduces the changes necessary to migrate the OSS-Fuzz fuzz tests and configuration scripts from the OSS-Fuzz repository into GitPython's repo.
Summary of Changes
Most of the changes proposed here are the same as described inthe discussion thread linked above. Beyond that, there are a few minor differences and potentially non-obvious details that I want to call out here:
Directory Structure
Initially, I proposed a single, flat "
fuzzing/
" directory at the top level, but when I moved the files, I felt that a few sub-directories would help with organization.The
fuzzing/READEME.md
file introduced in this PR has a section describing them and some additional documentation.Updates to Fuzz Targets
The fuzz test files in this PR include the changes that were originally proposed ingoogle/oss-fuzz#11763.
Seed Data
Part of efficiency improvements I proposed ingoogle/oss-fuzz#11763 included the addition of dictionary files (described in thenew README) andseed corpus zip files.
I stored these in a repo that I maintain:https://github.com/DaveLak/oss-fuzz-inputs/tree/main/gitpython, because OSS-Fuzz has understandable concerns about seed data bloating the size of their repo.
In this PR:
Next Steps
@Byron &@EliahKagan, please:
fuzzing/
directory doesn't get picked up by the publishing process, I'd appreciate it! I did what I could to test it locally but I'm not familiar enough with the release process to be confident.Thanks! I'll address feedback any feedback as quickly as possible.