Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Investigate separating test data from repository #5329

Open
Labels
keepItems to be ignored by the “Stale” Github Actiontopic: testing
@mdboom

Description

@mdboom

matplotlib includes its test data for image comparison tests in the git repository. Current HEAD is about 131MB of test data uncompressed. Not sure what the whole history of that data is, but it's a safe bet it's a significant fraction of the git repository.

There are some real advantages to this approach: The test data and the version of matplotlib they correspond to are easily syncronized by being in the same repo. The downside, of course, is the size of the repo.

There are a few alternative solutions I've been investigating, none of which seem to be the perfect answer, so I thought I'd open this up to a wider discussion.

git submodule: The test data would move to another repo (call it thetests repo), and the main repo has a special kind of symbolic link that points to a specific revision in thetests repo. Thetests repo is not cloned unless specifically asked for (git submodule update). The downside ofgit submodule is that a PR that requires both updating functionality in matplotlib and updating test data would have to be split into two PRs, one for each repo, and coordinated very carefully. The link in the matplotlib repo can not point to a revision in the fork of thetests repo, so it will fail until the PR for the tests repo is merged. In short:git submodule is awfully close to what we need, but it doesn't interact very well with the github PR workflow.

git subtree: Seems to avoid the extreme separation of reposgit submodule, and merges can take place involving both repos. However, it doesn't solve the problem of only cloning the test data if requested --git subtree s are always deeply cloned. Additionally,git submodule seems more appropriate if the two repos are separate projects usable on their own. I don't think that's the case here.

git annex: Allows to check in special links to the git repo instead of files. The files these links referred to can then be fetched or cleared as requested. The actual file contents can live a number of places, like a WebDAV server, or another git repo (which probably makes the most sense for us, to use free github hosting).git annex is a cool but fairly complex tool, but I think it's the closest to what we need.

Of course, none of this impacts how we distribute matplotlib, and more and more of our packages for end users just don't include the tests, and this is easy enough to do. So given the added complexity of all of the options above vs. the bandwidth and data costs of the status quo, I'm not sure it's obvious we should do anything. But, as I said, there might be some good solutions that come out of discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    keepItems to be ignored by the “Stale” Github Actiontopic: testing

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp