Skip to content

Use partial (blobless) git clones and auto-recovery on fetch failure#729

Open
stratakis wants to merge 1 commit into
python:mainfrom
stratakis:save_me_some_diskspace
Open

Use partial (blobless) git clones and auto-recovery on fetch failure#729
stratakis wants to merge 1 commit into
python:mainfrom
stratakis:save_me_some_diskspace

Conversation

@stratakis
Copy link
Copy Markdown
Contributor

Enable filters=["blob:none"] for partial clones reducing .git/objects from ~800MB to ~250MB per build directory.

Enable clobberOnFailure so failed fetches wipe and re-clone instead of failing the build.

Enable filters=["blob:none"] for partial clones reducing
.git/objects from ~800MB to ~250MB per build directory.

Enable clobberOnFailure so failed fetches wipe and re-clone instead of
failing the build.
@stratakis
Copy link
Copy Markdown
Contributor Author

This will save a lot of space for newly created build dirs, so to see the benefits, the buildbot admins should clean up their build dirs.

Basically with this change on a worker which would have 8 builders, 4 active branches, it will save roughly 17GB of disk space, plus the pr_ directories would be smaller as well.

Plus when clobberOnFailure gets active it will reclone with the blob:none option so builders that somehow failed the fetch will get the benefit (although that would depend on fetching failure), but I think it's good to have this there nonetheless.

@vstinner
Copy link
Copy Markdown
Member

I didn't know --filter=blob:none option. The git-clone manual page says:

For example, --filter=blob:none will filter out all blobs (file contents) until needed by Git. Also, --filter=blob:limit=<size> will filter out all blobs of size at least <size>.

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@zware zware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely on board with this. I've been less than thrilled with the disk usage lately and have wondered about switching to a shallow clone freshly pulled every build, but that would entail a 45MB download on every build rather than a nearly 1GB download every now and then. Blobless (maybe even treeless?) seems like a nice middle ground, assuming everybody has a new enough git; thank you for beating me to figuring it out :)

@stratakis
Copy link
Copy Markdown
Contributor Author

I'm definitely on board with this. I've been less than thrilled with the disk usage lately and have wondered about switching to a shallow clone freshly pulled every build, but that would entail a 45MB download on every build rather than a nearly 1GB download every now and then. Blobless (maybe even treeless?) seems like a nice middle ground, assuming everybody has a new enough git; thank you for beating me to figuring it out :)

treeless should save like 100+mb per fetch, but I since it needs to fetch extra on every checkout and it will slowly accumulate anyway I don't think it's worth adding it there.

I believe the blobless one would be the best for the buildbots (but maybe in the future treeless could be explored?).

From https://github.blog/open-source/git/get-up-to-speed-with-partial-clone-and-shallow-clone/ :

git clone --filter=blob:none creates a blobless clone. These clones download all reachable commits and trees while fetching blobs on-demand. These clones are best for developers and build environments that span multiple builds.

git clone --filter=tree:0 creates a treeless clone. These clones download all reachable commits while fetching trees and blobs on-demand. These clones are best for build environments where the repository will be deleted after a single build, but you still need access to commit history.

git clone --depth=1 creates a shallow clone. These clones truncate the commit history to reduce the clone size. This creates some unexpected behavior issues, limiting which Git commands are possible. These clones also put undue stress on later fetches, so they are strongly discouraged for developer use. They are helpful for some build environments where the repository will be deleted after a single build.

@stratakis
Copy link
Copy Markdown
Contributor Author

Filled 2 RFEs for buildbot that could help with the situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants