Re: Why Git is so fast (was: Re: Eric Sink's blog
- notes on git, dscms and a "whole
product" approach)

On Fri, 1 May 2009, Jeff King wrote:
> Thanks for the analysis; what you said makes sense to me. However, there
> is at least one case of somebody complaining that git doesn't scale as
> well as perforce for their load:

So we definitely do have scaling issues, there's no question about that. I
just don't think they are about enterprise network servers vs the more
workstation-oriented OSS world..

I think they're likely about the whole git mentality of looking at the big
picture, and then getting swamped by just how _huge_ that picture can be
if somebody just put the whole world in a single repository..

With perforce, repository maintenance is such a central issue that the
whole p4 mentality seems to _encourage_ everybody to put everything into
basically one single p4 repository. And afaik, p4 basically works mostly
like CVS, ie it really ends up being pretty much oriented to a "one file
at a time" model.

Which is nice in that you can have a million files, and then only check
out a few of them - you'll never even _see_ the impact of the other
999,995 files.

And git obviously doesn't have that kind of model at all. Git
fundamnetally never really looks at less than the whole repo. Even if you
limit things a bit (ie check out just a portion, or have the history go
back just a bit), git ends up still always caring about the whole thing,
and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one
_huge_ repository. I don't think that part is really fixable, although we
can probably improve on it.

And yes, then there's the "big file" issues. I really don't know what to
do about huge files. We suck at them, I know. There are work-arounds (like
not deltaing big objects at all), but they aren't necessarily that great

I b...

et we could probably improve git large-file behavior for many common
cases. Do we have a good test-case of some particular suckiness that is
actually relevant enough that people might decide to look at it (and by
"people", I do mean myself too - but I'd need to be somewhat motivated by
it. A usage case that we suck at and that is available and relevant).

