Python Buzz Forum - Distributed vs. Centralized Version Control

If you've paid any attention, you'll have noticed there's a ton of new version control systems. It's a little odd, really -- the community languished with CVS for such a long time. CVS got lots of really important things right, but as we all know it also was a mess. Then Subversion and Arch came along with two different models -- one based closely on CVS (Subversion), and one based on a new distributed model. Arch seems crufty and difficult, or maybe it's just that its designer has a somewhat crufty and difficult personality... but clearly that branch of design has had an explosion of implementations, where Subversion remains alone (though far more successful).

People treat the benefit of distributed development as self-evident, but I don't agree. The best real justification I've seen of the centralized model is Greg Hudson's Why Bitkeeper Isn't Right For Free Software, which is still relevant since it wasn't about the BK license or proprietary software. All the arguments against BK on the basis of license have been clearly proven, but the model remains relevant. The basic argument Greg makes is that the Linux development process is an anomaly and doesn't apply to most projects.

But I think this can be taken further. There's nothing wrong with the centralized model. There is something wrong with the way we are using Subversion (and CVS before it). The wrongness isn't that you need a server, or a network connection, or disk space. It's that you need commit privileges.

I see these issues as the important ones that source control can solve for open source:

Getting people's work off their hard disks, where it is dead and useless to the community.
Making work available and locatable by other developers.
Making it easy to understand the changes made.
Making it easy to integrate changes.

The distributed systems do some work on (1), usually by not needing a "server" (except maybe for rsync and any web server). But frankly the "server-less" systems they set up are usually much more complex in practice than a single well-maintained server. Now that Subversion has fixed many of its server problems (with fsfs among other things), server maintenance is really not a problem. And we share the work around; there are far fewer servers than developers, and that works fine.

But more practically, I think distributed systems enable private work in a way that is bad for the community. I think the private workflow so touted by distributed systems is a total non-feature, even an anti-feature. Open source development should happen in the open; that's what people usually want to do, and that's what we should encourage at every opportunity.

The distributed systems offer nothing for (2). Centralized systems allow you to list the files and branches and whatnot. Subversion made an important improvement on CVS by making the branching and tagging very transparent, where it was somewhat invisible and mysterious in CVS. That makes a real and practical difference in the usability of branching. Distributed systems are a step back in this respect.

Honestly I don't know how distributed systems compare on (3) and (4). Subversion could definitely be better, but I don't think that has anything to do with centralization. I find handling patches very difficult, but I think merging branches in Subversion is generally easier, and with far more room for improvement. As an aside, I don't see why the exchange of patches in email is even relevant in a usable and complete system -- emailing files around is a crappy interface for everyone. Making it less crappy is missing the point; email is not a good file transfer protocol. But because Linus does everything in email... sigh.

But centralized version control does need to become more open. The difference between someone with commit access and some random contributor should be reduced. Anonymous commits should be allowed (or commits with a very low registration threshold -- strict anonymity isn't really important here). The tools should be usable enough that we can say "we don't accept patches, we only accept pointers to branches in our repository". Maybe we could even say "we accept bug reports, but we prefer bug reports in the form of commits into our bug_example/ directory". We should stop using Wikis, and just use web frontends to our version control.

There's great potential in version control -- but it's all in usability and tools, security and scaling, not the mathematical appeal of patch management algorithms.

People like to talk about the benefit of open source's distributed development, but I think the communal aspect is just as important. We all already know that a successful open source developer must play well with others, much both follow and lead in different projects, must be able to handle and resolve personal and technical conflicts. We succede when we work in public; so why should we be so drawn to version control that encourages isolation, where everything is a fork?


	Web Artima.com