If you've paid any attention, you'll have noticed there's a ton of new
version control systems. It's a little odd, really -- the community
languished with CVS for such a long time. CVS got lots of really
important things right, but as we all know it also was a mess. Then
Subversion and Arch came along with two different models --
one based closely on CVS (Subversion), and one based on a new
distributed model. Arch seems crufty and difficult, or maybe it's
just that its designer has a somewhat crufty and difficult
personality... but clearly that branch of design has had an explosion
of implementations, where Subversion remains alone (though far more
successful).
People treat the benefit of distributed development as self-evident,
but I don't agree. The best real justification I've seen of the
centralized model is Greg Hudson's Why Bitkeeper Isn't Right For Free
Software,
which is still relevant since it wasn't about the BK license or
proprietary software. All the arguments against BK on the basis of
license have been clearly proven, but the model remains relevant. The
basic argument Greg makes is that the Linux development process is an
anomaly and doesn't apply to most projects.
But I think this can be taken further. There's nothing wrong with the
centralized model. There is something wrong with the way we are
using Subversion (and CVS before it). The wrongness isn't that you
need a server, or a network connection, or disk space. It's that you need commit privileges.
I see these issues as the important ones that source control can solve
for open source:
- Getting people's work off their hard disks, where it is dead and
useless to the community.
- Making work available and locatable by other developers.
- Making it easy to understand the changes made.
- Making it easy to integrate changes.
The distributed systems do some work on (1), usually by not needing a
"server" (except maybe for rsync and any web server). But frankly the
"server-less" systems they set up are usually much more complex in
practice than a single well-maintained server. Now that Subversion
has fixed many of its server problems (with fsfs among other things),
server maintenance is really not a problem. And we share the work
around; there are far fewer servers than developers, and that works
fine.
But more practically, I think distributed systems enable private work
in a way that is bad for the community. I think the private workflow
so touted by distributed systems is a total non-feature, even an
anti-feature. Open source development should happen in the open;
that's what people usually want to do, and that's what we should
encourage at every opportunity.
The distributed systems offer nothing for (2). Centralized systems
allow you to list the files and branches and whatnot. Subversion made
an important improvement on CVS by making the branching and tagging
very transparent, where it was somewhat invisible and mysterious in
CVS. That makes a real and practical difference in the usability of
branching. Distributed systems are a step back in this respect.
Honestly I don't know how distributed systems compare on (3) and (4).
Subversion could definitely be better, but I don't think that has
anything to do with centralization. I find handling patches very
difficult, but I think merging branches in Subversion is generally
easier, and with far more room for improvement. As an aside, I don't
see why the exchange of patches in email is even relevant in a usable
and complete system -- emailing files around is a crappy interface for
everyone. Making it less crappy is missing the point; email is not a good
file transfer protocol. But because Linus does everything in
email... sigh.
But centralized version control does need to become more open. The
difference between someone with commit access and some random
contributor should be reduced. Anonymous commits should be allowed
(or commits with a very low registration threshold -- strict anonymity
isn't really important here). The tools should be usable enough that
we can say "we don't accept patches, we only accept pointers to
branches in our repository". Maybe we could even say "we accept bug
reports, but we prefer bug reports in the form of commits into our
bug_example/ directory". We should stop using Wikis, and just use
web frontends to our version control.
There's great potential in version control -- but it's all in
usability and tools, security and scaling, not the mathematical appeal
of patch management algorithms.
People like to talk about the benefit of open source's
distributed development, but I think the communal
aspect is just as important. We all already know that a successful
open source developer must play well with others, much both follow and
lead in different projects, must be able to handle and resolve
personal and technical conflicts. We succede when we work in public;
so why should we be so drawn to version control that encourages
isolation, where everything is a fork?