This post originated from an RSS feed registered with Java Buzz
by Bill de hÓra.
Original Post: What a DVCS gets you (maybe)
Feed Title: Bill de hÓra
Feed URL: http://www.dehora.net/journal/atom.xml
Feed Description: FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8
Czajnik responding to Jeff Atwood's howto on setting up Subversion on windows: "Have you tried any distributed source control? I've recently
switched from Subversion to Mercurial, and I'm very happy about the
change. The most important reason was the ability to clone the repository to
my laptop, do some checkins there (without network access, on a plain,
train, etc.) and resync with just one command when I'm back home.
Distributed model seems cool even for a single developer :)"
Jeff Atwood responding to Czajnik: "What is the difference between what you describe, and working
traditionally offline, then checking in when you get back into the
office? If a "checkin" occurs on your local machine, and nobody else in
the world knows about it.. did it really happen? Maybe I'm not
understanding the distinction here. I still need to watch the rest of
Linus Torvalds' presentation on this topic ( http://www.youtube.com/watch?v=4XpnKHJAok8 )"
I don't know that Torvald's presentation on Git is going to explain much (Randall Schwartz's Git tutorial is much better). I agree with Jeff when he says elsewhere it will take years to see distributed version control systems (DVCS) being generally adopted. But having switched to a DVCS a while back (mercurial) coming from Subversion and before that CVS, let me lay out three things I think you get when you use a distributed model:
Better branching, thus control over code
speed, thus ease of use
Better IDEs (potentially)
But the short version is this - distributed version control is the general case.
Better branching
This one needs some justification, when you think of the
costs associated with branching and the folklore around them - industry
consensus is that branching is a bad thing, a neccessary evil.
A while back I said, "branching
isn't just a process or code duplication matter to avoid - it's
inevitable - as soon as you check out code or locked a file, you've
branched - checking back in *is* a merge operation." That was in the context of using mercurial as an offline supplement to perforce.
Saying that a local sandbox is a branch operationally can mean a few things,
update is a merge operation from another codeline,
checkin is a merge operation to another codeline,
you want to update and checkin frequently to avoid drifting.
Centralised systems like perforce, cvs, svn, do not treat checkouts as branching operations. As a result centralised
systems support branching in a limited sense - that's often why branching is a
bad thing. But
once you accept this model of having your sandbox under version
control, a lot of the pain (and fear) of dealing with branches
evaporates. Passing around changesets and patches becomes normal and logical.
Centralised VCS also results in a bias towards the server as the single point of truth
- your local sandbox can get messed up via conflicts, but a centralised model doesn't ever allow you to check in conflicted files. If the local
merge after update fails you have to cleanup conflicts manually. This points to a limitation in centralised version control systems -
the developer local history of changes is not preserved. It is as though you have a
maintenance/dev branch where every time you commit to the branch,
the checkin is routed to the code line where the branch was taken
from. That means no branch history is kept, ever. The information is
thrown away. And if your version of the file prior to the merge is
never versioned, that in turn that means any post-facto work or cleanup
of mistakes has to been dealt with manually. You can't go back through
the history.
I have more than once seen a developer effectively stranded, where they can't checkin because they can't integrate locally without a lot of pain, and it's too late to cut a branch after the fact. Their sandbox is going to get hosed before they'll able to checkin. I've seen it enough that I'm inclined to think it's not a training/skill problem, it's a tooling problem. I also suspect it kills innovation and experimentation on codelines - when branching is that heavyweight and problematic, what's the incentive?
The branching point takes some wind out of Subversion/CVS - before I used a DVCS, I had always felt Subversion's model was superior as it put branches into the physical space not some time dimension you can't see (I find explaining branching via CVS/Perforce hard).
Speed
This one is easy. Once you start using a DVCS for local work, going back to a centralised model feels slow, as in your mind wanders and breaks flow, which is the worst kind of slow. Sometimes if something can be made much faster, it becomes a matter of improved usability rather than technology - think broadband compared to dialup, or being able to run your unit tests in 20 seconds. If the basic versioning operations all become sub-second, this has the potential to impact your workflow for the better. The speed point takes some wind out of Perforce, which is the speed king as far as centralised models go (although it results in an ugly tradeoff with you having to tell the server what files you're working on).
Better IDEs (potentially)
Jeff also has a comment on using the IDE to do versioning: "This is sort of a religious issue. Some developers believe source
control should *never* be done inside the IDE, and I've started to see
their point after dealing with the many, many bugs in the Team Explorer
Visual Studio integration point of Team System."
Long bet: all IDE-local versioning tooling will come to use a DVCS internally, probably one that supports renaming operations (for refactoring support). Using a real VCS instead of a private library is likely to be a good thing, as it opens up the toolchain, , in much the same way that all Java IDEs eventually supporting Ant/Maven directly did.