by Steve Mallett
Version control systems are a tool close to any programmer's heart and a lot has been made of advancements in Subversion, but there is another version control system out there that completely redefines the boundaries of how such a system should work. Tom Lord is the author of the Arch Revision Control System. OSDir interviews Tom on the story behind Arch and just how different it is from what you're likely using today.
TAKAHASHI Tamotsu has translated our Interview with Tom Lord of Arch into Japanese. Thanks Tamo!
OSDir: What was the motivation behind writing Arch?
Tom Lord: First, when I was a working student, years and years ago, some of the people I respected, and was trying to learn from, were interested in a topic they called "programming in the large": the question of how to manage programming projects involving hundreds or thousands of programmers. I became interested in that problem and revision control is a subset of that problem.
Just a few years ago I had a stint consulting for a, now defunct, research lab at a major telecom company. My boss there had asked me to use CVS to make him less nervous. I couldn't. CVS just makes my skin crawl. Most of the time it just gets in the way, and the critical core things it's supposed to help with, like branching, merging, and tagging, are just painful and awkward. It's a horrible system. At that time, I did spend a few weeks on a skunk-works project to invent something better to make my boss less nervous by at least using some form of revision control. That little project failed and failed hard. I didn't understand, back then, the importance of branching and merging. I just wanted something to snapshot my source trees. I got that, but even as just a limited system it was way too slow and used up too much disk space even by my standards as a notorious disk-hog.
It was around that time that I noticed the Subversion project. I found it particularly interesting for two reasons: First, they were promising smart merging features. Secondly, they decided to make the storage management part of Subversion a transactional filesystem. I'm a big fan of the general idea of a transactional filesystem so Subversion sounded like a great project. I later came to regard Subversion as a very flawed design and implementation but it wasn't really a question for me back then... this was sufficiently early in the lifetime of Subversion so I couldn't really use it anyway.
OSDir: Critiquing is one thing, what prompted you to actually buckle down and get started coding?
Tom Lord: Some years later I was unexpectedly unemployed. This was after the .com bubble had burst so tech jobs were pretty hard to come by. Out of desperation, as much as anything, I thrashed around trying to find a free software project I could do quickly in order to try to salvage some semblance of a career to have something I could put on my resume and say "hey, see what I can do?" Maybe, if I got very lucky, I could have a project I could make money from directly.
During that search I again noticed Subversion and revisited the whole issue of revision control systems. I thought about the issues more than I had in the past: the requirements for storage management, avoiding using too much disk, what's needed from branching and merging, how to do atomic commits of whole trees, etc.
With all due respect to the Subversion folks, many of whom appear from over here to be very good hackers, I realized that they had completely blown it. Subversion is a dog. It's a horrible, horrible design based on a few very good ideas. It's hard, I think, for casual observers to recognize the problems with Subversion given the good ideas it contains, the high skill level of the developers, and the good job of project management they do, but the problems were enough to prevent me from just asking to join the Subversion project.
I realize that that's more than a little bit inflammatory so please let me qualify it a bit. I'm well aware that many people now use Subversion and are reasonably happy with it. In my view it's not an improvement over CVS because it takes too many steps backward in various areas --- but at the same time I recognize that in spite of those backwards steps, it also gets a few things right that CVS does not. If those are the only things you are really paying close attention to Subversion is from your perspective an improvement over CVS.
But for me, it was clear I wouldn't want the hassle of maintaining a Subversion archive in my environment: the admin burdens are too large. Worse than CVS it seemed to me, and it was clear I wouldn't want to hack on Subversion itself; the code was way too large and complex. I had very much wanted the smart merging features that they've long promised, but not yet delivered.
In short, part of what propelled me to write arch is just simple, old fashioned, ego-driven rivalry: I wanted to show those folks up.
OSDir: What was it about CVS you set out to fix?
Tom Lord: CVS is clunky junk. I don't think anyone seriously disagrees with that any more. Operations such as tagging are painfully slow. The interface to branching and merging is so anemic as to be almost, but not quite, useless. File renames are handled gracelessly. The network protocol is a dog. The remote server is flaky. Heck, local operation is flaky, too. The absence of atomic commits is problematic. The design of CVS's archival format makes it flaky and non-robust. Ever hear of the "zero-block over NFS" problem?
CVS has some strengths. It's a very stable piece of code largely because nobody wants to work on it anymore. It's a known quantity in the sense that many admins know how to set-up and babysit a CVS repository and many hackers know how to use it. The deep problems it has, such as lack of atomic commits, lousy branching, and merging support, can be worked around by a sufficiently disciplined team. The GCC project is, I think, the best public example of how to use CVS well.
So, CVS blows, but it's not like the sky is falling or anything. It's only that we could be doing much better. And, while the sky isn't falling, the widespread use of CVS in the free software community does put a sharp limit on how high we can fly.
One obvious example is the way that CVS, being a non-distributed system, distinguishes between committers and everyone else. Only committers get to make branches. Only committers get first-class help from CVS keeping a branch in sync with mainline. CVS creates a kind of class-distinction in the free software world that doesn't have to be there. Non-committers are second-class citizens. It's reasonable, of course, to allow only the actual maintainers of a project to commit changes to the mainline of that project. Without that class distinction there's no point to having maintainers at all. But it's not reasonable to deny non-committers all the other benefits of revision control.
Another obvious example is the network protocol. Talking to a CVS server over a fast LAN is harmless enough but, now, try talking to one across the internet, especially far away, and especially if the server is busy. This notion that you have to contact a central server to get anything done with a revision control system is a complete anachronism and just plain wrong for the modern world.
A less obvious example is the way that CVS makes file renaming, branching, tagging, and merging awkward and flaky. Those limits keep maintainers from doing their best work. Maintainers are reluctant to restructure a complicated source tree because that would mean renaming files and directories. So a certain degree of cruft in our projects never gets cleaned up. Maintainers are reluctant to work on branches because CVS makes that sufficiently painful so everything gets done on mainlines which therefore go through periods of flaky unreliability followed by "freeze-for-bug-fix" phases.
Ultimately, most users seem to wind up using CVS in the most simple-minded way: as a hub via which multiple programmers can all hack on a single tree. They don't get much more out of it than that. That's fine, but that's like 10% of the version control problem. That people are using CVS that way suggests that the shortcomings of the tool are limiting how programmers organize their projects.
OSDir: And so what about Subversion? What's left to address?
Tom Lord: Subversion fixes some CVS things. It has atomic commits, as does Arch, of course. It has a very CVS-like CLI. It has an incrementally better naming scheme for revisions and branches than CVS has. Arch has a better namespace too. Arch's has proven to be a little off-putting to newbies, but most experienced users think it's at least pretty good. Subversion is based on a very sexy idea; a transactional filesystem in which copying a file or even an entire tree is a very cheap operation.
Subversion is also worse, in my opinion, than CVS in some other areas. The implementation is too complicated, the use of BDB as the primary back end creates admin hassles... managing database log files and backups, having to run a recovery process or worse on the server whenever it fail. It's great when you're managing just a mainline, but if you start branching the poor merging support comes into play. Subversion requires a fairly heavyweight server and such distributed operation support as their is is being added as a kind of afterthought.
And the problems with Subversion are, in my opinion, pretty deeply rooted. The merging problems will be very hard to fix because they require some rethinking about the whole "branching is just tree-copying" paradigm. The BDB dependency can, in principle, be solved, but Subversion will probably never approach the elegant simplicity of Arch's archive storage management system. Subversion will probably always depend on server-side computation; a completely inappropriate choice for scalable development on a global scale.
This gets back to what prompted me to write arch. Having figured out how to do whole-tree diff' and patch' , I realized that there was a much, much simpler solution to revision control. One that had none of the problems of Subversion. I believe that arch already is, and will become more convincingly so over time, unambiguously better than CVS, as well as, Subversion and SVK.
OSDir: So should developers turn to using Arch?
Tom Lord: Look at the way the Linux kernel project works, at least for developers who are willing to drink the koolaid of Bit Keeper (BK) licensing. The BK-using kernel developers are using a system that has a lot in common with arch: good support for distributed development, changeset orientation, fairly painless branching and merging support. Like Arch, BK eliminates the class distinction between committers who get the benefits of revision control and non-committers who don't. And that community has embraced those features. They're ecstatic about them. Many of those developers are willing to speak out publicly about how much of an improvement to kernel development BK was. Evidently there is something about such tools that helps projects to function better.
That example isn't quite a scientific proof. But it is awfully suggestive. It suggests to me that the quality and capabilities of the tools we use to manage our software projects, especially the revision control tools, have a huge impact on the quality of the project itself. Right now, free software is looking awfully competitive against non-free in more and more domains. I think that if we want to sustain and increase that lead, thinking about how we run our projects, and what tools we use for that, is one of the important places to focus attention. We, the free software developer community, have proven that we're at least as smart as our proprietary competitors, we're larger in number, we're driven to produce great software and often succeed. It'll be a shame if we get tangled up in our own shoestrings by using poor tools to manage our projects and to implement our means of cooperation with one another.
OSDir: What's cooking for future releases?
Tom Lord: Arch already scales pretty darn well to handle huge trees and large projects but there's a little catch. The catch is that you have to configure your local environment with just a _little_ bit of care to get those benefits.
If you use arch naively, out of the box, taking no special steps, and you try to work on a very large and/or busy project, you'll run into some bumps in the road, performance-wise. In some sense, that's ok. Not all users run into the problems. Those that do can learn pretty quickly what to do about them. I think we can do a little better to make arch work just a bit more smoothly "out of the box".
Beyond just simplifying configuration we also have some tricks up our sleeves to speed up even a perfectly configured arch. So, today we're pretty fast in the ideal situation. Later this year we plan to be darn fast in the ideal situation.
Also, currently, the Arch command set has some notable differences from the CVS command set. For example, with some very limited exceptions, Arch encourages you to commit all the changes you've made to a given tree at once rather than, as in CVS, being able to commit some of these changes.
We currently have limited facilities that let you work in a more CVS-ish way, but we've recently figured out how to generalize those quite a bit. Hopefully well before the end of this year, arch will feel as similar to CVS as Subversion does, at least in those areas of overlapping functionality.
We also have some beginnings of arch GUIs. Here too I hope to see one of them mature and perhaps become distributed with Arch. One nice side effect of that is that it will clear a path for integrating arch with existing IDEs.
Steve Mallett is the founder and managing editor of OSDir.com, the care taker for opensource.org, general software wonk, and a dreadful programmer. You can read about his life in one light meaty snake at http://steve.osdir.com.