by Mike Mason
CVS is the defacto standard for Open Source version control. SourceForge, the largest collection of Open Source projects in the world, uses CVS to manage project source code, meaning thousands of developers are familiar with the sytem. But with CVS showing its age, a new source control tool is set to take on the world. After over three years in development, Subversion 1.0 has just been released.
This guide is aimed at CVS users who want to know more about the new features in Subversion, but it is also useful for someone familiar with source control systems who wants to find out what Subversion offers. We'll be talking about the philosophy and design behind Subversion, how it improves upon CVS, and how to get started using it.
The Subversion project started in earnest in February 2000, when CollabNet offered Karl Fogel a full time job developing a replacement for CVS. Karl Fogel and Jim Blandy had previously founded Cyclic Software which provides commercial support for CVS. Karl's 1999 book, Open Source Development with CVS, has since gone on to a third edition and is one of the most comprehensive guides to CVS available. The Subversion development team has grown to include half a dozen full time developers and hundreds of volunteers, all collaborating over the Internet.
Subversion was designed from the ground up as a modern, high-performance version control system. In contrast to CVS, which had grown organically from shell scripts and RCS, Subversion carries no historical baggage. Subversion takes advantage of a proper database backend, unlike CVS which is file based. The Subversion team have tried to make the new system similar in feel to CVS, so users are immediately at home with using it. Most of the features of CVS including tagging, branching and merging, are implemented in Subversion, along with host of new features:
- versioning support for directories, files and meta-data
- history tracking across moves, copies and renames
- truly atomic commits
- cheap branching and merging operations
- efficient network usage
- offline diff and revert
- efficient handling of binary files
Subversion uses a few new terms to describe its operation, most of which correspond to familiar CVS concepts.
The central store for all of the files under revision control, similar to a CVS depot
A set of files checked out of Subversion and stored on a user's local disk, similar to a CVS working directory.
A set of changes to the repository, committed at once and associated with a single log message. In Subversion, a revision number applies to the whole repository, whilst in CVS, each file has a revision number.
A Subversion tag is a read-only copy of a directory tree, whilst in CVS a tag, or label, is a property of each file in the depot. Both can be used to retrieve a "known state" of the files under revision control.
Arbitrary, possibly binary, meta-data associated with a file. Properties are versioned in the same way as a file's contents. There is no equivalent in CVS.
Subversion is implemented on top of Berkeley DB, a high-performance database, which allows operations to be efficiently performed on large numbers of files. CVS accesses a separate file on the server for each file under revision control, meaning that some operations -- most notably history and tagging -- can be extremely slow. The underlying database allows Subversion to be much better at handling multiple users than CVS, without the need for slow, unwieldy, lockfiles. The database backend also allows Subversion to provide atomic commits, a feature described below. The Subversion server can also produce a consistent, reliable backup without needing to be shut down.
Directories, Renaming and Meta-Data
Subversion records the state of your files in a repository, which keeps track of the history of both files and directories. CVS cannot track history for a directory, whereas directories in Subversion are versionable objects just like files. Subversion also provides the ability for the user to track arbitrary information about each file and directory. Properties support is a powerful feature that allows the user to attach metadata to any versionable object in the repository. The metadata is itself versioned just like files and directories. Properties are used to store things like mime-type, whether the file should be executable, and so on. Because properties can be user defined, they make Subversion extensible -- it's even possible to store binary data in a property, such as a thumbnail for a graphics file.
In CVS, manual hacking of the repository is required to move a file without losing revision history. In contrast, move, copy and rename operations have first class support under Subversion, which retains the history of these operations as well as branches and merges.
Atomic Commits and Changesets
When a user commits a change to a Subversion repository, the whole change is applied or is rolled back, and it is only visible to other users once completed. Changes to the repository work just like a transaction in a database, where a commit happens atomically (all at once) or not at all. In CVS a commit alters each file in turn until it completes. If a network connection goes down during a commit, a CVS repository can be left partially changed, often leaving the code in an unusable state. Furthermore, if a user updates their working directory whilst another user is committing a change, they may retrieve a partial commit from the CVS server. Subversion solves both of these problems.
Subversion's atomic commit mechanism keeps changes together in one logical group with a single commit message, called a revision or changeset, and assigns a revision number to it. The revision number actually applies to the whole repository, and can be used to describe and recreate the state of the repository at any point in time. A developer can say, "I fixed that in revision 646" and another developer can immediately see exactly what they changed. With CVS, revision history and commit messages are stored per-file -- there's no way to see what other files were changed without a lengthy scan of the repository. In contrast, history browsing with Subversion is fast.
Changesets will be familiar to anyone who has used systems such as Perforce, BitKeeper or Arch, and the notion is extremely powerful. By grouping changes to multiple files into a single logical unit, developers are able to better organise and track their changes. Changesets are an effective communication tool as well -- when discussing patches to be applied to a particular branch of development, all that is needed is a single revision number. Everyone is clear exactly what logical change is being discussed. It also makes it easy to apply a change, with no need to resort to long timestamps, which can often be incorrect.
Cheap Branching and Merging
Because Subversion uses a database to store the repository, it can perform "lazy copies" of files very quickly. Subversion uses lazy copies for branches and tags; you simply copy your trunk to a new location in the tree, with Subversion tracking the common history. Very little space is used up on the server, and the operation happens almost instantly. Since CVS stores history in versioned files, making a branch requires that all the files in the repository are accessed and updated. In Subversion, a tag is simply a branch that is read-only and never updated.
By using its lazy copy system, Subversion makes merging changes very fast. It can track changes to all branches using a simple database query and generate precise merge information very quickly.
Efficient Network Usage
The Subversion designers noticed that whilst disk space has vastly increased over the years, available network bandwidth has not. Subversion exploits this by storing pristine copies of repository files in a user's working copy, and using them to reduce network usage wherever possible. Subversion can diff and revert files without accessing the network, and sends diff information both when updating from the server, and when commiting changes back to the server. In contrast, CVS always sends the whole file back to the server when performing a commit.
Subversion has a clean, layered design, allowing multiple mechanisms for accessing the repository. The simplest is the file:// protocol, which accesses a repository on the local disk. The svnserve program provides a simple server for networking a repository using the svn:// protocol, much like CVS' pserver, although svnserve never sends passwords in plain text. For users requiring more security, or who wish to leverage existing SSH infrastructure, Subversion can use the svn+ssh:// protocol, which tunnels a connection over secure shell. This is similar to tunnelling a CVS connection over SSH.
Subversion can also leverage the Apache webserver to network a repository using the standard http and https protocols. It's important to realise you don't need Apache to use Subversion, but if you want to put a repository on the web you can take advantage of Apache's security and authorisation mechanisms to do so.
When combined with Apache, Subversion uses the WebDAV and DeltaV protocols to describe the versioned file system. Many operating systems contain support for these protocols -- for example Windows XP's "web folders" can access a Subversion repository.
Efficient Handling of Binary Files
CVS separately stores each revision of a binary file, meaning 10 revisions of a 100k file will always take up nearly a megabyte of disk space on the server. Subversion stores all files in a binary representation and uses an efficient binary diff algorithm to compute differences between them. This means multiple revisions of binary files take up a lot less space on the server. Storing files as binary also avoids many problems with line-endings in text files. Subversion can be configured to translate line-endings both when updating from and committing to the repository.
True Cross-Platform Support
Subversion is available on a wide variety of platforms, as you'd expect from a modern application written in portable Ansi C. Binaries are available for Windows, many flavours of Linux, Solaris, and even Apple's OSX. If there isn't a package for your operating system, you can download the source code and compile it fairly easily. Where Subversion really beats CVS is in its robust server support on Windows. A Subversion server will run happily on Windows 2000, 2003 and XP, whereas you'll have a very hard time persuading CVS to run its server on anything other than Unix. This support significantly lowers the barrier to entry for using Subversion and allows it to make inroads even in "Windows-only" development teams.
There are a wide variety of tools available for CVS, making it probably the most supported version control software available. For Subversion to really be a compelling replacement for CVS, its tool support must also be good. Fortunately a large number of tools are already available:
- TortoiseSVN is a Windows client for Subversion that integrates right into Explorer. Your files become colour coded to indicate their status (up to date, modified, in conflict), and most Subversion operations can be invoked using a simple right-click. TortoiseSVN comes with a graphical diff and merge tool to help you track changes.
- ViewCVS allows you to view a CVS repository in a web browser, listing files, branches, tags, and history. Happily, ViewCVS already supports Subversion repositories, but make sure you get an up to date version. Alternatively, if you're using Apache with Subversion, you can browse your repository straight away using a web browser.
- For IDE integration, Subclipse provides Subversion support for the popular Open Source Eclipse platform and AnkhSVN plugs into Visual Studio. For more general Windows integration, there are several projects developing a Microsoft SCC provider implementation. This allows most Windows development environments to talk to a Subversion repository.
- For developers interested in Agile and XP, CruiseControl and CruiseControl.NET both already support Continuous Integration against source code stored in Subversion.
- subversion.tigris.org is the official home of Subversion. You can find out more about the project, download packages for your operating system, and read the FAQ here.
- svnbook.red-bean.com holds the official Subversion book, which is due to be published mid-2004. This is the main reference manual for Subversion, and will answer most questions you may have.
- firstname.lastname@example.org is the main mailing list for getting help from other Subversion users. Post a message here if you can't find an answer in the FAQ or the book. For realtime chat with actual people, try IRC channel #svn on irc.freenode.net.
- collab.net funds development and hosting of Subversion and employs several full-time Subversion developers.
Mike Mason is a consultant with ThoughtWorks, where he's administered CVS, Perforce and Subversion servers. ThoughtWorks is a global leader in delivering complex custom software solutions for the world’s largest companies. Operating across five countries with offices in America, the United Kingdom, Australia, Canada and India, ThoughtWorks employs some of the best IT professionals and embraces the use of open source software to arrive at higher quality solutions faster and more cost-effectively than traditional software consultancies.