Subversion over HTTP: soon with less suck!

This entry was posted by on Tuesday, 11 November, 2008 at

So everyone knows my job at Google is to tech-lead the team responsible for our Subversion servers, as part of our larger open-source project hosting service. Thus, having come to a reasonable temporary stopping point with my previous 20% project, I’ve turned to a new 20% project: making Subversion itself better.

Specifically, I want to right a wrong, undo something I’ve felt nasty about for years. Subversion’s HTTP protocol is very complicated and unintelligible to mere mortals. Honestly, if Greg Stein and I got hit by buses, nobody would really understand what’s going on inside mod_dav_svn. What’s the backstory here? Basically, we tried to make mod_dav_svn implement a reasonable subset of DeltaV, which was a mostly-failed spec written long ago to implement Clearcase^H^H^H^H version control over HTTP. Eight years later, this extra complexity hasn’t bought us any interoperability with other version control systems — just a big headache to maintain and a icky performance penalty. The Subversion client, in being a “good DeltaV citizen”, isn’t allowed to directly talk about URLs that represent revisions, transactions, historical objects, and so on. Instead, it has to play dumb and continually issue a series of requests to “discover” opaque URLs that represent these concepts. It’s sort of like the client playing a formal game of 20 Questions, when it already knows the answers.

So after some chats with Greg Stein and others, I’ve collected ideas on how to streamline our existing protocol into something much more simple, tight, and comprehensible. Way fewer requests too. You can read our evolving design document and send questions/feedback to dev@subversion.tigris.org. Subversion 1.6 is planned to be released at year’s end, so if we’re lucky we’ll see this new protocol in Subversion 1.7 next summer.

Speaking of Subversion 1.6, however: a smaller sort of glastnost is happening there as well. In this new spirit of HTTP openness, we’re officially ending our policy of “not telling people how to access older revisions” over HTTP. If you recall, the Subversion book has always said:

Q: Can I view older revisions?
A: Your web browser speaks ordinary HTTP only. That means it knows only how to GET public URLs, which represent the latest versions of files and directories. [...] To find an older version of a file, a client must follow a specific procedure to “discover” the proper URL; the procedure involves issuing a series of WebDAV PROPFIND requests and understanding DeltaV concepts. This is something your web browser simply can’t do.

I’m here to break the chains! Reveal the lies! RELEASE THE KRAKEN. In Subversion 1.6, we’ve gone and implemented an official public query syntax for accessing older (revision, path) coordinate pairs:


http://host/repos/path?r=REV

http://host/repos/path?p=PEGREV

http://host/repos/path?p=PEGREV&r=REV

This query syntax offers the same peg-revision concept that one sees in the Subversion commandline client. The first syntax means “start at PATH in the latest revision, then follow the object back in time to revision REV.” This works even if the object was renamed and exists at a different place in the older revision. The second syntax allows one to pinpoint an object with no history tracing: just jump to revision PEGREV, and find PATH. The third syntax is very much like running “svn subcommand -r REV path@PEGREV”: start at PEGREV, find PATH, then trace the object back into older revision REV.

In any case, this means source code browsers and other tools can stop using “secret” internal urls to access older objects.