Archive for category Subversion

Your Community is NOT Your Tools.

Posted by on Wednesday, 30 November, 2011

(Disclaimer: I’m one of the ‘old guard’ open source guys. I co-founded the Subversion project back in 2000 and am a proud member of the ASF. These opinions are my own.)

A very popular blog post has been going around lately called Apache Considered Harmful, which criticizes the Apache Software Foundation (ASF) for being impossible to work with. On the surface, it looks a bit like a culture war between older and younger generations of open source hackers: the older generation is portrayed as stodgy and skeptical of distributed version control systems, making the ASF inhospitable to a younger generation used to the fast-and-freewheeling world of git and Github.

One of the ASF’s leaders, Jim Jagielski, then wrote a blog response which seems to say, “We’re not irrelevant; we just have high integrity. We care about long-term health of open source projects, not passing fads or hip popularity contests.”

But I think Jim is truly missing the main complaint.

Backing up a bit: what is the mission of the ASF? Why does it exist? My understanding is simple:

  1. to be a legal umbrella of protection
  2. to foster long-term, healthy open-source communities

The first goal is achieved by putting all of a project’s code under the Apache license, and getting all code contributors to grant nonexclusive IP rights to the ASF. This guarantees that the ASF “owns” the code, and thus can legally defend it.

The second goal is about encouraging and preserving healthy culture. The ASF has a famous saying: “community over code”. In other words, the ASF doesn’t accept donations of code (or code thrown over walls), it only accepts communities that happen to work on a common codebase. The community is the main asset, not the source code.

The ASF has a great set of cultural norms that it pushes on its communities via political means and lightweight processes. For example, the ASF requires that each community have a set of stewards (“committers”), which they call a “project management committee”; that communities use consensus-based discussions to resolve disputes; that they use a standardized voting system to resolve questions when discussion fails; that certain standards of humility and respect are used between members of a project, and so on. These cultural traditions are fantastic, and are the reason the ASF provides true long-term sustainability to open source projects. It’s the reason I pushed so hard to get the Subversion project into ASF.

Let’s go back to the original “Apache Considered Harmful” post again. Yes, the blog post rambled a bit about the ASF becoming “irrelevant”, but I think that’s just random grumbling around the actual issue at stake: the ASF’s insistence on forcing their hosting infrastructure onto projects. We have repeated examples of mature open source communities trying to join the ASF, which already use git as their version control system — and the ASF is insisting that they convert to Subversion and store their code in the ASF’s One Big Subversion Repository.

I fear what’s happening here is that the ASF elders have tragically confused “be part of our community” with “you must use our infrastructure”. There is no reason for these things to be entangled.

The ASF has teams of people dedicated to running servers for Subversion, SSH, QA testing, email lists, and so on. Ten years ago, infrastructure hosting was a Hard Thing. Getting to use the ASF’s hosting services was considered an attractive perk. These days, project hosting is utterly commoditized: we have Sourceforge, Google Code, Github, and other sites. In a matter of minutes, any two people can conjure up a hosted source repository, bugtracker, wiki, etc. So is it really a surprise that newer communities, ready to join the ASF, already have functional (and possibly superior) tools and infrastructure?

So why oh why does the ASF demand everyone use their Subversion service? They don’t force every project to use the same bugtracker; I wonder if source code is different because it’s the “special” asset being protected. Perhaps the ASF elders think it has to all be in one place in order for it to be protectable and controlled? A simple solution here is to simply require that at least one canonical copy of source code be stored on ASF servers. If that means doing an “hg pull” or “git pull” via cron job every hour, so be it. Who cares where the real coding is happening, or in how many repositories it’s happening in? Irrelevant. As long as a community has blessed a central repository as Official, and the ASF is keeping a synced copy of that somewhere, we should be all set. The ASF’s job is to shepherd communities, not force everyone to use the same software tools.

Ironically, years ago I too was suspicious of distributed version control, and wrote an article about how it tended to discourage ASF-style project cohesion. But in this case, we have examples of communities that are already cohesive and high-functioning, despite using git. They don’t need ASF’s tools; they just need a nice place to park their community. If they ain’t broke, stay out of their development processes.

(Note the ASF isn’t alone in this insanity. Others have told me that FSF projects are forced to use the Savannah collaborative platform, whether they want to or not. Crazy! Repeat after me, folks: your community is not your tools.)

WANdisco, ur doin it rong

Posted by on Monday, 3 January, 2011

Author’s Note: These opinions are my own. I’m one of the original folks that started the Subversion project, but no longer work on it. These thoughts do not reflect the official position of either the Subversion project or the Apache Software Foundation, which are located here on the ASF blog.

Subversion has reached the realm of Mature software — it’s yesterday’s technology, not cool or hip to work on anymore. It moves slowly. It is developed almost entirely by engineers working for corporations that need it or sell support for it. Alpha-geeks consider software like this “dead”, but the fact is that something like half of all corporate programmers use Subversion as their SCM (depending on which surveys you read.) This is a huge userbase; it may not be sexy, but it’s entrenched and here for the long haul.

Subversion isn’t unique in this position. It sits alongside other mature software such as Apache HTTPD or the GCC toolchain, which are famous projects that are similarly developed by corporate interests. There’s a tricky line to walk: none of these corporations “own” these projects. They understand that they’re acting as part of a consortium. Each interest sends representatives to the open source project, contributes code, and allows their engineers to participate in the full consensus-based evolution of the software. IBM, Apple, Google, and numerous other companies have figured out how to do this correctly:

  1. Let your engineers know what’s important to work on.
  2. Let them participate individually in the community process as usual.
  3. Profit. 98% of the time the corporations eventually get the features they want.

Today, however, we have a great counterexample of how not to participate in an open source project. Subversion was initially funded and developed by CollabNet; today at least two other companies — Elego and WANdisco — are employing numerous engineers to improve Subversion, and are just as vested in selling support and derivative products. CollabNet and Elego continue to function normally in the community, but WANdisco recently seems to have lost its marbles. Last week, they put out a press release and a CEO blogpost making some crazy statements.

It’s clear that the WANdisco CEO — David Richards — is frustrated at the slow pace at which Subversion is improving. But the two posts are simply making outrageous claims, either directly or via insinuation. David seems to believe that a cabal is preventing Subversion from advancing, and that “debate” is the evil instrument being used to block progress. He believes users are crying for the product to be improved, that the Subversion developers are ignoring them, and his company is now going to ride in on a white horse to save the project. By commanding engineers to Just Fix things, he’ll “protect the future”of Subversion, “overhauling” Subversion into a “radical new” product.

Is this guy for real? It sounds like someone read my friend Karl’s book and created a farce of “everything you’re not supposed to do” when participating in corporate open source.

Even weirder, he’s accusing developers of trying game statistics by creating lots of trivial commits. This is staggering proof that he has no knowledge of the svn developer community or its culture. If he did, he would know that nobody counts stats at all or even cares about them. David appears so desperate to prove that his company is the “leader” that he accuses a community of behaviors that he’s doing himself. (”We have the most active developers of any other company on staff” — who’s counting stats here? The svn developers, or David?)

OK, fine. So Dave Richards is a salesperson, and perhaps what he wrote is generic PR sales junk in order to get his customers excited. Unfortunately, in attempting to woo customers, he’s had the side-effect of making his company appear both clueless and antagonistic to the project:

  • Clueless: It’s obvious he has no technical knowledge of Subversion’s design, has no idea why certain features have or haven’t been written yet, and hasn’t actually brought any new technical proposals or insights to the table. All he’s done is repeat descriptions of features that everybody wants. And he actually seems to believe that all one needs to do is throw more developers at the problems. Suuuuuure.
  • Antagonistic: He’s insulted two-thirds of the active developers (and embarrassed his own employees) by declaring them to be incompetant stewards. There’s no simpler way to garner hate and come off like an ass than to say “everyone move aside and let me fix this” — it’s the opposite of consensus-driven development. It’s a juvenile, conceited behavior that completely disrespects the people and the process.

The Subversion developer community (and ASF) are known for their cool, calm-headed responses to provocations like this, which they’ve just posted. They know not to feed trolls. But speaking as a private developer, I just had to point out WANdisco’s insanity and hold it up as a textbook example of how to Fail in the open source community process.

Subversion moving to the Apache Software Foundation

Posted by on Thursday, 5 November, 2009

It’s no longer a secret, but now a public press release.

Not that this should shock anybody, but in case you didn’t know, now you do. The overlap between Apache and Subversion communities has always been huge since day one — with essentially identical cultures. We’ve talked about doing this for years. It means we can finally dissolve the ‘Subversion corporation’ and let ASF handle all our finances and legal needs.

“Why didn’t this happen sooner? Why now?”, you may ask. There are several answers.

First, the intellectual property was scattered. Collabnet owned a huge chunk of it, but so did other corporations and a large handful of other random volunteers from the internet. The ASF requires software grants to join, and we didn’t have our eggs in one basket.

Second, when the Subversion project first developed legal needs a few years ago — and also started receiving money from Google’s Summer of Code — it was relatively easy to set up our own non-profit. It gave us a place for money to live, and an entity to defend the Subversion trademark from a number of abusive third parties.

But over time, running our own non-profit turned out to be an awkward time suck. So about a year ago I started focusing on collecting Contributor License Agreements (CLAs) from both individuals and corporations, including Collabnet itself. Once the IP was all concentrated in the Subversion Corporation, it freed us up to move to the ASF of dump all of the bureaucracy on them. 🙂

So this announcement is also a bit of a point of pride for myself. I’ve long stopped working on Subversion code, but I wanted to make sure the project was parked in a good place before I could really walk away guilt-free. I now feel like my “work is done”, and that the ASF will be an excellent long-term home for the project. This is exactly what the ASF specializes in: being a financial and legal umbrella for a host of communities over the long haul. The project is in excellent hands now.

Of course, Collabnet has always been the main supplier of “human capital” for the project in terms of full-time programmers writing code, and that’s not going to change as far as I can see. Collabnet deserves huge kudos for the massive financial investment (and risk) in funding this project for nearly 10 years, and it seems clear they’re going to continue to be the “center” of project direction and corporate support for years to come. And this pattern isn’t uncommon either: the Apache HTTPD Server itself is mostly made up of committers working on behalf of interested corporations.

What’s interesting to me, however, are all the comments on the net about how this is a “death knell” for Subversion — as though the ASF were some sort of graveyard. That seems like a very typical viewpoint from the open source universe — mistaking mature software like Apache or Subversion (or anything not new and shiny) for “old and crappy”. In my opinion, the open source world seems to ignore the other 90% of programmers working in tiny software shops that utterly rely on these technologies as foundational. Even though I’ve become a Mercurial user myself, I can assure you that these other products aren’t going away anytime soon!

Hm. I smell another talk here.

Subversion over HTTP: soon with less suck!

Posted by on Tuesday, 11 November, 2008

So everyone knows my job at Google is to tech-lead the team responsible for our Subversion servers, as part of our larger open-source project hosting service. Thus, having come to a reasonable temporary stopping point with my previous 20% project, I’ve turned to a new 20% project: making Subversion itself better.

Specifically, I want to right a wrong, undo something I’ve felt nasty about for years. Subversion’s HTTP protocol is very complicated and unintelligible to mere mortals. Honestly, if Greg Stein and I got hit by buses, nobody would really understand what’s going on inside mod_dav_svn. What’s the backstory here? Basically, we tried to make mod_dav_svn implement a reasonable subset of DeltaV, which was a mostly-failed spec written long ago to implement Clearcase^H^H^H^H version control over HTTP. Eight years later, this extra complexity hasn’t bought us any interoperability with other version control systems — just a big headache to maintain and a icky performance penalty. The Subversion client, in being a “good DeltaV citizen”, isn’t allowed to directly talk about URLs that represent revisions, transactions, historical objects, and so on. Instead, it has to play dumb and continually issue a series of requests to “discover” opaque URLs that represent these concepts. It’s sort of like the client playing a formal game of 20 Questions, when it already knows the answers.

So after some chats with Greg Stein and others, I’ve collected ideas on how to streamline our existing protocol into something much more simple, tight, and comprehensible. Way fewer requests too. You can read our evolving design document and send questions/feedback to Subversion 1.6 is planned to be released at year’s end, so if we’re lucky we’ll see this new protocol in Subversion 1.7 next summer.

Speaking of Subversion 1.6, however: a smaller sort of glastnost is happening there as well. In this new spirit of HTTP openness, we’re officially ending our policy of “not telling people how to access older revisions” over HTTP. If you recall, the Subversion book has always said:

Q: Can I view older revisions?
A: Your web browser speaks ordinary HTTP only. That means it knows only how to GET public URLs, which represent the latest versions of files and directories. […] To find an older version of a file, a client must follow a specific procedure to “discover” the proper URL; the procedure involves issuing a series of WebDAV PROPFIND requests and understanding DeltaV concepts. This is something your web browser simply can’t do.

I’m here to break the chains! Reveal the lies! RELEASE THE KRAKEN. In Subversion 1.6, we’ve gone and implemented an official public query syntax for accessing older (revision, path) coordinate pairs:


This query syntax offers the same peg-revision concept that one sees in the Subversion commandline client. The first syntax means “start at PATH in the latest revision, then follow the object back in time to revision REV.” This works even if the object was renamed and exists at a different place in the older revision. The second syntax allows one to pinpoint an object with no history tracing: just jump to revision PEGREV, and find PATH. The third syntax is very much like running “svn subcommand -r REV path@PEGREV”: start at PEGREV, find PATH, then trace the object back into older revision REV.

In any case, this means source code browsers and other tools can stop using “secret” internal urls to access older objects.

Subversion 1.5.0 released

Posted by on Thursday, 19 June, 2008

No, really. Seriously!

After nearly two years of work, it’s been released to the public. Semi-intelligent tracking of merges, sparse directories, interactive conflict resolution, and much much more all described here. I had previously posted about how easy it is to manage feature branches; now you can try it yourself.

Visit to download. (It make take a few days for volunteers to update binary packages from the source code.) Also, we’ve just about finished up the new edition of the Subversion Book, which now covers 1.5. You can read it online, or wait a couple of months to buy a hardcopy from O’Reilly.

Subversion 1.5 merge-tracking in a nutshell

Posted by on Saturday, 10 May, 2008

As I’ve mentioned in other posts, the Subversion project is on the verge of releasing version 1.5, a culmination of nearly two years of work. The release is jam-packed with some huge new features, but the one everyone’s excited about is “merge tracking”.

Merge-tracking is when your version control system keeps track of how lines of development (branches) diverge and re-form together. Historically, open source tools such as CVS and Subversion haven’t done this at all; they’ve relied on “advanced” users carefully examining history and typing arcane commands with just the right arguments. Branching and merging is possible, but it sure ain’t easy. Of course, distributed version control systems have now started to remove the fear and paranoia around branching and merging—they’re actually designed around merging as a core competency. While Subversion 1.5 doesn’t make it merging as easy as a system like Git or Mercurial, it certainly solves common points of pain. As a famous quote goes, “it makes easy things easy, and hard things possible.” Subversion is now beginning to match features in larger, commercial tools such as Clearcase and Perforce.

My collaborators and I are gearing up to release a 2nd Edition of the free online Subversion book soon (and you should be able to buy it from O’Reilly in hardcopy this summer.) If you want gritty details about how merging works, you can glance over Chapter 4 right now, but I thought a “nutshell” summary would make a great short blog post, just to show people how easy the common case now is.

  1. Make a branch for your experimental work:

    $ svn cp trunkURL branchURL
    $ svn switch branchURL

  2. Work on the branch for a while:

    # ...edit files
    $ svn commit
    # ...edit files
    $ svn commit

  3. Sync your branch with the trunk, so it doesn’t fall behind:

    $ svn merge trunkURL
    --- Merging r3452 through r3580 into '.':
    U button.c
    U integer.c

    $ svn commit

  4. Repeat the prior two steps until you’re done coding.
  5. Merge your branch back into the trunk:

    $ svn switch trunkURL
    $ svn merge --reintegrate branchURL
    --- Merging differences between repository URLs into '.':
    U button.c
    U integer.c

    $ svn commit

  6. Go have a beer, and live in fear of feature branches no more.

Notice how I never had to type a single revision number in my example: Subversion 1.5 knows when the branch was created, which changes need to be synced from branch to trunk, and which changes need to be merged back into the trunk when I’m done. It’s all magic now. This is how it should have been in the first place. 🙂

Subversion 1.5 isn’t officially released yet, but we’re looking for people to test one of our final release candidate source tarballs. CollabNet has also created some nice binary packages for testing, as part of their early adopter program. Try it out and report any bugs!

Version Control and the… Long Gradated Scale

Posted by on Tuesday, 27 November, 2007

My previous post about version control and the 80% deserves a follow-up post, mainly because it caused such an uproar, and because I don’t want people to think I’m an ignorant narcissist. Some people agreed with my post, but a huge number of people took offense at my gross generalizations. I’ve seen endless comments on my post (as well as the supporting post by Jeff Atwood) where people are either trying to decide if they’re in the “80%” or in the “20%”, or are calling foul on the pompous assertion that everyone fits into those two categories.

So let me begin by apologizing. It’s all too easy to read the post and think that my thesis is “80% of programmers are stupid mouth-breathing followers, and 20% are cool smart people like me.” Obviously, I don’t believe that. 🙂 Despite the disclaimer at the top of the post (stating that I was deliberately making “oversimplified stereotypes” to illustrate a point), the writing device wasn’t worth it; I simply offended too many people. The world is grey, of course, and every programmer is different. Particular interests don’t make you more or less “20%”, and it’s impossible to point to a team of coders within an organization and make ridiculous statements like “this team is clearly a bunch of dumb 80% people”. Nothing is ever so clear cut as that.

And yet, despite the fact that we’re all unique and beautiful snowflakes, we all have some sort of vague platonic notion of the “alpha geek”. Over time, I’ve come to my own sort of intuition about identifying the degree to which someone is an alpha-geek. I read a lot of resumes and interview a huge number of engineering candidates at work, and the main question I ask myself after the interview is: “if this person were independently wealthy and didn’t need a job at all, would they still be writing software for fun?” In other words, does the person have an inherent passion for programming as an art? That’s the sort of thing that leads to {open-source participation, writing lisp compilers, [insert geeky activity here]}. This is the basis for my super-exaggerated 80/20 metaphor in my prior post, and hopefully a less offensive way of describing it.

That said, my experience with the software industry is that the majority of people who write software for a living do not have a deep passion for the craft of programming, and don’t do it for fun. They consume and use tools written by other people, and the tools need to be really user-friendly before they get adopted. As others have pointed out, they need to just work out of the box. The main point I was trying to make was that distributed version control systems (DVCS) haven’t reached that friendliness point yet, and Subversion is only just starting to reach that level (thanks to clients like TortoiseSVN). I subscribe to a custom Google Alert about my corner of the software world, meaning that anytime Google finds a new web page that mentions Subversion or version control, I get notified about it. You would be simply astounded at the number of new blog posts I see everyday that essentially say “Hey, maybe our team should start using version control! Subversion seems pretty usable, have you tried it yet?” I see close to zero penetration of DVCS into this world: that’s the next big challenge for DVCS as it matures.

Others have pointed out that while I scream for DVCS evangelists not to thoughtlessly trash centralized systems like Subversion, I’m busy thoughtlessly trashing DVCS! I certainly hope this isn’t the case; I’ve used Mercurial a bit here and there, and perhaps my former assertions are simply based on old information. I had previously complained that most DVCS systems don’t run on Windows, don’t have easy access control, and don’t have nice GUI clients. Looking at wikipedia, I sure seem to be wrong. 🙂

Version Control and “the 80%”

Posted by on Tuesday, 16 October, 2007

11/17/07: Before posting an angry comment about this post, please see the follow-up post!

Disclaimer: I’m going to make some crazy sweeping generalizations — ones which are based on my 12 years of observing the software development industry. I’m aware that I’m drawing some oversimplified stereotypes, but I think most of my peers who work in this industry will nod their head at some point, able to see the grains of truth in my characterizations.

Two Types of Programmers

There are two “classes” of programmers in the world of software development: I’m going to call them the 20% and the 80%.

The 20% folks are what many would call “alpha” programmers — the leaders, trailblazers, trendsetters, the kind of folks that places like Google and Fog Creek software are obsessed with hiring. These folks were the first ones to install Linux at home in the 90’s; the people who write lisp compilers and learn Haskell on weekends “just for fun”; they actively participate in open source projects; they’re always aware of the latest, coolest new trends in programming and tools.

The 80% folks make up the bulk of the software development industry. They’re not stupid; they’re merely vocational. They went to school, learned just enough Java/C#/C++, then got a job writing internal apps for banks, governments, travel firms, law firms, etc. The world usually never sees their software. They use whatever tools Microsoft hands down to them — usally VS.NET if they’re doing C++, or maybe a GUI IDE like Eclipse or IntelliJ for Java development. They’ve never used Linux, and aren’t very interested in it anyway. Many have never even used version control. If they have, it’s only whatever tool shipped in the Microsoft box (like SourceSafe), or some ancient thing handed down to them. They know exactly enough to get their job done, then go home on the weekend and forget about computers.

Shocking statement #1: Most of the software industry is made up of 80% programmers. Yes, most of the world is small Windows development shops, or small firms hiring internal programmers. Most companies have a few 20% folks, and they’re usually the ones lobbying against pointy-haired bosses to change policies, or upgrade tools, or to use a sane version-control system.

Shocking statement #2: Most alpha-geeks forget about shocking statement #1. People who work on open source software, participate in passionate cryptography arguments on Slashdot, and download the latest GIT releases are extremely likely to lose sight of the fact that “the 80%” exists at all. They get all excited about the latest Linux distro or AJAX toolkit or distributed SCM system, spend all weekend on it, blog about it… and then are confounded about why they can’t get their office to start using it.

I will be the first to admit that I completely lost sight of the 80% as well. When I was first hired by Collabnet to “design a replacement for CVS” back in 2000, my two collaborators and I were really excited. All the 20% folks were using CVS, especially for open source projects. We viewed this as an opportunity to win the hearts and minds of the open source world, and to especially attract the attention of all those alpha-geeks. But things turned out differently. When we finally released Subversion 1.0 in early 2004, guess what happened? Did we have flocks of 20% people converting open source projects to Subversion? No, actually, just a few small projects did that. Instead, we were overwhelmed with dozens of small companies tossing out Microsoft SourceSafe, and hundreds of 80% people flocking to our user lists for tech support.

Today, Subversion has now gone from “cool subversive product” to “the default safe choice” for both 80% and 20% audiences. The 80% companies who were once using crappy version control (or no version control at all) are now blogging to one another — web developers giving “hot tips” to each other about using version control (and Subversion in particular) to manage their web sites at their small web-development shops. What was once new and hot to 20% people has finally trickled down to everyday-tool status among the 80%.

The great irony here (as Karl Fogel points out in one of his recent OSCON slides) is that Subversion was originally intended to subvert the open source world. It’s done that to a reasonable degree, but it’s proven far more subversive in the corporate world!

Enter Distributed Version Control

In 2007, Distributed Version Control Systems (DVCS) are all the range among the alpha-geeks. They’re thrilled with tools like git, mercurial, bazaar-ng, darcs, monotone… and they view Subversion as a dinosaur. Bleeding-edge open source projects are switching to DVCS. Many of these early adopters come off as either incredibly pretentious and self-righteous (like Linus Torvalds!), or are just obnoxious fanboys who love DVCS because it’s new and shiny.

And what’s not to love about DVCS? It is really cool. It liberates users, empowers them to work in disconnected situations, makes branching and merging into trivial operations.

Shocking statement #3: No matter how cool DVCS is, anyone who tells you that DVCS is perfect for everyone is completely out of touch with reality.

Why? Because (1) DVCS has tradeoffs that are not appropriate for all teams, and (2) DVCS completely blows over the head of the 80%.

Let’s talk about tradeoffs first. While DVCS dramatically lowers the bar for participation in a project (just clone the repository and start making local commits!), it also encourages anti-social behavior. I already wrote a long essay about this (see The Risks of Distributed Version Control). In a nutshell: with a centralized system, people are forced to collaborate and review each other’s work; in a decentralized system, the default behavior is for each developer to privately fork the project. They have to put in some extra effort to share code and organize themselves into some sort of collaborative structure. Yes, I’m aware that a DVCS is able to emulate a centralized system; but defaults matter. The default action is to fork, not to collaborate! This encourages people to crawl into caves and write huge new features, then “dump” these code-bombs on their peers, at which point the code is unreviewable. Yes, best practices are possible with DVCS, but they’re not encouraged. It makes me nervous about the future of open source development. (Maybe the great liberation is worth it; time will tell.)

Second, how about all those 80% folks working in small Windows development shops? How would we go about deploying DVCS to them?

  • Most DVCS systems don’t run on Windows at all.
  • Most DVCS have no shell or GUI tool integrations; they’re command-line only.
  • Most 80% coders find TortoiseSVN full of new, challenging concepts like “update” and “commit”. They often struggle to use version control at all; are you now going to teach them the difference between “pull” and “update”, between “commit” and “push”? Look me in the eyes and say that with a straight face.
  • Corporations are inherently centralized entities. Not only is their power-structure centralized, but their shared resources are centralized as well.
    • Managers don’t want 20 different private forks of a codebase; they want one codebase that they can monitor all activity on.
    • Cloning a repository is bad for corporate security. Most corporations have an absolute need for access control on their code; sensitive intellectual property in specific parts of the repository is only readable/writeable by certain teams. No DVCS is able to provide fine-grained access control; the entire code history is sitting on local disk.
    • Cloning is often unscalable for corporations. Many companies have huge codebases — repositories which are dozens or even hundreds of gigabytes in size. When a new developer starts out, it’s simply a waste of time (and disk space) to clone a repository that big.

Again, I repeat the irony: Subversion was designed for open source geeks, but the reality is that it’s become much more of a “home run”for corporate development. Subversion is centralized. Subversion runs on Windows, both client and server. Subversion has fine-grained access control. It has an absolutely killer GUI (TortoiseSVN) that makes version control accessible to people who barely know what it is. It integrates with all the GUI IDEs like VS.NET and Eclipse. In short, it’s an absolute perfect fit for the 80%, and it’s why Collabnet is doing so well in supporting this audience.

DVCS and Subversion’s Future

Most Subversion developers are well aware of the cool new ground being broken by DVCS, and there’s already a lot of discussion out there to “evolve” Subversion 2.0 in those directions. However, as Karl Fogel pointed out in a long email, the challenge before us is to keep Subversion simple, while still co-opting many of the features of DVCS. We will not forget about the 80%!

Subversion 1.5 is getting very close to a release candidate, and this fixes the long-standing DVCS criticism that “Subversion merging is awful”. Branching is still a constant-time operation, but you can now repeatedly merge one branch to another without searching history for the exact arguments you need. Subversion automatically keeps track of which changes you’ve merged already, and which still need merging. We even allow cherry-picking of changes. We’ve also got nice interactive conflict resolution now, so you can plug in your favorite Mercurial
merging tool and away you go. A portable patch format is also coming soon.

For Subversion 2.0, a few of us are imagining a centralized system, but with certain decentralized features. We’d like to allow working copies to store “offline commits” and manage “local branches”, which can then be pushed to the central repository when you’re online again. Our prime directive is to keep the UI simple, and avoid the curse of DVCS UI (which often have 40, 50, or even 100 different commands!)

We also plan to centralize our working copy metadata into one place, which will make many client operations much faster. We may also end up stealing Mercurial’s “revlog” repository format as a replacement for the severely I/O bottlenecked FSFS format.

A Last Plea

Allow me to make a plea to all the DVCS fanatics out there: yes, it’s awesome, but please have some perspective! Understand that all tools have tradeoffs and that different teams have different needs. There is no magic bullet for version control. Anyone who argues that DVCS is “the bullet” is either selling you something or utterly forgetting about the 80%. They need to pull their head out of Slashdot and pay attention to the rest of the industry.

Update, 10/18/07: A number of comments indicate that my post should have been clearer in some ways. It was never my intent to say that “Subversion is good enough for everyone” or that “most of the world is too dumb to use DVCS, so don’t use it.” Instead, I’m simply presenting a checklist — a list of obstacles that DVCS needs to overcome in order to be accepted into mainstream corporate software development. I have no doubt that DVCS systems will get there someday, and that will be a great thing. And I’m imploring DVCS evangelists to be aware of these issues, rather than running around thoughtlessly trashing centralized systems. 🙂

Monitoring Subversion

Posted by on Thursday, 8 February, 2007

A confession: for many months now, I’ve been using the Google Alerts service to help me tune into general net buzz about Subversion. If you haven’t tried the service, I really recommend it. It’s like having a personal PR agent sending you updates about whatever topic you want. Each day, I get an email from Google showing all the latest web pages it’s found that mention all the words “subversion”, “version”, “control”. What I mostly see are blogs talking about people’s experiences with Subversion, but that’s still really interesting stuff. If you have your own pet topic that you want to monitor, give it a try.

In any case, allow me to make one bold statement to the public: the name of the system is “Subversion”, not “SubVersion”. There is no capital V, and there never has been. It’s kind of amazing to me how many times I see “SubVersion” written in these alert-emails.

What I’ve Really Been Working on at Google

Posted by on Friday, 28 July, 2006

It’s been a long week. After almost a year at Google, our team finally released our new service to the world at OSCON this week in Portland. Of course, I didn’t actually get to attend the conference; I spent four days holed up in a smelly hotel room with three other team-mates, trying to prepare for public launch. This lovely photo shows us like the caged animals we are.

My co-worker Fitz and I did manage to escape the hotel room twice to present two talks. We’ve been working together so long now that we can actually finish each other’s sentences, so we’ve turned this abliity into a presentation gimmick. We stand in front of crowds each holding a microphone, riffing on slide bullets back and forth. I’m not sure if it’s comedic or just novel, but we get really great feedback. One blogger did an amazing writeup of our first talk, and another blogger practically republished the slides of our second talk. A third blogger said that our first talk was the “best session of the day”. Woo!

In any case, the real news is our product announcement.

Long ago I made post describing What I Do at Google. Namely, I work on the Open Source team, whose mission is to promote open source software development however we can: create more of it, and make it better. Working for a large company like Google, we have two main resources: (1) money, and (2) a massive data-serving infrastructure. Given these resources, how can we accomplish our mission? Our team has been using money in a few ways. We make financial donations to important open source projects, and sometimes even pay people to work full-time on them. We also fund the hugely successful Summer of Code program, which pays hundreds students to work on open source software as ‘virtual summer internship’; the result is not only accelerated open source development across the board, but a whole new generation of open source programmers.

But it’s the massive data-serving infrastructure that we’ve not really used much — until now. Our big product announcement was Open Source Project Hosting on our team’s main website, You can read all the gritty details in our FAQ, and there are many blogs and news posts that talk about our new service, as well as folks posting screen shots.

The general jist of the service is: come to our site and create an open source project. There’s no approval process. Within a few seconds, your project is ready to go. You get a super-simple front page which describes your project, an issue tracker, and a Subversion repository to store your source code.

No, this is not a new concept. There are many project-hosting sites out there such as Sourceforge and Tigris. Our intent isn’t to compete directly with these sites, but rather to provide more options to open source developers; we feel that we’ve got a fresh new take on project hosting, and we’re excited to see how developers make use of it. Following the motto on our front page, we’ve “released early” (our service is admittedly quite spartan right now) but also plan to “release often” — there are a whole slew of new features planned over the next several months.

Let me draw your attention to the two main features we’re providing at launch. The issue tracker (written by a team-mate of mine) is unlike any I’ve ever seen. Like many Google products, it’s extremely fast and AJAX-y, has cool customizable views, and has a simple Gmail-like interface. Instead of forcing users to fill out myriads of fields, the issue entry is simple and uncluttered. Developers can invent any sort of arbitrary ‘labels’ to describe an issue, much like Gmail labels. The issue tracker then uses Google search technology to search over all of the data in a free-form fashion. There’s just one search box. I really think it’s a whole new approach to issue tracking applications, and we hope it’s useful to open source developers.

However, the thing which I’ve been working on for the last ten months (with Fitz) is the Subversion hosting feature. A typical Subversion repository has two different back-end options for storing your code: either a BerkeleyDB database or a FSFS (flat filesystem) store. What Fitz and I have done is write a new Google-specific back-end which stores the code in our datacenters, specifically in an internal technology called Bigtable. Jeff Dean (another Googler) has given public presentations about Bigtable, and you can read more about this technology by Googling for it. It’s much like a gigantic spreadsheet for holding data, but it can run over thousands of machines spread over mulitple datacenters. So just like other Google services (picasaweb, Gmail, Calendar, etc.), your data gets put into a massively scalable, redundant, and reliable system. I know that in the past, I’ve been personally frustrated when my own Subversion server goes down; it can be a lot of work to manage your own repository. Instead of being distracted with that, let Google Code be your free host, and get on with coding!

I know that some of my closer friends may wonder if I’ve turned to the Dark Side. Aren’t I the same guy who worked on open-source Subversion for five years? Who wrote a free book about Subversion? Why have I spent time all this time writing a proprietary (!) extension to an open source system? My answer is that it’s all about seeing the forest through the trees. Yes, I wish that our new Subversion back-end were open-sourceable, but it’s not a realistic possibility. Google has amazing hardware and a complex mountain of software to make use of it effectively, but it’s all one proprietary ecosystem. Releasing this new Subversion back-end as open source would be meaningless, since it’s not something that can function outside of Google. That said, the point here is the larger, longer-term benefit: by providing free, highly-available Subversion repositories to the world at large, we open to “embiggen” open source everywhere. (Yes, that’s the word used by my team leader, Chris DiBona). 🙂

To technical friends: I hope this post has been enlightening. To my nontechnical friends and family: hope that wasn’t too much babble!