Version Control and “the 80%”
11/17/07: Before posting an angry comment about this post, please see the follow-up post!
Disclaimer: I’m going to make some crazy sweeping generalizations — ones which are based on my 12 years of observing the software development industry. I’m aware that I’m drawing some oversimplified stereotypes, but I think most of my peers who work in this industry will nod their head at some point, able to see the grains of truth in my characterizations.
Two Types of Programmers
There are two “classes” of programmers in the world of software development: I’m going to call them the 20% and the 80%.
The 20% folks are what many would call “alpha” programmers — the leaders, trailblazers, trendsetters, the kind of folks that places like Google and Fog Creek software are obsessed with hiring. These folks were the first ones to install Linux at home in the 90′s; the people who write lisp compilers and learn Haskell on weekends “just for fun”; they actively participate in open source projects; they’re always aware of the latest, coolest new trends in programming and tools.
The 80% folks make up the bulk of the software development industry. They’re not stupid; they’re merely vocational. They went to school, learned just enough Java/C#/C++, then got a job writing internal apps for banks, governments, travel firms, law firms, etc. The world usually never sees their software. They use whatever tools Microsoft hands down to them — usally VS.NET if they’re doing C++, or maybe a GUI IDE like Eclipse or IntelliJ for Java development. They’ve never used Linux, and aren’t very interested in it anyway. Many have never even used version control. If they have, it’s only whatever tool shipped in the Microsoft box (like SourceSafe), or some ancient thing handed down to them. They know exactly enough to get their job done, then go home on the weekend and forget about computers.
Shocking statement #1: Most of the software industry is made up of 80% programmers. Yes, most of the world is small Windows development shops, or small firms hiring internal programmers. Most companies have a few 20% folks, and they’re usually the ones lobbying against pointy-haired bosses to change policies, or upgrade tools, or to use a sane version-control system.
Shocking statement #2: Most alpha-geeks forget about shocking statement #1. People who work on open source software, participate in passionate cryptography arguments on Slashdot, and download the latest GIT releases are extremely likely to lose sight of the fact that “the 80%” exists at all. They get all excited about the latest Linux distro or AJAX toolkit or distributed SCM system, spend all weekend on it, blog about it… and then are confounded about why they can’t get their office to start using it.
I will be the first to admit that I completely lost sight of the 80% as well. When I was first hired by Collabnet to “design a replacement for CVS” back in 2000, my two collaborators and I were really excited. All the 20% folks were using CVS, especially for open source projects. We viewed this as an opportunity to win the hearts and minds of the open source world, and to especially attract the attention of all those alpha-geeks. But things turned out differently. When we finally released Subversion 1.0 in early 2004, guess what happened? Did we have flocks of 20% people converting open source projects to Subversion? No, actually, just a few small projects did that. Instead, we were overwhelmed with dozens of small companies tossing out Microsoft SourceSafe, and hundreds of 80% people flocking to our user lists for tech support.
Today, Subversion has now gone from “cool subversive product” to “the default safe choice” for both 80% and 20% audiences. The 80% companies who were once using crappy version control (or no version control at all) are now blogging to one another — web developers giving “hot tips” to each other about using version control (and Subversion in particular) to manage their web sites at their small web-development shops. What was once new and hot to 20% people has finally trickled down to everyday-tool status among the 80%.
The great irony here (as Karl Fogel points out in one of his recent OSCON slides) is that Subversion was originally intended to subvert the open source world. It’s done that to a reasonable degree, but it’s proven far more subversive in the corporate world!
Enter Distributed Version Control
In 2007, Distributed Version Control Systems (DVCS) are all the range among the alpha-geeks. They’re thrilled with tools like git, mercurial, bazaar-ng, darcs, monotone… and they view Subversion as a dinosaur. Bleeding-edge open source projects are switching to DVCS. Many of these early adopters come off as either incredibly pretentious and self-righteous (like Linus Torvalds!), or are just obnoxious fanboys who love DVCS because it’s new and shiny.
And what’s not to love about DVCS? It is really cool. It liberates users, empowers them to work in disconnected situations, makes branching and merging into trivial operations.
Shocking statement #3: No matter how cool DVCS is, anyone who tells you that DVCS is perfect for everyone is completely out of touch with reality.
Why? Because (1) DVCS has tradeoffs that are not appropriate for all teams, and (2) DVCS completely blows over the head of the 80%.
Let’s talk about tradeoffs first. While DVCS dramatically lowers the bar for participation in a project (just clone the repository and start making local commits!), it also encourages anti-social behavior. I already wrote a long essay about this (see The Risks of Distributed Version Control). In a nutshell: with a centralized system, people are forced to collaborate and review each other’s work; in a decentralized system, the default behavior is for each developer to privately fork the project. They have to put in some extra effort to share code and organize themselves into some sort of collaborative structure. Yes, I’m aware that a DVCS is able to emulate a centralized system; but defaults matter. The default action is to fork, not to collaborate! This encourages people to crawl into caves and write huge new features, then “dump” these code-bombs on their peers, at which point the code is unreviewable. Yes, best practices are possible with DVCS, but they’re not encouraged. It makes me nervous about the future of open source development. (Maybe the great liberation is worth it; time will tell.)
Second, how about all those 80% folks working in small Windows development shops? How would we go about deploying DVCS to them?
- Most DVCS systems don’t run on Windows at all.
- Most DVCS have no shell or GUI tool integrations; they’re command-line only.
- Most 80% coders find TortoiseSVN full of new, challenging concepts like “update” and “commit”. They often struggle to use version control at all; are you now going to teach them the difference between “pull” and “update”, between “commit” and “push”? Look me in the eyes and say that with a straight face.
- Corporations are inherently centralized entities. Not only is their power-structure centralized, but their shared resources are centralized as well.
- Managers don’t want 20 different private forks of a codebase; they want one codebase that they can monitor all activity on.
- Cloning a repository is bad for corporate security. Most corporations have an absolute need for access control on their code; sensitive intellectual property in specific parts of the repository is only readable/writeable by certain teams. No DVCS is able to provide fine-grained access control; the entire code history is sitting on local disk.
- Cloning is often unscalable for corporations. Many companies have huge codebases — repositories which are dozens or even hundreds of gigabytes in size. When a new developer starts out, it’s simply a waste of time (and disk space) to clone a repository that big.
Again, I repeat the irony: Subversion was designed for open source geeks, but the reality is that it’s become much more of a “home run”for corporate development. Subversion is centralized. Subversion runs on Windows, both client and server. Subversion has fine-grained access control. It has an absolutely killer GUI (TortoiseSVN) that makes version control accessible to people who barely know what it is. It integrates with all the GUI IDEs like VS.NET and Eclipse. In short, it’s an absolute perfect fit for the 80%, and it’s why Collabnet is doing so well in supporting this audience.
DVCS and Subversion’s Future
Most Subversion developers are well aware of the cool new ground being broken by DVCS, and there’s already a lot of discussion out there to “evolve” Subversion 2.0 in those directions. However, as Karl Fogel pointed out in a long email, the challenge before us is to keep Subversion simple, while still co-opting many of the features of DVCS. We will not forget about the 80%!
Subversion 1.5 is getting very close to a release candidate, and this fixes the long-standing DVCS criticism that “Subversion merging is awful”. Branching is still a constant-time operation, but you can now repeatedly merge one branch to another without searching history for the exact arguments you need. Subversion automatically keeps track of which changes you’ve merged already, and which still need merging. We even allow cherry-picking of changes. We’ve also got nice interactive conflict resolution now, so you can plug in your favorite Mercurial
merging tool and away you go. A portable patch format is also coming soon.
For Subversion 2.0, a few of us are imagining a centralized system, but with certain decentralized features. We’d like to allow working copies to store “offline commits” and manage “local branches”, which can then be pushed to the central repository when you’re online again. Our prime directive is to keep the UI simple, and avoid the curse of DVCS UI (which often have 40, 50, or even 100 different commands!)
We also plan to centralize our working copy metadata into one place, which will make many client operations much faster. We may also end up stealing Mercurial’s “revlog” repository format as a replacement for the severely I/O bottlenecked FSFS format.
A Last Plea
Allow me to make a plea to all the DVCS fanatics out there: yes, it’s awesome, but please have some perspective! Understand that all tools have tradeoffs and that different teams have different needs. There is no magic bullet for version control. Anyone who argues that DVCS is “the bullet” is either selling you something or utterly forgetting about the 80%. They need to pull their head out of Slashdot and pay attention to the rest of the industry.
Update, 10/18/07: A number of comments indicate that my post should have been clearer in some ways. It was never my intent to say that “Subversion is good enough for everyone” or that “most of the world is too dumb to use DVCS, so don’t use it.” Instead, I’m simply presenting a checklist — a list of obstacles that DVCS needs to overcome in order to be accepted into mainstream corporate software development. I have no doubt that DVCS systems will get there someday, and that will be a great thing. And I’m imploring DVCS evangelists to be aware of these issues, rather than running around thoughtlessly trashing centralized systems.