Version Control and “the 80%”
11/17/07: Before posting an angry comment about this post, please see the follow-up post!
Disclaimer: I’m going to make some crazy sweeping generalizations — ones which are based on my 12 years of observing the software development industry. I’m aware that I’m drawing some oversimplified stereotypes, but I think most of my peers who work in this industry will nod their head at some point, able to see the grains of truth in my characterizations.
Two Types of Programmers
There are two “classes” of programmers in the world of software development: I’m going to call them the 20% and the 80%.
The 20% folks are what many would call “alpha” programmers — the leaders, trailblazers, trendsetters, the kind of folks that places like Google and Fog Creek software are obsessed with hiring. These folks were the first ones to install Linux at home in the 90’s; the people who write lisp compilers and learn Haskell on weekends “just for fun”; they actively participate in open source projects; they’re always aware of the latest, coolest new trends in programming and tools.
The 80% folks make up the bulk of the software development industry. They’re not stupid; they’re merely vocational. They went to school, learned just enough Java/C#/C++, then got a job writing internal apps for banks, governments, travel firms, law firms, etc. The world usually never sees their software. They use whatever tools Microsoft hands down to them — usally VS.NET if they’re doing C++, or maybe a GUI IDE like Eclipse or IntelliJ for Java development. They’ve never used Linux, and aren’t very interested in it anyway. Many have never even used version control. If they have, it’s only whatever tool shipped in the Microsoft box (like SourceSafe), or some ancient thing handed down to them. They know exactly enough to get their job done, then go home on the weekend and forget about computers.
Shocking statement #1: Most of the software industry is made up of 80% programmers. Yes, most of the world is small Windows development shops, or small firms hiring internal programmers. Most companies have a few 20% folks, and they’re usually the ones lobbying against pointy-haired bosses to change policies, or upgrade tools, or to use a sane version-control system.
Shocking statement #2: Most alpha-geeks forget about shocking statement #1. People who work on open source software, participate in passionate cryptography arguments on Slashdot, and download the latest GIT releases are extremely likely to lose sight of the fact that “the 80%” exists at all. They get all excited about the latest Linux distro or AJAX toolkit or distributed SCM system, spend all weekend on it, blog about it… and then are confounded about why they can’t get their office to start using it.
I will be the first to admit that I completely lost sight of the 80% as well. When I was first hired by Collabnet to “design a replacement for CVS” back in 2000, my two collaborators and I were really excited. All the 20% folks were using CVS, especially for open source projects. We viewed this as an opportunity to win the hearts and minds of the open source world, and to especially attract the attention of all those alpha-geeks. But things turned out differently. When we finally released Subversion 1.0 in early 2004, guess what happened? Did we have flocks of 20% people converting open source projects to Subversion? No, actually, just a few small projects did that. Instead, we were overwhelmed with dozens of small companies tossing out Microsoft SourceSafe, and hundreds of 80% people flocking to our user lists for tech support.
Today, Subversion has now gone from “cool subversive product” to “the default safe choice” for both 80% and 20% audiences. The 80% companies who were once using crappy version control (or no version control at all) are now blogging to one another — web developers giving “hot tips” to each other about using version control (and Subversion in particular) to manage their web sites at their small web-development shops. What was once new and hot to 20% people has finally trickled down to everyday-tool status among the 80%.
The great irony here (as Karl Fogel points out in one of his recent OSCON slides) is that Subversion was originally intended to subvert the open source world. It’s done that to a reasonable degree, but it’s proven far more subversive in the corporate world!
Enter Distributed Version Control
In 2007, Distributed Version Control Systems (DVCS) are all the range among the alpha-geeks. They’re thrilled with tools like git, mercurial, bazaar-ng, darcs, monotone… and they view Subversion as a dinosaur. Bleeding-edge open source projects are switching to DVCS. Many of these early adopters come off as either incredibly pretentious and self-righteous (like Linus Torvalds!), or are just obnoxious fanboys who love DVCS because it’s new and shiny.
And what’s not to love about DVCS? It is really cool. It liberates users, empowers them to work in disconnected situations, makes branching and merging into trivial operations.
Shocking statement #3: No matter how cool DVCS is, anyone who tells you that DVCS is perfect for everyone is completely out of touch with reality.
Why? Because (1) DVCS has tradeoffs that are not appropriate for all teams, and (2) DVCS completely blows over the head of the 80%.
Let’s talk about tradeoffs first. While DVCS dramatically lowers the bar for participation in a project (just clone the repository and start making local commits!), it also encourages anti-social behavior. I already wrote a long essay about this (see The Risks of Distributed Version Control). In a nutshell: with a centralized system, people are forced to collaborate and review each other’s work; in a decentralized system, the default behavior is for each developer to privately fork the project. They have to put in some extra effort to share code and organize themselves into some sort of collaborative structure. Yes, I’m aware that a DVCS is able to emulate a centralized system; but defaults matter. The default action is to fork, not to collaborate! This encourages people to crawl into caves and write huge new features, then “dump” these code-bombs on their peers, at which point the code is unreviewable. Yes, best practices are possible with DVCS, but they’re not encouraged. It makes me nervous about the future of open source development. (Maybe the great liberation is worth it; time will tell.)
Second, how about all those 80% folks working in small Windows development shops? How would we go about deploying DVCS to them?
- Most DVCS systems don’t run on Windows at all.
- Most DVCS have no shell or GUI tool integrations; they’re command-line only.
- Most 80% coders find TortoiseSVN full of new, challenging concepts like “update” and “commit”. They often struggle to use version control at all; are you now going to teach them the difference between “pull” and “update”, between “commit” and “push”? Look me in the eyes and say that with a straight face.
- Corporations are inherently centralized entities. Not only is their power-structure centralized, but their shared resources are centralized as well.
- Managers don’t want 20 different private forks of a codebase; they want one codebase that they can monitor all activity on.
- Cloning a repository is bad for corporate security. Most corporations have an absolute need for access control on their code; sensitive intellectual property in specific parts of the repository is only readable/writeable by certain teams. No DVCS is able to provide fine-grained access control; the entire code history is sitting on local disk.
- Cloning is often unscalable for corporations. Many companies have huge codebases — repositories which are dozens or even hundreds of gigabytes in size. When a new developer starts out, it’s simply a waste of time (and disk space) to clone a repository that big.
Again, I repeat the irony: Subversion was designed for open source geeks, but the reality is that it’s become much more of a “home run”for corporate development. Subversion is centralized. Subversion runs on Windows, both client and server. Subversion has fine-grained access control. It has an absolutely killer GUI (TortoiseSVN) that makes version control accessible to people who barely know what it is. It integrates with all the GUI IDEs like VS.NET and Eclipse. In short, it’s an absolute perfect fit for the 80%, and it’s why Collabnet is doing so well in supporting this audience.
DVCS and Subversion’s Future
Most Subversion developers are well aware of the cool new ground being broken by DVCS, and there’s already a lot of discussion out there to “evolve” Subversion 2.0 in those directions. However, as Karl Fogel pointed out in a long email, the challenge before us is to keep Subversion simple, while still co-opting many of the features of DVCS. We will not forget about the 80%!
Subversion 1.5 is getting very close to a release candidate, and this fixes the long-standing DVCS criticism that “Subversion merging is awful”. Branching is still a constant-time operation, but you can now repeatedly merge one branch to another without searching history for the exact arguments you need. Subversion automatically keeps track of which changes you’ve merged already, and which still need merging. We even allow cherry-picking of changes. We’ve also got nice interactive conflict resolution now, so you can plug in your favorite Mercurial
merging tool and away you go. A portable patch format is also coming soon.
For Subversion 2.0, a few of us are imagining a centralized system, but with certain decentralized features. We’d like to allow working copies to store “offline commits” and manage “local branches”, which can then be pushed to the central repository when you’re online again. Our prime directive is to keep the UI simple, and avoid the curse of DVCS UI (which often have 40, 50, or even 100 different commands!)
We also plan to centralize our working copy metadata into one place, which will make many client operations much faster. We may also end up stealing Mercurial’s “revlog” repository format as a replacement for the severely I/O bottlenecked FSFS format.
A Last Plea
Allow me to make a plea to all the DVCS fanatics out there: yes, it’s awesome, but please have some perspective! Understand that all tools have tradeoffs and that different teams have different needs. There is no magic bullet for version control. Anyone who argues that DVCS is “the bullet” is either selling you something or utterly forgetting about the 80%. They need to pull their head out of Slashdot and pay attention to the rest of the industry.
Update, 10/18/07: A number of comments indicate that my post should have been clearer in some ways. It was never my intent to say that “Subversion is good enough for everyone” or that “most of the world is too dumb to use DVCS, so don’t use it.” Instead, I’m simply presenting a checklist — a list of obstacles that DVCS needs to overcome in order to be accepted into mainstream corporate software development. I have no doubt that DVCS systems will get there someday, and that will be a great thing. And I’m imploring DVCS evangelists to be aware of these issues, rather than running around thoughtlessly trashing centralized systems. 🙂
RE: James in post 23
The heirarchical model you describe is pretty much how the Linux kernel and many (if not most) open source projects work.
The point about choosing the best tool (i.e. VCS) for the job (i.e. your current project) is pretty much a truism. And one that fanatical supporters of any particular tool/product/project tend to forget. But I think conflating vocational programmers with incompetent (i.e. incapable of learning new tools) and dead wood (i.e. unwilling to learn tools) programmers was a mistake which insults the vocational programmers. I’m certainly in the alpha programmer group, but I’ve worked with vocational, incompetent and dead wood programmers.
The vocational programmers are perfectly capable of learning any VCS or pretty much any other new tool. Some know about best practices, and would like to learn the new tools and techniques; however, they won’t go out of their way to do so, nor will they take the initiative to convince management of the need for process changes. But they will quietly support the alpha programmers. Others vocational programmers are smart but risk averse, and thus don’t like change, and still others are ignorant of best practices. These two groups will likely become interested in best practices after seeing them used. If management are alpha programmers, or support the alpha programmer(s) then the vocational programmers will do what they’re told.
Some organiziations completely lack alpha programmers (perhaps processes are so bad they got fed up and found employement elsewhere…). In other places the dead wook programmers wield alot of influence (probably because they’re the “legacy system” experts), or the management themselves are dead wood programmers. And these are the places where resistance to change is pathological. And unfortunately, the programming departments in alot of large corporations are dominated by dead wood.
Actually, there are three types of programmers!
You forgot to mention the people that do version control by hand using dos batches that copy files between directories whose names contain version numbers.
Centralized VCS with sound features from DVCS (offline commit, local branch) is right way for Subversion.
For me, subversion’s vendor branch is hard to maintain. Hg+MQ is better. If subversion 2.0 has some sort of MQ like plugin, it is cool.
Your argument about Subversion fitting the needs of big corporations better makes sense, based on the need for fine-grained access control and preferring a centralized system. (It’s generally best to use the tools best suited to the job at hand!)
And your future changes proposed for SVN are also quite welcome. Subversion’s future looks bright indeed.
Your arguments about the risks of distributed version control, however, really make it sound like you’re grasping at straws. I almost stopped reading at that point, laughing and deciding SVN was dead in the water if this was the best argument its creator had to offer.
May I recommend expunging that paragraph? You’ll have a much stronger argument, and will alienate fewer people, and will be dismissed by fewer people who’ve never seen the terrible things happening that you’re describing as risks of systems that you probably don’t use.
Thanks, and good luck with SVN 2. I look forward to trying it out again at that point.
I think the whole premise is wrong, I think its more like:
80% > 15% > 5%
where the 15% is what everyone here calls the ‘20%’ the 5% are the REAL legends, these are the guys that don’t just use the latest tool,tech, they create them from scratch.
If they are given Source control for example, they want to create their own, from scratch.
Or given OpenGL, they want to write their own, but no its not enough. They dig even further they want to write graphics rasterisers in assembly, but even that is not enough, they dislike that fact that modern CPU (ALU) adder microelectronics are slow due to propagation delays, so they want to re-engineer them! I know I have, come to think of it, I think at this stage this is probaly the 1% so:
80% > 19% > 1%