Version Control and “the 80%”
11/17/07: Before posting an angry comment about this post, please see the follow-up post!
Disclaimer: I’m going to make some crazy sweeping generalizations — ones which are based on my 12 years of observing the software development industry. I’m aware that I’m drawing some oversimplified stereotypes, but I think most of my peers who work in this industry will nod their head at some point, able to see the grains of truth in my characterizations.
Two Types of Programmers
There are two “classes” of programmers in the world of software development: I’m going to call them the 20% and the 80%.
The 20% folks are what many would call “alpha” programmers — the leaders, trailblazers, trendsetters, the kind of folks that places like Google and Fog Creek software are obsessed with hiring. These folks were the first ones to install Linux at home in the 90’s; the people who write lisp compilers and learn Haskell on weekends “just for fun”; they actively participate in open source projects; they’re always aware of the latest, coolest new trends in programming and tools.
The 80% folks make up the bulk of the software development industry. They’re not stupid; they’re merely vocational. They went to school, learned just enough Java/C#/C++, then got a job writing internal apps for banks, governments, travel firms, law firms, etc. The world usually never sees their software. They use whatever tools Microsoft hands down to them — usally VS.NET if they’re doing C++, or maybe a GUI IDE like Eclipse or IntelliJ for Java development. They’ve never used Linux, and aren’t very interested in it anyway. Many have never even used version control. If they have, it’s only whatever tool shipped in the Microsoft box (like SourceSafe), or some ancient thing handed down to them. They know exactly enough to get their job done, then go home on the weekend and forget about computers.
Shocking statement #1: Most of the software industry is made up of 80% programmers. Yes, most of the world is small Windows development shops, or small firms hiring internal programmers. Most companies have a few 20% folks, and they’re usually the ones lobbying against pointy-haired bosses to change policies, or upgrade tools, or to use a sane version-control system.
Shocking statement #2: Most alpha-geeks forget about shocking statement #1. People who work on open source software, participate in passionate cryptography arguments on Slashdot, and download the latest GIT releases are extremely likely to lose sight of the fact that “the 80%” exists at all. They get all excited about the latest Linux distro or AJAX toolkit or distributed SCM system, spend all weekend on it, blog about it… and then are confounded about why they can’t get their office to start using it.
I will be the first to admit that I completely lost sight of the 80% as well. When I was first hired by Collabnet to “design a replacement for CVS” back in 2000, my two collaborators and I were really excited. All the 20% folks were using CVS, especially for open source projects. We viewed this as an opportunity to win the hearts and minds of the open source world, and to especially attract the attention of all those alpha-geeks. But things turned out differently. When we finally released Subversion 1.0 in early 2004, guess what happened? Did we have flocks of 20% people converting open source projects to Subversion? No, actually, just a few small projects did that. Instead, we were overwhelmed with dozens of small companies tossing out Microsoft SourceSafe, and hundreds of 80% people flocking to our user lists for tech support.
Today, Subversion has now gone from “cool subversive product” to “the default safe choice” for both 80% and 20% audiences. The 80% companies who were once using crappy version control (or no version control at all) are now blogging to one another — web developers giving “hot tips” to each other about using version control (and Subversion in particular) to manage their web sites at their small web-development shops. What was once new and hot to 20% people has finally trickled down to everyday-tool status among the 80%.
The great irony here (as Karl Fogel points out in one of his recent OSCON slides) is that Subversion was originally intended to subvert the open source world. It’s done that to a reasonable degree, but it’s proven far more subversive in the corporate world!
Enter Distributed Version Control
In 2007, Distributed Version Control Systems (DVCS) are all the range among the alpha-geeks. They’re thrilled with tools like git, mercurial, bazaar-ng, darcs, monotone… and they view Subversion as a dinosaur. Bleeding-edge open source projects are switching to DVCS. Many of these early adopters come off as either incredibly pretentious and self-righteous (like Linus Torvalds!), or are just obnoxious fanboys who love DVCS because it’s new and shiny.
And what’s not to love about DVCS? It is really cool. It liberates users, empowers them to work in disconnected situations, makes branching and merging into trivial operations.
Shocking statement #3: No matter how cool DVCS is, anyone who tells you that DVCS is perfect for everyone is completely out of touch with reality.
Why? Because (1) DVCS has tradeoffs that are not appropriate for all teams, and (2) DVCS completely blows over the head of the 80%.
Let’s talk about tradeoffs first. While DVCS dramatically lowers the bar for participation in a project (just clone the repository and start making local commits!), it also encourages anti-social behavior. I already wrote a long essay about this (see The Risks of Distributed Version Control). In a nutshell: with a centralized system, people are forced to collaborate and review each other’s work; in a decentralized system, the default behavior is for each developer to privately fork the project. They have to put in some extra effort to share code and organize themselves into some sort of collaborative structure. Yes, I’m aware that a DVCS is able to emulate a centralized system; but defaults matter. The default action is to fork, not to collaborate! This encourages people to crawl into caves and write huge new features, then “dump” these code-bombs on their peers, at which point the code is unreviewable. Yes, best practices are possible with DVCS, but they’re not encouraged. It makes me nervous about the future of open source development. (Maybe the great liberation is worth it; time will tell.)
Second, how about all those 80% folks working in small Windows development shops? How would we go about deploying DVCS to them?
- Most DVCS systems don’t run on Windows at all.
- Most DVCS have no shell or GUI tool integrations; they’re command-line only.
- Most 80% coders find TortoiseSVN full of new, challenging concepts like “update” and “commit”. They often struggle to use version control at all; are you now going to teach them the difference between “pull” and “update”, between “commit” and “push”? Look me in the eyes and say that with a straight face.
- Corporations are inherently centralized entities. Not only is their power-structure centralized, but their shared resources are centralized as well.
- Managers don’t want 20 different private forks of a codebase; they want one codebase that they can monitor all activity on.
- Cloning a repository is bad for corporate security. Most corporations have an absolute need for access control on their code; sensitive intellectual property in specific parts of the repository is only readable/writeable by certain teams. No DVCS is able to provide fine-grained access control; the entire code history is sitting on local disk.
- Cloning is often unscalable for corporations. Many companies have huge codebases — repositories which are dozens or even hundreds of gigabytes in size. When a new developer starts out, it’s simply a waste of time (and disk space) to clone a repository that big.
Again, I repeat the irony: Subversion was designed for open source geeks, but the reality is that it’s become much more of a “home run”for corporate development. Subversion is centralized. Subversion runs on Windows, both client and server. Subversion has fine-grained access control. It has an absolutely killer GUI (TortoiseSVN) that makes version control accessible to people who barely know what it is. It integrates with all the GUI IDEs like VS.NET and Eclipse. In short, it’s an absolute perfect fit for the 80%, and it’s why Collabnet is doing so well in supporting this audience.
DVCS and Subversion’s Future
Most Subversion developers are well aware of the cool new ground being broken by DVCS, and there’s already a lot of discussion out there to “evolve” Subversion 2.0 in those directions. However, as Karl Fogel pointed out in a long email, the challenge before us is to keep Subversion simple, while still co-opting many of the features of DVCS. We will not forget about the 80%!
Subversion 1.5 is getting very close to a release candidate, and this fixes the long-standing DVCS criticism that “Subversion merging is awful”. Branching is still a constant-time operation, but you can now repeatedly merge one branch to another without searching history for the exact arguments you need. Subversion automatically keeps track of which changes you’ve merged already, and which still need merging. We even allow cherry-picking of changes. We’ve also got nice interactive conflict resolution now, so you can plug in your favorite Mercurial
merging tool and away you go. A portable patch format is also coming soon.
For Subversion 2.0, a few of us are imagining a centralized system, but with certain decentralized features. We’d like to allow working copies to store “offline commits” and manage “local branches”, which can then be pushed to the central repository when you’re online again. Our prime directive is to keep the UI simple, and avoid the curse of DVCS UI (which often have 40, 50, or even 100 different commands!)
We also plan to centralize our working copy metadata into one place, which will make many client operations much faster. We may also end up stealing Mercurial’s “revlog” repository format as a replacement for the severely I/O bottlenecked FSFS format.
A Last Plea
Allow me to make a plea to all the DVCS fanatics out there: yes, it’s awesome, but please have some perspective! Understand that all tools have tradeoffs and that different teams have different needs. There is no magic bullet for version control. Anyone who argues that DVCS is “the bullet” is either selling you something or utterly forgetting about the 80%. They need to pull their head out of Slashdot and pay attention to the rest of the industry.
Update, 10/18/07: A number of comments indicate that my post should have been clearer in some ways. It was never my intent to say that “Subversion is good enough for everyone” or that “most of the world is too dumb to use DVCS, so don’t use it.” Instead, I’m simply presenting a checklist — a list of obstacles that DVCS needs to overcome in order to be accepted into mainstream corporate software development. I have no doubt that DVCS systems will get there someday, and that will be a great thing. And I’m imploring DVCS evangelists to be aware of these issues, rather than running around thoughtlessly trashing centralized systems. 🙂
I think the main point your article misses (or is not clear about) is that corporate environments (small Windows shops, banks, …) and Open Source community need NOT use the same tools.
Ben, thanks fo rthis post. You wrote:
“In a nutshell: with a centralized system, people are forced to collaborate and review each other’s work”
I work in your 80%, and you simply cannot underestimate the importance of this. In a very large corporate IT shop, the inability of developers to cooperate with each other in sane ways literally shuts down development by leaving the build perpetually broken.
One other comment: your numbers are off – the 80% is much more like 99% outside of Silicon Valley, and many of us – shockingly – aren’t *complete* idiots and don’t like being addressed as such.
Ben:
I’m beginning to think Subversion development is being poisoned by an influx of Perforce brain damage.
First, changesets, which are basically useless as implemented. You _need_ patch-hunk-level granularity to make this a useful feature for general use — the person who justified the lack of this by saying “this is what 90% of users need — ignore the other 10%” has the percentages backwards, or at least very skewed from reality.
Then the svn update interactive merge feature, which breaks lots of existing shell scripts and is difficult to drive from a script (and –non-interactive is insufficiently granular, because it shuts off password access as well)).
And now a distinction between “offline” and “online” usage. Holy thionite, man. Next, you’ll be introducing Perforce’s idiotic mainframe-era “forms”.
Shabs, perhaps you’d like to come to the dev@subversion.tigris.org list and discuss your opinions about new svn 1.5 features? Drive-by criticisms of discussions that took months are sort of… unfair. 🙂
Hi Ben,
We had this disussion sometime back at the local linux users group
http://www.ae.iitm.ac.in/pipermail/ilugc/2007-May/034190.html
http://www.ae.iitm.ac.in/pipermail/ilugc/2007-May/034293.html.
Ben,
It’s great to see you raising these issues and doing so in such a well written way. While thesedays I’m actively encouraging people to look at DVCS technology and Bazaar in particular, I’ve spent most of the last 12 years managing teams in a corporate environment using CVS. I’m highly familiar with many of the points you’ve raised. For example, the maximum number of branches our developers could manage on their laptops was 3 or 4 because of the huge size of the code base. The code was mission critical 24×7 – quality and processes to achieve it were major focuses of our team.
I must say though, that while each of your “shocking statements” are true, many of your other points about DVCS technology simply aren’t correct. They may apply to select tools and DVCS fan-boys (Git and Linus perhaps) but they don’t apply to Bazaar and (to a lesser extent) Hg. Bazaar in particular directly supports the central workflow model. In fact, it supports many workflows really well and really cleanly as illustrated in http://bazaar-vcs.org/Workflows. I’ve argued for some time now that the future of version control is neither central nor distributed, it’s adaptive. See http://ianclatworthy.wordpress.com/2007/06/21/version-control-the-future-is-adaptive/.
Suggesting that central VCS is inherently better because it forces developers to check in to a central location more frequently is a myth. It’s a bit like saying “You’re allowed to talk to anyone in the company on the phone but only through a centrally registered conference call.” Used correctly, DVCS increases collaboration and leads to a much higher quality trunk than I’ve ever experienced or seen by teams using central VCS tools.
It’s good to hear of some of the improvements planned for Subversion. They will greatly help many people do their job more effectively. Together with companion tools like TeamCity, they will be sufficient for many teams for some time. However, there are compelling advantages to DVCS that mean no amount of adding features to a central-only VCS tool can match it. See my recent paper on Why DVCS Matters (http://ianclatworthy.files.wordpress.com/2007/10/dvcs-why-and-how3.pdf) for a a summary of these.
I think you have to have a middle ground between the 20%ers and the 80%ers. I’ve done things like taking the time to learn a language over the weekend for fun, and I’ve certainly gotten in arguments over algorithms with friends and of course have tried linux, but I also program in C# and Java (although, I barely do any work in my Java class and the only C# work I do is XNA stuff), and I like Windows. I’ve never participated in an open source project before although I guess I’m only 17. Whatever, maybe one day I’ll be a 20%er, but for right now I feel more like an mix between the two.
Thanks for putting the stuff I already knew and was pondering upon into a blog. It is definitely better received and accepted when someone as well known as you states it. I have been considering similar nature of collaboration in the CAD world and your article provides some valuable insight on which route to pursue to reach a larger audience.
I think there are three types of programmers, not just two. You forgot those who are interested in new stuff and try everything out, but still go home friday afternoon and forget about computers until monday. Those who use Windows every day but still try Linux from time to time (and then go back to Windows again, because there’s always at least one device that doesn’t have a driver or the driver is so buggy it constantly freezes the system).
And sometimes even those people contribute to open source projects 🙂
But you’re absolutely right about the problems of DVCS. They’re (at least for now, I’m sure they’ll improve) way too complicated for the average people. If you count the programmers out there who never even used version control (besides making a copy of their source folder from time to time and renaming that folder with the date of the copy) you will find that those still outnumber the ones who use version control by thousands.
Try explaining DVCS to them – after a five minute introduction they will already have mentally shut down, built a defense wall against such a complicated beast – and from that time on, you won’t get through to them anymore and they’ll find any excuse to not use version control at all.
When I joined TortoiseSVN (yes, I joined, I didn’t start it), it was because at that time there simply was no version control system available that I would have considered introducing in our company. I found a lot back then, but they all were much too complicated to use or had a really high learning curve. I knew that I just wouldn’t be able to explain those to my coworkers (especially the one at the desk to my left back then) and get them to understand.
That’s why I joined TortoiseSVN, and I’ve tried to keep it as easy to use as possible, hiding as much of the internals of version control as possible (e.g., you won’t find any peg revisions in TortoiseSVN – it always tries to find that itself). To be honest, most of the time I had to implement something in TortoiseSVN I was thinking about that guy to my left and how I would that feature to him – if I couldn’t come up with an explanation that he would understand I went back to the drawing board and tried to implement the feature a different way.
And I think that’s one of the reasons why TortoiseSVN is one of the more successful SVN clients.
Sam Vilain: “However the GUIs have already far surpassed anything you will see in the Subversion space so this is becoming less and less important.”
Great! I’m looking for some DVCS plugin that is on par with Subclipse/Subversive. Could you please post the link? I only found “Mercurial Eclipse” but that didn’t impress me, so I’m stuck with a local SVK repository becuse I can acess that with Subclipse.
By that measure, we wouldn’t have either Linux or *BSD today, to name just two.
i wouldn’t say that many of the 20% people (working in banks etcetera) have never used version control – they all do, in my experience, they use an IDE with SCM plugin. Also i wouldn’t say they all use Microsoft only – there is a large percentage using Java/Eclipse, especially in banks, government, etcetera, mostly for web app server development.
They use whatever tools Microsoft hands down to them — usally VS.NET if they’re doing C++, or maybe a GUI IDE like Eclipse or IntelliJ for Java development.
You must be one of those crazy bitches that uses strictly vi for your editor. Things like Intellisense/Content Assist are more helpful than you think.
sven: I use emacs, actually. 🙂 And yes, emacs has something quite similar to Intellisense too. I never type whole symbol-names out.
Great post, and it inspired my own post. I agree totally with your 80/20 description. Personally, I would have divided the post into two separate ones: one describing the 80/20 observation, and the other regarding DVCS.
“version control and a working Linux abilities are vital tools for a programmerâ€.
That’s an interesting statement Zack. While I totally agree with the former, the latter is just one way of saying “the ability to motivate yourself to learn (and, crucially, keep learning) how to manipulate a system at a low level” or similar. Learning how to use GNU/Linux effectively is just one way of doing that – you could equally well substitute self-motivated learning of any skill which isn’t well supported by the mainstream.
In my experience all too many of the “80%” just don’t do that – I’ve lost count of the number of times I’ve had to “educate” a supposedly senior developer about const correctness or how to use STL containers efficiently. For a disturbingly large subset of developers, “Merge” is a terrifying word.
FWIW I’ve cut myself a career as a Windows developer and I’ve never got around to learning GNU/Linux – but that doesn’t mean I don’t keep up to date with what’s going on in the wider community. I’m all too well aware of what’s going on with C++0x, dynamic languages etc…it’s just that I simply don’t have the time to do everything I’d like to (my free time usually gets eaten up by company admin), so something usually has to give.
As the saying goes, “We live in interesting times”. 🙂
Beside the fact that this 80/20 is just some number you pulled out of thin air, you are setting up a false dichotomy. As a corporate developer myself, I have grown increasingly frustrated with CVS and tools like it precisely because its poor branch/merge semantics and/or lack of a local repository. I had found myself doing my own “ad hoc” source control by saving away files in an “intermediate state” before check-in to the central code repository.
The central code repository is supposed to represent a high quality, if not pristine view of the source. Every checkin is supposed to bring the source closer towards the project’s goal. However, that does not jive with the way people (like me) work. I need to sometimes be very experimental, and try radical/risky things that I don’t wish to expose to the rest of the developers. But I need to be able to back out in baby steps, precisely because I know that my “catastrophic code error” (like a global find and replace gone bad) rate will be much higher. And I don’t want to pay the insane cost of branching then merging the tree every time I do this. So having a local repository is critical for me to maintain my own productivity level, while not interfering with anyone else’s.
I have not yet tried something like GIT or Mercurial, or Bazaar. I instead use *two different* source control systems: CVSNT and Perforce. CVS is the corporate central repository, and Perforce is my local repository. This way I get the benefit of CVS annotate to know what other people have done to the sources, and I get to use the very powerful and nice Perforce (that has a free single user license) for me to try my experiments in. This works great for me (since I get to work uninterrupted in the way *I* am most confrtable) and my company (which is only exposed to my good changes that have reached the corporate standards for level of quality).
The goal of DVCS systems, as I understand it, are to solve my problem in a single system. I don’t buy this idea that I am among some elite 20% of developers that code this way. My contention is that *anyone* whose main development system is a laptop instead of a desktop *already* understands a good argument for having a local repository.
If you want to clone some features from GIT you are going to have to start with the idea of a local repository. Otherwise, I just don’t see SVN as having any serious future.
My instinct, from my experience, is to say that the majority of developers in the world are lone developers. Sometimes they are successfull and get more developers to join them in their quest. Then there are purpose software teams that are building large applications who will have established cultures and are resistent to change. Almost every developer I know started as a lone developer tinkering with code. Perhaps these statements are incorrect and I certainly can’t back them up but it makes sense to me.
The vast majority of these lone developers will generally not see the benefit of revision control over taking some sort of backup (my turn to generalise with sweeping statements).
The group I see as being a large driving force in adopting tools like Mercurial and Bazaar is those growing from being a single developer to getting a co-worker and the progression from that point. The reason I say this is that the barrier to entry is next to nothing for these two tools.
To use either of these two tools is simply adding a single folder which does not pollute your project. To pull out of the decision is simply deleting the folder. The tools are simple and require only a few commands to get the basics done.
Centralised repositories require an install of a server and are more constraining in their use. Centralised repositories have a larger learning curve in how to deal with a server or in a lot of cases requires a sysadmin who is willing to help, beaurocracy is a killer of a lot of things.
For me I see it as a natural evolutionary step for lone developers progressing onto team development which will drive Bazaar and Mercurial.
I use Bazaar and Mercurial in particular for this comment as these are the only two DVCS I have used and I do not know if my comments are true of others.
There are other reasons why I think Mercurial and Bazaar are improvements over subversion but this post is about majorities and status quos and so is probably not relevant.
I don’t think that I am disagreeing with your post too much just adding more perspective.
The only sure thing I know is that things will change.
I think you have to have a middle ground between the 20%ers and the 80%ers. I’ve done things like taking the time to learn a language over the weekend for fun, and I’ve certainly gotten in arguments over algorithms with friends and of course have tried linux, but I also program in C# and Java (although, I barely do any work in my Java class and the only C# work I do is XNA stuff), and I like Windows. I’ve never participated in an open source project before although I guess
version control pro is a very easy version control tool and very cheap.you can visit http://www.upredsun.com/vcpro/vcpro.html and learn more…
In this blog you criticize DVCS for the wrong reasons, I’ll get to that later. Then you mention all the enhancements to SVN which are all DVCS features (offline commits, local branches, consolidating all the metadata directories…) These features won’t fix some of the biggest failings of SVN.
Here are reasons why some of your criticisms of DVCS are misguided.
“In a nutshell: with a centralized system, people are forced to collaborate and review each other’s work”
With SVN people are forced to put their code in the main repository in order to commit their work at all. They don’t have a choice. Everyone reading this knows that putting your code in the same place and collaborating are NOT the same thing.
“They have to put in some extra effort to share code and organize themselves into some sort of collaborative structure.”
It’s not extra work, it is sharing code along the lines of an existing social structure. It’s usually less work to commit locally until you want to collaborate or move the code off your machine, and the push it all when you want to.
“Yes, I’m aware that a DVCS is able to emulate a centralized system; but defaults matter. The default action is to fork, not to collaborate!”
When it is your only option it is not a default. It is a limitation.
The fact is that if you have developers that don’t play nicely with others, the tool isn’t going to matter. I have seen more people refrain from commits with SVN than DVCS because in their mind they aren’t ready to commit to trunk and creating a branch is too hard. This is a problem with the individual regardless of tool, and with SVN this guy (who is the 80%) doesn’t even have the capabilities of VCS locally.
” * Managers don’t want 20 different private forks of a codebase; they want one codebase that they can monitor all activity on.”
There is only the one codebase that matters usually, this is moot.
” * Cloning a repository is bad for corporate security. Most corporations have an absolute need for access control on their code; sensitive intellectual property in specific parts of the repository is only readable/writeable by certain teams. No DVCS is able to provide fine-grained access control; the entire code history is sitting on local disk.”
Parts of the repo, and history are two orthogonal concepts. This doesn’t even make sense. If parts of your repo need different security you can create a different repo. You have to do the same thing in SVN. A certain range of history is not going to be what you need to protect.
” * Cloning is often unscalable for corporations. Many companies have huge codebases — repositories which are dozens or even hundreds of gigabytes in size. When a new developer starts out, it’s simply a waste of time (and disk space) to clone a repository that big.”
This is blatantly false. I’ve used git-svn on several large repos. My local git repo with all the history is ALWAYS smaller than the SVN snapshot. One 7gb svn snapshot was a 3.5gb git repo. Half the size and the whole history. The bare git repo (without a working copy) was only 35mb (yeah, with an m). Having the whole history, and the ability make local commits and branches makes most operations way faster. Which all argues that on a large code base you SHOULD be using DCVS.
Just like most of the rest of your post, I find that assertion was made completely out of ignorance.
The real trade off between Subversion and a DVCS like git is usability vs. power. Subversion has great IDE integration, and windows GUI’s. To me the gui seems unnecessary and I would rather have the features. This is not the case for everyone. It’s a safe bet that most of the DVCS’s are going to catch up in the usability department before SVN gets the features, and that fact does matter to everyone.
I don’t agree with you. You are focusing on the team, and assuming the model is fixed.
With DVCS you can still follow a centralized model with one repository where all developers can push to.
The nice things is that even if you follow that model, I am still free to push my changes to my laptop instead, work from home, then push them to my server, continue working in the train, create 300 branches and easily merge them when I am back. I can’t do that with subversion, I usually need to commit in order to work again from another place (and I still need network access).
I’m totally agree with you in one point – DVCS is not for every project.
But I disagree with your Pareto distribution.
My point of view:
80% – doing NOTHING. (As my colleague said after learning Ruby : “Anyway, Java is the only language in which you can write MILLIONS LINES for enterprise projects)
16% – doing SOMETHING
3% – doing WELL
And 1% – are extremely good, with .2 % doing EXTRA.
It’s my not very honest opinion.
Anyway:
1. Thank you for SVN
2. My apologies for my English