Subversion’s Future?

This entry was posted by on Tuesday, 29 April, 2008 at

According to Google Analytics, one of the most heavily trafficked posts on my blog is the one I wrote years ago, the Risks of Distributed Version Control. It’s full of a lot of semi-angry comments about how wrong I am. I thought I would follow up to that post with some newer thoughts and news.

I have to say, after using Mercurial for a bit, I think distributed version control is pretty neat stuff. As Subversion tests a final release candidate for 1.5 (which features limited merge-tracking abilities), there’s a bit of angst going on in the Subversion developer community about what exactly the future of Subversion is. Mercurial and Git are everywhere, getting more popular all the time (certainly among the 20% trailblazers). What role does Subversion — a “best of breed” centralized version control system — have in a world where everyone is slowly moving to decentralized systems? Subversion has clearly accomplished the mission we established back in 2000 (“to replace CVS”). But you can’t hold still. If Subversion doesn’t have a clear mission going into the future, it will be replaced by something shinier. It might be Mercurial or Git, or maybe something else. Ideally, Subversion would replace itself. 🙂 If we were to design Subversion 2.0, how would we do it?

Last week one of our developers wrote an elegant email that summarizes a potential new mission statement very well. You should really read the whole thing here. Here’s a nice excerpt:

I'm pretty confident that, for a new open source project of non-huge
size, I would not choose Subversion to host it [...]
 
So does that mean Subversion is dead? That we should all jump ship
and just write a new front-end for git and make sure it runs on
windows?

Nah. Centralized version control is still good for some things:

* Working on huge projects where putting all of the *current* source
  code on everyone's machine is infeasible, let alone complete
  history (but where atomic commits across arbitrary pieces of the
  project are required).
* Read authorization! A client/server model is pretty key if you
  just plain aren't allowed to give everyone all the data. (Sure,
  there are theoretical ways to do read authorization in distributed
  systems, but they aren't that easy.)

My opinion? The Subversion project shouldn't spend any more time
trying to make Subversion a better version control tool for non-huge
open source projects. Subversion is already decent for that task, and
other tools have greater potential than it. We need to focus on
making Subversion the best tool for organizations whose users need to
interact with repositories in complex ways[...]

I’ve chatted with other developers, and we’ve all come to some similar private conclusions about Subversion’s future. First, we think that this will probably be the “final” centralized system that gets written in the open source world — it represents the end-of-the-line for this model of code collaboration. It will continue to be used for many years, but specifically it will gain huge mindshare in the corporate world, while (eventually) losing mindshare to distributed systems in the open-source arena. Those of us living in the open source universe really have a skewed view of reality. From where we stand, it may seem like “everyone’s switching to git”, but then when you look at a graph like the one below (which shows all public (not private!) Apache Subversion servers discoverable on the internet), you can see that Subversion isn’t anywhere near “fading away”. Quite the opposite: its adoption is still growing quadratically in the corporate world, with no sign of slowing down. This is happening independently of open source trailblazers losing interest in it. It may end up becoming a mainly “corporate” open source project (that is, all development funded by corporations that depend on it), but that’s a fine way for a piece of mature software to settle down. 🙂

71 Responses to “Subversion’s Future?”

  1. Stephen Baynes

    The comercial world needs aspects of centralized and distributed CM.
    Projects are often multsite, multi team. They may involve external partners (I have seen Git and Mercutirial used in somecases) but sometimes communication with these is restricted to email only. Separately produced IP that is integrated into many products is common – something the open source tools have not yet tackled. On the other hand there are usually many developers at one location and data sizes (particually once CAD data is added) are lage. 10GB workareas are not rare and I have seen one that was a quater of a Terabyte. For this you have to have link based workareas to a shared cache. Most of the tools targeting open source assume one developer per location. On the other hand access controls need to be good – data may need to be replicated between repositories granting different acceses. Most of this streatches the best comvercial CM tools too.

  2. I’m sure SVN is not going anywhere in the near future. It’s far too valuable of a tool.

    Not only is it necessary in order to provide some form of central auditing of the source, but it’s also far less confusing. Many developers, even in a highly technical organization like where I work, have problems dealing with the more complicated aspects of source control. Even *I* find git somewhat confusing, I can’t imagine throwing it at the people here. Furthermore, SVN integrates nicely with tools like Trac, which is a complete godsend for our environment.

    Troels: I’ve written a script that pulls LDAP group members and places them in the authz file. Please feel free to contact me at kamil@kamilkisiel.net if you are interested in a copy.

  3. Pascal Varet

    @Ben:

    Hi Ben,

    Thank you for this interesting article! While there has been a lot of posting here already, I would still like to chime him and thank your level-headedness, the clarity of your vision, and that way you have to welcome discussion of Subversion’s flaws without antagonism against other solutions.

    I hope the twitchy zealots and trolls that have unfortunately flocked to your post will not discourage you from this approach, and in the name of the silent mass whose use cases require centralized source control, I would like to humbly thank you for providing us with such a good tool for this purpose.

    PS: The anecdote about how Karl and you received the delta editor design from Jim Blandy is thus far among my favorite chapters of the ‘Beautiful Code’ book, not only for its excellent software design literacy, but also, and perhaps mostly, for its deep humility.

    @Troels Arvin says:

    > – LDAP-integration in the svn access configuration files, so that groups may
    > refer to groups in an LDAP tree

    Excellent point. We solved it here with a small Python script that automatically generates the svnpolicy file by polling the LDAP directory, and runs from cron; it ended up working much more smoothly than I had anticipated.

    > – A way to handle charset conversions on the fly:

    I would like to second this wish. Although the core issue really is that the Linux filesystem thinks of filenames as *bytes* (with no explicit encoding information) as opposed to characters, causing all kinds of problems when you can’t tell what the LC_CTYPE was at the time the file was created, a workaround at SVN’s level would be sweet. This is probably the number one issue my users get.

  4. I would just think about how to make subversion compatible with distributed systems. Just because subversion sits on one server doesn’t mean it can’t participate in a distributed system. Might sound strange, but that’s would I would try to work out.

    Stephan

  5. I’m suprised that only two previous comments have mentioned SVK. From http://svk.bestpractical.com/view/HomePage
    “svk is a decentralized version control system built with the robust Subversion filesystem. It supports repository mirroring, disconnected operation, history-sensitive merging, and integrates with other version control systems, as well as popular visual merge tools.”
    I’ve used SVK extensively. It provides 90% of the functionality of other DVCS tools, on top of SVN. So you can have your cake and eat it to (mostly).
    It’s not a panacea though. First of all it’s got a very bad case of CPAN dependency hell. If you don’t mind installing it with the cpan shell, or if a native package already exists for your platform, then no problem. If you are trying to roll your own native package, welcome to hell.
    I work for the Scripps Institution of Oceanography and I have to maintain syncronized source code and configuration information for a shoreside office and 4 ocean going research vessels, with very slow and unreliable internet access, and that might only be in home port once every 3 years. We’ve opted to go with Mercurial instead of SVK because of HGs ability to sync any repository with any other repository and dump changesets to files. EG I can clone a repo from the shore server onto my laptop, then to the ship’s main server, then my buddy can grab it from there, and we can all push/pull in whatever sequence is most convenient at the time. When it comes time to push our changes back to shore, I can dump the changesets to an HG changeset file, BZIP and RSYNC it. Alternatively, I can just bring the repo home on my laptop and push the changes to the shore server once I’m back in the office.
    This is much more convenient than SVN/SVK where a large commit might take an hour over the satellite, and if it breaks then I have to start all over again because it’s atomic and there’s no way to resume a partial commit, or an update for that matter. Our satellite links rarely stay up for an hour without interruption. I know SVK can dump out a patch set, but then I lose history. The bottom line is that SVN/SVK just doesn’t work over slow, unreliable links.
    Granted my environment is probably highly unusual.

  6. Adam Schrotenboer

    It is not necessary to pull ldap group members anymore, at least with Apache 2.2. Instead, try this.

    LoadModule authnz_ldap_module modules/mod_authnz_ldap.so

    Order deny,allow
    AuthType basic
    AuthName LDAP
    AuthBasicProvider ldap
    AuthzLDAPAuthoritative On
    AuthLDAPURL “ldap://ldap.lan.example.com/cn=users,dc=example,dc=com?uid?one”
    AuthLDAPGroupAttribute memberUid
    AuthLDAPGroupAttributeIsDN off
    # these are ‘OR’ requirements, not ‘AND’
    require ldap-group cn=specialgroup,cn=groups,dc=example,dc=com
    require ldap-group cn=softgroup,cn=groups,dc=example,dc=com
    # 1026 is ‘specialgroup’
    # 1037 is softgroup
    require ldap-attribute gidNumber=1026
    require ldap-attribute gidNumber=1037

  7. Adam Schrotenboer

    Oops, didn’t realize that would be interpreted as a tag.

    LoadModule authnz_ldap_module modules/mod_authnz_ldap.so
    <Location />
    Order deny,allow
    AuthType basic
    AuthName LDAP
    AuthBasicProvider ldap
    AuthzLDAPAuthoritative On
    AuthLDAPURL “ldap://ldap.lan.example.com/cn=users,dc=example,dc=com?uid?one”
    AuthLDAPGroupAttribute memberUid
    AuthLDAPGroupAttributeIsDN off
    # these are ‘OR’ requirements, not ‘AND’
    require ldap-group cn=specialgroup,cn=groups,dc=example,dc=com
    require ldap-group cn=softgroup,cn=groups,dc=example,dc=com
    # 1026 is ‘specialgroup’
    # 1037 is softgroup
    require ldap-attribute gidNumber=1026
    require ldap-attribute gidNumber=1037
    </Location>

  8. Probably a bit of a crazy thing to say but, maybe subversion should just keep doing what it’s doing. There are lots of non-technical reasons why people use subversion as verses distributed systems, some that come to mind (and, in some cases, apply to me) are things like:

    *. It does the job.
    *. It’s robust.
    *. I understand how it works.
    *. Maybe I want one central version.
    *. It requires minimal set up (can run ad-hoc, from filespace or over SSH w/o any servers).
    *. I like it.

    For what I want to do (i/c things like running a private, personal code repository) it’s just fine. I could spend the N hours picking $THE_BEST_DVCS and getting as familiar with it as I am with svn, but, hey, you know what, I’d rather spend that time doing real work, or with my partner.

    I guess the base line is, that subversion is becoming infrastructure. It changes the appeal of it as a piece of software, it may not be breaking much new ground, but it’s becoming really important to lots of people. It’s transitioning into a new phase of the software lifecycle and some of the head scratching and arguing about direction is because of that ‘growing up’. For example, look at Apache and the amount of hassel they’ve had moving people off the 1.3 series to the (technically much stronger) 2.X series.

    So, yea, keep up the good work guys. Please. We kinda need you to…

  9. Jeff Ebert

    I use centralized revision control in a corporate environment for chip design. Working area size is a big concern for us. The assumption that disk space is cheap is simply not accurate. Fast, backed-up disk space is not cheap, and that is what we use in a corporate environment. We have shared compute resources that are networked to shared filers (net apps). We are going over a very fast, very reliable network for every file access already.

    So, obviously, the biggest problem in using subversion in my work is the text-base penalty. Distributed version control systems are worse, I would assume, because you are not sharing the repository with other people. Therefore, our only options seem to be CVS and Perforce. Without the text-base penalty, I know of a large number of chip designers that would have switched to SVN a few years ago.

    I do use subversion for small hobby projects, but I cannot justify switching to it at work. If the project does begin to focus on the needs of typical corporate users, then I think the disk vs. network assumptions will need to be revisited.

    Thanks for your hard work!

  10. Yeah, we’ve got to get that text-base penalty lowered or made optional eventually… If we had an optional ‘svn edit’ command (meaning: “I’m about to edit this file”), then that command could first copy the working file to hitherto-nonexistent text-base, doing any eol or keyword subst along the way, of course.

  11. Certainly, for my workplace (a video games development company) our head checkout is in the order of 15gB (30gB with .svn folders -_-) but each developer needs all that to be able to test changes etc.

    The _big_ advantage of SVN for us however is that it lets us give limited read and commit access to third-parties (outsourced work) without giving them either the ability to write things they should not be able to, nor read things they should not be able to (eg things we do not own ourselves).

    On the other hand, when workflow permits, I use git-svn so I can use git for my work, and then push it back up to the SVN server as appropriate. (Workflow permits here means Linux-side work. git under windows just doesn’t rock my pants successfully. Also, current project has checkin-requirements that git-svn would not integrate well with. >_<)

    The other big advantage of SVN is that it has a Windows client that the non-technical people (artists, producers, game designers etc) can use without significant support. I’m hoping one of the Mercurial windows frontends comes through here, although _I_ would like a git-svn windows shell addin, personally.

    Linux.conf.au 2008’s Gaming Miniconf has a video in which I ramble undirectedly about the reasons we can’t yet move from SVN. It’s the FOSS In Commercial Games video.

  12. Troels Arvin

    @Adam: I know that Apache can handle LDAP authorization. That way, you may give all-or-none access to a repository. However, there is no obvious way to specify that LDAP group “foo” is the only group which may access /some/sub/tree of a repository.

    (I prefer to have one big repository, so that history doesn’t get lost if a file is moved/copied/… betweeen repositories.)

  13. I totally agree with previous comments such as “the corporate world (especially someplace like Google or Apple with loads of IP to protect) will stay on a fundamentally central model.” And I’ve definitely worked on a number of projects where it is totally impractical to keep an entire working copy on your own local drive, to say nothing of an entire repository history. (And *two* copies of every file? That’s just crazy talk!)

    I think DVCS has been overhyped as a solution. Yes, it’s a problem that access to a centralized server over a WAN is slow. Yes, it’s a problem that you have to create an actual branch on the centralized server if you want to checkpoint some temporary, unfinished work in progress. But is the answer to mirror the entire repository on your own local disk? Of course not!

    So what’s the real answer? I think it’s a combination of a centralized system with tools to help enable better distributed development. Admittedly, I’m not exactly a disinterested party in this discussion, but the answer, I think, is something along the lines of my company’s tool Cascade. You don’t have to throw away or migrate your existing Subversion or Perforce repository, yet you get many of those same benefits that a DVCS offers through Cascade’s caching system, proxy server, and “checkpointing” mechanism — plus a bunch of other cool stuff.

  14. Mike

    some of us at work are using git as a sandbox organizer.
    but really, we are using it to hide our early-stage work.
    this is a psychological issue, not a technical one – and
    maybe its not a good idea.

    we depend upon Subversion for our Central Authoritative Repositories.

  15. BSD

    I think you are seeing this type of approach emerge:

    A team of 10-12 people checks out something from svn and convert it to hg/git whatever and start hacking on it – collaborating “outside” svn using dcvs tools. When they have done enough work to make their changes significant enough to check back in to the “core repository”, things are cleaned up and having used hgsvn or gtsvn conversion to start with (leaving the subversion check out intact) they check the changes things back in to svn. You can use hg log and other local tools to help build a well documented commit message to put back into the subversion repo. Some care is required to avoid checking in any files/directories that are hg or git specific (use ignore). Used together SVN and DCVS can make the canoncial subversion repo less busy and messy and can keep multiple modes of collaboration “on track” and working together.

    One comment on *big* projects: the freebsd project uses p4 and svn together – it’s a mess and not documented well but meets a need. I think if subversion documented how to do this nicely kind of thing well with hg and git … and or made it possible to use svn *itself* in a semi-distributed kind of way (local history etc) it would be great.

    A spiffier nicer builtin web viewer/publisher (ViewVC is a little olde-fashioned) that could work from inside checked out versions of a project (sort of like “hg serve”) would be nice too.

    http://pypi.python.org/pypi/hgsvn

  16. Catherine

    I would recommend SCM Anywhere, which is a SQL Server-based software configuration management (SCM) tool with fully integrated version control, bug tracking and build automation.
    http://www.scmsoftwareconfigurationmanagement.com

  17. Best take I’ve heard on the future of VC.

Trackbacks/Pingbacks

  1. Web Development Stuff, Freedom Stuff : Appeal Democrat
  2. Monkeyz.eu / Gestionnaires de configuration: Récapitulatif des systèmes distribués
  3. Subversion: Sistema de control de versiones - Tutorial y material | Picando Código
  4. Paas and NoSQL are coming so you better get ready - Platform as a Service Magazine