It’s been a long week. After almost a year at Google, our team finally released our new service to the world at OSCON this week in Portland. Of course, I didn’t actually get to attend the conference; I spent four days holed up in a smelly hotel room with three other team-mates, trying to prepare for public launch. This lovely photo shows us like the caged animals we are.
My co-worker Fitz and I did manage to escape the hotel room twice to present two talks. We’ve been working together so long now that we can actually finish each other’s sentences, so we’ve turned this abliity into a presentation gimmick. We stand in front of crowds each holding a microphone, riffing on slide bullets back and forth. I’m not sure if it’s comedic or just novel, but we get really great feedback. One blogger did an amazing writeup of our first talk, and another blogger practically republished the slides of our second talk. A third blogger said that our first talk was the “best session of the day”. Woo!
In any case, the real news is our product announcement.
Long ago I made post describing What I Do at Google. Namely, I work on the Open Source team, whose mission is to promote open source software development however we can: create more of it, and make it better. Working for a large company like Google, we have two main resources: (1) money, and (2) a massive data-serving infrastructure. Given these resources, how can we accomplish our mission? Our team has been using money in a few ways. We make financial donations to important open source projects, and sometimes even pay people to work full-time on them. We also fund the hugely successful Summer of Code program, which pays hundreds students to work on open source software as ‘virtual summer internship’; the result is not only accelerated open source development across the board, but a whole new generation of open source programmers.
But it’s the massive data-serving infrastructure that we’ve not really used much — until now. Our big product announcement was Open Source Project Hosting on our team’s main website, code.google.com. You can read all the gritty details in our FAQ, and there are many blogs and news posts that talk about our new service, as well as folks posting screen shots.
The general jist of the service is: come to our site and create an open source project. There’s no approval process. Within a few seconds, your project is ready to go. You get a super-simple front page which describes your project, an issue tracker, and a Subversion repository to store your source code.
No, this is not a new concept. There are many project-hosting sites out there such as Sourceforge and Tigris. Our intent isn’t to compete directly with these sites, but rather to provide more options to open source developers; we feel that we’ve got a fresh new take on project hosting, and we’re excited to see how developers make use of it. Following the motto on our front page, we’ve “released early” (our service is admittedly quite spartan right now) but also plan to “release often” — there are a whole slew of new features planned over the next several months.
Let me draw your attention to the two main features we’re providing at launch. The issue tracker (written by a team-mate of mine) is unlike any I’ve ever seen. Like many Google products, it’s extremely fast and AJAX-y, has cool customizable views, and has a simple Gmail-like interface. Instead of forcing users to fill out myriads of fields, the issue entry is simple and uncluttered. Developers can invent any sort of arbitrary ‘labels’ to describe an issue, much like Gmail labels. The issue tracker then uses Google search technology to search over all of the data in a free-form fashion. There’s just one search box. I really think it’s a whole new approach to issue tracking applications, and we hope it’s useful to open source developers.
However, the thing which I’ve been working on for the last ten months (with Fitz) is the Subversion hosting feature. A typical Subversion repository has two different back-end options for storing your code: either a BerkeleyDB database or a FSFS (flat filesystem) store. What Fitz and I have done is write a new Google-specific back-end which stores the code in our datacenters, specifically in an internal technology called Bigtable. Jeff Dean (another Googler) has given public presentations about Bigtable, and you can read more about this technology by Googling for it. It’s much like a gigantic spreadsheet for holding data, but it can run over thousands of machines spread over mulitple datacenters. So just like other Google services (picasaweb, Gmail, Calendar, etc.), your data gets put into a massively scalable, redundant, and reliable system. I know that in the past, I’ve been personally frustrated when my own Subversion server goes down; it can be a lot of work to manage your own repository. Instead of being distracted with that, let Google Code be your free host, and get on with coding!
I know that some of my closer friends may wonder if I’ve turned to the Dark Side. Aren’t I the same guy who worked on open-source Subversion for five years? Who wrote a free book about Subversion? Why have I spent time all this time writing a proprietary (!) extension to an open source system? My answer is that it’s all about seeing the forest through the trees. Yes, I wish that our new Subversion back-end were open-sourceable, but it’s not a realistic possibility. Google has amazing hardware and a complex mountain of software to make use of it effectively, but it’s all one proprietary ecosystem. Releasing this new Subversion back-end as open source would be meaningless, since it’s not something that can function outside of Google. That said, the point here is the larger, longer-term benefit: by providing free, highly-available Subversion repositories to the world at large, we open to “embiggen” open source everywhere. (Yes, that’s the word used by my team leader, Chris DiBona). ๐
To technical friends: I hope this post has been enlightening. To my nontechnical friends and family: hope that wasn’t too much babble!