For a research paper I am working on, we wanted to analyze the top 30 “most collaborative” projects on Github. Defining a quantitative metric of collaboration and sorting projects according to it is not an easy task, as collaboration is in many cases implicit and not recorded, while not all actions of collaboration are equal. As a proxy, we chose to measure the number of people that perform changes that mutate the state of a repository. On Github, we could identify the following:

  • A: Create a commit to a repository
  • B: Perform a code review on an individual commit
  • C: Create/Update/Merge/Close a pull request
  • D: Perform a code review on a pull request
  • E: Comment on a pull request
  • F: Create/Close an issue
  • G: Comment on an issue

Using GHTorrent as a data source, I wrote a script to measure the individual persons that performed the actions above for all non-forked repositories and then sorted the repos according to the total number of individual contributors. The results can be seen in the table below:

repoABCDEFGall
isaacs/npm1002116723247256833026147
torvalds/linux596814673161006212
symfony/symfony10215212613951305184421606215
jquery/jquery-mobile2121343121350288830086391
joyent/node65752833132943230428056653
CocoaPods/Specs26589025843912355152686674
gitlabhq/gitlabhq60589871138915225136087344
angular/angular.js8759213061391520154037787919
rails/rails2699309231560731744746489015339
mxcl/homebrew342676312552838885157730120510

The numbers are staggering. A project (Homebrew) that is just 5 years old has attracted 20.5k — 20,500, the size of a small city! — people to contribute to it. Ruby on Rails has been collaboratively developed by a community of 15k people and still works! To compare these numbers with other software engineering projects is futile: most projects, even ones with a very long lifeline are very small in comparison. Perhaps a more fair comparison is with other online collaborative initiatives: The English Wikipedia is being maintained by 130,800 people, while the effort of decoding the human genome has been carried out by thousands of people.

If nothing else, the above are an example of the power of commons and certainly the usefulness of Github as a collaboration platform.



Published

27 March 2014

Tags