A Mixed Methods Approach to Mining Code Review Data: Examples and a study of multi-commit reviews and pull requests

by Rigby, Peter C. and Bacchelli, Alberto and Gousios, Georgios and Mukadam, Murtuza

edited by Bird, Christian and Menzies, Tim and Zimmermann, Thomas

You can get a pre-print version from here.

Abstract

Software code review has been considered an important quality assurance mechanism for the last 35 years. The techniques for conducting modern code reviews have evolved along with the software industry and have become progressively incremental and lightweight. We have studied code review in number of contemporary settings, including Apache, Linux, KDE, Microsoft, Android, and GitHub. Code review is an inherently social activity, so we have used both quantitative and qualitative methods to understand the underlying parameters (or measures) of the process, as well as the rich interactions and motivations for doing code review. In this chapter, we describe how we have used a mixed methods approach to triangulate our findings on code review. We also describe how we use quantitative data to help us sample the most interesting cases from our data to be analyzed qualitatively. To illustrate code review research, we provide new results that contrast single and multi-commit reviews. We find that while multi-commit reviews take longer and have more lines churned than single commit reviews, the same number of people are involved in both types of review. To enrich and triangulate our findings, we qualitatively analyze the characteristics of multi-commit reviews and find that there are two types: reviews of branches and revisions to single commits. We also examine the reasons why commits on GitHub pull requests are rejected.

Bibtex record

@incollection{RBGM14,
  author = {Rigby, Peter C. and Bacchelli, Alberto and Gousios, Georgios and Mukadam, Murtuza},
  editor = {Bird, Christian and Menzies, Tim and Zimmermann, Thomas},
  title = {A Mixed Methods Approach to Mining Code Review Data: Examples and a study of multi-commit reviews and pull requests},
  booktitle = {The Art and Science of Analyzing Software Data},
  year = {2015},
  publisher = {Morgan Kaufmann},
  pages = {231--255},
  isbn = {0124115195},
  url = {/pub/code-review-mixed-methods.pdf}
}

The paper