Paper discussion: Poncin et al. [1]

Seminar discussion

by Ernst Mulders

Moderator of the discussion is Ayushi Rastogi.

The discussion session is started with an analogy. Ayushi mentions the delays one can have inside an airport. The similarity with software is in the fact that one bottleneck can cause delays for the entire project. Furthermore assumptions are made for the required scenarios. In the airport example a student mentions that they might have assumed that check-in would take 20 minutes, but in reality takes longer causing delays.

The moderator specifically mentions the three parts of the paper: - processes, - mining and - software repositories

The students are asked what the keywords / important items of the paper are. The following replies are given:

  • Prototype presentation paper
  • It proposed a tool
  • Trying to identify processes in software development
  • Analytics + processes + software development

Ayushi mentions that process mining can be described as extracting data that help you find the process. So the paper is about how you can find these processes. Why is this relevant in software? Because the amount of data is too large to do it manually.

“What repositories is the paper talking about?” the moderator asks. After a slight delay a student gives the answer that the paper uses a broader view on the term repository, and not only means software repositories (commit history) but also bug trackers and e-mail logs. Ayushi adds that these repositories contain: who, what and when.

The discussion continues about the experiments conducted in the paper. The first experiment is summarized by a student and can be described as “matching characters.” The following elements are mentioned:

  • Name matching, with heuristics
  • Context merging (Georgios notes: git time zones are notoriously hard to match since commit times are local times of the developer his machine)
  • Person title matching

The explanation of the 2nd experiment is given by a student after a refresh of that part of the paper. The following points come forward in the discussion:

  • It is about bug lifecycle
  • The lifecycle is in practice different from the theory

When asked by Ayushi why the paper is useful, the following answers are given:

  • For giving time estimates
  • For improvement of processes

Everybody agrees that the paper isn’t a good paper. Some reasons are given:

  • Temporal inefficiency (data life span should be around 6 months, no longer)
  • Missing data, often why certain actions happen
  • Outliers aren’t explained
  • They didn’t validate

Paper discussion: Løhre and Jørgensen [2]

Motivation

  • Software estimation is very hard (not every one agrees), and often wrong (think software crisis)
  • Usually done by experienced developers, rathen than following research findings
  • Humans are very bad estimators, because of cognitive biases

How do anchoring effects affect estimations?

Research method

3 experiments (randomized control trials):

  • Experiment 1: Is there an anchoring effect? Does numerical precision affect anchoring?
  • Experiment 2: Does too narrow numerical precision affect anchoring?
  • Experiment 3: Does the source of the anchoring information affect the anchroring effect?

Results

  • Experiment 1: Strong anchoring effect. The precision of the anchoring values is not important
  • Experiment 2: Strong anchoring effect. Again, the precision of the anchoring values is not important
  • Experiment 3: The credibility of the anchor source does not affect the anchor effect

Implications

To summarize, our findings indicate on the one hand that anchors have strong effects on estimates of software project effort, and on the other hand, that the anchoring effect is not moderated by numerical preciseness or source credibility.

Questions

  • Please summarize the paper

Biases / Anchroring

  • What do you think is a bias?
  • How do biases affect research and results?
  • What other biases might affect effort estimation?

Experiments

  • What is an experiment and how to run one?
  • How to check differences between groups?

Seminar discussion

by Ernst Mulders

The discussion was started with the moderator asking to give a summary of the paper. Two summaries were given. From these summaries it is discussed that an earlier paper on anchoring already existed and that in this paper extra dimensions and input are added. Furthermore it is noted that motivation for the paper came from questions not asked in other papers about anchoring.

Immediately from this conversation the topic turns onto biases.

The moderator mentions and recommends the book Thinking, Fast and Slow by Daniel Kahneman. Kahneman is an important figure in the field of behavioral economics. The book also explains on how we make decisions that we think are rational, but are actually biased. Until the sixties it was thought that people could think purely rational.

After the introduction of the book the moderator hands out the Cognitive Bias Codex which shows the influences in decision making. In a clockwise manner we go through the items on the codex. The items on the main circle are shortly read out loud by the moderator. It is noted that items in the inner circle are the corresponding research fields.

The bias discussion continues with the moderator asking the question “Can you think of places were biases can play a role in SE?” Several students reply:

  • In an hiring process. Previous experience with people influences a decision made about them. Also it is mentioned that people tend to hire persons similar to themselves (although diversity in a team is actually better).
  • In choosing a programming language
  • Within a team, biases can occur on the ethnicity of fellow members (racism)

When the group diversity argument is mentioned, the moderator turns to Ayushi Rastogi who has performed research on biases in accepting pull requests. From her research it is found that reviewers of a pull request are 20% more likely to accept the request when the committer is from the same country as the reviewer. Furthermore people performing the reviews often say they aren’t biased, but the committers mention they do feel the bias.

The moderator continues the discussion by asking for examples of biases in research. The given examples by students are:

  • “I know better, because I’m the professor”
  • Confirmation bias
  • Always looking for the simpler methods instead of the more complex ones
  • Bias in research method selection (the moderator disagrees; there aren’t that many)
  • Representativeness, e.g. you select repositories you know or have a sort of bias. So we should choose at random, or we could also use all of them. However random pick doesn’t necessarily give a good representation, and selecting all of them isn’t always possible due to the sheer volume.

As an example an exponential decay graph is drawn on the board by the moderator with number of pull requests on the Y-axis and number of projects on the X-axis. The solution to getting a good sample is using stratified sampling. Explained a fellow student as taking x samples from every part of the distribution.

  • Bias sampling
  • Bias interpretation
  • Bias construct

References

[1]
W. Poncin, A. Serebrenik, and M. Van Den Brand, “Process mining software repositories,” in Software maintenance and reengineering (CSMR), 2011 15th european conference on, 2011, pp. 5–14.
[2]
E. Løhre and M. Jørgensen, “Numerical anchors and their strong effects on software development effort estimates,” J. Syst. Softw., vol. 116, no. C, pp. 49–56, Jun. 2016.
[3]
W. Van Der Aalst, “Process mining: Overview and opportunities,” ACM Transactions on Management Information Systems (TMIS), vol. 3, no. 2, p. 7, 2012.
[4]
W. Aalst, “Van der (2011). Process mining: Discovery, conformance and enhancement of business processes.” Springer-Verlag, Berlin.
[5]
C. W. Günther and A. Rozinat, “Disco: Discover your processes.” BPM (Demos), vol. 940, pp. 40–44, 2012.
[6]
B. F. Van Dongen, A. K. A. de Medeiros, H. Verbeek, A. Weijters, and W. M. Van Der Aalst, “The ProM framework: A new era in process mining tool support,” in International conference on application and theory of petri nets, 2005, pp. 444–454.
[7]
M. Gupta and A. Sureka, “Nirikshan: Mining bug report history for discovering process maps, inefficiencies and inconsistencies,” in Proceedings of the 7th india software engineering conference, 2014, p. 1.
[8]
V. Rubin, C. W. Günther, W. M. Van Der Aalst, E. Kindler, B. F. Van Dongen, and W. Schäfer, “Process mining framework for software processes,” in International conference on software process, 2007, pp. 169–181.
[9]
M. Gupta, A. Sureka, and S. Padmanabhuni, “Process mining multiple repositories for software defect resolution from control and organizational perspective,” in Proceedings of the 11th working conference on mining software repositories, 2014, pp. 122–131.
[10]
M. Gupta, “Improving software maintenance using process mining and predictive analytics,” in Software maintenance and evolution (ICSME), 2017 IEEE international conference on, 2017, pp. 681–686.