The purpose of a seminar is to read and discuss papers critically:
A good question:
Types of good questions:
D Characterize the following questions:
In some cases, actually assigning roles may help the discussion:
Another form of role playing is debating. Groups are assigned two opposing views and try to defend them until an agreement is reached.
Encourage everyone to participate:
Help participants summarize and articulate what they ’ve learned
Be honest, open and inviting.
Most importantly: Keep notes!
Befor each session:
At the end of each session:
We will discuss the paper by Zimmerman et al. “Cross project defect prediction” [1]
Why do this research?
What did the authors do?
The authors took several versions of 12 projects, extracted features and trained a logistic regression model to predict whether a component is defect prone.
Then, they trained a model on a project version A to predict defect proness on project version B. They fail to do so with good prec / recal.
Then, they quantify the effect of project similarity on the prediction accuracy, and also attempt to rank the features in terms of how they contribute to cross-project prediction.
What did the authors find?
The paper authors argue (and present evidence) that cross-project defect prediction does not work
Through feature ranking, they find that the number of samples, use of specific technologies and average churn contribute the most to cross project predictive power.
Why are the results important?
“A consequence for research is that rather than increasing the precision and recall of models by some small percentage, it should focus on how to make defect prediction work across projects and relevant for a wide audience. We believe that this will be an important trend for software engineering in general. Learn from one project, to improve another.”
Is the definition of defect proness satisfying; what could a more fine-grained instance of it look like?
If you where to extend the prediction model, what features would you use?
Why do the authors need to check similarity of features?
Are there any alternative ways to check feature similarity?
by Thomas Kluiters
The discussion was opened by the moderator introducing the question on how one would summarize the paper.
Multiple students gave their summary, however, the moderator noted how a summary should be told like a story.
After summarizing the paper the moderator asked the students if there were any technical questions about the paper.
General questions asked:
The moderator asked the first question to the audience and the group agreed that a technical defect is “Anything that’s a bug, something that doesn’t conform to the specification.”
The second question was shortly answered as it was a factual question and could be found in the paper, the authors explored post-release defects.
A remark was made on how these days we can apply patches to systems that contain defects and fix bugs much quicker.
Another student asked another technical question: “Why did the authors choose to use logistic regression?”
The moderator agreed that this was debatable and asked the students first, “What is logistic regression?”
The students then defined logistic regression and agreed on it’s definition.
The follow-up question was “What other methods could the authors have used? Why would they use logistic regression?”
A student responded how logistic regression is very simple to apply and easy to explain, thus, in the interest of simplicity the authors used logistic regression.
The moderator noted the different ways we can classify data these days, going over SVMs, Random forests and decision trees.
A student formulated his opinion on the choice of logistic regression: “It’s an arguable decision to use logistic regression, as logistic regression does not seem fit.”
Another student disagreed: “It makes sense to use logistic regression as it’s interpretable by humans, in other words, ‘explainable’ to humans.”
A third student disagreed to this and argued that Neural Networks can be explained and will outperform logistic regression considerably.
The moderator raised a controversial opinion: “Perhaps the authors did not know any better?”
The instructor added to this opinion: “The paper was published in 2009, is it possible for the authors to know about neural networks then?”
The moderator continues: “The paper is published in 2009, is it possible for Neural Networks to not be as popular?”
A student answered: “They should have known, though, it would yield the best result.”
The instructor also mentions model fit: “What about the model fit? Did they report the model fit?” (A model fit explains how well a model fits into the data).
The students respond by saying how the paper does not meson the model fit.
The moderator then raises a new question: “Now I have a question about the modeling used by the authors, they have used their own set of 40 features… Do you think there is any feature missing?”
Some students reply:
The moderator continues to ask the students about the features: “Do they have any process features?”
Multiple students give answers, and agree on the fact that the authors could have chosen more process features.
A small discussion was held on what the difference is between a product feature and a process feature. The moderator makes a comment on the fact that the authors were constrained by the data they were given, instead of gathering their own data. Furthermore, the moderator continued how researchers either choose to use quantitative data or qualitative data.
The students agreed on the fact that the authors should form a hypothesis on what features would be deemed interesting to keep, instead of using all of them.
The moderator then raises the question: “In your opinion, can the results of this paper be generalized?”
Lastly, the moderator asks the students: “A a higher level of abstraction what’s the message of this paper.”
After a short discussions the moderator and students agree on the following: “The value of this paper is the message it conveys, doing cross-project research does not really work.”