Blog‎ > ‎


posted Oct 1, 2010, 11:11 AM by Thanh Nguyen
I will present the following paper at the Working Conference on Reverse Engineering 2010 - Oct 13-16, 2010 in Beverly, MA:
Nguyen, T. H. D., Adams, B. and Hassan, A. E. 2010. A Case Study of Bias in Bug-Fix Datasets. In Working Conference on Reverse Engineering, Beverly, Massachusetts, Accepted.


Software quality researchers build software quality models by recovering traceability links between bug reports in issue tracking repositories and source code files. However, all too often the data stored in issue tracking repositories is not explicitly tagged or linked to source code. Researchers have to resort to heuristics to tag the data (e.g., to determine if an issue is a bug report or a work item), or to link a piece of code to a particular issue or bug.

Recent studies by Bird et al. and by Antoniol et al. suggest that software models based on imperfect datasets with missing links to the code and incorrect tagging of issues, exhibit biases that compromise the validity and generality of the quality models built on top of the datasets. In this study, we verify the effects of such biases for a commercial project that enforces strict development guidelines and rules on the quality of the data in its issue tracking repository. Our results show that even in such a perfect setting, with a near-ideal dataset, biases do exist -- leading us to conjecture that biases are more likely a symptom of the underlying software development process instead of being due to the used heuristics.

Hope to see you there.