Questions for Data Scientists in Software Engineering: A Replication

by Huijgens, Hennie and Rastogi, Ayushi and Mulders, Ernst and Gousios, Georgios and Deursen, Arie van

You can get a pre-print version from here.
You can view the publisher's page here.


In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft hold for other software companies, including software-intensive companies with different primary focus (to which we refer as software-defined enterprises). Furthermore, it is not evident that the problems identified five years ago are still applicable, given the technological advances in software engineering. This paper presents a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions. This paper presents a comprehensive guide of questions for data scientists selected from the previous study at Microsoft along with our current work at ING. We replicated the original Microsoft study at ING, looking for questions that impact both software companies and software-defined enterprises and continue to impact software engineering. We also add new questions that emerged from differences in the context of the two companies and the five years gap in between. Our results show that software engineering questions for data scientists in the software-defined enterprise are largely similar to the software company, albeit with exceptions. We hope that the software engineering research community builds on the new list of questions to create a useful body of knowledge.

Bibtex record

  author = {Huijgens, Hennie and Rastogi, Ayushi and Mulders, Ernst and Gousios, Georgios and Deursen, Arie van},
  title = {Questions for Data Scientists in Software Engineering: A Replication},
  year = {2020},
  isbn = {9781450370431},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {},
  doi = {10.1145/3368089.3409717},
  booktitle = {Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  pages = {568–579},
  numpages = {12},
  keywords = {Software Analytics, Data Science, Software Engineering},
  location = {Virtual Event, USA},
  series = {ESEC/FSE 2020}

The paper