Here is a list of papers/problems to study.
Again, we specifically encourage you to propose new ideas! source{d}, who is using a lot of ML4SE, also keeps a repository with interesting papers on topic: https://github.com/src-d/awesome-machine-learning-on-source-code. Feel free to explore it!
Code translation. Translate source code from one language to the other, e.g., from Java to C# or, maybe more interesting to industry, Cobol to Java. See Chen et al. [1] as reference. As a possible dataset to be explored, coding websites, such as Codeforces and Rosetta Code, contain the same problem implemented in multiple different languages.
Code completion. IDEs have been suggesting code completion for years now. However, the use of DL brings us new possibilities: suggesting more contextual code completion. Researchers have been showing that this is indeed a tricky task [2]. This project is about replicating (or improving) upon this paper.
Type inference. Inferring the type of a variable, especially in dynamically typed languages, can be a challenge. Hellendoorn [3] has shown us that DL techniques can indeed be very precise in this task. This project aims at replicating this paper.
API usage. Developers often need help in learning how to use an API. Can we provide developers with API usage, given some natural text? Gu et al. [4] and Liu et al [5] showed that this is possible. Your project here is to replicate one of these papers.
Mutation testing. In mutation testing, we mutate the original program and check if the existing test cases are able to find the error. Large companies, such as Google, have been adopting mutation testing, but not without its challenges [6]. In particular, given the size of programs, the number of possible mutants is enormous; thus, prioritizing which mutants to generate is currently an open problem. Tufano et al. [7] proposed the use of deep learning to learn which mutants are really relevant, based on bug fixes.
Logging strategies. Identifying where to log is a hard task in large systems; on one hand, you don’t want to log too much; on the other hand, if you don’t log an important part of the code, you might miss information to debug a crash. Researchers have been empirically studying how developers decide where to log [8], and have been proposing supervised ML techniques to suggest improvements in log lines [9] [10]. In this project, you will study whether NLP based approaches provide better results.
Anomaly detection in logs. Under construction.
Log reduction. Modern software systems generate lots of runtime information, that developers need to examine in order to identify causes of failures, when those happen. With this project, you will build a tool that given a log and a label (e.g. pass/fail) it will learn a model that automatically identifies the important lines in an input log.
Code refactoring. Maintaining (bad) source code is not an easy task. And, although industry has widely adopted linters, they have a well-known problem: the number of false positives [11]. We conjecture that ML-based techniques will be able to provide more useful refactoring to developers. In this task, you will train ML models to recommend (or maybe even automatically applying) refactorings. See the RefactoringMiner tool, which might help you in collecting real-world refactorings.
Flaky tests. Flaky tests are tests that present non-deterministic behavior (i.e., tests that sometimes pass, sometimes fail). Mark Harman, Facebook’s senior scientist, mentioned that flaky test is an important problem at Facebook. Google says that 1.5% of their 4.2M tests present flaky behavior at some point [12]. Researchers have been empirically investigating the problem. Luo et al. [13] noticed that async wait, concurrency, and test order dependency are the most common causes for flakiness. Lam et al. [14] recently developed iDFlakies, a tool that aims at identifying tests that are flaky due to order execution. We ask: can the use of ML help us in identifying flaky tests?
Tagging Algorithm. Whenever solving coding challenges, e.g., the ones from CodeForces, you have to choose a strategy: will you apply dynamic programming? Will you apply brute force? Does it involve probabilities? Labeling a piece of code with such tags might be really useful to education. Or, somewhat related, given the textual description of the problem, can we suggest solution strategies? Current and closest (non ML) related work on this topic aims at inferring the algorithm complexity based on Java bytecode [15].
Programming styles. A recent paper [16] in the PL field caused lots of stir by (very vocally) refuting the findings of a Comm. ACM highlight paper[17]. Both papers try to quantify the effects of programming language use on error-proneness of software. However, their approach is too coarse-grained, as we can write any style of code in any programming language. What is needed is a more fine-grained approach, that e.g. links code styles (e.g. functional, imperative or declarative) to bugs. With this project, you will build an automated program style detector by feeding an ML solution with code in functional and imperative styles and let it learn to differentiate between the two.
The course contents are copyrighted (c) 2018 - onwards by TU Delft and their respective authors and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.