Expected start date2023
Estimated duration3 ans
ContactArnaud Blouin, Djamel Eddine Khelladiprenom.email@example.com
Detecting and fixing bugs that threaten the stability of software systems is crucial to the industry. For example, recent research work studied patches that fixed security issues in Java code to understand how to prevent them in future development . The literature is also vast on studying code histories to: understand how and why issues appear [2,3,4]; how to leverage such histories to alarm on possible issues to come .
However, it is crucial to also be able to study the software issues while scaling on large evolution histories. Indeed, large software systems do evolve with high frequencies with several commits (i.e., changes) per day or even per hour. Over years of evolution, a given history can reach hundreds of thousands of commits. Hence, posing scalability challenges for any analysis on the whole histories of software systems.
In our recent work we proposed HyperAST , a novel approach to capture, incrementally in an optimized way, in a single AST (Abstract Syntax Tree) the numerous ASTs a code history like Git contains in a raw format (i.e. one AST for each commit). In terms of scaling, HyperAST now enables large scale temporal code analyses (i.e. code analyses on large code histories), such as code evolution analyses, history-based security analyses, efficient code element tracking. In terms of features, HyperAST provides its users with an API at the expected level of abstraction for working on different versions of code. The work on HyperAST work received an ACM SIGSOFT Distinguished Paper Award at the ASE 2022 conference (A* core-ranked).
Thanks to its ability to scale and its features, HyperAST opens new research perspectives in terms of temporal code analysis. The goal of this PhD is to propose novel scientific contributions for analyzing large code histories.
The goals are multiple:
To do so, the candidate will leverage HyperAST. The mentioned scientific contributions require:
1/ working on polyglot temporal code analysis and adding HyperAST polyglot support. This requires to put in relation ASTs from different languages stored in a single HyperAST to then perform code analyses, produce fixes, or co-evolve code.
2/ performing large scale code analyses to study the coding process that lead to issues. Learning processes can then be used to build prediction models or serve for providing fixes or ingredients of fixes.
3/ exploring the co-evolution challenge on top of the HyperAST in interdependent projects or libraries and clients projects.
The candidate will work in the DiverSE team, common to IRISA and Inria. The DiverSE team is located in Rennes. DiverSE’s research is in the field of software engineering. The team is actively involved in European, French and industrial projects and is composed of 9 professors/researchers, 20 PhD students, 4 post-docs and 3 engineers. The main supervisors of the thesis will be Arnaud Blouin and Djamel Eddine Khelladi. The candidate will enrol in the doctoral school in computer science of the University of Rennes 1.
We are looking for exceptional and motivated candidates for this 3-year PhD. The candidate must have (or be about to obtain) a master’s or engineering degree in computer science. A mastery of scientific English is necessary. Knowledge of French is not required. Gross monthly salary: around 2050 € (years 1 and 2) then around 2150 €.
The candidates must send their application to Arnaud Blouin and Djamel Eddine Khelladi (arnaud.blouin at irisa.fr, djamel.khelladi at irisa.fr), with the following documents: