Source Code Analysis and Natural Language Lab

SCANL is a diverse team of scientists dedicated to studying the latent connection between source code behavior and the natural language elements used to describe that behavior.


Program Comprehension and Textual Analysis.

There is a strong relationship between the natural language (e.g., found in identifiers) and behavior of source code; developers use this relationship to understand the code they read daily. We explore this relationship by studying rename refactorings, grammar patterns, and static source code analysis. Our goal is to support stronger techniques to automate identifier naming as well as support developers in reading and comprehending code more quickly. This is the research topic that underlies all other research we do. Please check our core research section to see what recent work we have done in this area.

Program Transformation and Refactoring

Program transformations allow us to modify code programmatically. It is important to ensure these techniques are safe, customizable, and easily integrated with today’s software development processes such that developers can, for example, migrate APIs or refactor. We support transformations both through our research on identifier naming and through the creation of flexible, easy-to-use techniques for creating and applying program transformations.

Static Source Code Analysis

A lot of our work relies on static analysis techniques, and most frequently we make use of the srcML Framework to normalize, transform, and analyze source code. Our lab supports several tools built on srcML in addition to hosting Dr. Emily Hill’s natural language framework, SWUM. We are dedicated to providing high-quality research tools and data sets for software research and development. Check our Github page regularly to see what we have to offer and feel free to contact us with questions.