Intensive Software, Construction and Engineering

Scaling Software Engineering to Big Code with Machine Learning


One of the aims of modern software engineering is to automate the process of understanding software artifacts. This course focuses on techniques for analyzing programs to find, e.g., security vulnerabilities and errors, as well as understanding the software development process. You will gain a working knowledge of program analysis and its theoretical underpinning foundations rooted in abstract interpretation. The class starts with lectures on key concepts of program analysis including data flow analysis and abstract interpretation as well as applications to security, bug finding and testing. Topics in dynamic analysis and machine-learning will be covered to explore the tradeoff between soundness and scalability. The second part of the course will emfocuses on reading, presenting and validating research papers.

Workload and Grading

The expected workload is one day a week. Mostly reading and writing. Some opportunities for coding are in the second half but they are optional. The deliverables are:

  • One midterm exam.
  • Two, one-page, paper reviews.
  • One, in-class, presentation, done in groups.
  • One project, done in groups.

The project is the only open-ended task. While grades are besides the point in a research class, they are required and the expectation is for them to fall the range [B, B+, A-, A], where a B is below average, B+ and A- mean expectations were met, and A+ means awesome.

Textbook and Papers

The textbook is Principle of Program Analysis (PPA) by F. Nielson, H.R. Nielson and C. Hankin. (An old version of the text is on ResearchGate here. The newest edition has 476 pages.)

  • Fairness Testing: Testing Software for Discrimination FSE 2017. [link]](https://dl.acm.org/citation.cfm?id=3106277)
  • Do Automatically Generated Unit Tests Find Real Faults? ASE 2015. link
  • QuickChecking Static Analysis Properties STVR 2017. link
Bug finding
  • Finding Broken Promises in Asynchronous JavaScript Programs OOPSLA 2018. link
  • ExceLint: Automatically Finding Spreadsheet Formula Errors OOPSLA 2018. link
  • To Type or Not to Type: Quantifying Detectable Bugs in JavaScript ICSE 2017. link
  • Automated Identification of Security Issues from Commit Messages and Bug Reports FSE 2017. link
  • ZENIDS: Introspective Intrusion Detection for PHP Applications ICSE 2017.. link
  • Active Learning of Points-To Specifications PLDI 2018. link
  • P/Taint: Unified Points-to and Taint Analysis OOPSLA 2017. link
  • Modular static analysis of string manipulations in C programs SAS 2018. link
Big Code
  • Automatically Diagnosing and Repairing Error Handling Bugs in C FSE 2017. link
  • Recovering Clear, Natural Identifiers from Obfuscated JS Names FSE 2017. link


Working in groups students, you will select a research paper, obtain its data and code and perform a reproduction study, write a report, and give a lecture.

Reproduction study

There are three alternatives of increasing difficulty. (1) A repetition is the simplest kind of reproduction where you simply run the code provided by the authors on their data and report on the results. (2) A reanalysis starts from the author’s code and data, but attempts to evaluate if the results are robust in the presence of changes to the methodology. (3) And finally, an independent reproduction is a reimplementation of the original in different conditions and with different data with the aim of determining if the same conclusions can still be obtained.


Your report should do the following: (1) summarize the paper’s approach and results, (2) describe how you validated the results, and (3) summarize what you found. In simple cases, when all is as claimed, the report can be just a couple of pages. If there are interesting differences a report could be extended into a full paper.


A lecture is 90 minutes long including time for questions and discusion. The goal is to give an in-depth presentation of the paper, placing it in context of previous research and talk about is strenghs and limitation.


“Jan Vitek is the most horrible professor I’ve ever met. […] he made students read and learn all by themselves. […] He kind of enjoys watching students working so hard over night and night” Be warned.