Towards Partially Evaluating Symbolic Interpreters for All

by Shangyin Tan, Guannan Wei, Tiark Rompf

1 Abstract

Symbolic execution is a program analysis technique to automatically explore the execution space of programs by treating some inputs symbolically. To efficiently perform symbolic execution, one emerging way is to construct a compiler that translates input programs to symbolic programs without the interpretation overhead. Previous work has explored compiling nondeterministic symbolic execution by partially evaluating a symbolic interpreter.

In this paper, we follow a semantics-first approach and investigate compiling concolic execution and backward symbolic execution by multi-stage programming. In particular, we construct variants of staged symbolic interpreters that can be partially evaluated using the Lightweight Modular Staging (LMS) framework. We demonstrate our approach using a simple low-level intermediate representation (IR) and evaluate the prototype implementations for the LLVM IR. Our concolic compiler shows comparable performance to SymCC, a state-of-the-art concolic compiler, and our backward symbolic compiler solves tasks that are difficult for forward execution engines. The demonstrated approach shows a unifying methodology that can be applied to compiling diverse flavors of symbolic execution.

[Paper] [Artifact] [Presentation](not available yet)

This short paper was presented at the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation 2022 (PEPM 2022).

2 Experiment

This section briefly explains how to reproduce our experiments reported in the paper. We conducted all the experiments on an Intel i7-8750H machine running Ubuntu 20.04 with 16 GB main memory. If you have any questions beyond the content in the paper and this page, feel free to contact the authors.

We implement our concolic execution compiler and backward symbolic execution compiler as extensions to the existing symbolic execution compiler LLSC. We make the source code available here.

dev-clean/src/main/scala/sai/conc implements the concolic compiler.
dev-clean/src/main/scala/sai/ccbse implements the backward symbolic execution compiler.

2.1 Concolic Execution Compiler

Our concolic execution compiler is available here, and the benchmark programs for concolic execution is located here. Since our tool operates on the LLVM IR, we provide a script to compile all the C programs to the corresponding LLVM IR.

Benchmarking this tool

We implement the proposed concolic compiler in ConcOpt.scala. To run the concolic compiler, we provide a simple function in Driver.scala:

def runConc(file: String, proj: String, fname: String, args: List[Int])

In this function, file is the path to the target LLVM IR program; proj is the name for the generated folder; fname is the name of the entry function; and args is an optional (concrete) argument list to the target function. Running this function will compile the target LLVM IR source code to C++ code in conc_gen/proj, which perform the concolic execution. Then, we can navigate to this directory and compile the C++ files to an executable.

sai/dev-clean $ cd ./conc_gen/proj
sai/dev-clean/conc_gen/proj $ make -j 4

For simplicity, the name of the executable will be proj, too. Then we can run the executable and obtain the test cases in conc_gen/proj/tests.

Benchmarking SymCC

We evaluate our benchmarks using SymCC’s simple Z3 backend. To compile SymCC with Z3 backend, follow the instructions and when running the cmake command, set the flag -DQSYM_BACKEND=OFF.

The benchmark program set is here. Because the way we accept input is different from SymCC, we modify the input handling part of the program.

To compile the target program with SymCC, run symcc target.c -o target. This yields an executable a.out. We provide the input seeds (binary form) we used to run these programs here. We use the following command to benchmark SymCC:

$ target < seeds/target.in &> target.out

2.2 Backward Symbolic Execution Compiler

Our concolic execution compiler is available here, and thebenchmark programs for concolic execution is located here. As we previously said, we provide a makefile to compile all the programs to LLVM IR at once.

We implement the proposed backward symbolic execution (CCBSE) compiler in Engine.scala. To run the CCBSE compiler, we provide a simple function in Driver.scala:

def runCCBSE(file: String, proj: String, fname: String)

The file and proj are similar to the arguments of the concolic compiler. Instead giving the entry function, we assume the entry function to be main for the CCBSE compiler. fname here is the function contains the target (erroneous) location. In our benchmarks, we implement the target location as a call to the target() function. The runCCBSE function will generate a folder for the desired LLVM IR program file at ccbse_gen/proj that contains all the compiled C++ source code Same as we did in the concolic compiler, we can change to this directory and run make to obtain the executable. This executable performs the call-chain backward symbolic execution. The input leads to the target location will be stored in ccbse_gen/proj/tests.