Towards Partially Evaluating Symbolic Interpreters for All
by Shangyin Tan, Guannan Wei, Tiark Rompf
Symbolic execution is a program analysis technique to automatically explore the execution space of programs by treating some inputs symbolically. To efficiently perform symbolic execution, one emerging way is to construct a compiler that translates input programs to symbolic programs without the interpretation overhead. Previous work has explored compiling nondeterministic symbolic execution by partially evaluating a symbolic interpreter.
In this paper, we follow a semantics-first approach and investigate compiling concolic execution and backward symbolic execution by multi-stage programming. In particular, we construct variants of staged symbolic interpreters that can be partially evaluated using the Lightweight Modular Staging (LMS) framework. We demonstrate our approach using a simple low-level intermediate representation (IR) and evaluate the prototype implementations for the LLVM IR. Our concolic compiler shows comparable performance to SymCC, a state-of-the-art concolic compiler, and our backward symbolic compiler solves tasks that are difficult for forward execution engines. The demonstrated approach shows a unifying methodology that can be applied to compiling diverse flavors of symbolic execution.
This short paper was presented at the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation 2022 (PEPM 2022).
This section briefly explains how to reproduce our experiments reported in the paper. We conducted all the experiments on an Intel i7-8750H machine running Ubuntu 20.04 with 16 GB main memory. If you have any questions beyond the content in the paper and this page, feel free to contact the authors.
dev-clean/src/main/scala/sai/concimplements the concolic compiler.
dev-clean/src/main/scala/sai/ccbseimplements the backward symbolic execution compiler.
2.1 Concolic Execution Compiler
Our concolic execution compiler is available here, and the benchmark programs for concolic execution is located here. Since our tool operates on the LLVM IR, we provide a script to compile all the C programs to the corresponding LLVM IR.
Benchmarking this tool
We implement the proposed concolic compiler in
ConcOpt.scala. To run the concolic compiler, we provide a simple function in
def runConc(file: String, proj: String, fname: String, args: List[Int])
In this function,
file is the path to the target LLVM IR program;
proj is the name for the generated folder;
fname is the name of the entry function; and
args is an optional (concrete) argument list to the target function. Running this function will compile the target LLVM IR source code to C++ code in
conc_gen/proj, which perform the concolic execution. Then, we can navigate to this directory and compile the C++ files to an executable.
sai/dev-clean $ cd ./conc_gen/proj sai/dev-clean/conc_gen/proj $ make -j 4
For simplicity, the name of the executable will be
proj, too. Then we can run the executable and obtain the test cases in
We evaluate our benchmarks using SymCC’s simple Z3 backend. To compile SymCC with Z3 backend, follow the instructions and when running the
cmake command, set the flag
The benchmark program set is here. Because the way we accept input is different from SymCC, we modify the input handling part of the program.
To compile the target program with SymCC, run
symcc target.c -o target. This yields an executable
a.out. We provide the input seeds (binary form) we used to run these programs here. We use the following command to benchmark SymCC:
$ target < seeds/target.in &> target.out
2.2 Backward Symbolic Execution Compiler
Our concolic execution compiler is available here, and thebenchmark programs for concolic execution is located here. As we previously said, we provide a makefile to compile all the programs to LLVM IR at once.
We implement the proposed backward symbolic execution (CCBSE) compiler in
Engine.scala. To run the CCBSE compiler, we provide a simple function in
def runCCBSE(file: String, proj: String, fname: String)
proj are similar to the arguments of the concolic compiler. Instead giving the entry function, we assume the entry function to be
main for the CCBSE compiler.
fname here is the function contains the target (erroneous) location. In our benchmarks, we implement the target location as a call to the
target() function. The
runCCBSE function will generate a folder for the desired LLVM IR program
ccbse_gen/proj that contains all the compiled C++ source code Same as we did in the concolic compiler, we can change to this directory and run
make to obtain the executable. This executable performs the call-chain backward symbolic execution. The input leads to the target location will be stored in