If you need an introduction to AFL, you have probably missed out a lot in the instrumented binary fuzzing saga for the past couple of years. afl-fuzz(fuzzer part of this toolset) is extremely fast, easy to use and requires minimal configuration. Technical details of AFL are available here. All this awesomeness is written in C, a language that I almost never used. So I wanted to try and understand the implementation i.e How ideas were translated to code in AFL.
Before proceeding further, it is recommended to read through afl compile time instrumentation. Now, what about the black box binaries for which source code is unavailable?? Instrumentation is used to
One way is to parse the given binary and rewrite it along with the instrumentation (afl-dyninst).
QEMU is also a process emulator that lets you run different architectures on a single machine by doing dynamic translation.
Read qemu binary translation (All subsequent qemu internals' images are taken from this presentation). QEMU can
{.align-center}
Let us walk through an abstracted qemu execution run
{.align-center}
{.align-center}
We need to find the function in qemu that gets called for executing a translated block. Keep in mind that qemu and the binary run in the same process, so this allows us to write instrumentation in C and patch qemu source.
cur_loc = (cur_loc >> 4) ^ (cur_loc << 8);
cur_loc &= MAP_SIZE - 1;
/* Implement probabilistic instrumentation by looking at scrambled block
address. This keeps the instrumented locations stable across runs. */
if (cur_loc >= afl_inst_rms) return;
afl_area_ptr[cur_loc ^ prev_loc]++;
prev_loc = cur_loc >> 1;
Same as compile time instrumentation
QEMU specific tweaks
cpu_tb_exec() is responsible for executing a TB and information such as pc address is available there. If you recall the compile time instrumentation where we used random constants for tracing, here we can use pc address of basic block as the constant.
/* Execute a TB, and fix up the CPU state afterwards if necessary */ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb) { CPUArchState *env = cpu->env_ptr; uintptr_t ret; TranslationBlock *last_tb; int tb_exit; uint8_t *tb_ptr = itb->tc.ptr; /* AFL Instrumentation here */ if(itb->pc == afl_entry_point) { afl_setup(); afl_forkserver(cpu); } afl_maybe_log(itb->pc); /* End AFL Instrumentation here */ qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc, "Trace %d: %p [" TARGET_FMT_lx "/" TARGET_FMT_lx "/%#x] %s\n", cpu->cpu_index, itb->tc.ptr, itb->cs_base, itb->pc, itb->flags, lookup_symbol(itb->pc)); ....
tb_find() is responsible for finding a TB based on current state. This function takes care of cache lookup and calls tb_gen_code() incase of translation required. We can add afl_request_tsl() here to signal forkserver to translate and keep this block in its memory for future clones. The parameters required for translation are constructed into a struct and passed.
struct afl_tsl t; if (!afl_fork_child) return; t.pc = pc; t.cs_base = cb; t.flags = flags; if (write(TSL_FD, &t, sizeof(struct afl_tsl)) != sizeof(struct afl_tsl)) return;
elfload.patch to record the afl_entry_poiunt, afl_start_code & afl_end_code. These attributes are used in afl_maybe_log() for some bounds check.
syscall.patch to pass the right pid and tgid incase of SIGABRT on forkserver.
These are just plain C ports of the existing assembly.
PS: Considering what QEMU is capable of, I was amazed by the simplicity of this patch which required no major modifications to afl-fuzz.