Analyzing Program Execution Flow Through Disassembly of main()
Compilation Stages of a Program
Understanding the compilation process is essential. For a source file like helloworld.c, gcc transforms it through several intermediate stages:
helloworld.c # Source code
helloworld.i # Preprocessed source after macro expansion
helloworld.S # Generated assembly code
helloworld.o # Relocatable object file
helloworld # Final executable after linking
Disassembling the Executable
The objdump utility can disassemble both object files and executables:
# Disassemble the final executable
objdump -S -M intel,i386 helloworld
# Disassemble the relocatable object file
objdump -S -M intel,i386 helloworld.o
A key difference emerges: the .o file lacks an .init section and the _start() entry point, which the linker adds to create a runnable executable. This confirms that the executable contains the necessary bootstrap code for the operating system to launch it.
Debugging with GDB
Using gdb with the layout asm command displays asesmbly code that matches the objdump output. To trace main()'s execution, start the program with starti, set a breakpoint at the first enstruction of main, and continue.
Observe the initial stack pointer (esp) and base pointer (ebp):
(gdb) info registers esp ebp
The first instruction, lea ecx, [esp+4], loads the address of argc into ecx. The +4 offset skips the return address on the stack.
Next, and esp, 0xfffffff0 aligns the stack pointer to a 16-byte boundary, which is required for SSE/SIMD instructions.
The instruction push DWORD PTR [ecx-4] stores the original stack pointer value for restoration after main() returns. This +4 then -4 sequence ensures ecx points correctly to the argument array.
Saving the caller's frame: push ebp preserves the previous base pointer, then mov ebp, esp establishes a new frame for main().
Register preservation: push ecx and push ebx save these registers. ecx holds the argv pointer, while ebx may be a value required by the runtime startup code (_start).
Allocating local variable space: sub esp, 16 reserves stack space for local variables while maintaining alignment.
On IA-32, the stack pointer must be aligned to 4 bytes at function entry. When using SSE instructions, GCC ensures 16-byte alignment.
Loading the instruction pointer: call 0x80483b0 <__x86.get_pc_thunk.ax> places the current eip into eax, as eip cannot be read directly.
Variable initialization: The code stores constants into local variable slots:
mov DWORD PTR [ebp-0x14], 0x3 ; int val_a = 3
mov DWORD PTR [ebp-0x10], 0x5 ; int val_b = 5
Check memory at these locations:
(gdb) x/xw $ebp-0x14
(gdb) x/xw $ebp-0x10
Addition operation: Values are loaded into registers, summed, and stored:
mov ecx, DWORD PTR [ebp-0x14]
mov edx, DWORD PTR [ebp-0x10]
add edx, ecx
mov DWORD PTR [ebp-0xc], edx ; int sum = val_a + val_b
Preparing for printf: The stack pointer is adjusted by 8 bytes (sub esp, 8) to maintain alignment before pushing arguments. The sum variable is pushed first (push DWORD PTR [ebp-0xc]), followed by the format string address.
Locating the format string: The instruction push eax - 0x1fec computes the address of the string literal. Find it in the disassembly:
.rodata:080484c0 080484c0: 2f 2f 20 25 64 0a 00 ; "// %d\n"
Check the runtime memory mapping to verify:
(gdb) info proc mappings
Calculate and print the address:
(gdb) print $eax - 0x1fec
The call printf executes the output. Afterward, add esp, 16 cleans up the arguments from the stack.
Function return: Set eax to zero as the return value. Restore the stack frame:
mov esp, ebp ; Deallocate local variables
pop ebp ; Restore caller's base pointer
ret ; Return to caller
This analysis reveals how the compiler translates C code into machine instructions, manages the stack, and interfaces with library functions.