Introduction

Compiling and executing a C program is a multi-stage process. In this post I’ll walk through each stages of compiling and executing the following C program with filename test.c:

#include <stdio.h>

#define LOOP_TIMES 10

int main(int argc, char *argv[])
{
    for (int i = 0; i < LOOP_TIMES; i++)
    {
        printf("Hello World #%i!\n", i);
    }
    return 0;
}

Our testings are performs on Debian Bullseye AMD64, intermediate result may vary depending on the OS and hardware.

Preprocessing

The first stage of compilation is called preprocessing. In this stage, the C pre-processor is responsible for handling pre-processor directives (lines starting with a # character). These pre-processor directives form a simple macro language with its own syntax and semantics. This language is used to reduce repetition in source code, e.g. lines with #include are replaced by the contents of the referenced file (with different search rules for names in quotes versus those in angle brackets). Names introduced with #define are systematically replaced with their definitions throughout the program, #if and its relatives are processed to conditionally omit code, etc…

To get the result of the preprocessing stage, we can pass -E option to gcc

gcc -E -o test.i test.c

The output after preprocessing stage in my machine look like following

// ... omitted for brevity
# 873 "/usr/include/stdio.h" 3 4

# 2 "test.c" 2




# 5 "test.c"
int main(int argc, char *argv[])
{
    for (int i = 0; i < 10; i++)
    {
        printf("Hello World #%i!\n", i);
    }
    return 0;
}

Compilation

In this stage, the actual compiler translates pre-processed source into assembly language. These form an intermediate human-readable language. The existence of this step allows for C code to contain inline assembly instructions and for different assemblers to be used. To get the result of the compilation stage, pass the -S option to gcc:

gcc -S -o test.s test.i

The output after compilation stage in my machine look like following

    .file	"test.c"
    .text
    .section	.rodata
.LC0:
    .string	"Hello World #%i!\n"
    .text
    .globl	main
    .type	main, @function
main:
.LFB0:
    .cfi_startproc
    pushq	%rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq	%rsp, %rbp
    .cfi_def_cfa_register 6
    subq	$32, %rsp
    movl	%edi, -20(%rbp)
    movq	%rsi, -32(%rbp)
    movl	$0, -4(%rbp)
    jmp	.L2
.L3:
    movl	-4(%rbp), %eax
    movl	%eax, %esi
    leaq	.LC0(%rip), %rdi
    movl	$0, %eax
    call	printf@PLT
    addl	$1, -4(%rbp)
.L2:
    cmpl	$9, -4(%rbp)
    jle	.L3
    movl	$0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size	main, .-main
    .ident	"GCC: (Debian 10.2.1-6) 10.2.1 20210110"
    .section	.note.GNU-stack,"",@progbits

Assembly

During this stage, The assembler converts the assembly language source to an unlinked relocatable object file in ELF format. The output contains the actual instructions to be run by the target processor. However, an unlinked relocatable object file is not executable yet: it may require definitions from other files, including libraries. To get the result of the assembly stage, pass the -c option to gcc:

gcc -c -o test.o test.s

or we can manually invoke as

as -o test.o test.s

Running the above command will produce an unlinked relocatable object file in ELF format named test.o. We can inspect the ELF sections with readelf -a test.o | less and to see the content of specific section we can use readelf -x .text test.o.

Linking

The object files generated in the assembly stage is composed of machine instructions that the processor understands, but some pieces of the program are out of order or missing. The linker resolves all the references in a set of object files or archive so that functions in some pieces can successfully call functions in other ones, and then produces an executable. To get the final executable use following command, -v option give us detail information of linking process.

gcc -v -o test.elf test.o

We can also manually invoke linker separately using ld to get the final executable.

GLIBC_LIB_DIR="/usr/lib/x86_64-linux-gnu"
GCC_LIB_DIR="/usr/lib/gcc/x86_64-linux-gnu/10"
STARTFILES="$GLIBC_LIB_DIR/crt1.o $GLIBC_LIB_DIR/crti.o"
ENDFILES="$GLIBC_LIB_DIR/crtn.o"
ld -o test.elf -dynamic-linker /lib64/ld-linux-x86-64.so.2 $STARTFILES test.o $GLIBC_LIB_DIR/libc.so $ENDFILES

The final executable is also a ELF file.

ELF

Executable and Linkable Format (ELF) is a common standard file format used in UNIX system for executable files, object code, shared libraries, and core dumps.

Execution

At first, it seems when a program is executed, it starts with the int main(int argc, char *argv[]), however it is not quite true.

Load Executable with Interpreter

Firstly, when we try to run a program, it trigger an execve system call to the kernel. The kernel allocates the structure linux_binprm for a new process, open the executable file from disk, find the corresponding interpreter for the executable, in case of our C program executable in ELF format is then executed with ELF loader.

Load Dynamic Linker

The ELF loader read program headers table of executable which contains a field INTERP. For dynamically linked program INTERP is the path to dynamic linker. We can use readelf --program-headers test.elf to see the program headers table and use readelf -x .interp test.elf to see the value of INTERP, its value is /lib64/ld-linux-x86-64.so.2 in my machine. The kernel opens and reads the dynamic linker executable in ELF format.

Auxiliary Vector

Kernel uses a special structure called the auxiliary vector or auxv to comminicate with dymanic linker. Kernel prepares auxv and pass auxv by putting on the stack for the newly created program. Thus, when the dynamic linker starts it can use its stack pointer to find the all the startup information required. It contains system specific information that may be required, such as the default size of a virtual memory page on the system or hardware capabilities. We can request the dynamic linker to show some debugging output of the auxv by specifying the environment value LD_SHOW_AUXV=1

Call Dynamic Linker with Program Entry Point

Kernel looks for the e_entry field from the ELF header of our program executable which contains the entry point address which by default is symbol _start. We can examine the entry point with objdump -f test.elf. We can use option --entry=<symbol name> of ld to change entry point to other symbol.

Kernel adds the value of e_entry to auxv. Kernel then starts the execution from the entry point address as specified by dynamic linker.

Dynamic Linker

Investigating the dynamic linker with command objdump -f /lib64/ld-linux-x86-64.so.2 and objdump --disassemble --section=.text /lib64/ld-linux-x86-64.so.2 we found the entry point of dynamic linker is function _dl_rtld_di_serinfo. It does some linking process on the fly by loading any libraries as specified in the dynamic section of the program executable in ELF format and then continue execution from our program executable entry point address which was passed in.

Kernel Library

To avoid the overheads of system calls by triggering a trap to the processor which is slow. Kernel loads a shared library (ref: #1, #2, #3) into the address space of every newly created process which contains a function that makes system calls for you. When the kernel starts the dynamic linker it adds an entry AT_SYSINFO_EHDR to the auxv structure (ref: #1, #2) which is the address in the memory that the special kernel library lives in. When the dynamic linker starts it can look for the AT_SYSINFO_EHDR pointer, and if found load that library for the program. The program has no idea this library exists; this is a private arrangement between the dynamic linker and the kernel.

The programmers make system calls indirectly through calling functions in the standard C library. The standard C library can check to see if the special kernel binary is loaded, and if so use the functions within that to make system calls. If the kernel determines the hardware is capable, this will use the fast system call method.

The role of _start function

As you might have already noticed, in the linking section we have to include somes extras files, this is because the symbol _start is defined in crt1.o (Some systems use crt0.o, while some use crt1.o and a few even use crt2.o or higher). It takes care of bootstrapping the initial execution of the program, e.g. setup arguments, prepare environment variables for program execution etc. What exactly that entails is highly libc implementation dependent. The objects are provided by different implementations of libc and cannot be mixed with other ones.

The following code is disassembled version of _start with objdump --disassemble=_start test.elf:

0000000000401040 <_start>:
          401040:	      31 ed                	xor    %ebp,%ebp
          401042:	      49 89 d1             	mov    %rdx,%r9
          401045:	      5e                   	pop    %rsi
          401046:	      48 89 e2             	mov    %rsp,%rdx
          401049:	      48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
          40104d:	      50                   	push   %rax
          40104e:	      54                   	push   %rsp
          40104f:	      49 c7 c0 10 11 40 00 	mov    $0x401110,%r8        # __libc_csu_fini
          401056:	      48 c7 c1 b0 10 40 00 	mov    $0x4010b0,%rcx       # __libc_csu_init
          40105d:	      48 c7 c7 71 10 40 00 	mov    $0x401071,%rdi       # our main function
          401064:	      ff 15 86 2f 00 00    	callq  *0x2f86(%rip)        # 403ff0 <__libc_start_main@GLIBC_2.2.5>
          40106a:	      f4                   	hlt    

On glibc 2.31, _start initializes very early ABI requirements (like the stack or frame pointer), setting up the argc/argv/env values, and then pass pointers of __libc_csu_init, __libc_csu_fini and main function to __libc_start_main which in turn does more general bootstrapping before finally calling the real main function.

The implementation of __libc_start_main is quite complicated as it needs to be portable across the very wide number of systems and architectures that glibc can run on. It does a number of specific things related to setting up the C library which the most of the programmers don’t need to worry about.

Initialization and Termination Routines

init and fini are two special parts of code in shared libraries that may need to be called before the library starts, and before the library is unloaded respectively. This might be useful for library programmers to setup variables when the library is started, or to clean up at the end. __libc_start_main call the __libc_csu_init before calling our main function and register __libc_csu_fini as a callback to be called before program exit with __cxa_atexit. What __libc_csu_init/__libc_csu_fini do is simply loop the list of init/fini function and invokes them.

In order to traverse the list of init functions, two symbols __init_array_start and __init_array_end is defined during the linking process and exported as part of ELF symbol table .symtab.

We can use __attribute__((constructor)) and __attribute__((destructor)) (ref: #1) to add initialization and termination routines to our program, e.g.

void __attribute__((constructor)) program_init(void)
{
    printf("init\n");
}

void __attribute__((destructor)) program_fini(void)
{
    printf("fini\n");
}

In the new realease of glibc the process of fini was changed as part of this commit.

Call Main Function

Once __libc_start_main has completed with the initialization it finally calls the main function! Remember that it had the stack setup initially with the arguments and environment pointers from the kernel; this is how main gets its argc, argv[], envp[] arguments.

Exit

When the main function returns __libc_start_main call void exit(int exit_code) with return value of main function as exit code. The implementation of exit is trigger a syscall exit_group (ref: #1, #2, #3, #4, #5, #6) to immediately stops the current process.

Writing program without startfiles

Now we know how the call to the main is made. We can override the _start function to make it call our main().

#include <stdio.h>
#include <stdlib.h>

#define LOOP_TIMES 10

void _start()
{
    exit(main());
}

int main(void)
{
    for (int i = 0; i < LOOP_TIMES; i++)
    {
        printf("Hello World #%i!\n", i);
    }
    return 0;
}

Now we have to force gcc to use our implementation of _start().

gcc -nostartfiles -o test.elf test.c

We can also manually invoke ld:

gcc -c -o test.o test.c
GLIBC_LIB_DIR="/usr/lib/x86_64-linux-gnu"
GCC_LIB_DIR="/usr/lib/gcc/x86_64-linux-gnu/10"
ld -o test.elf -dynamic-linker /lib64/ld-linux-x86-64.so.2 test.o $GLIBC_LIB_DIR/libc.so

Reference