Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Understanding C Program Compilation and Preprocessing

Tech May 14 1

Translation and Execution Environments

In any ANSI C implementation, there are two distinct environments:

  • The translation environment, where source code is converted into executable machine instructions
  • The execution environment, which actually runs the code
When a source file like source.c needs to produce output on screen, it must first become an executable program (.exe). The process of converting .c to .exe is the translation environment, consisting of compilation and linking. Compilation itself is divided into preprocessing, compilation proper, and assembly phases. After translation, we get an executable file.

In the execution environment, the program is loaded into memory, uses a stack for local variables and return addresses, computes results, and prints output to the screen if requested.

Detailed Compilation and Linking

Pre-compiled Knowledge

Each source file (.c) is processed by the compiler to generate object files. Multiple object files are then processed by the linker along with libraries to create an executable program. Functions like printf() are library functions stored in link libraries. When a program uses external functions, the linker must include their dependent libraries.

IDEs like VS2019 are integrated development environments containing editing, compilation, linking, and debugging capabilities. More precisely:
  • Editor for editing functionality
  • Compiler (cl.exe in VS2019) for compilation
  • Linker (link.exe in VS2019) for linking
  • Debugger for debugging functionality

Translation Environment

Let's examine compilation and linking using Linux CentOS 7 with vim editor and gcc compiler.

Compilation
Consider these source files:

math.c
#include <stdio.h>

int calculate(int x, int y)
{
    return x + y;
}
main.c
#include "math.c"

int main()
{
    int num1 = 10;
    int num2 = 20;
    
    int result = calculate(num1, num2);
    printf("result = %d\n", result);
    return 0;
}
Preprocessing (Macro Expansion)
The preprocessing stage handles directives beginning with #. Using gcc -E stops compilation after preprocessing:

gcc -E main.c -o main.i
The preprocessor performs:
  • Header file expansion
  • Comment removal
  • Macro symbol substitution
For example, with added macros and comments:

#include "math.c"

// Macro definition
#define LIMIT 100;

int main()
{
    int value = LIMIT;
    int num1 = 10;
    int num2 = 20;
    
    int result = calculate(num1, num2);
    printf("result = %d\n", result + value);
    
    return 0;
}
After preprocessing, comments disappear and LIMIT is replaced with 100.

Compilation (Assembly Generation)
Using gcc -S stops after compilation, generating assembly code:

gcc -S main.i
This creates main.s containing assembly instructions. The compilation process includes:
  • Syntax analysis
  • Lexical analysis
  • Semantic analysis
  • Symbol summarization
Assembly (Machine Code Generation)
Using gcc -c converts assembly to machine code, creating an object file:

gcc -c main.s
This generates main.o (or .obj in Windows), containing binary instructions and a symbol table.

Linking
Let's modify the code to separate the function declaration:

math.h
#pragma once
#include <stdio.h>

#define LIMIT 100
int calculate(int x, int y);
math.c
int calculate(int x, int y)
{
    return x + y;
}
main.c
#include "math.h"

int main()
{
    int num1 = 10;
    int num2 = 20;
    
    int result = calculate(num1, num2);
    printf("result = %d\n", result);
    return 0;
}
Now compile both source files:

gcc math.c main.c
The linker performs segment table merging and symbol table relocation, creating the executable.

Execution Environment

Program execution involves:
  1. Loading the program into memory (handled by OS or manually)
  2. Calling the main function
  3. Executing code using a runtime stack for local variables and return addresses, and static memory for variables with persistent values
  4. Terminating the program (normally or unexpectedly)

Preprocessing Details

Predefined Symbols

C provides predefined symbols for file information:

__FILE__      // Source file being compiled
__LINE__      // Current line number
__DATE__      // Compilation date
__TIME__      // Compilation time
__STDC__      // 1 if compiler follows ANSI C
Example usage:

int main()
{
    printf("File: %s\n", __FILE__);
    printf("Line: %d\n", __LINE__);
    printf("Date: %s\n", __DATE__);
    printf("Time: %s\n", __TIME__);
    return 0;
}

#define Directive

Defining Identifiers
Basic syntax: #define name stuff

Examples:

#define MAXIMUM 1000
#define REG register
#define INFINITE_LOOP for(;;)
#define CASE break;case
Note: Don't add semicolons after #define definitions.

Defining Macros
Macros resemble functions but perform text replacement:

#define SQUARE(x) x * x
Problem with precedence:

int result = SQUARE(5 + 1); // Becomes 5 + 1 * 5 + 1 = 11
Solution: Use parentheses:

#define SQUARE(x) ((x) * (x))
#define DOUBLE(x) ((x) + (x))
Replacement Rules
  1. Check parameters for other #define symbols and replace them first
  2. Insert replacement text at the original location
  3. Scan again for #define symbols and repeat
Notes:
  • Macros cannot be recursive
  • String contents are not searched for replacements
# and ## Operators
# converts parameters to strings, ## concatenates tokens:

#define PRINT(value, format) printf("The value of " #value " is " format "\n", value);
#define CONCAT(x, y) x##y

int number = 42;
PRINT(number, "%d"); // Prints: The value of number is 42

int version = 7;
printf("%d", CONCAT(ver, sion)); // Prints: 7
Macros with Side Effects
Avoid macros with parameters that have side effects:

#define MAX(x, y) ((x) > (y) ? (x) : (y))

int a = 3, b = 4;
int max = MAX(++a, ++b); // Undefined behavior!
Macros vs Functions
Macros advantages:
  • Faster execution (no function call overhead)
  • Type independent
  • Can do things functions cannot (like type-specific operations)
Macros disadvantages:
  • Increases code size
  • Cannot debug
  • Lack type checking
  • Operator precedence issues
Naming convention: Use all caps for macros, mixed case for functions.

#undef Directive

Removes a macro definition:

#define TEMP 100
int val = TEMP; // Works
#undef TEMP
int val2 = TEMP; // Error: TEMP undefined

Command Line Definitions

Define symbols during compilation:

gcc -D ARRAY_SIZE=10 program.c

Conditional Compilation

Include code conditionally:

#if DEBUG_MODE
    // Debug code
#endif

#ifdef OS_UNIX
    // Unix-specific code
#elif defined(OS_WINDOWS)
    // Windows-specific code
#endif

File Inclusion

Local files: #include "filename" (searches current directory first)

Library files: #include <filename> (searches standard library paths)

Preventing Multiple Inclusion
Use include guards:

#ifndef HEADER_NAME_H
#define HEADER_NAME_H

// Header contents

#endif
Or:

#pragma once

Other Preprocessor Directives

Additional directives include #error, #pragma, and #line.

Implementing offsetof

Macro to simulate offsetof:

#define OFFSETOF(type, member) (size_t)&(((type*)0)->member)

struct Example {
    char c;
    int i;
    double d;
};

int main() {
    printf("Offset of c: %zu\n", OFFSETOF(struct Example, c));
    printf("Offset of i: %zu\n", OFFSETOF(struct Example, i));
    printf("Offset of d: %zu\n", OFFSETOF(struct Example, d));
    return 0;
}

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.