Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Linux Basic I/O: File Descriptors, dup2 Redirection, and the Everything-Is-A-File Model

Notes 2

File Descriptors: The Array Index Underpinning I/O

Disk Files vs In-Memory Open Files

Files stored persistently on storage are called disk files. When a file is opened by a process, it is loaded from disk into memory, becoming an in-memory open file. This relationship mirrors that of programs on disk vs running processes in memory.

Since any system can have many processes, each opening multiple filess, the operating system must manage all open files efficiently. Following the kernel's common "describe first, organize later" management pattern:

  1. The kernel creates a struct file for every open file, which stores the file's metadata, content, and state information
  2. All struct file instances are linked into a global doubly linked list, so management becomes simple operations on the list.

To map which open files belong to which process, we need a mapping between processes and open files. When a process is created, the kernel sets up its task_struct (process control block), which includes a pointer to a files_struct structure. This structure holds an array of struct file * pointers, where each entry points to the struct file of an open file. The index of this array is the file descriptor (fd).

To perform I/O, the kernel uses the fd to index into this array, get the pointer to the struct file, and access the file's data. When you open a new file, the kernel adds the new struct file pointer to the first available empty slot in the array, and returns the index of that slot to the process.

Note: Writes to a file are first buffered in memory, and flushed to disk at a later time for performance.

File Descriptor Allocation Rule

Let's test the allocation rule with a simple example:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    int f1 = open("./first.log", O_WRONLY | O_CREAT, 0644);
    int f2 = open("./second.log", O_WRONLY | O_CREAT, 0644);
    int f3 = open("./third.log", O_WRONLY | O_CREAT, 0644);
    int f4 = open("./fourth.log", O_WRONLY | O_CREAT, 0644);
    
    printf("%d\n", f1);
    printf("%d\n", f2);
    printf("%d\n", f3);
    printf("%d\n", f4);
    
    close(f1);
    close(f2);
    close(f3);
    close(f4);
    return 0;
}

Running this code will show all fds starting at 3. What happens if we close the default 0 and 2 descriptors before opening new files?

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    close(STDIN_FILENO);
    close(STDERR_FILENO);

    int f1 = open("./first.log", O_WRONLY | O_CREAT, 0644);
    int f2 = open("./second.log", O_WRONLY | O_CREAT, 0644);
    int f3 = open("./third.log", O_WRONLY | O_CREAT, 0644);
    int f4 = open("./fourth.log", O_WRONLY | O_CREAT, 0644);
    
    printf("%d\n", f1);
    printf("%d\n", f2);
    printf("%d\n", f3);
    printf("%d\n", f4);
    
    close(f1);
    close(f2);
    close(f3);
    close(f4);
    return 0;
}

Now the output shows 0 and 2 are allocated to the new files. If we close 1:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    close(STDOUT_FILENO);

    int f1 = open("./first.log", O_WRONLY | O_CREAT, 0644);
    int f2 = open("./second.log", O_WRONLY | O_CREAT, 0644);
    int f3 = open("./third.log", O_WRONLY | O_CREAT, 0644);
    int f4 = open("./fourth.log", O_WRONLY | O_CREAT, 0644);
    
    printf("%d\n", f1);
    printf("%d\n", f2);
    printf("%d\n", f3);
    printf("%d\n", f4);
    
    close(f1);
    close(f2);
    close(f3);
    close(f4);
    return 0;
}

No output appearss on screen! This confirms the allocation rule: new file descriptors are always allocated starting from the smallest available unused index in the fd array.

Every new Linux process automatically has 3 default open file descriptors:

  • 0: Standard input, defaults to the keyboard device
  • 1: Standard output, defaults to the display
  • 2: Standard error, defaults to the display

This is a kernel feature, not a feature of any programming language — all user-space libraries follow this convention because it is mandated by the operating system. We can verify this with simple tests:

#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main() {
    const char *msg = "Hello standard output\n";
    write(STDOUT_FILENO, msg, strlen(msg));
    return 0;
}
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main() {
    const char *msg = "Hello standard error\n";
    write(STDERR_FILENO, msg, strlen(msg));
    return 0;
}
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main() {
    char buf[1024];
    ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
    if (n > 0) {
        buf[n] = '\0';
        printf("Received input: %s", buf);
    }
    return 0;
}

Redirection

Output Redirection

test where we close standard output and get no output on screen is actually a simple demonstration of output redirection:

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define TARGET_FILE "output.log"

int main() {
    int fd = open(TARGET_FILE, O_CREAT | O_WRONLY | O_TRUNC, 0666);
    if (fd < 0) {
        perror("open failed");
        return 1;
    }

    const char *line = "hello linux\n";
    for (int i = 0; i < 5; i++) {
        write(STDOUT_FILENO, line, strlen(line));
    }

    close(fd);
    return 0;
}

This code prints 5 lines to screen. If we modify it to close stdout before opening:

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define TARGET_FILE "output.log"

int main() {
    close(STDOUT_FILENO);
    int fd = open(TARGET_FILE, O_CREAT | O_WRONLY | O_TRUNC, 0666);
    if (fd < 0) {
        perror("open failed");
        return 1;
    }

    const char *line = "hello linux\n";
    for (int i = 0; i < 5; i++) {
        write(STDOUT_FILENO, line, strlen(line));
    }

    close(fd);
    return 0;
}

All 5 lines are now written to output.log instead of the display. This is output redirection. The core idea is: by changing what the file descriptor entry points to, we change where the I/O goes. When we close stdout, the entry 1 becomes free, so the new file gets allocated index 1. All code that writes to fd 1 (standard output) now writes to our new file.

Input Redirection

Input redirection follows exactly the same pattern: we redirect what would normally come from standard input (keyboard) to come from another file:

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    close(STDIN_FILENO);
    int fd = open("input.txt", O_RDONLY);
    if (fd < 0) {
        perror("open failed");
        exit(1);
    }
 
    char buf[64];
    ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
    if (n > 0) {
        buf[n] = '\0';
        printf("Read from file: %s\n", buf);
    }
 
    close(fd);
    return 0;
}

By closing stdin (0), we make 0 available, so the new file gets fd 0, and all reads from 0 now come from the file.

Append Redirection

Append redirection is just output redirection where we open the file with the O_APPEND flag, so new writes are added to the end of the file instead of overwriting existing content:

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    close(STDOUT_FILENO);
    int fd = open("output.log", O_WRONLY | O_APPEND);
    if (fd < 0) {
        perror("open failed");
        exit(1);
    }

    const char *line = "new appended line\n";
    write(STDOUT_FILENO, line, strlen(line));
 
    close(fd);
    return 0;
}

The dup2 System Call

Manually closing descriptors to get the right index works, but it's error-prone. Linux provides the dup2 system call to directly perform the redirection by copying the file pointer from one descriptor to another.

  • Function signature: int dup2(int oldfd, int newfd);
  • Behavior: Copies the struct file * pointer from fd_array[oldfd] to fd_array[newfd], overwriting the existing value at newfd.
  • Return value: Returns 0 on success, -1 on error.

Edge cases:

  1. If oldfd is not a valid open file descriptor, the call fails, and newfd is not modified.
  2. If oldfd is valid and oldfd == newfd, dup2 does nothing and returns newfd immediately.

Example of output redirection with dup2:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
    int fd = open("./output.txt", O_WRONLY | O_CREAT, 0644);
    dup2(fd, STDOUT_FILENO);
    close(fd); // Clean up the original descriptor, no longer needed

    printf("This text is redirected to the file\n");
    printf("No output appears on the screen\n");
    return 0;
}

After the copy, both fd and the target descriptor (1 in this case) point to the same struct file, so we can safely close the original fd to avoid descriptor leaks.

Append and input redirection with dup2 follow the same pattern:

// Append redirection example
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define OUTPUT_FILE "output.log"

int main() {
	int fd = open(OUTPUT_FILE, O_CREAT | O_WRONLY | O_APPEND, 0666);
	if (fd < 0) {
		perror("open failed");
		return 1;
	}

	dup2(fd, STDOUT_FILENO);
	close(fd);

	const char *msg = "appended hello linux\n";
	for (int i = 0; i < 5; i++) {
		write(STDOUT_FILENO, msg, strlen(msg));
	}

	return 0;
}
// Input redirection example
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define INPUT_FILE "input.log"

int main() {
	int fd = open(INPUT_FILE, O_RDONLY);
	if (fd < 0) {
		perror("open failed");
		return 1;
	}

	dup2(fd, STDIN_FILENO);
	close(fd);

	char buf[64];
	ssize_t n = read(STDIN_FILENO, buf, sizeof(buf) - 1);
	if (n > 0) {
		buf[n] = '\0';
		printf("Read content: %s\n", buf);
	}
	
	return 0;
}

Redirection for C Standard Library Functions

C standard I/O functions like printf and fprintf work perfectly with redirection done via dup2, because these functions ultimately use the kernel's file descriptor table under the hood:

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define OUTPUT_FILE "log.txt"

int main() {
	int fd = open(OUTPUT_FILE, O_CREAT | O_WRONLY | O_APPEND, 0666);
	if (fd < 0) {
		perror("open failed");
		return 1;
	}
	dup2(fd, STDOUT_FILENO);
	close(fd);

	printf("fd: %d\n", fd);
	printf("hello from printf\n");
	fprintf(stdout, "hello from fprintf\n");

	return 0;
}

All output from the C library functions is correctly redirected to the file.

Shell Redirection Operators

The common shell redirection operators > (overwrite output), >> (append output), < (input), << (here document) are all implemented using dup2 under the hood. When the shell sets up redirection before executing a user command, it creates the redirection by modifying the file descriptor table, and the changes are preserved across process execution.


Understanding Linux's Everything-Is-A-File Model

A key property of the Linux design is that redirection state (the mappings of file descriptors to open files) is preserved across execve (process replacement), which is how the shell can implement redirection for user programs.

To understand why everything is a file in Linux, we need to look at how the kernel abstracts different types of resources and devices:

  • Any resource or device that can be read from or written to is abstracted as an open file. When you open the device, the kernel creates a struct file just like it does for a regular disk file.
  • Each struct file includes a pointer to a struct file_operations, a structure that holds function pointers to device-specific implementations of core operations like read, write, open, and release.

When a process calls read(fd, buf, size):

  1. The kernel looks up struct file * from the fd array entry
  2. Follows the pointer to the file_operations table for that file
  3. Calls the device-specific read function stored in the table via the function pointer

This means that regardless of whether the target is a regular disk file, keyboard, display, network card, or block device, the process uses the exact same read/write/open/close interface to interact with it. The kernel handles dispatching to the correct device-specific implementation via the function pointers.

This abstraction is analogous to object-oriented polymorphism: struct file acts as a base class, and every device/file type implements its own operations based on the base interface. All resources look identical to the calling process, hence the "everything is a file" design. This abstraction layer is called the Virtual File System (VFS), which unifies access to all types of file systems and devices under a single interface.

Tags: LinuxI/O

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.