Understanding Linux File I/O: System Interfaces, File Descriptors, and Redirection
File Operations in C
In C programming, file operations involve managing both file content and attributes. To access a file, it must first be opened, which loads its properties or data into memory. This process is governed by the von Neumann architecture, where operations like reading and writing occur in memory. Files can be categorized as memory files (opened files) and disk files (stored on disk).
File operations are performed by processes. When a process executes code that calls functions like fopen or write, it interacts with the file. Therefore, learning file operations is essentially about understanding the relationship between processes and opened files.
Writing to Files in C
To write data to a file in C, use the fopen function with appropriate modes. For example, to create and write to a file:
#include <stdio.h>
int main() {
FILE* file_handle = fopen("example.txt", "w");
if (file_handle == NULL) {
perror("fopen");
return 1;
}
for (int i = 0; i < 20; i++) {
fprintf(file_handle, "Line %d\n", i);
}
fclose(file_handle);
return 0;
}
If a filename is passed without a path, fopen creates the file in the current working directory of the process.
Current Working Directory
The current path refers to the working directory of the process. By default, this is the directory where the executable resides, but it can be changed using functions like chdir. For instance, to get the process ID and view its working directory:
#include <stdio.h>
#include <unistd.h>
int main() {
printf("Process ID: %d\n", getpid());
while (1) {
// Infinite loop to inspect process info
}
return 0;
}
Run this and check the process details to see the current working directory (cwd).
File Operation Modes
C provides several modes for file operations:
r: Read-only mode for existing text files.r+: Read-write mode for existing text files.w: Write-only mode; creates a new file or truncates an existing one.w+: Read-write mode with truncation.a: Append mode; writes data to the end of the file without overwriting.a+: Read and append mode.
For example, using append mode:
FILE* append_file = fopen("log.txt", "a");
if (append_file != NULL) {
fprintf(append_file, "New entry\n");
fclose(append_file);
}
This adds text to log.txt without clearing existing content.
Reading Files and Implementing cat
To read a file line by line, use fgets:
#include <stdio.h>
int main() {
FILE* input_file = fopen("log.txt", "r");
if (input_file == NULL) {
perror("fopen");
return 1;
}
char line_buffer[100];
while (fgets(line_buffer, sizeof(line_buffer), input_file) != NULL) {
printf("%s", line_buffer);
}
fclose(input_file);
return 0;
}
To mimic the cat command, read a file specified as a command-line argument:
#include <stdio.h>
int main(int argc, char* argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
return 1;
}
FILE* target_file = fopen(argv[1], "r");
if (target_file == NULL) {
perror("fopen");
return 1;
}
char text_chunk[80];
while (fgets(text_chunk, sizeof(text_chunk), target_file) != NULL) {
printf("%s", text_chunk);
}
fclose(target_file);
return 0;
}
File System Interfaces
In Linux, files encompass not just disk files but also devices like keyboards and printers. Various programming languages provide file operation interfaces, but they ultimately rely on Linux system calls.
System Calls and Encapsulation
System calls are low-level interfaces provided by the operating system for hardware access. Higher-level languages encapsulate these calls to simplify usage and ensure cross-platform compatibility. For example, printf in C internally uses system calls to write to the display.
Opening Files with open
The open system call is used to open files. Its prototype is:
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
int open(const char* pathname, int flags, mode_t mode);
pathname: File path.flags: Options likeO_RDONLY(read-only),O_WRONLY(write-only),O_CREAT(create if missing).mode: Permissions for new files (e.g.,0666).
Flags are passed using bitwise operations. For instance, to open a file for writing, creaitng it if necessary:
int file_desc = open("data.txt", O_WRONLY | O_CREAT, 0644);
if (file_desc < 0) {
perror("open");
return 1;
}
Closing, Writing, and Reading Files
Use close to close a file descriptor:
#include <unistd.h>
close(file_desc);
To write data:
char message[] = "Hello, file!\n";
write(file_desc, message, sizeof(message) - 1);
To read data:
char read_buffer[50];
ssize_t bytes_read = read(file_desc, read_buffer, sizeof(read_buffer));
if (bytes_read > 0) {
// Process read_buffer
}
Additional flags:
O_TRUNC: Truncate file to zero length when opening.O_APPEND: Append data to the end of the file.
For example, to open a file for appending:
int append_fd = open("log.txt", O_WRONLY | O_APPEND, 0644);
File Descriptors (fd)
File descriptors are integers returned by open, representing open files. Standard descriptors are:
0: Standard input (stdin)1: Standard output (stdout)2: Standard error (stderr)
New files receive descriptors starting from 3, as 0-2 are preallocated.
Underlying Mechanism
The operating system manages open files using a struct file for each, organized in a linked list. Each process has a files_struct containing an array of pointers to these structures. The array index corresponds to the file descriptor, establishing a mapping between processes and files.
Everything is a File in Linux
Linux treats hardware devices as files by abstracting them into struct file objects with function pointers for read/write operations. This abstraction allows uniform access via file descriptors.
fd Allocation Rule
File descriptors are allocated by finding the smallest unused index in the array. For example, if descriptor 0 is closed, a new file may receive fd 0.
Redirection
Redirection changes where input comes from or output goes to by modifying file descriptor mappigns.
Using dup2 for Redirection
The dup2 system call duplicates a file descriptor:
int dup2(int oldfd, int newfd);
It copies oldfd to newfd, closing newfd if necessary. For output redirection to a file:
int output_fd = open("output.txt", O_WRONLY | O_CREAT, 0644);
dup2(output_fd, STDOUT_FILENO); // STDOUT_FILENO is 1
close(output_fd);
printf("This goes to output.txt\n");
Append and Input Redirection
For append redirection, use O_APPEND:
int append_fd = open("log.txt", O_WRONLY | O_APPEND, 0644);
dup2(append_fd, STDOUT_FILENO);
close(append_fd);
For input redirection, use O_RDONLY:
int input_fd = open("input.txt", O_RDONLY);
dup2(input_fd, STDIN_FILENO); // STDIN_FILENO is 0
close(input_fd);
char user_input[100];
fgets(user_input, sizeof(user_input), stdin); // Reads from input.txt