Practical File Operations in Python
Python's file operations are straightforward, using the built-in open() function to obtain a file handler for performing various tasks. The permissible actions are determined by the specified access mode.
Available access modes include: r, w, a, r+, w+, a+, rb, wb, ab, r+b, w+b, a+b. The default mode is r for read-only.
Read-Only Modes (r, rb)
data_file = open('document.txt', mode='r', encoding='utf-8')
data_content = data_file.read()
print(data_content)
data_file.close()
The encoding parameter is used to decode the file's content. UTF-8 is the most common encoding.
rb mode reads data in binary (bytes format) and does not accept a encoding parameter.
binary_file = open('document.txt', mode='rb')
binary_data = binary_file.read()
print(binary_data)
binary_file.close()
# Output (example): b'\xef\xbb\xbfSample Content...'
Binary read mode (rb) is essential for non-text files like images, audio, or video where data cannot be directly represented as text. It is also fundamental for streaming and file transfer operations.
File Paths:
- Absolute Path: The full path from the root directory to the file.
- Relative Path: The path relative to the location of the executing script (e.g.,
'./data.txt','../config.txt'). Using relative paths improves portability, as the project can be moved without breaking file references.
Methods for Reading Files:
-
read(): Reads the entire file content into memory. Can cause memory issues with large files.file_handle = open('sample.txt', mode='r', encoding='utf-8') full_text = file_handle.read() print(full_text) -
read(size): Reads a specified number of characters (or bytes inrbmode). Subsequent reads continue from the current file position.file_handle = open('sample.txt', mode='r', encoding='utf-8') chunk_one = file_handle.read(4) # Reads 4 characters chunk_two = file_handle.read(4) # Reads the next 4 characters print(chunk_one, chunk_two) -
readline(): Reads a single line from the file, including the newline character (\n). Thestrip()method is often used to remove whitespace.file_handle = open('sample.txt', mode='r', encoding='utf-8') line_one = file_handle.readline() line_two = file_handle.readline() print(line_one.strip()) print(line_two.strip()) -
readlines(): Reads all lines into a list, with each line as an element. This also loads the entire file into memory.file_handle = open('sample.txt', mode='r', encoding='utf-8') lines_list = file_handle.readlines() for single_line in lines_list: print(single_line.strip()) -
Iterating over the file object: The most memory-efficient method for reading large files, processing one line at a time.
file_handle = open('sample.txt', mode='r', encoding='utf-8') for each_line in file_handle: print(each_line.strip())
Always close the file handler after operations using close().
Write-Only Modes (w, wb)
In write mode (w), if the file does not exist, it is created. If it exists, its content is erased before writing.
output_file = open('output.txt', mode='w', encoding='utf-8')
output_file.write('Initial Content')
output_file.flush() # Ensures data is written to disk
output_file.close()
In wb (binary write) mode, strings must be encoded to bytes before writing.
bin_output = open('data.bin', mode='wb')
bin_output.write('Binary Data'.encode('utf-8'))
bin_output.close()
Append Modes (a, ab)
Append mode adds new data to the end of an existing file without overwriting previous content.
log_file = open('app.log', mode='a', encoding='utf-8')
log_file.write('New log entry\n')
log_file.close()
Read and Write (r+)
This mode allows both reading and writting. It is crucial to understand that the initial cursor position is at the start of the file, and operations depend on the order of reads and writes.
Recommended workflow: Read first, then write.
file_rw = open('test.txt', mode='r+', encoding='utf-8')
existing = file_rw.read() # Cursor moves to end after read
file_rw.write('Appended Text') # Write occurs at the end
print(existing)
file_rw.close()
Note: Writing before reading in r+ mode will overwrite content from the cursor's starting position.
Write and Read (w+)
This mode truncates (clears) the file upon opening, then allows writing and subsequent reading. The initial read after opening yields nothing.
file_wr = open('test_wplus.txt', mode='w+', encoding='utf-8')
file_wr.write('Sample')
file_wr.seek(0) # Move cursor to start before reading
content = file_wr.read() # Now content can be read
print(content)
file_wr.close()
Append and Read (a+)
This mode opens the file for appending. The cursor is posisioned at the end of the file, so a direct read() call returns an empty string unless the cursor is moved.
file_ar = open('test_aplus.txt', mode='a+', encoding='utf-8')
file_ar.write('Appended Line\n')
file_ar.seek(0) # Move to beginning to read
old_content = file_ar.read()
print(old_content)
file_ar.close()
File Pointer and Truncation
seek(offset, whence): Moves the file pointer.offsetis in bytes.whenceis 0 (start), 1 (current), or 2 (end).tell(): Returns the current file pointer position in bytes.truncate(size=None): Truncates the file tosizebytes. Ifsizeis omitted, truncates from the current position.
Important: In r+ mode, after any read operation, subsequent writes happen at the end. To truncate from a specific point, move the cursor with seek() first.
with open('example.dat', mode='r+', encoding='utf-8') as f:
f.seek(10) # Move to byte 10
current_pos = f.tell()
f.truncate() # Deletes all content after byte 10
f.write('Replacement') # Writes starting at current (post-truncation) position
File Modification Patterns
Files cannot be modified in-place. The standard approach is to read from a source file, modify the content, and write to a new temporary file. Finally, replace the original file with the new one.
Method 1: In-Memory Replacement (for small files)
import os
with open('source.txt', 'r') as src, open('temp.txt', 'w') as dst:
original_text = src.read()
updated_text = original_text.replace('old_word', 'new_word')
dst.write(updated_text)
os.replace('temp.txt', 'source.txt') # Atomic replacement
Method 2: Line-by-Line Processing (for large files)
import os
with open('source.txt', 'r') as src, open('temp.txt', 'w') as dst:
for line in src:
new_line = line.replace('old_word', 'new_word')
dst.write(new_line)
os.replace('temp.txt', 'source.txt')