Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Practical File Operations in Python

Tech May 17 2

Python's file operations are straightforward, using the built-in open() function to obtain a file handler for performing various tasks. The permissible actions are determined by the specified access mode.

Available access modes include: r, w, a, r+, w+, a+, rb, wb, ab, r+b, w+b, a+b. The default mode is r for read-only.

Read-Only Modes (r, rb)

data_file = open('document.txt', mode='r', encoding='utf-8')
data_content = data_file.read()
print(data_content)
data_file.close()

The encoding parameter is used to decode the file's content. UTF-8 is the most common encoding.

rb mode reads data in binary (bytes format) and does not accept a encoding parameter.

binary_file = open('document.txt', mode='rb')
binary_data = binary_file.read()
print(binary_data)
binary_file.close()
# Output (example): b'\xef\xbb\xbfSample Content...'

Binary read mode (rb) is essential for non-text files like images, audio, or video where data cannot be directly represented as text. It is also fundamental for streaming and file transfer operations.

File Paths:

  • Absolute Path: The full path from the root directory to the file.
  • Relative Path: The path relative to the location of the executing script (e.g., './data.txt', '../config.txt'). Using relative paths improves portability, as the project can be moved without breaking file references.

Methods for Reading Files:

  1. read(): Reads the entire file content into memory. Can cause memory issues with large files.

    file_handle = open('sample.txt', mode='r', encoding='utf-8')
    full_text = file_handle.read()
    print(full_text)
    
  2. read(size): Reads a specified number of characters (or bytes in rb mode). Subsequent reads continue from the current file position.

    file_handle = open('sample.txt', mode='r', encoding='utf-8')
    chunk_one = file_handle.read(4)  # Reads 4 characters
    chunk_two = file_handle.read(4)  # Reads the next 4 characters
    print(chunk_one, chunk_two)
    
  3. readline(): Reads a single line from the file, including the newline character (\n). The strip() method is often used to remove whitespace.

    file_handle = open('sample.txt', mode='r', encoding='utf-8')
    line_one = file_handle.readline()
    line_two = file_handle.readline()
    print(line_one.strip())
    print(line_two.strip())
    
  4. readlines(): Reads all lines into a list, with each line as an element. This also loads the entire file into memory.

    file_handle = open('sample.txt', mode='r', encoding='utf-8')
    lines_list = file_handle.readlines()
    for single_line in lines_list:
        print(single_line.strip())
    
  5. Iterating over the file object: The most memory-efficient method for reading large files, processing one line at a time.

    file_handle = open('sample.txt', mode='r', encoding='utf-8')
    for each_line in file_handle:
        print(each_line.strip())
    

Always close the file handler after operations using close().

Write-Only Modes (w, wb)

In write mode (w), if the file does not exist, it is created. If it exists, its content is erased before writing.

output_file = open('output.txt', mode='w', encoding='utf-8')
output_file.write('Initial Content')
output_file.flush()  # Ensures data is written to disk
output_file.close()

In wb (binary write) mode, strings must be encoded to bytes before writing.

bin_output = open('data.bin', mode='wb')
bin_output.write('Binary Data'.encode('utf-8'))
bin_output.close()

Append Modes (a, ab)

Append mode adds new data to the end of an existing file without overwriting previous content.

log_file = open('app.log', mode='a', encoding='utf-8')
log_file.write('New log entry\n')
log_file.close()

Read and Write (r+)

This mode allows both reading and writting. It is crucial to understand that the initial cursor position is at the start of the file, and operations depend on the order of reads and writes.

Recommended workflow: Read first, then write.

file_rw = open('test.txt', mode='r+', encoding='utf-8')
existing = file_rw.read()  # Cursor moves to end after read
file_rw.write('Appended Text')  # Write occurs at the end
print(existing)
file_rw.close()

Note: Writing before reading in r+ mode will overwrite content from the cursor's starting position.

Write and Read (w+)

This mode truncates (clears) the file upon opening, then allows writing and subsequent reading. The initial read after opening yields nothing.

file_wr = open('test_wplus.txt', mode='w+', encoding='utf-8')
file_wr.write('Sample')
file_wr.seek(0)  # Move cursor to start before reading
content = file_wr.read()  # Now content can be read
print(content)
file_wr.close()

Append and Read (a+)

This mode opens the file for appending. The cursor is posisioned at the end of the file, so a direct read() call returns an empty string unless the cursor is moved.

file_ar = open('test_aplus.txt', mode='a+', encoding='utf-8')
file_ar.write('Appended Line\n')
file_ar.seek(0)  # Move to beginning to read
old_content = file_ar.read()
print(old_content)
file_ar.close()

File Pointer and Truncation

  • seek(offset, whence): Moves the file pointer. offset is in bytes. whence is 0 (start), 1 (current), or 2 (end).
  • tell(): Returns the current file pointer position in bytes.
  • truncate(size=None): Truncates the file to size bytes. If size is omitted, truncates from the current position.

Important: In r+ mode, after any read operation, subsequent writes happen at the end. To truncate from a specific point, move the cursor with seek() first.

with open('example.dat', mode='r+', encoding='utf-8') as f:
    f.seek(10)  # Move to byte 10
    current_pos = f.tell()
    f.truncate()  # Deletes all content after byte 10
    f.write('Replacement')  # Writes starting at current (post-truncation) position

File Modification Patterns

Files cannot be modified in-place. The standard approach is to read from a source file, modify the content, and write to a new temporary file. Finally, replace the original file with the new one.

Method 1: In-Memory Replacement (for small files)

import os
with open('source.txt', 'r') as src, open('temp.txt', 'w') as dst:
    original_text = src.read()
    updated_text = original_text.replace('old_word', 'new_word')
    dst.write(updated_text)
os.replace('temp.txt', 'source.txt')  # Atomic replacement

Method 2: Line-by-Line Processing (for large files)

import os
with open('source.txt', 'r') as src, open('temp.txt', 'w') as dst:
    for line in src:
        new_line = line.replace('old_word', 'new_word')
        dst.write(new_line)
os.replace('temp.txt', 'source.txt')

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.