Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Efficient File I/O Techniques in Python

Tech May 8 4

Reading and Writing Text Files

In Python 2, strings are byte sequences by default (ASCII), while Unicode strings require a u'' prefix. All text processing should use Unicode internally, with explicit encoding/decoding at I/O boundaries to avoid corruption.

Python 3 simplifies this: the str type is Unicode by default, and bytes represent raw data (prefixed with b). The built-in open() function supports an encoding parameter for transparent conversion:

message = '你好'
with open('output.txt', 'w', encoding='utf-8') as f:
    f.write(message)

with open('output.txt', 'r', encoding='utf-8') as f:
    print(f.read())

Handling Binary Files

Binary formats like WAV audio files contain structured headers followed by raw data. To parse them:

  • Open in binary mode ('rb' or 'wb')
  • Use struct.unpack() to interpret fixed-size fields
  • Read directly into pre-allocated buffers (e.g., NumPy arrays) to efficiency
import struct
import numpy as np

def locate_chunk(file_obj, target):
    file_obj.seek(12)
    while True:
        name = file_obj.read(4)
        if len(name) < 4:
            raise ValueError("Chunk not found")
        size = struct.unpack('<I', file_obj.read(4))[0]
        if name == target:
            return file_obj.tell(), size
        file_obj.seek(size, 1)

with open('audio.wav', 'rb') as src:
    offset, data_size = locate_chunk(src, b'data')
    buffer = np.empty(data_size // 2, dtype=np.int16)
    src.readinto(buffer)
    buffer //= 8  # Reduce amplitude

    with open('processed.wav', 'wb') as dst:
        src.seek(0)
        dst.write(src.read(offset))
        buffer.tofile(dst)

Controlling File Buffering

File writes are buffered to minimize system calls. Buffering strategies include:

  • Full buffering: Data flushed when buffer fills (typically 4–8 KB)
  • Line buffering: Flushes on newline (\n), only in text mode
  • Unbuffered: Immediate writes (use buffering=0 for binary files)

Explicit buffer size control:

# Full buffering with custom size
with open('data.bin', 'wb', buffering=8192) as f:
    f.write(b'...')

# Line buffering (text mode only)
with open('log.txt', 'w', buffering=1) as f:
    f.write("Entry\n")  # Flushed immediately

# Unbuffered binary write
with open('realtime.bin', 'wb', buffering=0) as f:
    f.write(b'immediate')

Memory-Mapped Files

The mmap module maps files directly into memory, enabling:

  • Random access to large files without full loading
  • Shared memory between procesess
  • Direct hardware register access (e.g., /dev/mem)
import mmap

with open('/dev/fb0', 'r+b') as fb:
    screen_size = 8294400  # Example framebuffer size
    with mmap.mmap(fb.fileno(), screen_size) as mm:
        # Fill first half of screen with white (RGBA)
        mm[:screen_size//2] = b'\xff\xff\xff\x00' * (screen_size // 8)

Retrieving File Metadata

Use os.stat() to obtain detailed file attributes:

import os
import stat
import time

info = os.stat('script.py')
print(f"Size: {info.st_size} bytes")
print(f"Modified: {time.ctime(info.st_mtime)}")

# Check file type
if stat.S_ISREG(info.st_mode):
    print("Regular file")

# Check permissions
if info.st_mode & stat.S_IRUSR:
    print("User-readable")

Convenience functions in os.path simplify common checks:

os.path.isfile('data.txt')      # True if regular file
os.path.getsize('data.txt')     # File size in bytes
os.path.getmtime('data.txt')    # Modification time

Working with Temporary Files

For transient data that shouldn’t persist:

  • TemporaryFile(): Anonymous, auto-deleted on close
  • NamedTemporaryFile(): Has a filesystem path; auto-deleted unless delete=False
from tempfile import TemporaryFile, NamedTemporaryFile

# Anonymous temporary file
with TemporaryFile() as tf:
    tf.write(b'temporary data')
    tf.seek(0)
    print(tf.read(5))

# Named temporary file
with NamedTemporaryFile() as ntf:
    print(f"Path: {ntf.name}")
    ntf.write(b'shared data')

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.