Home > Tech > Content

Efficient File I/O Techniques in Python

Tech May 8 4

Reading and Writing Text Files

In Python 2, strings are byte sequences by default (ASCII), while Unicode strings require a u'' prefix. All text processing should use Unicode internally, with explicit encoding/decoding at I/O boundaries to avoid corruption.

Python 3 simplifies this: the str type is Unicode by default, and bytes represent raw data (prefixed with b). The built-in open() function supports an encoding parameter for transparent conversion:

message = '你好'
with open('output.txt', 'w', encoding='utf-8') as f:
    f.write(message)

with open('output.txt', 'r', encoding='utf-8') as f:
    print(f.read())

Handling Binary Files

Binary formats like WAV audio files contain structured headers followed by raw data. To parse them:

Open in binary mode ('rb' or 'wb')
Use struct.unpack() to interpret fixed-size fields
Read directly into pre-allocated buffers (e.g., NumPy arrays) to efficiency

import struct
import numpy as np

def locate_chunk(file_obj, target):
    file_obj.seek(12)
    while True:
        name = file_obj.read(4)
        if len(name) < 4:
            raise ValueError("Chunk not found")
        size = struct.unpack('<I', file_obj.read(4))[0]
        if name == target:
            return file_obj.tell(), size
        file_obj.seek(size, 1)

with open('audio.wav', 'rb') as src:
    offset, data_size = locate_chunk(src, b'data')
    buffer = np.empty(data_size // 2, dtype=np.int16)
    src.readinto(buffer)
    buffer //= 8  # Reduce amplitude

    with open('processed.wav', 'wb') as dst:
        src.seek(0)
        dst.write(src.read(offset))
        buffer.tofile(dst)

Controlling File Buffering

File writes are buffered to minimize system calls. Buffering strategies include:

Full buffering: Data flushed when buffer fills (typically 4–8 KB)
Line buffering: Flushes on newline (\n), only in text mode
Unbuffered: Immediate writes (use buffering=0 for binary files)

Explicit buffer size control:

# Full buffering with custom size
with open('data.bin', 'wb', buffering=8192) as f:
    f.write(b'...')

# Line buffering (text mode only)
with open('log.txt', 'w', buffering=1) as f:
    f.write("Entry\n")  # Flushed immediately

# Unbuffered binary write
with open('realtime.bin', 'wb', buffering=0) as f:
    f.write(b'immediate')

Memory-Mapped Files

The mmap module maps files directly into memory, enabling:

Random access to large files without full loading
Shared memory between procesess
Direct hardware register access (e.g., /dev/mem)

import mmap

with open('/dev/fb0', 'r+b') as fb:
    screen_size = 8294400  # Example framebuffer size
    with mmap.mmap(fb.fileno(), screen_size) as mm:
        # Fill first half of screen with white (RGBA)
        mm[:screen_size//2] = b'\xff\xff\xff\x00' * (screen_size // 8)

Retrieving File Metadata

Use os.stat() to obtain detailed file attributes:

import os
import stat
import time

info = os.stat('script.py')
print(f"Size: {info.st_size} bytes")
print(f"Modified: {time.ctime(info.st_mtime)}")

# Check file type
if stat.S_ISREG(info.st_mode):
    print("Regular file")

# Check permissions
if info.st_mode & stat.S_IRUSR:
    print("User-readable")

Convenience functions in os.path simplify common checks:

os.path.isfile('data.txt')      # True if regular file
os.path.getsize('data.txt')     # File size in bytes
os.path.getmtime('data.txt')    # Modification time

Working with Temporary Files

For transient data that shouldn’t persist:

TemporaryFile(): Anonymous, auto-deleted on close
NamedTemporaryFile(): Has a filesystem path; auto-deleted unless delete=False

from tempfile import TemporaryFile, NamedTemporaryFile

# Anonymous temporary file
with TemporaryFile() as tf:
    tf.write(b'temporary data')
    tf.seek(0)
    print(tf.read(5))

# Named temporary file
with NamedTemporaryFile() as ntf:
    print(f"Path: {ntf.name}")
    ntf.write(b'shared data')

Tags: Python file-io Performance binary-data

Back to List

Prev: Docker Installation on Ubuntu and Essential Commands Guide

Next: Working with JLabel Components and Custom Icons in Swing

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Fading Coder

Efficient File I/O Techniques in Python

Reading and Writing Text Files

Handling Binary Files

Controlling File Buffering

Memory-Mapped Files

Retrieving File Metadata

Working with Temporary Files

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Efficient File I/O Techniques in Python

Reading and Writing Text Files

Handling Binary Files

Controlling File Buffering

Memory-Mapped Files

Retrieving File Metadata

Working with Temporary Files

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment