Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Extracting Common Archive Formats with Python

Notes May 12 2

Python's standard and third-party libraries offer functions to decompress various archive types. The common formats are .gz, .tar, .tgz, .zip, and .rar.

Format Overview

  • .gz (Gzip): Typically compresses a single file. It is often used in combinatino with tar for multiple files.
  • .tar (Tape Archive): A Unix archival format that bundles files without compression.
  • .tgz / .tar.gz: A tar archive compressed with Gzip.
  • .zip: A common format that compresses multiple files individually.
  • .rar: A proprietary archive format with higher compression ratios, generally slower than ZIP.

Decompressing Gzip Files (.gz)

Gzip files usual contain one item. The following function extracts a .gz file.

import gzip

def extract_gzip(gzip_path):
    """Decompresses a .gz file."""
    output_path = gzip_path.rstrip('.gz')
    with gzip.open(gzip_path, 'rb') as compressed_file:
        file_data = compressed_file.read()
    with open(output_path, 'wb') as output_file:
        output_file.write(file_data)

Extracting Tar Archives (.tar, .tgz, .tar.gz)

Tar archives can contain multiple files and directories. The tarfile module handles extraction, including compressed variants.

import tarfile
import os

def extract_tar(tar_path):
    """Extracts a tar archive, optionally compressed (e.g., .tar.gz)."""
    extract_dir = os.path.splitext(tar_path)[0] + '_extracted'
    os.makedirs(extract_dir, exist_ok=True)
    with tarfile.open(tar_path, 'r:*') as archive:
        archive.extractall(path=extract_dir)

Use 'r:*' to open archives with transparent compression detection.

Extracting ZIP Files (.zip)

The zipfile module provides functionality for ZIP archives.

import zipfile
import os

def extract_zip(zip_path):
    """Extracts a .zip archive."""
    extract_dir = os.path.splitext(zip_path)[0] + '_extracted'
    os.makedirs(extract_dir, exist_ok=True)
    with zipfile.ZipFile(zip_path, 'r') as archive:
        archive.extractall(extract_dir)

Extracting RAR Files (.rar)

RAR files require the third-party rarfile library. Install it via pip install rarfile.

import rarfile
import os

def extract_rar(rar_path):
    """Extracts a .rar archive."""
    extract_dir = os.path.splitext(rar_path)[0] + '_extracted'
    os.makedirs(extract_dir, exist_ok=True)
    with rarfile.RarFile(rar_path, 'r') as archive:
        archive.extractall(extract_dir)

Creating Tar Archives

The tarfile module also supports creating archives. The arcname parameter controls the name of the file inside the archive.

import tarfile
import os

def create_tar(source_dir, output_tar_path, compression=''):
    """Creates a tar archive from a directory."""
    # Define mode based on compression
    mode_map = {'': 'w', 'gz': 'w:gz', 'bz2': 'w:bz2'}
    mode = mode_map.get(compression, 'w')
    with tarfile.open(output_tar_path, mode) as tar:
        for root, dirs, files in os.walk(source_dir):
            for file in files:
                full_path = os.path.join(root, file)
                # Store file with its base name only
                tar.add(full_path, arcname=file)

The mode for tarfile.open() determines compression:

  • 'w' or 'w:': No compression.
  • 'w:gz': Gzip compression.
  • 'w:bz2': Bzip2 compression.
  • 'r:*': Read with automatic compression detection.

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.