Extracting Common Archive Formats with Python
Python's standard and third-party libraries offer functions to decompress various archive types. The common formats are .gz, .tar, .tgz, .zip, and .rar.
Format Overview
- .gz (Gzip): Typically compresses a single file. It is often used in combinatino with
tarfor multiple files. - .tar (Tape Archive): A Unix archival format that bundles files without compression.
- .tgz / .tar.gz: A
tararchive compressed with Gzip. - .zip: A common format that compresses multiple files individually.
- .rar: A proprietary archive format with higher compression ratios, generally slower than ZIP.
Decompressing Gzip Files (.gz)
Gzip files usual contain one item. The following function extracts a .gz file.
import gzip
def extract_gzip(gzip_path):
"""Decompresses a .gz file."""
output_path = gzip_path.rstrip('.gz')
with gzip.open(gzip_path, 'rb') as compressed_file:
file_data = compressed_file.read()
with open(output_path, 'wb') as output_file:
output_file.write(file_data)
Extracting Tar Archives (.tar, .tgz, .tar.gz)
Tar archives can contain multiple files and directories. The tarfile module handles extraction, including compressed variants.
import tarfile
import os
def extract_tar(tar_path):
"""Extracts a tar archive, optionally compressed (e.g., .tar.gz)."""
extract_dir = os.path.splitext(tar_path)[0] + '_extracted'
os.makedirs(extract_dir, exist_ok=True)
with tarfile.open(tar_path, 'r:*') as archive:
archive.extractall(path=extract_dir)
Use 'r:*' to open archives with transparent compression detection.
Extracting ZIP Files (.zip)
The zipfile module provides functionality for ZIP archives.
import zipfile
import os
def extract_zip(zip_path):
"""Extracts a .zip archive."""
extract_dir = os.path.splitext(zip_path)[0] + '_extracted'
os.makedirs(extract_dir, exist_ok=True)
with zipfile.ZipFile(zip_path, 'r') as archive:
archive.extractall(extract_dir)
Extracting RAR Files (.rar)
RAR files require the third-party rarfile library. Install it via pip install rarfile.
import rarfile
import os
def extract_rar(rar_path):
"""Extracts a .rar archive."""
extract_dir = os.path.splitext(rar_path)[0] + '_extracted'
os.makedirs(extract_dir, exist_ok=True)
with rarfile.RarFile(rar_path, 'r') as archive:
archive.extractall(extract_dir)
Creating Tar Archives
The tarfile module also supports creating archives. The arcname parameter controls the name of the file inside the archive.
import tarfile
import os
def create_tar(source_dir, output_tar_path, compression=''):
"""Creates a tar archive from a directory."""
# Define mode based on compression
mode_map = {'': 'w', 'gz': 'w:gz', 'bz2': 'w:bz2'}
mode = mode_map.get(compression, 'w')
with tarfile.open(output_tar_path, mode) as tar:
for root, dirs, files in os.walk(source_dir):
for file in files:
full_path = os.path.join(root, file)
# Store file with its base name only
tar.add(full_path, arcname=file)
The mode for tarfile.open() determines compression:
'w'or'w:': No compression.'w:gz': Gzip compression.'w:bz2': Bzip2 compression.'r:*': Read with automatic compression detection.