Python Modules, Packages, and Standard Library Utilities
Module Fundamentals
A module serves as a collection of related functionalities. They originate from Python's built-in standard library, third-party packages, or custom-developed scripts.
Formats include:
.pyscripts written in Python.- Compiled C/C++ extensions (shared libraries or DLLs).
- Directories containing an
__init__.pyfile, recognized as packages. - Built-in modules intrinsically linked to the Python interpreter.
Utilizing modules improves development velocity and minimizes code duplication.
Import Mechanisms
The execution context differentiates between the main script and the imported module.
The import Statement
When initially importing:
- A dedicated namespace is generated for the module.
- The module's code executes, populating its namespace.
- The importing script receives a reference to the module's namespace within its own scope.
Subsequent imports reuse the existing namespace without re-executing the code.
Usage requires prefixing: module_name.function_name.
- Pros: Prevents naming collisions.
- Cons: Verbose syntax.
import core.utils
core.utils.validate_input()
The from ... import ... Statement
Initial import process mirrors the import steps, but directly binds the specified names into the current script's namespace.
Usage is prefix-free.
- Pros: Cleaner syntax.
- Cons: Higher risk of namespace collisions.
from module import * imports all public names, controllable via the module's __all__ attribute.
Circular Dependencies
Circular imports occur when two modules attempt to import eachother. Mitigation strategies:
- Delay the import by moving the
from ... import ...statement to the end of the file. - Localize the import by placing it inside a function, ensuring it only executes when needed.
# auth.py
print('Loading authentication module')
def verify():
from db import fetch_user
user = fetch_user()
return user is not None
token = 'secret'
# db.py
print('Loading database module')
from auth import token
def fetch_user():
return token
Dynamic imports can be achieved using importlib:
import importlib
target = 'core.engine'
handler = importlib.import_module(target)
print(dir(handler))
Module Search Path Resolution
The interpreter locates modules based on this priority:
- Modules already loaded in memory.
- Built-in standard modules.
- Directories listed in
sys.path(starting with the executing script's directory).
All imported modules reference environment variables based on the executing script's sys.path.
Absolute imports traverse from the top-level directory listed in sys.path.
- Pros: Universally accessible.
- Cons: Lengthy paths.
Relative imports reference the current module's location using . (current) and .. (parent).
- Pros: Compact syntax.
- Cons: Restricted to intra-package usage; invalid in top-level execution scripts. Exceeding the top-level package boundary raises a
ValueError.
Package Architecture
A package is a directory containing an __init__.py file. Importing a package effectively executes its __init__.py.
During the first import:
- Namespace generated for
__init__.py. - Code inside
__init__.pyruns. - Current script binds to the package namespace.
In Python 2, __init__.py was mandatory; Python 3 allows implicit namespace packages.
Rules:
- The dot (
.) left-hand operand must signify a package. - Absolute imports inside a package should start from the top-level project directory.
- Relative imports (using
.) are preferred for internal package dependencies to maintain portability upon renaming top-level directories. - Relative imports cannot travrese beyond the package's root directory.
Standard Library Essentials
1. time
Time representations: Timestamp (seconds since epoch), Local time, UTC.
import time
current_timestamp = time.time() # Float
local_struct = time.localtime() # struct_time
utc_struct = time.gmtime()
formatted_str = time.strftime("%Y-%m-%d %H:%M:%S", local_struct)
parsed_struct = time.strptime("2023-10-05 14:30:00", "%Y-%m-%d %H:%M:%S")
timestamp_from_struct = time.mktime(parsed_struct)
time.sleep(2) # Delay execution
2. datetime
import datetime
present = datetime.datetime.now()
custom_date = datetime.datetime(2023, 5, 12, 10, 0, 0)
time_diff = present - custom_date
future_date = present + datetime.timedelta(days=7)
3. random
import random
import string
float_val = random.uniform(5.0, 10.0)
int_val = random.randint(10, 99)
even_val = random.randrange(0, 100, 2)
char_list = random.choices('xyz123')
sample_str = ''.join(random.sample(string.ascii_letters + string.digits, 6))
data_list = [1, 2, 3, 4]
random.shuffle(data_list)
4. sys
import sys
args = sys.argv # Command-line arguments
sys.exit(0) # Exit program
version_info = sys.version
platform_name = sys.platform
def show_progress(ratio, bar_width=40, prefix='Progress: '):
ratio = min(ratio, 1.0)
filled = '*' * int(bar_width * ratio)
empty = '-' * (bar_width - int(bar_width * ratio))
print(f"\r{prefix}[{filled}{empty}] {int(ratio*100)}%", end='')
5. shutil
import shutil
import zipfile
shutil.copyfile('src.txt', 'dst.txt')
shutil.copytree('folder_src', 'folder_dst', ignore=shutil.ignore_patterns('*.tmp'))
shutil.rmtree('folder_dst')
shutil.move('src.txt', 'new_location.txt')
shutil.make_archive('archive_name', 'zip', root_dir='target_folder')
# Extracting
with zipfile.ZipFile('archive_name.zip', 'r') as zf:
zf.extractall()
6. os
import os
current_dir = os.getcwd()
os.makedirs('new_dir/sub_dir')
os.rmdir('new_dir/sub_dir')
os.rename('old.txt', 'new.txt')
combined = os.path.join('/var', 'data', 'file.txt')
print(os.path.exists('file.txt'))
print(os.path.isfile('file.txt'))
print(os.path.isdir('my_folder'))
print(os.path.getsize('file.txt'))
print(os.path.abspath('relative_path'))
7. pickle
import pickle
data_payload = {'key': 'value'}
serialized = pickle.dumps(data_payload)
deserialized = pickle.loads(serialized)
# with open('data.pkl', 'wb') as f: pickle.dump(data_payload, f)
# with open('data.pkl', 'rb') as f: loaded = pickle.load(f)
8. json
import json
from datetime import datetime
data_map = {"active": True, "count": 5}
json_str = json.dumps(data_map)
parsed_map = json.loads(json_str)
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)
print(json.dumps({"now": datetime.utcnow()}, cls=DateTimeEncoder))
9. shelve
import shelve
with shelve.open('persistent_db') as db:
db['record_1'] = {'id': 1, 'status': 'open'}
db['record_2'] = {'id': 2, 'status': 'closed'}
print(db['record_1'])
10. xml.etree.ElementTree
import xml.etree.ElementTree as ET
tree = ET.parse('config.xml')
root = tree.getroot()
for child in root:
print(child.tag, child.attrib)
for node in root.iter('setting'):
node.text = 'updated_value'
node.set('modified', 'yes')
new_elem = ET.Element('new_setting')
new_elem.text = 'added'
root.append(new_elem)
tree.write('updated_config.xml')
# Creating XML
root_elem = ET.Element("configuration")
sub_elem = ET.SubElement(root_elem, "parameter", attrib={"type": "string"})
sub_elem.text = "example"
tree_obj = ET.ElementTree(root_elem)
tree_obj.write("new_config.xml", encoding="utf-8", xml_declaration=True)
11. configparser
import configparser
parser = configparser.ConfigParser()
parser.read('setup.ini')
sections = parser.sections()
host_val = parser.get('database', 'host')
port_val = parser.getint('database', 'port')
parser.set('database', 'host', 'localhost')
parser.write(open('setup.ini', 'w'))
# Creating ini
config = configparser.ConfigParser()
config['DEFAULT'] = {'timeout': '30'}
config['database'] = {'host': '127.0.0.1', 'port': '5432'}
with open('new_setup.ini', 'w') as f:
config.write(f)
12. hashlib and hmac
import hashlib
import hmac
hash_obj = hashlib.sha256()
hash_obj.update(b'initial_data')
hash_obj.update(b'additional_data')
print(hash_obj.hexdigest())
# HMAC
mac = hmac.new(b'secret_key', b'message_data', hashlib.sha256)
print(mac.hexdigest())
13. subprocess
import subprocess
result = subprocess.run(['ls', '-l'], capture_output=True, text=True)
print(result.stdout)
pipe1 = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
pipe2 = subprocess.Popen(['grep', 'py'], stdin=pipe1.stdout, stdout=subprocess.PIPE)
output = pipe2.communicate()[0]
14. logging
import logging
import logging.config
logging.basicConfig(filename='app.log', level=logging.DEBUG,
format='%(asctime)s - %(levelname)s - %(message)s')
logging.debug('Debug event')
logging.error('Error encountered')
# Advanced Configuration
logger = logging.getLogger("network_ops")
handler1 = logging.FileHandler('detail.log', encoding='utf-8')
handler2 = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler1.setFormatter(formatter)
logger.addHandler(handler1)
logger.setLevel(logging.INFO)
logger.info("Operation started")
# Dictionary Configuration
LOG_CONFIG = {
'version': 1,
'formatters': {'basic': {'format': '%(asctime)s %(message)s'}},
'handlers': {'console': {'class': 'logging.StreamHandler', 'formatter': 'basic', 'level': 'DEBUG'}},
'loggers': {'main': {'handlers': ['console'], 'level': 'DEBUG'}}
}
logging.config.dictConfig(LOG_CONFIG)
log_inst = logging.getLogger('main')
log_inst.info('Configured via dictionary')
15. re (Regular Expressions)
Regex syntax and methods:
import re
matches = re.findall(r'\d+', 'ID: 42, Age: 25')
search_obj = re.search(r'(\d+)-(\d+)', '123-456')
if search_obj:
print(search_obj.group(1)) # 123
split_res = re.split(r'[;,]', 'a,b;c')
sub_res = re.sub(r'old', 'new', 'old data old values', count=1)
pattern = re.compile(r'\bword\b')
pattern.findall('a word and another word')
# Named groups and group swapping
text = "apples|oranges|bananas"
grouped = re.search(r"(.+?)\|(.+?)\|(.+)", text)
swapped = re.sub(r"(.+?)\|(.+?)\|(.+)", r"\3|\2|\1", text) # bananas|oranges|apples