Essential Python Development Patterns and LLM Integration Strategies
Legacy String Handling in Python 2
In Python 2, managing text requires careful distinction between byte strings and Unicode objects. To prevent encoding conflicts during concatenation, ensure all strings share the same type.
# Converting to Unicode using the 'u' prefix
text_unicode = u'\u4e2d\u56fd'
print(type(text_unicode)) # <type 'unicode'>
# Exporting for storage (UTF-8 encoding)
serialized_text = text_unicode.encode('utf-8')
print(type(serialized_text)) # <type 'str'>
Modern Python 3 Utilities
Advanced Slicing Behavior
When using indices in string slicing, if the start index is negative and the end index is positive in a way that creates an invalid range, Python returns an empty string rather than an error.
Debugging and Diagnostics
Trigger an interactive debugger at a specific line of code:
import pdb; pdb.set_trace()
Data Partitioning and Randomization
Splitting a list for machine learning tasks (e.g., training vs. validation) can be handled efficiently using random.shuffle and slicing.
import random
def partition_dataset(items, split_ratio=0.1, seed_value=42):
random.seed(seed_value)
shuffled_items = list(items)
random.shuffle(shuffled_items)
split_point = int(len(shuffled_items) * split_ratio)
if split_point < 1 or len(shuffled_items) == 0:
return shuffled_items, []
val_set = shuffled_items[:split_point]
train_set = shuffled_items[split_point:]
return train_set, val_set
# Random sampling
population = list(range(100))
sampled_unique = random.sample(population, k=5) # No replacement
sampled_repeats = random.choices(population, k=5) # With replacement
Formatted Logging
Standardizing timestamp output for logs improves readability and traceability.
from datetime import datetime
def log_with_timestamp(message):
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"[{timestamp}] - {message}")
Logic Flow: Loop-Else Constrcution
The else block in a Python loop executes only if the loop completes naturally without hitting a break statement.
def find_element(target, collection):
for item in collection:
if item == target:
print(f"Found: {item}")
break
else:
print("Target not found in collection.")
Robust Exception Handling
Handling multiple error types within a single block allows for cleaner error recovery strategies.
def safe_division():
try:
numerator = float(input("Enter numerator: "))
denominator = float(input("Enter denominator: "))
result = numerator / denominator
except (ZeroDivisionError, ValueError) as err:
if isinstance(err, ZeroDivisionError):
print("Error: Division by zero.")
else:
print("Error: Invalid numerical input.")
else:
print(f"Calculation successful: {result}")
Integrating with Large Language Models
Azure OpenAI GPT-4 API Wrapper
This implementation includes retry logic to handle transient API failures.
import time
import openai
from datetime import datetime
def fetch_gpt_response(messages, config, retries=5, backoff=5):
attempt = 0
while attempt < retries:
try:
response = openai.ChatCompletion.create(
engine=config['engine'],
messages=messages,
temperature=config.get('temperature', 0.7),
max_tokens=config.get('max_tokens', 1024)
)
return response.choices[0]["message"]["content"]
except Exception as e:
attempt += 1
print(f"[{datetime.now()}] Error: {e}. Retrying in {backoff}s...")
time.sleep(backoff)
return "[API_ERROR]"
Baidu Wenxin (Ernie Bot) API Integration
Interacting with Wenxin requires an access token and specific model endpoint mapping.
import requests
import json
def get_baidu_token(api_key, secret_key):
url = f"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={api_key}&client_secret={secret_key}"
res = requests.post(url)
return res.json().get("access_token")
def call_ernie_bot(prompt, token, model_url):
headers = {'Content-Type': 'application/json'}
payload = json.dumps({
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.1
})
endpoint = f"{model_url}?access_token={token}"
response = requests.post(endpoint, headers=headers, data=payload)
return response.json().get("result")
Infrastructure and Database Operations
MySQL Connection Management
Using pymysql to manage relational data, ensuring connections are refreshed to avoid timeoust.
import pymysql
def execute_query(connection, sql_statement, is_write=False):
try:
connection.ping(reconnect=True)
with connection.cursor() as cursor:
cursor.execute(sql_statement)
if is_write:
connection.commit()
return True
return cursor.fetchall()
except Exception as e:
connection.rollback()
print(f"Database error: {e}")
return None
Redis Cache Interaction
Standardized methods for writing and expiring keys in Redis.
import redis
import json
def cache_data(client, key, value, ttl=3600):
payload = json.dumps(value) if not isinstance(value, str) else value
client.set(key, payload)
client.expire(key, ttl)
LLM Fine-Tuning Frameworks
LLaMA Factory Configuration
For pre-training or fine-tuning, shell scripts automate the environment setup and DeepSpeed integration.
# Example Training Launch Script
deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
--stage pt \
--model_name_or_path /path/to/base_model \
--do_train \
--dataset my_custom_corpus \
--finetuning_type lora \
--output_dir ./checkpoints/output_model \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--learning_rate 5e-5 \
--num_train_epochs 2.0 \
--fp16 \
--deepspeed ds_config.json
MS-Swift Compatibility Note
When performing Supervised Fine-Tuning (SFT) on Qwen-VL models, specific transformers versions are required. Using transformers=4.47.3 may cause input_ids errors; downgrading to 4.47.1 is a verified resolution for pipeline stability.