Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Essential Python Development Patterns and LLM Integration Strategies

Notes 2

Legacy String Handling in Python 2

In Python 2, managing text requires careful distinction between byte strings and Unicode objects. To prevent encoding conflicts during concatenation, ensure all strings share the same type.

# Converting to Unicode using the 'u' prefix
text_unicode = u'\u4e2d\u56fd'
print(type(text_unicode))  # <type 'unicode'>

# Exporting for storage (UTF-8 encoding)
serialized_text = text_unicode.encode('utf-8')
print(type(serialized_text))  # <type 'str'>

Modern Python 3 Utilities

Advanced Slicing Behavior

When using indices in string slicing, if the start index is negative and the end index is positive in a way that creates an invalid range, Python returns an empty string rather than an error.

Debugging and Diagnostics

Trigger an interactive debugger at a specific line of code:

import pdb; pdb.set_trace()

Data Partitioning and Randomization

Splitting a list for machine learning tasks (e.g., training vs. validation) can be handled efficiently using random.shuffle and slicing.

import random

def partition_dataset(items, split_ratio=0.1, seed_value=42):
    random.seed(seed_value)
    shuffled_items = list(items)
    random.shuffle(shuffled_items)
    
    split_point = int(len(shuffled_items) * split_ratio)
    if split_point < 1 or len(shuffled_items) == 0:
        return shuffled_items, []
    
    val_set = shuffled_items[:split_point]
    train_set = shuffled_items[split_point:]
    return train_set, val_set

# Random sampling
population = list(range(100))
sampled_unique = random.sample(population, k=5)  # No replacement
sampled_repeats = random.choices(population, k=5) # With replacement

Formatted Logging

Standardizing timestamp output for logs improves readability and traceability.

from datetime import datetime

def log_with_timestamp(message):
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print(f"[{timestamp}] - {message}")

Logic Flow: Loop-Else Constrcution

The else block in a Python loop executes only if the loop completes naturally without hitting a break statement.

def find_element(target, collection):
    for item in collection:
        if item == target:
            print(f"Found: {item}")
            break
    else:
        print("Target not found in collection.")

Robust Exception Handling

Handling multiple error types within a single block allows for cleaner error recovery strategies.

def safe_division():
    try:
        numerator = float(input("Enter numerator: "))
        denominator = float(input("Enter denominator: "))
        result = numerator / denominator
    except (ZeroDivisionError, ValueError) as err:
        if isinstance(err, ZeroDivisionError):
            print("Error: Division by zero.")
        else:
            print("Error: Invalid numerical input.")
    else:
        print(f"Calculation successful: {result}")

Integrating with Large Language Models

Azure OpenAI GPT-4 API Wrapper

This implementation includes retry logic to handle transient API failures.

import time
import openai
from datetime import datetime

def fetch_gpt_response(messages, config, retries=5, backoff=5):
    attempt = 0
    while attempt < retries:
        try:
            response = openai.ChatCompletion.create(
                engine=config['engine'],
                messages=messages,
                temperature=config.get('temperature', 0.7),
                max_tokens=config.get('max_tokens', 1024)
            )
            return response.choices[0]["message"]["content"]
        except Exception as e:
            attempt += 1
            print(f"[{datetime.now()}] Error: {e}. Retrying in {backoff}s...")
            time.sleep(backoff)
    return "[API_ERROR]"

Baidu Wenxin (Ernie Bot) API Integration

Interacting with Wenxin requires an access token and specific model endpoint mapping.

import requests
import json

def get_baidu_token(api_key, secret_key):
    url = f"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={api_key}&client_secret={secret_key}"
    res = requests.post(url)
    return res.json().get("access_token")

def call_ernie_bot(prompt, token, model_url):
    headers = {'Content-Type': 'application/json'}
    payload = json.dumps({
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.1
    })
    endpoint = f"{model_url}?access_token={token}"
    response = requests.post(endpoint, headers=headers, data=payload)
    return response.json().get("result")

Infrastructure and Database Operations

MySQL Connection Management

Using pymysql to manage relational data, ensuring connections are refreshed to avoid timeoust.

import pymysql

def execute_query(connection, sql_statement, is_write=False):
    try:
        connection.ping(reconnect=True)
        with connection.cursor() as cursor:
            cursor.execute(sql_statement)
            if is_write:
                connection.commit()
                return True
            return cursor.fetchall()
    except Exception as e:
        connection.rollback()
        print(f"Database error: {e}")
        return None

Redis Cache Interaction

Standardized methods for writing and expiring keys in Redis.

import redis
import json

def cache_data(client, key, value, ttl=3600):
    payload = json.dumps(value) if not isinstance(value, str) else value
    client.set(key, payload)
    client.expire(key, ttl)

LLM Fine-Tuning Frameworks

LLaMA Factory Configuration

For pre-training or fine-tuning, shell scripts automate the environment setup and DeepSpeed integration.

# Example Training Launch Script
deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
    --stage pt \
    --model_name_or_path /path/to/base_model \
    --do_train \
    --dataset my_custom_corpus \
    --finetuning_type lora \
    --output_dir ./checkpoints/output_model \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --learning_rate 5e-5 \
    --num_train_epochs 2.0 \
    --fp16 \
    --deepspeed ds_config.json

MS-Swift Compatibility Note

When performing Supervised Fine-Tuning (SFT) on Qwen-VL models, specific transformers versions are required. Using transformers=4.47.3 may cause input_ids errors; downgrading to 4.47.1 is a verified resolution for pipeline stability.

Tags: Pythonllm

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.