Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Implementing Least Squares Fitting with SciPy

Notes May 10 3

Least squares fitting is a fundamental technique in regression analysis used to approximate the solution of overdetermined systems. Given a dataset of experimental points $(x_i, y_i)$ that are expected to follow a specific functional relationship $y = f(x, \mathbf{p})$, the primary goal is to determine the parameter vector $\mathbf{p}$ that minimizes the sum of the squares of the residuals. The residual represents the difference between the observed data and the value predicted by the model.

In Python's scientific computing stack, the scipy.optimize module provides the leastsq function, which implements the Levenberg-Marquardt algorithm to solve these non-linear least squares problems. The function requires two main inputs: a callable that calculates the error vector (residuals) and an initial guess for the parameters.

Linear Regression Using Least Squares

To begin with a simple example, consider fitting a linear model $y = kx + b$. The unknown parameters are the slope $k$ and the intercept $b$. The following code demonstrates how to use leastsq to find these parameters by minimizing the vertical distance between the data points and the regression line.

import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt

# Sample experimental data
x_coords = np.array([8.19, 2.72, 6.39, 8.71, 4.70, 2.66, 3.78])
y_coords = np.array([7.01, 2.78, 6.47, 6.71, 4.10, 4.23, 4.05])

def calculate_residuals(params, x, y):
    """
    Computes the error vector between the observed data and the linear model.
    params: tuple/list containing [slope, intercept]
    """
    slope, intercept = params
    return y - (slope * x + intercept)

# Initial parameter guess: [slope=1, intercept=0]
initial_guess = [1, 0]

# Perform the least squares optimization
# args=(x_coords, y_coords) passes the data to the residuals function
result = optimize.leastsq(calculate_residuals, initial_guess, args=(x_coords, y_coords))

fitted_slope, fitted_intercept = result[0]
print(f"Fitted Slope (k): {fitted_slope:.5f}")
print(f"Fitted Intercept (b): {fitted_intercept:.5f}")

# Visualization of the results
plt.figure(figsize=(8, 6))
x_range = np.linspace(0, 10, 100)
plt.plot(x_range, fitted_slope * x_range + fitted_intercept, 'b-', label='Fitted Line')
plt.plot(x_coords, y_coords, 'o', color='green', label='Raw Data')
plt.title('Linear Least Squares Fitting')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.legend()
plt.grid(True)
plt.show()

In this implementation, calculate_residuals returns the array of differences. The leastsq algorithm adjusts the input parameters to minimize the sum of the squares of these returned values.

Fitting a Sine Wave to Noisy Data

Linear models are straightforward, but least squares is particularly powerful for non-linear models. The following example fits a sine wave function defined as $y = A \sin(2\pi k x + \theta)$, where $A$ is amplitude, $k$ is frequency, and $\theta$ is the phase angle. We will generate synthetic data with added random noise to simulate real-world experimental conditions.

def sine_model(x, amplitude, frequency, phase):
    """The target function: A * sin(2*pi*k*x + theta)"""
    return amplitude * np.sin(2 * np.pi * frequency * x + phase)

def error_function(params, x_measured, y_measured):
    """
    Calculates the difference between experimental data and the model.
    params: list [amplitude, frequency, phase]
    """
    return y_measured - sine_model(x_measured, *params)

# Generate clean data
x_points = np.linspace(0, 2 * np.pi, 100)
true_amplitude, true_freq, true_phase = 10, 0.34, np.pi / 6
y_true = sine_model(x_points, true_amplitude, true_freq, true_phase)

# Add Gaussian noise to simulate measurement errors
np.random.seed(42)
noise_level = 2.0
y_noisy = y_true + noise_level * np.random.randn(len(x_points))

# Initial guess for parameters [A, k, theta]
# Note: Frequency guess is intentionally close to true value to ensure convergence
initial_params = [7, 0.4, 0]

# Execute fitting
optimized_params, _ = optimize.leastsq(error_function, initial_params, args=(x_points, y_noisy))

print(f"True Parameters:     [{true_amplitude}, {true_freq}, {true_phase:.5f}]")
print(f"Fitted Parameters:   {optimized_params}")

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(x_points, y_noisy, 'o', markersize=4, alpha=0.6, label='Noisy Experimental Data')
plt.plot(x_points, y_true, 'k-', linewidth=2, label='True Function')
plt.plot(x_points, sine_model(x_points, *optimized_params), 'r--', linewidth=2, label='Fitted Curve')
plt.legend()
plt.title('Non-Linear Least Squares: Sine Wave Fitting')
plt.show()

Using curve_fit for Simplified Syntax

While leastsq requires a specific residual function structure, scipy.optimize also offers curve_fit. This function wraps the older leastsq (and other methods) but simplifies the interface by allowing the model function to accept the independent variable and parameters directly as separate arguments.

# Using the same data generated in the previous example
# Model signature: f(x, A, k, theta)
def model_curve(x, A, k, theta):
    return A * np.sin(2 * np.pi * k * x + theta)

# curve_fit requires the model function, independent variable, dependent variable, and initial guess
popt, pcov = optimize.curve_fit(model_curve, x_points, y_noisy, p0=[7, 0.4, 0])

print(f"Parameters via curve_fit: {popt}")

Importance of Initial Guesses

Non-linear optimization algorithms are iterative and converge towards a local minimum. Consequently, the initial parameter guess (p0) is critical. If the starting point is too far from the actual solution, the algorithm may converge to an incorrect local minimum or fail to converge altogether.

Consider the sine wave example again. If we initialize the frequency parameter with a value that deviates significantly from the true frequency (e.g., guessing 1.0 instead of 0.34), the optimization may fail to find the correct wave characteristics.

# Poor initial guess for frequency
bad_initial_guess = [10, 1.0, 0] 

# Attempt fitting with bad guess
popt_bad, _ = optimize.curve_fit(model_curve, x_points, y_noisy, p0=bad_initial_guess)

print("True Parameters:", [true_amplitude, true_freq, true_phase])
print("Result with bad guess:", popt_bad)

In this scenario, the output parameters for frequency and phase will likely diverge significantly from the true values. To mitigate this, domain knowledge is often used to estimate parameters before running the algorithm, or global optimization techniques may be employed for moree robust searching.

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.