Vectorized Implementation of Multiple Linear Regression
Multiple Linear Regression with Vector Notation
Multiple linear regression extends single-feature regression by incorporating multiple input features. The model can be expressed using vector notation for compact representation and efficient computation.
Vector Representation
Instead of treating parameters w₁ through wₙ as separate values, we combine them into a parameter vector w. The model function becomes:
f<sub>w,b</sub>(x) = w · x + b
where · denotes the dot product between vectors w and x.
Vectorization Benefits
Vectorization provides two key advantages:
- Code Conciseness: Reduces implementation to fewer lines of code
- Computational Efficiency: Leverages parallel hardware capabilities through libraries like NumPy
NumPy Implementation Example
import numpy as np
# Initialize parameters and features
weight_vector = np.array([0.5, -2.1, 1.8])
bias_term = 4.3
feature_vector = np.array([3.2, 1.5, 2.8])
# Vectorized prediction
prediction = np.dot(weight_vector, feature_vector) + bias_term
Gradient Descent for Multiple Features
The gradient descent algorithm updates parameters simultaneously:
repeat until convergence:
w_j := w_j - α * ∂J/∂w_j for j = 1..n
b := b - α * ∂J/∂b
The partial derivatives are computed as:
∂J/∂w_j = (1/m) * Σ(f_wb(x⁽ⁱ⁾) - y⁽ⁱ⁾) * x_j⁽ⁱ⁾
∂J/∂b = (1/m) * Σ(f_wb(x⁽ⁱ⁾) - y⁽ⁱ⁾)
Feature Scaling Techniques
When features have different ranges, scaling improves gradient descent performance:
Maximum Value Scaling
scaled_x = original_x / max_value
Mean Normalization
scaled_x = (x - μ) / (max - min)
Z-score Normalization
scaled_x = (x - μ) / σ
Polynomial Regression
Linear regression can model non-linear relationships through feature engineering:
# Create polynomial features
X_poly = np.c_[x, x**2, x**3]
# Apply feature scaling
X_poly_scaled = (X_poly - np.mean(X_poly, axis=0)) / np.std(X_poly, axis=0)
Convergence Monitoring
Plot the cost function J(w,b) against iteration count to verify gradient descent is working correctly. The cost should decrease monotonically with each iteration when using an appropriate learning rate.
Learning Rate Selection
Test multiple learning rate values (e.g., 0.001, 0.003, 0.01, 0.03) and select largest value that consistently decreases the cost function without causing divergence.