Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Bayesian Optimization: Surrogate Models, Acquisition Functions, and Practical Implementation

Tech May 9 4

Problem Setting

Consider the task of minimizing an expensive, black-box function \(f: \mathcal{X}\to\mathbb{R}\) where the analytic form is unknown and only point-wise evaluations are available. In machine-learning terms, this is the hyper-parameter tuning problem: given a validation metric \(R^2(\theta)\), find \(\theta^*\in\Theta\) that maximizes it while keeping the number of full training runs small.

Core Loop

Bayesian optimization alternates between two operations:

  1. Surrogate model: Fit a probabilistic regression model on the collected \(\{(x_i, y_i)\}\) pairs.
  2. Acquisition function: Use the posterior of the surrogate to decide the next query point.

Surrogate Models

Gaussian Process Regresssion

A Gaussian process (GP) places a prior over functions:

\[ f(\cdot)\sim\mathcal{GP}\bigl(m(\cdot),k(\cdot,\cdot)\bigr) \] For simplicity we set the mean function \(m(x)=0\). The kernel \(k\) encodes smoothness; the squared-exponential (RBF) kernel is common:

\[ k(x_i,x_j)=\sigma^2\exp\!\left(-\frac{\|x_i-x_j\|^2}{2\ell^2}\right) \] Given noisy observations \(y_i=f(x_i)+\varepsilon_i\) with \(\varepsilon_i\sim\mathcal{N}(0,\sigma_n^2)\), the posterior at a test location \(x_*\) is

\[ \begin{aligned} \mu(x_*)&=\mathbf{k}_*^T(K+\sigma_n^2I)^{-1}\mathbf{y},\\ \sigma^2(x_*)&=k(x_*,x_*)-\mathbf{k}_*^T(K+\sigma_n^2I)^{-1}\mathbf{k}_*, \end{aligned} \] where \(K_{ij}=k(x_i,x_j)\) and \(\mathbf{k}_*=[k(x_*,x_1),\dots,k(x_*,x_n)]^T\). ### Tree-structured Parzen Estimator (TPE)

Instead of modeling \(p(y|x)\), TPE splits the history into two densities:

\[ p(x|y)\approx \begin{cases} \ell(x)=p(x|yThe next query maximizes the expected improvement ratio \(\ell(x)/g(x)\), enabling categorical variables and conditional spaces. Acquisition Functions

Probability of Improvement (PI)

\[ \alpha_{\text{PI}}(x)=\Phi\!\left(\frac{\mu(x)-f^+-\xi}{\sigma(x)}\right),\quad \xi\ge 0 \] ### Expected Improvement (EI)

\[ \alpha_{\text{EI}}(x)=\sigma(x)\!\left[Z\Phi(Z)+\phi(Z)\right],\quad Z=\frac{\mu(x)-f^+}{\sigma(x)} \] where \(\phi,\Phi\) are the standard normal pdf and cdf. ### Upper/Lower Confidence Bound

\[ \alpha_{\text{UCB}}(x)=\mu(x)+\beta^{1/2}\sigma(x),\qquad \alpha_{\text{LCB}}(x)=\mu(x)-\beta^{1/2}\sigma(x) \] For GPs, \(\beta_t=2\log(|D|t^2\pi^2/6\delta)\) yields the GP-UCB regret bound.

End-to-End Example: SVR Hyper-parameter Search

from bayes_opt import BayesianOptimization
from sklearn.svm import SVR
from sklearn.model_selection import KFold
from sklearn.metrics import r2_score
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def cv_score(C, epsilon, gamma):
    folds = KFold(n_splits=5, shuffle=True, random_state=1)
    scores = []
    for tr, va in folds.split(X_train):
        model = SVR(C=C, epsilon=epsilon, gamma=gamma)
        model.fit(X_train[tr], y_train[tr])
        scores.append(r2_score(y_train[va], model.predict(X_train[va])))
    return sum(scores) / len(scores)

pbounds = {'C': (1, 16), 'epsilon': (0, 1), 'gamma': (0, 1)}
opt = BayesianOptimization(cv_score, pbounds, random_state=42)
opt.maximize(init_points=5, n_iter=25)

best = opt.max
print(best)
# {'target': 0.87, 'params': {'C': 12.4, 'epsilon': 0.08, 'gamma': 0.04}}

svr_best = SVR(**best['params'])
svr_best.fit(X_train, y_train)
print("Test R²:", r2_score(y_test, svr_best.predict(X_test)))
# Test R²: 0.91

Visualizing the GP Surrogate

import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

rng = np.random.RandomState(0)
X_obs = rng.uniform(-5, 5, 20).reshape(-1, 1)
y_obs = np.sin(X_obs).ravel() + 0.2*rng.randn(20)

gp = GaussianProcessRegressor(kernel=RBF(length_scale=1.0),
                              alpha=0.1**2, normalize_y=True)
gp.fit(X_obs, y_obs)

X_plot = np.linspace(-6, 6, 400).reshape(-1, 1)
mu, std = gp.predict(X_plot, return_std=True)

plt.figure(figsize=(8, 4))
plt.scatter(X_obs, y_obs, c='k', zorder=10, label='Observations')
plt.plot(X_plot, mu, 'b', label='Posterior mean')
plt.fill_between(X_plot.ravel(), mu-1.96*std, mu+1.96*std,
                 color='blue', alpha=0.2, label='95% CI')
plt.legend(); plt.show()

Experimental Design in the Sciences

Bayesian optimization is equally useful when the objective is a wet-lab measurement. A minimal template:

def black_box(a, b, c, d, e):
    x = np.array([[a, b, c, d, e]])
    x_scaled = scaler.transform(x)
    return model.predict(x_scaled)[0]

optimizer = BayesianOptimization(
    f=black_box,
    pbounds={'a': (0, 1), 'b': (0, 1), 'c': (0, 1),
             'd': (0, 1), 'e': (0, 1)},
    random_state=7
)
optimizer.maximize(init_points=30, n_iter=50)
print("Best protocol:", optimizer.max)

Replace black_box with any expensive assay, and the loop will propose the next experimental conditions to run.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.