Home > Tech > Content

Bayesian Optimization: Surrogate Models, Acquisition Functions, and Practical Implementation

Tech May 9 4

Problem Setting

Consider the task of minimizing an expensive, black-box function \(f: \mathcal{X}\to\mathbb{R}\) where the analytic form is unknown and only point-wise evaluations are available. In machine-learning terms, this is the hyper-parameter tuning problem: given a validation metric \(R^2(\theta)\), find \(\theta^*\in\Theta\) that maximizes it while keeping the number of full training runs small.

Core Loop

Bayesian optimization alternates between two operations:

Surrogate model: Fit a probabilistic regression model on the collected \(\{(x_i, y_i)\}\) pairs.
Acquisition function: Use the posterior of the surrogate to decide the next query point.

Surrogate Models

Gaussian Process Regresssion

A Gaussian process (GP) places a prior over functions:

\[ f(\cdot)\sim\mathcal{GP}\bigl(m(\cdot),k(\cdot,\cdot)\bigr) \] For simplicity we set the mean function \(m(x)=0\). The kernel \(k\) encodes smoothness; the squared-exponential (RBF) kernel is common:

\[ k(x_i,x_j)=\sigma^2\exp\!\left(-\frac{\|x_i-x_j\|^2}{2\ell^2}\right) \] Given noisy observations \(y_i=f(x_i)+\varepsilon_i\) with \(\varepsilon_i\sim\mathcal{N}(0,\sigma_n^2)\), the posterior at a test location \(x_*\) is

\[ \begin{aligned} \mu(x_*)&=\mathbf{k}_*^T(K+\sigma_n^2I)^{-1}\mathbf{y},\\ \sigma^2(x_*)&=k(x_*,x_*)-\mathbf{k}_*^T(K+\sigma_n^2I)^{-1}\mathbf{k}_*, \end{aligned} \] where \(K_{ij}=k(x_i,x_j)\) and \(\mathbf{k}_*=[k(x_*,x_1),\dots,k(x_*,x_n)]^T\). ### Tree-structured Parzen Estimator (TPE)

Instead of modeling \(p(y|x)\), TPE splits the history into two densities:

\[ p(x|y)\approx \begin{cases} \ell(x)=p(x|yThe next query maximizes the expected improvement ratio \(\ell(x)/g(x)\), enabling categorical variables and conditional spaces. Acquisition Functions

Probability of Improvement (PI)

\[ \alpha_{\text{PI}}(x)=\Phi\!\left(\frac{\mu(x)-f^+-\xi}{\sigma(x)}\right),\quad \xi\ge 0 \] ### Expected Improvement (EI)

\[ \alpha_{\text{EI}}(x)=\sigma(x)\!\left[Z\Phi(Z)+\phi(Z)\right],\quad Z=\frac{\mu(x)-f^+}{\sigma(x)} \] where \(\phi,\Phi\) are the standard normal pdf and cdf. ### Upper/Lower Confidence Bound

\[ \alpha_{\text{UCB}}(x)=\mu(x)+\beta^{1/2}\sigma(x),\qquad \alpha_{\text{LCB}}(x)=\mu(x)-\beta^{1/2}\sigma(x) \] For GPs, \(\beta_t=2\log(|D|t^2\pi^2/6\delta)\) yields the GP-UCB regret bound.

End-to-End Example: SVR Hyper-parameter Search

from bayes_opt import BayesianOptimization
from sklearn.svm import SVR
from sklearn.model_selection import KFold
from sklearn.metrics import r2_score
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def cv_score(C, epsilon, gamma):
    folds = KFold(n_splits=5, shuffle=True, random_state=1)
    scores = []
    for tr, va in folds.split(X_train):
        model = SVR(C=C, epsilon=epsilon, gamma=gamma)
        model.fit(X_train[tr], y_train[tr])
        scores.append(r2_score(y_train[va], model.predict(X_train[va])))
    return sum(scores) / len(scores)

pbounds = {'C': (1, 16), 'epsilon': (0, 1), 'gamma': (0, 1)}
opt = BayesianOptimization(cv_score, pbounds, random_state=42)
opt.maximize(init_points=5, n_iter=25)

best = opt.max
print(best)
# {'target': 0.87, 'params': {'C': 12.4, 'epsilon': 0.08, 'gamma': 0.04}}

svr_best = SVR(**best['params'])
svr_best.fit(X_train, y_train)
print("Test R²:", r2_score(y_test, svr_best.predict(X_test)))
# Test R²: 0.91

Visualizing the GP Surrogate

import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

rng = np.random.RandomState(0)
X_obs = rng.uniform(-5, 5, 20).reshape(-1, 1)
y_obs = np.sin(X_obs).ravel() + 0.2*rng.randn(20)

gp = GaussianProcessRegressor(kernel=RBF(length_scale=1.0),
                              alpha=0.1**2, normalize_y=True)
gp.fit(X_obs, y_obs)

X_plot = np.linspace(-6, 6, 400).reshape(-1, 1)
mu, std = gp.predict(X_plot, return_std=True)

plt.figure(figsize=(8, 4))
plt.scatter(X_obs, y_obs, c='k', zorder=10, label='Observations')
plt.plot(X_plot, mu, 'b', label='Posterior mean')
plt.fill_between(X_plot.ravel(), mu-1.96*std, mu+1.96*std,
                 color='blue', alpha=0.2, label='95% CI')
plt.legend(); plt.show()

Experimental Design in the Sciences

Bayesian optimization is equally useful when the objective is a wet-lab measurement. A minimal template:

def black_box(a, b, c, d, e):
    x = np.array([[a, b, c, d, e]])
    x_scaled = scaler.transform(x)
    return model.predict(x_scaled)[0]

optimizer = BayesianOptimization(
    f=black_box,
    pbounds={'a': (0, 1), 'b': (0, 1), 'c': (0, 1),
             'd': (0, 1), 'e': (0, 1)},
    random_state=7
)
optimizer.maximize(init_points=30, n_iter=50)
print("Best protocol:", optimizer.max)

Replace black_box with any expensive assay, and the loop will propose the next experimental conditions to run.

Tags: BayesianOptimization

Back to List

Prev: Removing Duplicates from Sorted Arrays In-Place Using the Two‑Pointer Technique

Next: Case-Insensitive SQL Keyword Lexer with Nom

Fading Coder

Bayesian Optimization: Surrogate Models, Acquisition Functions, and Practical Implementation

Problem Setting

Core Loop

Surrogate Models

Gaussian Process Regresssion

\[ p(x|y)\approx \begin{cases} \ell(x)=p(x|yThe next query maximizes the expected improvement ratio \(\ell(x)/g(x)\), enabling categorical variables and conditional spaces. Acquisition Functions

Probability of Improvement (PI)

End-to-End Example: SVR Hyper-parameter Search

Visualizing the GP Surrogate

Experimental Design in the Sciences

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor