Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Foundations of Neural Networks and Deep Learning

Tech May 18 2

Perceptrons and Logical Operations

A perceptron is a binary classifier that takes multiple inputs and produces a single output. Each input is weighted, and the output is determined by whether the weighted sum exceeds a threshold — yielding 1 (fire) if true, 0 (no fire) otherwise.

Basic Logic Gates

  • AND gate: Outputs 1 only when both inputs are 1.
  • NAND gate: Inverts the AND output.
  • OR gate: Outputs 1 if at least one input is 1.

Implementation with Weights and Bias

The bias term b shifts the decision boundary, while weights w₁, w₂ scale input contributions. A gate can be implemented as:

import numpy as np

def and_gate(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.7
    z = np.dot(x, w) + b
    return 1 if z > 0 else 0

def nand_gate(x1, x2):
    x = np.array([x1, x2])
    w = np.array([-0.5, -0.5])
    b = 0.7
    z = np.dot(x, w) + b
    return 1 if z > 0 else 0

def or_gate(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.2
    z = np.dot(x, w) + b
    return 1 if z > 0 else 0

Limitation: XOR and Linear Separability

A single-layer perceptron cannot compute XOR, because XOR is not linearly separable — no straight line can separate its truth table outputs. This limitation motivates multi-layer architectures.

Multi-Layer Networks and Activation Functions

Replacing the step function with smooth, differentiable functions transforms a perceptron into a neural network capable of gradient-based learning.

Common Activation Functions

  • Step function: Discontinuous, non-differentiable; used in classical perceptrons.
  • Sigmoid: Smooth S-shaped curve, bounded between 0 and 1.
  • ReLU (Rectified Linear Unit): f(x) = max(0, x) — efficient and avoids vanishing gradients for positive inputs.
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def relu(x):
    return np.maximum(0, x)

def step(x):
    return (x > 0).astype(int)

x_vals = np.linspace(-4, 4, 1000)
plt.figure(figsize=(8, 4))
plt.plot(x_vals, step(x_vals), label='Step', linestyle='--')
plt.plot(x_vals, sigmoid(x_vals), label='Sigmoid', linestyle='-.')
plt.plot(x_vals, relu(x_vals), label='ReLU')
plt.legend()
plt.grid(True)
plt.show()

Matrix Operations in Neural Networks

Neural layers perform afffine transformations: z = x @ W + b, where x is enput, W is weight matrix, and b is bias vector.

  • For a 2D input x of shape (N, D_in) and weight W of shape (D_in, D_out), output shape is (N, D_out).
  • Broadcasting and np.dot() handle batched computation efficiently.
X = np.array([[1.0, 0.5]])  # shape: (1, 2)
W1 = np.array([[0.1, 0.3, 0.5],
               [0.2, 0.4, 0.6]])  # shape: (2, 3)
b1 = np.array([0.1, 0.2, 0.3])   # shape: (3,)

A1 = np.dot(X, W1) + b1  # shape: (1, 3)
Z1 = sigmoid(A1)

Building a Three-Layer Feedforward Network

import numpy as np

def init_params():
    return {
        'W1': np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]]),
        'b1': np.array([0.1, 0.2, 0.3]),
        'W2': np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]]),
        'b2': np.array([0.1, 0.2]),
        'W3': np.array([[0.1, 0.3], [0.2, 0.4]]),
        'b3': np.array([0.1, 0.2])
    }

def forward(params, x):
    a1 = np.dot(x, params['W1']) + params['b1']
    z1 = sigmoid(a1)
    a2 = np.dot(z1, params['W2']) + params['b2']
    z2 = sigmoid(a2)
    a3 = np.dot(z2, params['W3']) + params['b3']
    return a3  # identity output

params = init_params()
x_input = np.array([[1.0, 0.5]])
y_output = forward(params, x_input)
print(y_output)  # [[0.31682708 0.69627909]]

Output Layers: Regression vs Classification

  • Regression: Use identity activation (y = x) — output is a continuous value.
  • Classification: Use softmax to convert logits into probability-like outputs summing to 1.
def softmax(logits):
    shifted = logits - np.max(logits, axis=-1, keepdims=True)
    exps = np.exp(shifted)
    return exps / np.sum(exps, axis=-1, keepdims=True)

logits = np.array([0.3, 2.9, 4.0])
probs = softmax(logits)
print(probs)  # [0.01821127 0.24519181 0.73659691]
print(np.sum(probs))  # 1.0

Loss Functions and Optimization

Loss Computation

  • Mean Squared Error (MSE) for regression:
    def mse_loss(y_pred, y_true):
        return 0.5 * np.mean((y_pred - y_true) ** 2)
    
  • Categorical Cross-Entropy for classification:
    def cross_entropy_loss(y_pred, y_true):
        # y_true: one-hot encoded
        eps = 1e-7
        return -np.sum(y_true * np.log(y_pred + eps)) / len(y_true)
    

Numerical Gradient and Gradient Descent

def numerical_gradient(func, x, h=1e-4):
    grad = np.zeros_like(x)
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        original = x[idx]
        x[idx] = original + h
        f_plus = func(x)
        x[idx] = original - h
        f_minus = func(x)
        grad[idx] = (f_plus - f_minus) / (2 * h)
        x[idx] = original
        it.iternext()
    return grad

def gradient_descent(func, x_init, lr=0.01, steps=100):
    x = x_init.copy()
    for _ in range(steps):
        grad = numerical_gradient(func, x)
        x -= lr * grad
    return x

Backpropagation via Computational Graphs

Backpropagation computes gradients using the chain rule, propagating errors backward through operations.

Layer Abstractions

  • Multiplication layer:

    class MulLayer:
        def __init__(self):
            self.x = self.y = None
        def forward(self, x, y):
            self.x, self.y = x, y
            return x * y
        def backward(self, dout):
            return dout * self.y, dout * self.x
    
  • Addition layer:

    class AddLayer:
        def forward(self, x, y):
            return x + y
        def backward(self, dout):
            return dout, dout
    
  • ReLU layer:

    class ReLULayer:
        def __init__(self):
            self.mask = None
        def forward(self, x):
            self.mask = x <= 0
            out = x.copy()
            out[self.mask] = 0
            return out
        def backward(self, dout):
            dout[self.mask] = 0
            return dout
    
  • Affine layer (fully connected):

    class AffineLayer:
        def __init__(self, W, b):
            self.W, self.b = W, b
            self.x = self.dW = self.db = None
        def forward(self, x):
            self.x = x
            return x @ self.W + self.b
        def backward(self, dout):
            dx = dout @ self.W.T
            self.dW = self.x.T @ dout
            self.db = np.sum(dout, axis=0)
            return dx
    

Convolutional Neural Networks (CNNs)

CNNs preserve spatial structure using convolution and pooling.

Core Concepts

  • Convolution: Sliding filter over input to produce feature maps.
  • Padding: Zero-padding controls output spatial dimensions.
  • Stride: Step size between filter applications.
  • Pooling: Downsampling (e.g., max-pooling) reduces resolution and adds translation invariance.

Simple CNN Layer Stack

from collections import OrderedDict

class SimpleCNN:
    def __init__(self):
        self.layers = OrderedDict([
            ('conv1', ConvLayer(filter_num=32, filter_size=3, pad=1, stride=1)),
            ('relu1', ReLULayer()),
            ('pool1', PoolLayer(pool_h=2, pool_w=2, stride=2)),
            ('affine1', AffineLayer(W=np.random.randn(32*14*14, 100) * 0.01,
                                   b=np.zeros(100))),
            ('relu2', ReLULayer()),
            ('affine2', AffineLayer(W=np.random.randn(100, 10) * 0.01,
                                   b=np.zeros(10)))
        ])
        self.last_layer = SoftmaxWithLoss()

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        return x

    def loss(self, x, t):
        y = self.predict(x)
        return self.last_layer.forward(y, t)

    def gradient(self, x, t):
        self.loss(x, t)
        dout = 1
        dout = self.last_layer.backward(dout)
        for layer in reversed(list(self.layers.values())):
            dout = layer.backward(dout)
        grads = {k: v.dW for k, v in self.layers.items() if hasattr(v, 'dW')}
        grads.update({k: v.db for k, v in self.layers.items() if hasattr(v, 'db')})
        return grads

Optimization Strategies Beyond SGD

  • Momentum: Accumulates velocity to dampen oscillation.
  • AdaGrad: Adapts learning rates per parameter using historical gradient squares.
  • Adam: Combines momentum and adaptive learning rates; default β₁=0.9, β₂=0.999.
class AdamOptimizer:
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.m = self.v = self.t = None

    def update(self, params, grads):
        if self.m is None:
            self.m = {k: np.zeros_like(v) for k, v in params.items()}
            self.v = {k: np.zeros_like(v) for k, v in params.items()}
            self.t = 0

        self.t += 1
        for k in params:
            self.m[k] = self.beta1 * self.m[k] + (1 - self.beta1) * grads[k]
            self.v[k] = self.beta2 * self.v[k] + (1 - self.beta2) * (grads[k] ** 2)
            m_hat = self.m[k] / (1 - self.beta1 ** self.t)
            v_hat = self.v[k] / (1 - self.beta2 ** self.t)
            params[k] -= self.lr * m_hat / (np.sqrt(v_hat) + 1e-7)

Weight Initialization and Regularization

  • Xavier initialization: For sigmoid/tanh — variance scaled by 1/n_in.
  • He initialization: For ReLU — variance scaled by 2/n_in.
  • Weight decay (L2 regularization): Adds penalty λ∑w² to loss.
  • Dropout: Randomly deactivates neurons during training to reduce co-adaptation.

Batch Normalization

Normalizes layer inputs across mini-batches:

class BatchNorm:
    def __init__(self, gamma=1.0, beta=0.0, eps=1e-5):
        self.gamma, self.beta, self.eps = gamma, beta, eps
        self.running_mean = self.running_var = None

    def forward(self, x, train=True):
        if train:
            mu = np.mean(x, axis=0)
            var = np.var(x, axis=0)
            if self.running_mean is None:
                self.running_mean = mu
                self.running_var = var
            else:
                self.running_mean = 0.9 * self.running_mean + 0.1 * mu
                self.running_var = 0.9 * self.running_var + 0.1 * var
            x_centered = x - mu
            inv_std = 1 / np.sqrt(var + self.eps)
            x_norm = x_centered * inv_std
        else:
            x_norm = (x - self.running_mean) / np.sqrt(self.running_var + self.eps)
        out = self.gamma * x_norm + self.beta
        return out

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.