Home > Tech > Content

Foundations of Neural Networks and Deep Learning

Tech May 18 2

Perceptrons and Logical Operations

A perceptron is a binary classifier that takes multiple inputs and produces a single output. Each input is weighted, and the output is determined by whether the weighted sum exceeds a threshold — yielding 1 (fire) if true, 0 (no fire) otherwise.

Basic Logic Gates

AND gate: Outputs 1 only when both inputs are 1.
NAND gate: Inverts the AND output.
OR gate: Outputs 1 if at least one input is 1.

Implementation with Weights and Bias

The bias term b shifts the decision boundary, while weights w₁, w₂ scale input contributions. A gate can be implemented as:

import numpy as np

def and_gate(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.7
    z = np.dot(x, w) + b
    return 1 if z > 0 else 0

def nand_gate(x1, x2):
    x = np.array([x1, x2])
    w = np.array([-0.5, -0.5])
    b = 0.7
    z = np.dot(x, w) + b
    return 1 if z > 0 else 0

def or_gate(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5, 0.5])
    b = -0.2
    z = np.dot(x, w) + b
    return 1 if z > 0 else 0

Limitation: XOR and Linear Separability

A single-layer perceptron cannot compute XOR, because XOR is not linearly separable — no straight line can separate its truth table outputs. This limitation motivates multi-layer architectures.

Multi-Layer Networks and Activation Functions

Replacing the step function with smooth, differentiable functions transforms a perceptron into a neural network capable of gradient-based learning.

Common Activation Functions

Step function: Discontinuous, non-differentiable; used in classical perceptrons.
Sigmoid: Smooth S-shaped curve, bounded between 0 and 1.
ReLU (Rectified Linear Unit): f(x) = max(0, x) — efficient and avoids vanishing gradients for positive inputs.

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def relu(x):
    return np.maximum(0, x)

def step(x):
    return (x > 0).astype(int)

x_vals = np.linspace(-4, 4, 1000)
plt.figure(figsize=(8, 4))
plt.plot(x_vals, step(x_vals), label='Step', linestyle='--')
plt.plot(x_vals, sigmoid(x_vals), label='Sigmoid', linestyle='-.')
plt.plot(x_vals, relu(x_vals), label='ReLU')
plt.legend()
plt.grid(True)
plt.show()

Matrix Operations in Neural Networks

Neural layers perform afffine transformations: z = x @ W + b, where x is enput, W is weight matrix, and b is bias vector.

For a 2D input x of shape (N, D_in) and weight W of shape (D_in, D_out), output shape is (N, D_out).
Broadcasting and np.dot() handle batched computation efficiently.

X = np.array([[1.0, 0.5]])  # shape: (1, 2)
W1 = np.array([[0.1, 0.3, 0.5],
               [0.2, 0.4, 0.6]])  # shape: (2, 3)
b1 = np.array([0.1, 0.2, 0.3])   # shape: (3,)

A1 = np.dot(X, W1) + b1  # shape: (1, 3)
Z1 = sigmoid(A1)

Building a Three-Layer Feedforward Network

import numpy as np

def init_params():
    return {
        'W1': np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]]),
        'b1': np.array([0.1, 0.2, 0.3]),
        'W2': np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]]),
        'b2': np.array([0.1, 0.2]),
        'W3': np.array([[0.1, 0.3], [0.2, 0.4]]),
        'b3': np.array([0.1, 0.2])
    }

def forward(params, x):
    a1 = np.dot(x, params['W1']) + params['b1']
    z1 = sigmoid(a1)
    a2 = np.dot(z1, params['W2']) + params['b2']
    z2 = sigmoid(a2)
    a3 = np.dot(z2, params['W3']) + params['b3']
    return a3  # identity output

params = init_params()
x_input = np.array([[1.0, 0.5]])
y_output = forward(params, x_input)
print(y_output)  # [[0.31682708 0.69627909]]

Output Layers: Regression vs Classification

Regression: Use identity activation (y = x) — output is a continuous value.
Classification: Use softmax to convert logits into probability-like outputs summing to 1.

def softmax(logits):
    shifted = logits - np.max(logits, axis=-1, keepdims=True)
    exps = np.exp(shifted)
    return exps / np.sum(exps, axis=-1, keepdims=True)

logits = np.array([0.3, 2.9, 4.0])
probs = softmax(logits)
print(probs)  # [0.01821127 0.24519181 0.73659691]
print(np.sum(probs))  # 1.0

Loss Functions and Optimization

Loss Computation

Mean Squared Error (MSE) for regression:

def mse_loss(y_pred, y_true):
    return 0.5 * np.mean((y_pred - y_true) ** 2)

Categorical Cross-Entropy for classification:

def cross_entropy_loss(y_pred, y_true):
    # y_true: one-hot encoded
    eps = 1e-7
    return -np.sum(y_true * np.log(y_pred + eps)) / len(y_true)

Numerical Gradient and Gradient Descent

def numerical_gradient(func, x, h=1e-4):
    grad = np.zeros_like(x)
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        original = x[idx]
        x[idx] = original + h
        f_plus = func(x)
        x[idx] = original - h
        f_minus = func(x)
        grad[idx] = (f_plus - f_minus) / (2 * h)
        x[idx] = original
        it.iternext()
    return grad

def gradient_descent(func, x_init, lr=0.01, steps=100):
    x = x_init.copy()
    for _ in range(steps):
        grad = numerical_gradient(func, x)
        x -= lr * grad
    return x

Backpropagation via Computational Graphs

Backpropagation computes gradients using the chain rule, propagating errors backward through operations.

Layer Abstractions

Multiplication layer:

class MulLayer:
    def __init__(self):
        self.x = self.y = None
    def forward(self, x, y):
        self.x, self.y = x, y
        return x * y
    def backward(self, dout):
        return dout * self.y, dout * self.x

Addition layer:

class AddLayer:
    def forward(self, x, y):
        return x + y
    def backward(self, dout):
        return dout, dout

ReLU layer:

class ReLULayer:
    def __init__(self):
        self.mask = None
    def forward(self, x):
        self.mask = x <= 0
        out = x.copy()
        out[self.mask] = 0
        return out
    def backward(self, dout):
        dout[self.mask] = 0
        return dout

Affine layer (fully connected):

class AffineLayer:
    def __init__(self, W, b):
        self.W, self.b = W, b
        self.x = self.dW = self.db = None
    def forward(self, x):
        self.x = x
        return x @ self.W + self.b
    def backward(self, dout):
        dx = dout @ self.W.T
        self.dW = self.x.T @ dout
        self.db = np.sum(dout, axis=0)
        return dx

Convolutional Neural Networks (CNNs)

CNNs preserve spatial structure using convolution and pooling.

Core Concepts

Convolution: Sliding filter over input to produce feature maps.
Padding: Zero-padding controls output spatial dimensions.
Stride: Step size between filter applications.
Pooling: Downsampling (e.g., max-pooling) reduces resolution and adds translation invariance.

Simple CNN Layer Stack

from collections import OrderedDict

class SimpleCNN:
    def __init__(self):
        self.layers = OrderedDict([
            ('conv1', ConvLayer(filter_num=32, filter_size=3, pad=1, stride=1)),
            ('relu1', ReLULayer()),
            ('pool1', PoolLayer(pool_h=2, pool_w=2, stride=2)),
            ('affine1', AffineLayer(W=np.random.randn(32*14*14, 100) * 0.01,
                                   b=np.zeros(100))),
            ('relu2', ReLULayer()),
            ('affine2', AffineLayer(W=np.random.randn(100, 10) * 0.01,
                                   b=np.zeros(10)))
        ])
        self.last_layer = SoftmaxWithLoss()

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        return x

    def loss(self, x, t):
        y = self.predict(x)
        return self.last_layer.forward(y, t)

    def gradient(self, x, t):
        self.loss(x, t)
        dout = 1
        dout = self.last_layer.backward(dout)
        for layer in reversed(list(self.layers.values())):
            dout = layer.backward(dout)
        grads = {k: v.dW for k, v in self.layers.items() if hasattr(v, 'dW')}
        grads.update({k: v.db for k, v in self.layers.items() if hasattr(v, 'db')})
        return grads

Optimization Strategies Beyond SGD

Momentum: Accumulates velocity to dampen oscillation.
AdaGrad: Adapts learning rates per parameter using historical gradient squares.
Adam: Combines momentum and adaptive learning rates; default β₁=0.9, β₂=0.999.

class AdamOptimizer:
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.m = self.v = self.t = None

    def update(self, params, grads):
        if self.m is None:
            self.m = {k: np.zeros_like(v) for k, v in params.items()}
            self.v = {k: np.zeros_like(v) for k, v in params.items()}
            self.t = 0

        self.t += 1
        for k in params:
            self.m[k] = self.beta1 * self.m[k] + (1 - self.beta1) * grads[k]
            self.v[k] = self.beta2 * self.v[k] + (1 - self.beta2) * (grads[k] ** 2)
            m_hat = self.m[k] / (1 - self.beta1 ** self.t)
            v_hat = self.v[k] / (1 - self.beta2 ** self.t)
            params[k] -= self.lr * m_hat / (np.sqrt(v_hat) + 1e-7)

Weight Initialization and Regularization

Xavier initialization: For sigmoid/tanh — variance scaled by 1/n_in.
He initialization: For ReLU — variance scaled by 2/n_in.
Weight decay (L2 regularization): Adds penalty λ∑w² to loss.
Dropout: Randomly deactivates neurons during training to reduce co-adaptation.

Batch Normalization

Normalizes layer inputs across mini-batches:

class BatchNorm:
    def __init__(self, gamma=1.0, beta=0.0, eps=1e-5):
        self.gamma, self.beta, self.eps = gamma, beta, eps
        self.running_mean = self.running_var = None

    def forward(self, x, train=True):
        if train:
            mu = np.mean(x, axis=0)
            var = np.var(x, axis=0)
            if self.running_mean is None:
                self.running_mean = mu
                self.running_var = var
            else:
                self.running_mean = 0.9 * self.running_mean + 0.1 * mu
                self.running_var = 0.9 * self.running_var + 0.1 * var
            x_centered = x - mu
            inv_std = 1 / np.sqrt(var + self.eps)
            x_norm = x_centered * inv_std
        else:
            x_norm = (x - self.running_mean) / np.sqrt(self.running_var + self.eps)
        out = self.gamma * x_norm + self.beta
        return out

Back to List

Prev: Advanced React Development: Scaffolding and Component Patterns

Next: Real-time File Synchronization Using rsync and inotify for Continuous Backup

Fading Coder

Foundations of Neural Networks and Deep Learning

Perceptrons and Logical Operations

Basic Logic Gates

Implementation with Weights and Bias

Limitation: XOR and Linear Separability

Multi-Layer Networks and Activation Functions

Common Activation Functions

Matrix Operations in Neural Networks

Building a Three-Layer Feedforward Network

Output Layers: Regression vs Classification

Loss Functions and Optimization

Loss Computation

Numerical Gradient and Gradient Descent

Backpropagation via Computational Graphs

Layer Abstractions

Convolutional Neural Networks (CNNs)

Core Concepts

Simple CNN Layer Stack

Optimization Strategies Beyond SGD

Weight Initialization and Regularization

Batch Normalization

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Foundations of Neural Networks and Deep Learning

Perceptrons and Logical Operations

Basic Logic Gates

Implementation with Weights and Bias

Limitation: XOR and Linear Separability

Multi-Layer Networks and Activation Functions

Common Activation Functions

Matrix Operations in Neural Networks

Building a Three-Layer Feedforward Network

Output Layers: Regression vs Classification

Loss Functions and Optimization

Loss Computation

Numerical Gradient and Gradient Descent

Backpropagation via Computational Graphs

Layer Abstractions

Convolutional Neural Networks (CNNs)

Core Concepts

Simple CNN Layer Stack

Optimization Strategies Beyond SGD

Weight Initialization and Regularization

Batch Normalization

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment