Building Neural Networks with TensorFlow: From Perceptrons to CNNs
Perceptrons: The Building Blocks
Introduced by Frank Rosenblatt in 1957, the perceptron is a fundamental unit of artificial neural networks. It takes multiple input values, multiplies each by a corresponding weight, sums these weighted inputs, and then applies an activation function to produce an output. This simple mechanism can solve basic logical operations like AND and OR.
The relationship between perceptrons and logistic regression is significant. While both use weighted sums and activation functions, perceptrons typically use a step function, whereas logistic regression uses a sigmoid function for probabilistic outputs.
Neural Network Fundamentals
Artificial Neural Networks (ANNs) are computational models inspired by biological neural systems, designed to approximate complex functions. They consist of interconnected nodes (neurons) organized into layers.
Neural networks can be categorized by complexity:
- Basic Networks: Single-layer perceptrons, linear networks, and backpropagation networks.
- Advanced Networks: Boltzmann machines, restricted Boltzmann machines, and recurrent neural networks.
- Deep Networks: Deep belief networks, convolutional neural networks (CNNs), and long short-term memory (LSTM) networks.
Key characteristics of neural networks include:
- Input vectors match the number of input neurons.
- Each connection has an associated weight.
- Neurons within the same layer are not connected.
- Networks typically consist of input, hidden, and output layers.
- Full connectivity between consecutive layers.
The core components of a neural network are:
- Architecture: The structure defining weights and neurons.
- Activation Function: Determines neuron output based on input.
- Learning Rule: Specifies how weights are adjusted over time, typically using backpropagation.
TensorFlow Modules Overview
TensorFlow provides several modules for neural network operations:
- tf.nn: Low-level neural network operations including convolutions, pooling, normalization, and loss functions.
- tf.layers: High-level abstractions for building networks, particularly useful for convolutional layers.
- tf.contrib: Experimental features and advanced operations, though less stable than core modules.
Shallow Neural Network for MNIST
The MNIST dataset consists of handwritten digits. We'll implement a simple neural network using softmax regression.
Data Preparation
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# Load MNIST data with one-hot encoding
mnist = input_data.read_data_sets('data/mnist', one_hot=True)
Model Construction
# Define placeholders for input data
X = tf.placeholder(tf.float32, [None, 784]) # 28x28 images flattened
y_true = tf.placeholder(tf.float32, [None, 10]) # 10 classes
# Initialize weights and biases
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
# Compute logits and apply softmax
logits = tf.matmul(X, W) + b
y_pred = tf.nn.softmax(logits)
# Calculate cross-entropy loss
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=logits))
# Use gradient descent optimizer
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y_true, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Training and Evaluation
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(1000):
batch_X, batch_y = mnist.train.next_batch(100)
sess.run(optimizer, feed_dict={X: batch_X, y_true: batch_y})
test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, y_true: mnist.test.labels})
print(f"Test accuracy: {test_accuracy:.4f}")
Convolutional Neural Network for MNIST
CNNs excel at image recognition by leveraging spatial hierarchies. They use convolutional layers to detect features and pooling layers to reduce dimensionality.
Model Architecture
def create_cnn_model():
# Input layer
X = tf.placeholder(tf.float32, [None, 784])
y_true = tf.placeholder(tf.float32, [None, 10])
X_image = tf.reshape(X, [-1, 28, 28, 1])
# Convolutional layer 1
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))
h_conv1 = tf.nn.relu(tf.nn.conv2d(X_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# Convolutional layer 2
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# Fully connected layer
W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Output layer
W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]))
logits = tf.matmul(h_fc1, W_fc2) + b_fc2
y_pred = tf.nn.softmax(logits)
return X, y_true, logits, y_pred
# Training and evaluation would follow similar patterns as the shallow network