TensorFlow Data Pipelines and Neural Network Implementation: From CSV to CNN
Data Loading and Neural Network Foundations
Approaches to Feeding Data into TensorFlow
There are three primary methods to supply data to a TensorFlow program:
- QueueRunner pipeline: Reads data from files using queue-based input pipelines at the beginning of the graph.
- Feeding: Python code provides data at each step during runtime.
- Preloading: All data is stored as constants or variables in the graph (suitable for small datasets).
File Reading Pipeline
The pipeline consists of three stages:
- Build a filename queue
- Read and decode
- Batch processing
Note: These operations require starting threads that manage the queue operations to ensure smooth enqueue and dequeue during reading.
1. Building the Filename Queue
# Construct a queue of filenames
filename_queue = tf.train.string_input_producer(
string_tensor, # 1-D tensor of filenames with paths
shuffle=True # Randomize order
)
2. Reading and Decoding
Read file contents from the queue and decode them. Each reader typically extracts one sample at a time:
- Text files: one line per sample (
tf.TextLineReader). - Image files: one image per sample (
tf.WholeFileReader). - Binary files: fixed number of bytes per sample (
tf.FixedLengthRecordReader). - TFRecords: one
Exampleprotocol buffer per sample (tf.TFRecordReader).
The common read(file_queue) method returns a tuple (key, value) where key is the filename and value is the raw content (one sample).
Decoding converts raw bytes into tensors:
# Decode text (CSV)
tf.decode_csv
# Decode JPEG/PNG images
tf.image.decode_jpeg(contents) # -> uint8 tensor [height, width, channels]
tf.image.decode_png(contents) # -> uint8 tensor [height, width, channels]
# Decode raw binary bytes (used with FixedLengthRecordReader)
tf.decode_raw(bytes, out_type=tf.uint8)
All decoded data is of type tf.uint8 by default. Use tf.cast() to convert to tf.float32 later if needed.
3. Batching
After decoding, a single sample is available. To get multiple samples, push them into a new queue for batching:
tf.train.batch(
tensors, # list of tensors to batch
batch_size, # number of samples per batch
num_threads=1, # number of enqueue threads
capacity=32 # max number of elements in the queue
)
# For shuffled batching:
tf.train.shuffle_batch(...)
Thread Management
The queues used are tf.train.QueueRunner objects. To start the queue operations, use:
coord = tf.train.Coordinator() # coordinator for threads
threads = tf.train.start_queue_runners(sess=session, coord=coord)
# Stop gracefully
coord.request_stop()
coord.join(threads)
Image Data Fundamentals
Images are represented as tensors with shape [height, width, channels]. Grayscale images have one channel (single value per pixel). Color images have three channels (RGB). For batches, the shape becomes [batch, height, width, channels].
To standardize image sizes for modeling, use:
tf.image.resize_images(images, size) # size = [new_height, new_width]
Storage uses uint8 to save space; computation uses float32 for precision.
Example: Loading Dog Images
import tensorflow as tf
import os
class ImageLoader:
def __init__(self):
self.files = os.listdir('./dog')
self.paths = [os.path.join('./dog/', f) for f in self.files]
def load_images(self):
# 1. Filename queue
queue = tf.train.string_input_producer(self.paths)
# 2. Read and decode
reader = tf.WholeFileReader()
_, raw = reader.read(queue) # key ignored
image = tf.image.decode_jpeg(raw)
# Resize to uniform shape [200, 200, 3]
resized = tf.image.resize_images(image, [200, 200])
resized.set_shape([200, 200, 3])
# 3. Batch
batch = tf.train.batch([resized], batch_size=100, capacity=100)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
images = sess.run(batch)
print('Batch shape:', images.shape)
coord.request_stop()
coord.join(threads)
if __name__ == '__main__':
loader = ImageLoader()
loader.load_images()
Binary Data: CIFAR-10 Example
The CIFAR-10 dataset consists of 60,000 32×32 color images in 10 classes. Each binary file contains 10,000 samples, each composed of 1 byte (label) + 3072 bytes (pixels: 1024 red, then 1024 green, then 1024 blue, row‑major).
Pipeline Implementation
import tensorflow as tf
import os
class CifarDataset:
def __init__(self):
self.height = 32
self.width = 32
self.channels = 3
self.image_bytes = self.height * self.width * self.channels
self.label_bytes = 1
self.record_bytes = self.label_bytes + self.image_bytes
def read_binary(self, file_list):
# 1. Filename queue
queue = tf.train.string_input_producer(file_list)
# 2. Read fixed-length records
reader = tf.FixedLengthRecordReader(self.record_bytes)
_, record = reader.read(queue) # key ignored
# Decode raw bytes
decoded = tf.decode_raw(record, tf.uint8)
# Split label and image
label = tf.slice(decoded, [0], [self.label_bytes])
pixels = tf.slice(decoded, [self.label_bytes], [self.image_bytes])
# Reshape to [channels, height, width] then transpose to [h, w, c]
img_reshaped = tf.reshape(pixels, [self.channels, self.height, self.width])
img = tf.transpose(img_reshaped, [1, 2, 0])
# Convert to float
img_float = tf.cast(img, tf.float32)
# 3. Batch
label_batch, image_batch = tf.train.batch(
[label, img_float], batch_size=100, capacity=100)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
lbl, imgs = sess.run([label_batch, image_batch])
print('Labels shape:', lbl.shape)
print('Images shape:', imgs.shape)
coord.request_stop()
coord.join(threads)
if __name__ == '__main__':
data_dir = './cifar-10-batches-bin'
files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.bin')]
cifar = CifarDataset()
cifar.read_binary(files)
TFRecords Format
TFRecords is a binary format that stores data as tf.train.Example protocol buffers. It saves memory and does not require separate label files.
Writing TFRecords
# Example of writing CIFAR-10 data to TFRecords
with tf.python_io.TFRecordWriter('cifar10.tfrecords') as writer:
for i in range(100):
image_bytes = image_batch[i].tostring()
label_val = int(label_batch[i][0])
example = tf.train.Example(features=tf.train.Features(feature={
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[label_val])),
'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_bytes]))
}))
writer.write(example.SerializeToString())
Reading TFRecords
def read_from_tfrecord(self):
# 1. Filename queue
queue = tf.train.string_input_producer(['cifar10.tfrecords'])
# 2. Read and parse Example
reader = tf.TFRecordReader()
_, serialized = reader.read(queue)
features = tf.parse_single_example(serialized, features={
'label': tf.FixedLenFeature([], tf.int64),
'image': tf.FixedLenFeature([], tf.string)
})
image = tf.decode_raw(features['image'], tf.uint8)
image = tf.reshape(image, [self.height, self.width, self.channels])
# 3. Batch
label_batch, image_batch = tf.train.batch(
[features['label'], image], batch_size=100, capacity=100)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
lbl, imgs = sess.run([label_batch, image_batch])
coord.request_stop()
coord.join(threads)
Neural Network Basics
An artificial neural network (ANN) mimics biological neural structures. It typically consists of an input layer, one or more hidden layers, and an output layer. Each connection has a weight, and each neuron (except input) applies an activation function. The output layer is often a fully connected layer.
Perceptron (PLA)
The perceptron is the simplest neural unit: it computes a weighted sum of inputs plus bias and passes the result through a step function (sign). It can solve linear separable problems.
Softmax Regression and Cross-Entropy Loss
For multi‑class classification, the network often uses a softmax output layer to convert logits into probabilities. The loss is the cross‑entropy between the true labels (one‑hot) and the predicted probabilities:
loss_per_sample = tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)
mean_loss = tf.reduce_mean(loss_per_sample)
MNIST Handwritten Digit Recognition
The MNIST dataset contains 28×28 grayscale images of digits 0‑9 (60,000 training, 10,000 test). Images are flattened into 784‑dimensional vectors. Labels are one‑hot encoded.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
print('Train shape:', x_train.shape, y_train.shape) # (60000, 28, 28) (60000,)
A simple linear model (softmax regression) can achieve about 92% accuracy. However, linear models cannot handle non‑linear patterns without feature engineering.
Introduction to Convolutional Neural Networks
Convolutional Neural Networks (CNNs) extend traditional MLPs by adding convolutional and pooling layers before the fully connected layers. This enables effective feature extraction from grid‑like data (e.g., images).
Why CNNs?
Traditional fully connected networks ignore spatial structure and have too many parameters for large images. CNNs use local connectivity, weight sharing, and pooling to reduce parameters and capture hierarchical features.
Convolutional Layer
A convolutional layer applies multiple learnable filters (kernels) to the input. Each filter slides across the input with a given stride and optionally zero‑padding to produce a feature map. The output size is determined by:
output_size = (input_size - filter_size + 2 * padding) / stride + 1
Filters at early layers detect edges and corners; deeper layers combine them into higher‑level concepts.
Pooling Layer
Pooling (e.g., max pooling or average pooling) reduces the spatial dimensions, lowering computational load and providing translation invariance. Common pooling size is 2×2 with stride 2.
Typical CNN Architecture
Input → [Conv + ReLU] → Pooling → [Conv + ReLU] → Pooling → Flatten → Fully Connected → Softmax
This structure has been the foundation for breakthroughs in image classification (e.g., AlexNet, VGG, ResNet).