Integrating TensorBoard for Model Training Visualization
To utilize TensorBoard for visualizing training metrics, the SummaryWriter class from torch.utils.tensorboard is employed. This class writes event files to a specified directory, which TensorBoard reads to generate visualizations.
from torch.utils.tensorboard import SummaryWriter
log_writer = SummaryWriter(log_dir="training_logs")
Logging Scalar Metrics
Use the add_scalar method to record single-valued metrics over training steps.
Parameters:
tag: Identifier for the metric (e.g., 'Loss/train').scalar_value: The value of the metric.global_step: The training step or epoch number.
Example:
for epoch in range(100):
loss = compute_loss()
log_writer.add_scalar(tag="Training/Loss", scalar_value=loss, global_step=epoch)
Visualizing Images
The add_image method allows logging images for inspection.
Parameters:
tag: Name for the image.img_tensor: Image data as a NumPy array or PyTorch tensor.global_step: Step associated with the image.dataformats: Format of the input data (e.g., 'CHW', 'HWC').
Example:
import numpy as np
from PIL import Image
log_writer = SummaryWriter(log_dir="visualization_data")
sample_img_path = "data/images/sample_ant.jpg"
pil_image = Image.open(sample_img_path)
np_image = np.asarray(pil_image)
print(f"Image dimensions: {np_image.shape}")
# If the image is HWC format (Height, Width, Channels)
log_writer.add_image(tag="Sample_Ant", img_tensor=np_image, global_step=0, dataformats='HWC')
Launching TensorBoard
After running the script, start the TensorBoard server to view the logs.
In a terminal, execute:
tensorboard --logdir=training_logs
To avoid port conflicts when multiple instances are running:
tensorboard --logdir=training_logs --port=6007
Ensure image data is in the correct format. Common formats are 'CHW' (Channels, Height, Width) and 'HWC'. Use the dataformats argument to specify if it differs from the default.