top of page
Gradient With Circle
Image by Nick Morrison

Insights Across Technology, Software, and AI

Discover articles across technology, software, and AI. From core concepts to modern tech and practical implementations.

Fashion MNIST Dataset with PyTorch: A Step-by-Step Tutorial

  • Aug 30, 2024
  • 6 min read

Updated: Mar 10

Building a neural network for the Fashion MNIST dataset in PyTorch shows how the core pieces of deep learning fit together. From loading and preprocessing data to defining the model, training it, and evaluating its performance, each step reveals how neural networks learn from data.


Understanding this workflow creates a strong foundation for exploring more advanced techniques such as convolutional neural networks (CNNs), transfer learning, and larger deep learning architectures.


Fashion MNIST Dataset with PyTorch

Introduction to Fashion MNIST in PyTorch

The Fashion MNIST dataset is a widely used benchmark for image classification tasks in deep learning. It was introduced as a more challenging replacement for the original MNIST handwritten digit dataset and contains 70,000 grayscale images representing 10 categories of clothing items. Each image is 28×28 pixels, making the dataset lightweight and ideal for experimenting with neural network models.


Fashion MNIST is divided into 60,000 training images and 10,000 test images, allowing developers to train models and evaluate their performance on unseen data. The dataset includes categories such as T-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot.


Because the images are small and the dataset is well structured, Fashion MNIST is commonly used for learning the fundamentals of image classification with deep learning frameworks like PyTorch. It allows beginners to focus on understanding neural networks, training workflows, and evaluation techniques without the heavy computational requirements of larger image datasets.


Implementing Fashion MNIST Image Classification in PyTorch

After understanding the dataset and the core concepts behind neural networks, the next step is to implement a working model using PyTorch. This involves preparing the dataset, defining the neural network architecture, training the model, and evaluating its performance on unseen data.

In the following sections, we will build a simple neural network to classify images from the Fashion MNIST dataset. The implementation will demonstrate how PyTorch handles data loading, model definition, forward propagation, loss calculation, and parameter updates during training. By the end of this implementation, you will have a fully functioning image classification model trained on Fashion MNIST.


1. Setting Up the Environment and Loading the Fashion MNIST Dataset

Before building the neural network, the Python environment must include PyTorch and the required supporting libraries. The easiest way to install the core packages is through pip.

pip install torch torchvision

The torch library provides the deep learning framework, while torchvision includes utilities for computer vision tasks such as datasets, transformations, and pretrained models.

Once the environment is ready, the next step is loading the Fashion MNIST dataset. PyTorch makes this process straightforward through the torchvision.datasets module. During loading, basic preprocessing steps are applied to convert images into tensors and normalize their pixel values.

import torch
from torchvision import datasets, transforms

# Define transformations for preprocessing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))])

# Load the Fashion MNIST dataset
trainset = datasets.FashionMNIST(root='./data', train=True, 		download=True, transform=transform)
testset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders for batching
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

In this step, transforms.ToTensor() converts the images into PyTorch tensors, while transforms.Normalize() scales the pixel values to improve training stability. The DataLoader then organizes the dataset into batches, allowing the neural network to process multiple images at once during training.

When the dataset is downloaded for the first time, PyTorch retrieves the files and extracts them locally. The terminal output below shows the actual download and extraction process:

This confirms that the dataset has been successfully downloaded and prepared for training. Once the data loaders are ready, the next step is defining the neural network architecture that will classify the Fashion MNIST images.


2. Building the Neural Network in PyTorch

With the dataset prepared, the next step is defining the neural network architecture that will classify the Fashion MNIST images. For this task, a simple feedforward neural network is sufficient. The model consists of fully connected layers that gradually transform the flattened image input into class predictions.

Each image in the dataset is 28 × 28 pixels, so it must first be flattened into a vector of 784 features before passing through the network. The architecture below uses two hidden layers with ReLU activation functions, followed by an output layer that produces logits for the ten clothing categories.

import torch.nn as nn
import torch.nn.functional as F

class FashionMNISTClassifier(nn.Module):
    def __init__(self):
        super(FashionMNISTClassifier, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)
    def forward(self, x):
        x = x.view(-1, 28 * 28)  # Flatten the image
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return xmodel = FashionMNISTClassifier()

print(model)

The printed output below shows the structure of the neural network, including the dimensions of each fully connected layer:

FashionMNISTClassifier(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=256, bias=True)
  (fc3): Linear(in_features=256, out_features=10, bias=True))

This confirms that the model contains three linear layers that progressively reduce the feature dimensions before producing predictions for the 10 Fashion MNIST classes.


3. Training the Model

Once the neural network architecture is defined, the next step is training the model. Training involves calculating prediction errors, computing gradients through backpropagation, and updating the model parameters so that the network gradually improves its predictions.

For this task, CrossEntropyLoss is used as the loss function since the model is solving a multi-class classification problem with ten output classes. The Adam optimizer is chosen because it adapts the learning rate during training and generally performs well for neural network models.

The following code defines the loss function, initializes the optimizer, and implements a training loop that processes the dataset over multiple epochs.

import torch.optim as optim

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
def train_model(trainloader, model, criterion, optimizer, 	num_epochs=5):
    for epoch in range(num_epochs):
        running_loss = 0.0
        for images, labels in trainloader:
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}")

train_model(trainloader, model, criterion, optimizer)

During each epoch, the model processes batches of images from the training dataset. For every batch, the network generates predictions through a forward pass, computes the loss, and then updates its parameters using gradient descent.

The following output shows the actual training loss recorded during execution:

Epoch 1, Loss: 0.4830017135913438
Epoch 2, Loss: 0.3657274188231558
Epoch 3, Loss: 0.3273437559318695
Epoch 4, Loss: 0.3029962965706264
Epoch 5, Loss: 0.278847184866222

The decreasing loss across epochs indicates that the neural network is learning meaningful patterns from the Fashion MNIST images and improving its classification performance during training.


4. Evaluating the Model

Once training is complete, the next step is evaluating how well the neural network performs on unseen data. This is done using the test dataset, which the model has not encountered during training. The goal of evaluation is to measure how effectively the model generalizes to new inputs.

During evaluation, gradient calculations are disabled using torch.no_grad(). Since the model is no longer learning at this stage, disabling gradients makes the process faster and reduces memory usage.

The following function runs the trained model on the test dataset and calculates classification accuracy.

def evaluate_model(testloader, model):
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in testloader:
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    print(f"Accuracy: {100 * correct / total}%")

evaluate_model(testloader, model)

The model processes each batch of test images, predicts the most likely class, and compares the predictions with the true labels to determine how many were classified correctly.

The following output was produced after running the evaluation:

Accuracy: 87.28%

An accuracy of 87.28% indicates that the neural network correctly classified most images in the test dataset. For a simple fully connected network trained on Fashion MNIST, this level of performance is generally considered a solid baseline and demonstrates that the model has learned meaningful patterns from the training data.


5. Visualizing the Results

Accuracy scores provide a numerical view of model performance, but visualizing predictions can offer deeper insight into how the neural network is classifying images. By displaying sample images from the test dataset along with their predicted and true labels, it becomes easier to observe where the model performs well and where it might make mistakes.

The following code selects a batch of images from the test loader, runs them through the trained model, and displays a few predictions using Matplotlib.

import matplotlib.pyplot as plt

# Class names in Fashion MNIST
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Function to display an image with prediction
def imshow(img, i):
    img = img / 2 + 0.5
    plt.imshow(img.numpy().squeeze(), cmap='gray')
    plt.title(f'Predicted: {classes[predicted[i]]}\nTrue: {classes[labels[i]]}')
    plt.axis('off')
    plt.show()

# Get a batch of test images
dataiter = iter(testloader)
images, labels = next(dataiter)

# Run the model on the images
outputs = model(images)_, predicted = torch.max(outputs, 1)

# Display a few predictions
for i in range(2):
    imshow(images[i], i)

In this example, images from the test dataset are passed through the trained model to generate predictions. Each displayed image shows both the predicted class and the true label, making it easier to see how accurately the model interprets different clothing items.

Visualizing predictions is particularly useful for identifying misclassifications, which can reveal patterns that might require improvements in model architecture, training data, or hyperparameters.

Fashion MNIST Dataset with PyTorch - COLABCODES


Conclusion

The process of building a neural network to classify images from the Fashion MNIST dataset demonstrates the foundational steps of deep learning and image classification with PyTorch. Starting with data preparation and loading, we've seen how important it is to properly transform and normalize the dataset to ensure effective model training. The simple feedforward neural network used in this tutorial provides a basic yet powerful introduction to image classification, highlighting how layers, activation functions, and loss calculation contribute to the learning process.


Get in touch for customized mentorship, research and freelance solutions tailored to your needs.

bottom of page