top of page

Learn through our Blogs, Get Expert Help, Mentorship & Freelance Support!

Welcome to Colabcodes, where innovation drives technology forward. Explore the latest trends, practical programming tutorials, and in-depth insights across software development, AI, ML, NLP and more. Connect with our experienced freelancers and mentors for personalised guidance and support tailored to your needs.

Coding expert help blog - colabcodes

Implementing Neural Networks from Scratch using PyTorch in Python

  • Writer: Samul Black
    Samul Black
  • Oct 20
  • 9 min read

Deep learning frameworks like PyTorch have made building neural networks faster and more intuitive—but real understanding comes when you implement things yourself. In this blog, you’ll go beyond using high-level abstractions and learn how neural networks actually work under the hood. We’ll implement a fully connected neural network from scratch using PyTorch fundamentals such as tensors, parameters, and the autograd engine. By the end, you’ll not only train a model on real data but also gain a solid grasp of what happens during forward propagation, loss computation, backpropagation, and parameter updates.

neural networks - colabcodes

Introduction to Building Neural Networks in PyTorch

PyTorch has quickly become a go-to deep learning framework for both beginners and researchers due to its intuitive design, Pythonic syntax, and dynamic computation graph. Unlike traditional machine learning workflows, PyTorch allows developers to think in terms of tensors, gradients, and forward-backward propagation—making it ideal for building and experimenting with neural networks from the ground up.

Neural networks consist of interconnected layers of mathematical operations. When building them using PyTorch, we work step by step: defining parameters, performing forward passes, computing errors, and using backpropagation to update weights. This hands-on approach helps you gain a solid understanding of how deep learning truly works under the hood before relying on high-level abstractions.


Why Choose PyTorch for Neural Network Development

There are several deep learning frameworks available today, but PyTorch stands out because:


  1. It feels like Python: The syntax is clean, readable, and easy for Python developers to grasp.

  2. Dynamic computation graph: You can define and modify the neural network structure on the fly, which is great for debugging and experimenting.

  3. Autograd support: PyTorch automatically calculates gradients during backpropagation, reducing manual complexity.

  4. Large ecosystem and community: With libraries like TorchVision, TorchText, and PyTorch Lightning, building end-to-end AI pipelines becomes faster.

  5. Widely used in AI research: Many state-of-the-art deep learning models are prototyped and published using PyTorch.


In short, PyTorch gives you both control and flexibility—ideal for learning and innovation.


What “From Scratch” Really Means in PyTorch

When we say “implementing neural networks from scratch,” we don’t mean manually coding matrix derivatives or writing our own optimizer logic from zero. Instead, it means:


✅ Defining your own layers using low-level features such as torch.nn.Parameter

✅ Manually writing the forward pass

✅ Controlling the training loop yourself (forward → loss → backward → update)

✅ Avoiding prebuilt high-level wrappers as much as possible


This approach helps you understand what happens behind model.fit() or nn.Linear() so you gain clarity on how neural networks truly function.


Prerequisites and Environment Setup

Before diving into implementation, you need a properly configured Python environment with PyTorch and supporting libraries installed. Setting up your environment correctly ensures that your code runs smoothly, with optional support for GPU acceleration if available.


Installing PyTorch in Python (CPU/GPU Support)

You can install PyTorch using pip. If you're working on CPU:

pip install torch torchvision

If you have an NVIDIA GPU and CUDA installed, visit the official PyTorch website (pytorch.org) and copy the installation command that matches your CUDA version. For example:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

To verify installation:

import torch
print(torch.__version__)
print("CUDA Available:", torch.cuda.is_available())

Output:
2.8.0+cu126
CUDA Available: True

Required Python Libraries and Version

You should have the following installed:

Library

Purpose

Python 3.8+

Required for modern PyTorch builds

torch

Core deep learning library

torchvision

For datasets like MNIST/CIFAR

numpy

Helpful for numerical operations

matplotlib (optional)

To visualize results

To install everything at once:

pip install numpy matplotlib

Once these are set up, you’re ready to start coding your neural network from scratch.


Core Concepts Before Implementation

Before implementing a neural network from scratch in PyTorch, developing a deep understanding of its fundamental components is essential. These core concepts form the backbone of every operation that takes place inside the model—from data movement to gradient updates.


PyTorch Tensors and Operations

Tensors are the fundamental data structure in PyTorch. They are multi-dimensional arrays used to store inputs, outputs, model parameters, and intermediate values computed during training. Think of tensors as a more powerful extension of matrices that can be used in spaces of higher dimensions.

Key characteristics:


  • Tensors can represent vectors, matrices, images, or even batches of data.

  • They support mathematical operations like addition, multiplication, dot products, and reshaping.

  • Tensors can move between CPU and GPU for faster computation, enabling efficient deep learning training.


Tensors act as the carriers of data throughout the model, passing through various layers and transformations during forward and backward propagation.


How PyTorch’s Autograd Engine Powers Backpropagation

Training a neural network involves adjusting weights based on how far the model’s predictions deviate from actual target values. This adjustment process relies on backpropagation, which requires calculating gradients (partial derivatives) of the loss with respect to each parameter.

PyTorch’s Autograd engine automates this process:


  • During the forward pass, PyTorch dynamically builds a computational graph tracking all tensor operations.

  • During the backward pass, it calculates gradients by traversing this graph in reverse.

  • These gradients are then used to update the network’s parameters using optimization algorithms.


This dynamic automatic differentiation allows PyTorch to flexibly support complex model architectures and experimental workflows.


What Are Weights, Biases, and Activation Functions?

A neural network is composed of layers, each containing parameters that are learned during training. These parameters include weights and biases.


  • Weights determine the strength of the connection between neurons. They act as coefficients that scale the input features.

  • Biases allow each neuron to shift the output independently of the input, improving the network’s ability to learn patterns that don't pass through the origin.


However, without introducing non-linearity, the network would only be able to learn linear relationships. This is where activation functions come into play.

Activation functions are applied after linear transformations to introduce non-linearity into the model’s decision-making process. Common examples include:


  • ReLU (Rectified Linear Unit) – widely used for hidden layers due to its simplicity and effectiveness.

  • Sigmoid – used for outputs in binary classification tasks.

  • Softmax – transforms outputs into probability distributions in multi-class classification.


Together, weights and biases form the trainable structure of the network, while activation functions enable it to model complex, non-linear relationships in data.


Dataset Preparation for Neural Network Training

Before training a neural network, it's essential to structure and preprocess the dataset properly. This involves importing the required modules, applying transformations, loading the dataset, and organizing it into batches using DataLoaders.


Importing Required Libraries

To begin, PyTorch and torchvision libraries are imported to access datasets, transformations, and DataLoader utilities.

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

Here, torch provides the core deep learning framework, while datasets and transforms help in loading and preprocessing data. DataLoader is used to batch and shuffle the dataset efficiently.


Applying Transformations (Normalization + Tensor Conversion)

Before feeding images into a neural network, they must be converted to tensors and normalized to a consistent scale for stable learning.

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

This pipeline converts pixel values into tensors and normalizes them with a mean of 0.5 and standard deviation of 0.5, which helps the network train more effectively.


Loading the FashionMNIST Dataset

Once the transformation pipeline is defined, the FashionMNIST training and testing datasets are loaded using torchvision.datasets.

train_dataset = datasets.FashionMNIST(
    root='./data',
    train=True,
    transform=transform,
    download=True
)

test_dataset = datasets.FashionMNIST(
    root='./data',
    train=False,
    transform=transform,
    download=True
)

Here, train=True loads the training split and train=False loads the testing split. The dataset is downloaded automatically if it's not already present.


Creating DataLoaders for Training and Testing

To enable efficient iteration over the dataset in batches, DataLoaders are created for both training and testing datasets.

train_loader = DataLoader(
    train_dataset,
    batch_size=64,
    shuffle=True
)

test_loader = DataLoader(
    test_dataset,
    batch_size=64,
    shuffle=False
)

The batch_size defines how many samples are processed at once. Shuffling is enabled during training for randomness, while testing keeps data order consistent.


Implementing a Neural Network from Scratch in PyTorch

Implementing a neural network from scratch helps in understanding how layers, weights, and forward propagation function internally. In PyTorch, this can be done by manually defining parameters, building models step-by-step, and then refining them using higher-level modules like nn.Module.


Creating a Custom Linear Layer with nn.Parameter

A custom linear layer demonstrates how PyTorch stores and updates weights and biases manually using nn.Parameter, which registers tensors as learnable parameters.

import torch
import torch.nn as nn

class CustomLinear(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(CustomLinear, self).__init__()
        self.weights = nn.Parameter(torch.randn(output_dim, input_dim))
        self.bias = nn.Parameter(torch.randn(output_dim))

    def forward(self, x):
        return torch.matmul(x, self.weights.T) + self.bias

This manually defines a linear transformation where weights and bias are learnable parameters. The forward method performs matrix multiplication similar to a fully connected layer.


Building a Simple MLP Model Using Custom Layers

Once a custom linear layer is available, it can be used to build a simple Multi-Layer Perceptron (MLP) with activation functions and stacked layers.

class SimpleMLP(nn.Module):
    def __init__(self):
        super(SimpleMLP, self).__init__()
        self.layer1 = CustomLinear(784, 128)
        self.layer2 = CustomLinear(128, 64)
        self.layer3 = CustomLinear(64, 10)

    def forward(self, x):
        x = x.view(x.size(0), -1)       # Flatten image input
        x = torch.relu(self.layer1(x))  # First hidden layer
        x = torch.relu(self.layer2(x))  # Second hidden layer
        x = self.layer3(x)              # Output layer (logits)
        return x

Here, multiple custom linear layers are combined with ReLU activations to form a basic feedforward neural network suitable for classification.


Alternative: Using nn.Module for Cleaner Implementations

Instead of manually creating each parameterized layer, PyTorch provides high-level modules like nn.Linear for cleaner and more efficient model creation.

class MLPWithModules(nn.Module):
    def __init__(self):
        super(MLPWithModules, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        x = x.view(x.size(0), -1)
        return self.model(x)

Using nn.Sequential with predefined modules like nn.Linear and nn.ReLU leads to a more concise, maintainable, and production-ready implementation.


Writing the Manual Training Loop (Forward, Loss, Backward, Update)

A manual training loop provides full control over how a neural network learns. It includes performing forward propagation, calculating loss, computing gradients via backpropagation, and updating model parameters.


Model Instantiation (Using Previously Defined Model)

Before defining the training components, the model must be instantiated using one of the previously created architectures (e.g., MLPWithModules).

model = SimpleMLP()  

Here, the model is created based on the neural network structure defined earlier.


Defining Loss Function and Optimizer

A loss function evaluates prediction errors, while the optimizer updates the model weights based on computed gradients.

import torch.nn as nn
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

CrossEntropyLoss is ideal for multi-class classification. The SGD optimizer updates model parameters with a learning rate of 0.01.


Forward Pass and Loss Computation Explained

During training, each batch goes through the model to produce outputs, which are then compared with actual labels via the loss function.

for images, labels in train_loader:
    outputs = model(images)           # Forward pass
    loss = criterion(outputs, labels) # Loss calculation

The model predicts class probabilities, and the loss measures the difference from true labels.


Backpropagation and Parameter Update Using loss.backward()

After calculating the loss, gradients are computed and used to update weights.

    optimizer.zero_grad()    # Reset previous gradients
    loss.backward()          # Compute gradients via backprop
    optimizer.step()         # Update weights

Gradients are cleared, recalculated, and applied to adjust parameters toward minimizing loss.


Full Training Loop Code with Accuracy Tracking

The full training process is executed across multiple epochs with accuracy tracking to monitor performance.

num_epochs = 5

for epoch in range(num_epochs):
    total, correct = 0, 0

    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Accuracy: {accuracy:.2f}%")

Output:
Epoch [1/15], Loss: 2.0564, Accuracy: 24.36%
Epoch [2/15], Loss: 2.0657, Accuracy: 26.59%
Epoch [3/15], Loss: 1.8173, Accuracy: 34.73%
Epoch [4/15], Loss: 1.5521, Accuracy: 39.90%
Epoch [5/15], Loss: 1.6673, Accuracy: 41.47%
Epoch [6/15], Loss: 1.3872, Accuracy: 43.17%
Epoch [7/15], Loss: 1.5100, Accuracy: 45.72%
Epoch [8/15], Loss: 1.3530, Accuracy: 46.95%
Epoch [9/15], Loss: 1.2225, Accuracy: 48.08%
Epoch [10/15], Loss: 1.4070, Accuracy: 49.32%
Epoch [11/15], Loss: 1.5774, Accuracy: 50.11%
Epoch [12/15], Loss: 1.8325, Accuracy: 50.99%
Epoch [13/15], Loss: 1.9283, Accuracy: 52.23%
Epoch [14/15], Loss: 1.5970, Accuracy: 53.16%
Epoch [15/15], Loss: 1.0500, Accuracy: 53.64%

This complete loop processes batches, updates weights, and reports accuracy after each epoch, showing how well the model is learning.


Evaluating the Neural Network on Test Data

After training, it’s important to evaluate the model on unseen test data to understand how well it generalizes beyond the training set. This involves disabling gradient calculations, running a forward pass on the test set, and computing overall loss and accuracy.


Calculating Loss and Accuracy on Test Set

The model is evaluated in inference mode using torch.no_grad() to prevent gradient calculation, ensuring faster and memory-efficient evaluation.

model.eval()  # Set model to evaluation mode

test_loss = 0
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)
        test_loss += loss.item()

        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

average_loss = test_loss / len(test_loader)
accuracy = 100 * correct / total

print(f"Test Loss: {average_loss:.4f}, Test Accuracy: {accuracy:.2f}%")

Output:
Test Loss: 1.3999, Test Accuracy: 55.60%

Setting the model to .eval() deactivates features like dropout. Loss and accuracy are calculated over all batches to assess overall performance.


A lower loss indicates better alignment between model predictions and target labels. Higher accuracy reflects correct predictions across the test set. If there’s a significant gap between training and test accuracy, it may indicate overfitting, underfitting, or issues with hyperparameters such as learning rate or batch size.


Conclusion

In this guided walkthrough, we built a complete neural network workflow in PyTorch from the ground up. We started by preparing the dataset using torchvision and DataLoaders, ensuring efficient batching and normalization of inputs. Then, we gained deeper insights into neural network internals by implementing a custom linear layer using nn.Parameter, followed by constructing a full Multi-Layer Perceptron (MLP) both manually and using PyTorch’s higher-level nn.Module utilities.

After defining the loss function and optimizer, we manually crafted a training loop that handled forward propagation, loss calculation, gradient backpropagation, and parameter updates. Accuracy tracking during training helped monitor learning progress. Finally, we evaluated the trained model on test data in evaluation mode to measure its generalization ability using loss and accuracy metrics.

By building everything step by step, we developed a strong intuitive understanding of how PyTorch manages layers, parameters, gradients, and training workflows. This foundational approach provides a solid base for exploring more advanced architectures like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers, as well as techniques such as regularization, learning rate scheduling, and transfer learning.

You’ve not only trained a neural network—you’ve understood how it learns.

Get in touch for customized mentorship, research and freelance solutions tailored to your needs.

bottom of page