Implementing Neural Networks from Scratch using PyTorch in Python
- Oct 20, 2025
- 12 min read
Updated: Mar 9
Deep learning frameworks like PyTorch make it easy to build neural networks using high-level modules. But relying only on ready-made layers can hide how these models actually work under the hood.
This tutorial breaks down the process by building key components step by step. You’ll see how forward propagation, loss calculation, backpropagation, and parameter updates fit together inside a working neural network.
By the end, you’ll have trained a simple neural network in Python while gaining a clearer understanding of how PyTorch manages tensors, gradients, and the overall training workflow.

Why Choose PyTorch for Neural Network Development
There are several deep learning frameworks available today, but PyTorch has gained widespread popularity because of its flexibility and ease of use.
One major advantage of PyTorch is its Pythonic design, which allows developers to write code that feels natural for Python users. The syntax is clean and readable, making it easier to build and debug neural network models.
Another important feature is the dynamic computation graph. Unlike static graph frameworks, PyTorch builds the computation graph during runtime. This allows developers to modify network architectures easily and makes debugging significantly more straightforward.
PyTorch also includes a powerful autograd engine that automatically computes gradients during backpropagation. This removes the need to manually calculate derivatives, which greatly simplifies the training process for neural networks.
In addition, PyTorch benefits from a large ecosystem of tools and libraries such as
TorchVision, TorchText, and PyTorch Lightning. These resources help developers build complete machine learning pipelines more efficiently.
Because of these advantages, PyTorch has become a popular framework in both academic research and real-world AI development.
What “From Scratch” Really Means in PyTorch (Environment Setup Included)
When we say “implementing neural networks from scratch”, we don’t mean manually coding every mathematical derivative or building our own optimization algorithms from zero. Instead, it means working closer to the core components of the framework rather than relying entirely on high-level abstractions.
Using PyTorch, implementing a neural network from scratch typically involves defining model parameters manually, writing the forward pass yourself, and controlling the training loop step by step. In practice, this includes defining learnable weights using low-level features such as torch.nn.Parameter, implementing the forward propagation logic, managing the training cycle (forward pass → loss calculation → backward pass → parameter update), and avoiding prebuilt wrappers as much as possible.
Taking this approach helps you understand what actually happens behind functions like model.fit() or layers such as nn.Linear(). Instead of relying on abstraction, you gain a clearer picture of how neural networks compute outputs, measure errors, and update their weights during training.
Before diving into the implementation, you need a properly configured Python environment with PyTorch and a few supporting libraries installed. Setting up the environment correctly ensures that your code runs smoothly and can optionally take advantage of GPU acceleration if it is available.
You can install PyTorch using pip. If you are working on a CPU-only environment, run:
pip install torch torchvisionIf you have an NVIDIA GPU with CUDA installed, you should install the version that matches your CUDA configuration. For example:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121After installation, you can verify that PyTorch is correctly installed and check whether CUDA is available by running:
import torch
print(torch.__version__)
print("CUDA Available:", torch.cuda.is_available())Example output might look like this:
2.8.0+cu126
CUDA Available: TrueTo follow this tutorial smoothly, you should also have a few additional Python libraries installed. These include numpy for numerical operations and matplotlib if you want to visualize results during training. They can be installed using:
pip install numpy matplotlibOnce these libraries are installed and PyTorch is set up, your environment is ready for building and training a neural network from scratch using Python.
Core Concepts You Must Understand Before Building Neural Networks in PyTorch
Before implementing a neural network from scratch in PyTorch, it is important to understand the fundamental components that power deep learning models. These core ideas explain how data flows through a network, how predictions are generated, and how the model learns by adjusting its parameters. Once these concepts become clear, the process of building and training neural networks in PyTorch becomes much more intuitive.
1. PyTorch Tensors: The Foundation of Deep Learning Computation
At the heart of every PyTorch model lies the tensor. Tensors are multi-dimensional numerical arrays used to store and manipulate data throughout the neural network. They function as the primary data structure in PyTorch and are used for everything from input features and model parameters to intermediate values produced during calculations.
Unlike standard Python lists or NumPy arrays, PyTorch tensors are designed specifically for high-performance numerical computation and deep learning workloads.
Some key characteristics of PyTorch tensors include:
They can represent vectors, matrices, images, or entire batches of training data.
They support a wide range of mathematical operations such as addition, matrix multiplication, reshaping, and broadcasting.
Tensors can seamlessly move between CPU and GPU, allowing models to take advantage of hardware acceleration for faster training.
During the forward pass of a neural network, tensors move through layers and transformations, carrying data while mathematical operations gradually convert raw inputs into meaningful predictions.
2. Automatic Differentiation with PyTorch Autograd
Training a neural network requires adjusting its parameters so that predictions become closer to the correct outputs. This process relies on backpropagation, which calculates gradients describing how each parameter influences the model's error.
Manually computing these gradients would be extremely complex for large neural networks. This is where PyTorch’s Autograd engine becomes essential.
Autograd automatically performs gradient calculations by tracking every tensor operation executed during the forward pass. It constructs a dynamic computational graph that records how outputs are derived from inputs.
When the backward pass is triggered:
PyTorch traverses the computational graph in reverse.
Gradients are calculated for every parameter involved in producing the final loss.
These gradients are then used by optimization algorithms to update model weights.
Because this graph is built dynamically during execution, developers can easily modify model architectures, experiment with new layers, and debug computations without rigid constraints.
3. Weights, Biases, and Activation Functions in Neural Networks
A neural network is composed of multiple layers, each containing parameters that are learned during training. The two most important parameters inside each layer are weights and biases.
Weights determine the strength of the relationship between input features and neurons. They scale input values and control how strongly each feature influences the final prediction.
Biases allow neurons to shift their output independently of the input values. This added flexibility helps the model capture patterns that do not necessarily pass through the origin.
However, if neural networks only performed linear transformations using weights and biases, they would be limited to learning simple linear relationships. Real-world data rarely behaves that way.
This limitation is solved using activation functions, which introduce non-linearity into the network. Activation functions transform the output of a neuron before passing it to the next layer, allowing neural networks to learn complex patterns.
Some commonly used activation functions include:
ReLU (Rectified Linear Unit) – the most widely used activation for hidden layers due to its efficiency and ability to reduce training issues like vanishing gradients.
Sigmoid – often used in binary classification tasks because it compresses outputs into values between 0 and 1.
Softmax – commonly applied in the final layer of multi-class classification models to convert outputs into probability distributions.
Together, weights, biases, and activation functions form the learning mechanism of a neural network. As training progresses, these parameters are gradually adjusted to minimize prediction errors and improve the model’s ability to recognize patterns in data.
Implementing a Neural Network from Scratch in PyTorch
Building a neural network from scratch in PyTorch is one of the best ways to understand how deep learning models actually work. Instead of relying entirely on pre-built layers, developers can manually define model parameters, perform forward computations, and observe how the learning process updates weights during training.
This approach reveals what happens behind the scenes inside modern neural networks. By constructing layers step by step, you gain a clearer understanding of how inputs move through the network, how predictions are generated, and how gradients modify parameters during backpropagation.
1. Creating a Custom Linear Layer Using nn.Parameter
A good starting point for implementing neural networks in PyTorch is creating a custom linear layer. This demonstrates how weights and biases are stored and updated as trainable parameters.
In PyTorch, learnable parameters are registered using nn.Parameter. When a tensor is wrapped with nn.Parameter, PyTorch automatically tracks it during gradient computation and optimization.
import torch
import torch.nn as nn
class CustomLinear(nn.Module):
def __init__(self, input_dim, output_dim):
super(CustomLinear, self).__init__()
self.weights = nn.Parameter(torch.randn(output_dim, input_dim))
self.bias = nn.Parameter(torch.randn(output_dim))
def forward(self, x):
return torch.matmul(x, self.weights.T) + self.biasA custom layer is created by subclassing nn.Module, which is the base class for all neural network components in PyTorch. The layer defines two learnable parameters:
1. Weights – a matrix that transforms input features into output values.
2. Bias – a vector that shifts the output of the transformation.
During the forward pass, the input tensor is multiplied with the weight matrix using torch.matmul, and the bias term is added to produce the final output. This operation performs the same fundamental computation as a fully connected (dense) layer commonly used in neural networks. Because the parameters are registered using nn.Parameter, PyTorch automatically tracks them during backpropagation and updates them when an optimizer is applied during training.
By implementing layers in this way, developers can see how PyTorch internally manages learnable parameters while still maintaining flexibility to design custom neural network architectures.
2. Building a Simple MLP Model Using Custom Layers
After creating a custom linear layer, the next step is assembling these layers into a complete neural network. A common architecture used in many machine learning tasks is the Multi-Layer Perceptron (MLP). An MLP is a feedforward neural network made up of multiple fully connected layers stacked together with activation functions between them.
Each layer receives input from the previous layer, performs a linear transformation, and then passes the result through a non-linear activation function. This layered structure allows the network to gradually learn more complex representations of the data.
Using the previously defined CustomLinear layer, we can construct a simple MLP model in PyTorch.
class SimpleMLP(nn.Module):
def __init__(self):
super(SimpleMLP, self).__init__()
self.layer1 = CustomLinear(784, 128)
self.layer2 = CustomLinear(128, 64)
self.layer3 = CustomLinear(64, 10)
def forward(self, x):
x = x.view(x.size(0), -1) # Flatten image input
x = torch.relu(self.layer1(x)) # First hidden layer
x = torch.relu(self.layer2(x)) # Second hidden layer
x = self.layer3(x) # Output layer (logits)
return xBetween the custom layers, the ReLU activation function is applied using torch.relu. ReLU introduces non-linearity into the network, enabling the model to learn complex relationships instead of just linear patterns.
The input tensor is also flattened using x.view(x.size(0), -1), which converts multi-dimensional image data into a one-dimensional vector suitable for fully connected layers.
By stacking custom layers in this way, the neural network gradually transforms raw input data into meaningful predictions. This modular design is a key principle in PyTorch, allowing developers to easily expand models with additional layers, different activation functions, or more advanced architectures.
Alternative: Using nn.Module for Cleaner Implementations
While building custom layers helps developers understand how neural networks operate internally, most real-world PyTorch models rely on the framework’s built-in modules. PyTorch provides a wide range of predefined layers and activation functions that make model development faster, more readable, and easier to maintain.
Instead of manually defining weights and biases with nn.Parameter, developers can use modules such as nn.Linear, nn.ReLU, and nn.Sequential to construct neural networks in a more structured way. These modules automatically handle parameter registration and integrate seamlessly with PyTorch’s training and optimization workflow.
One particularly useful component is nn.Sequential, a container that allows layers to be stacked together in a pipeline-like structure. Each layer processes the input and passes its output directly to the next layer, which significantly simplifies the implementation of feedforward neural networks.
Below is an example of implementing the same Multi-Layer Perceptron (MLP) architecture using PyTorch’s built-in modules.
class MLPWithModules(nn.Module):
def __init__(self):
super(MLPWithModules, self).__init__()
self.model = nn.Sequential(
nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10)
)
def forward(self, x):
x = x.view(x.size(0), -1)
return self.model(x)The neural network is constructed using predefined PyTorch layers rather than manually implemented components. The input layer transforms flattened feature vectors using nn.Linear, while hidden layers apply the ReLU activation function to introduce non-linearity into the model. The final linear layer produces raw prediction scores, commonly referred to as logits, which are later passed to a loss function during training.
The use of nn.Sequential simplifies the architecture by chaining layers together in a clearly defined order. This approach removes the need to manually call each layer inside the forward method, resulting in cleaner and more maintainable code.
In practice, most deep learning models built with PyTorch follow this approach because it reduces boilerplate code while still allowing developers to design flexible neural network architectures for a wide range of machine learning tasks.
Writing a Manual Training Loop in PyTorch
Once a neural network architecture has been defined, the next step is training the model so it can learn patterns from data. In PyTorch, this process is controlled through a training loop, where the model repeatedly processes batches of data, measures prediction errors, computes gradients, and updates its parameters.
A typical training loop includes four essential steps: performing a forward pass through the network, calculating the loss, running backpropagation to compute gradients, and updating the model parameters using an optimizer. Implementing this loop manually provides full control over how the neural network learns.
1. Instantiating the Model
Before training begins, the neural network must be created using one of the architectures defined earlier. For example, the previously implemented SimpleMLP model can be instantiated as follows:
model = SimpleMLP()This creates a neural network object containing all layers and learnable parameters defined inside the model class.
2. Defining the Loss Function and Optimizer
A loss function is used to measure how far the model’s predictions are from the true labels. The optimizer then adjusts the model parameters based on gradients calculated during backpropagation.
import torch.nn as nn
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)CrossEntropyLoss is commonly used for multi-class classification tasks because it compares predicted class scores with the correct labels. The stochastic gradient descent (SGD) optimizer updates the model parameters using the computed gradients and a learning rate of 0.01.
3. Forward Pass and Loss Calculation
During training, the dataset is usually divided into batches that are loaded using a data loader. Each batch is passed through the model to generate predictions. These predictions are then compared with the true labels to compute the loss.
for images, labels in train_loader:
outputs = model(images) # Forward pass
loss = criterion(outputs, labels) # Loss calculationThe forward pass sends input images through the neural network layers to produce output scores for each class. The loss function then evaluates how different those predictions are from the actual labels.
4. Backpropagation and Parameter Updates
After computing the loss, the next step is updating the model parameters. This is done through backpropagation, where gradients of the loss with respect to each parameter are calculated.
optimizer.zero_grad() # Reset previous gradients
loss.backward() # Compute gradients via backpropagationoptimizer.step() # Update model parametersFirst, any previously stored gradients are cleared. The loss.backward() call computes gradients by traversing the computational graph in reverse. Finally, optimizer.step() updates the model parameters using the calculated gradients.
5. Complete Training Loop with Accuracy Tracking
In practice, the training loop runs across multiple epochs, where each epoch represents a full pass through the dataset. Accuracy can also be calculated during training to monitor how well the model is learning.
num_epochs = 5
for epoch in range(num_epochs):
total, correct = 0, 0
for images, labels in train_loader:
outputs = model(images)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Accuracy: {accuracy:.2f}%")During each epoch, the model processes batches of training data, calculates prediction errors, updates parameters, and tracks how many predictions match the true labels. As training progresses, the network gradually learns patterns in the dataset, which improves its classification accuracy.
Epoch [1/15], Loss: 2.0564, Accuracy: 24.36%
Epoch [2/15], Loss: 2.0657, Accuracy: 26.59%
Epoch [3/15], Loss: 1.8173, Accuracy: 34.73%
...
Epoch [15/15], Loss: 1.0500, Accuracy: 53.64%These results show the training progression of the neural network. Over multiple epochs, the loss generally decreases while the accuracy increases, indicating that the model is gradually improving its ability to classify the input data.
This kind of output provides a clear view of how the neural network learns during training and helps developers monitor whether the model is converging or requires further tuning.
6. Evaluating the Neural Network on Test Data
After training a neural network, the next step is evaluating its performance on unseen data. This stage is important because it reveals how well the model generalizes beyond the training dataset. A model that performs well during training but poorly on new data is usually suffering from overfitting.
Evaluation involves running the trained model on the test dataset, calculating prediction errors, and measuring overall accuracy. During this phase, gradient calculations are disabled to make the process faster and more memory efficient.
In PyTorch, this is done using torch.no_grad(), which prevents the framework from storing intermediate values needed for backpropagation. The model is also switched to evaluation mode using model.eval(), which disables training-specific behaviors such as dropout and batch normalization updates.
model.eval() # Set model to evaluation mode
test_loss = 0
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
outputs = model(images)
loss = criterion(outputs, labels)
test_loss += loss.item()
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
average_loss = test_loss / len(test_loader)
accuracy = 100 * correct / total
print(f"Test Loss: {average_loss:.4f}, Test Accuracy: {accuracy:.2f}%")In this evaluation loop, the model processes batches of test images and produces predictions through a forward pass. The loss function measures how different the predictions are from the true labels, and the total number of correct predictions is tracked to compute accuracy.
The following output was produced during testing:
Test Loss: 1.3999, Test Accuracy: 55.60%The test loss represents the average prediction error across all test batches, while the test accuracy indicates the percentage of correctly classified samples. Lower loss values generally suggest that the model’s predictions are closer to the true labels, while higher accuracy indicates stronger classification performance.
Comparing training results with test performance helps diagnose potential issues in the model. A large gap between training accuracy and test accuracy may indicate overfitting, where the model memorizes training data instead of learning general patterns. On the other hand, consistently low accuracy across both datasets may suggest underfitting or poorly chosen hyperparameters such as the learning rate, batch size, or model complexity.
Evaluating the neural network on test data provides a realistic measure of its effectiveness and ensures the model performs reliably when applied to real-world inputs.
Conclusion
Building a neural network from scratch in PyTorch reveals how deep learning models truly operate beneath high-level abstractions. By implementing custom layers with nn.Parameter, constructing a Multi-Layer Perceptron, and later recreating it with built-in modules like nn.Linear and nn.Sequential, the underlying mechanics of neural networks become much clearer.
The manual training loop demonstrated how forward propagation, loss calculation, backpropagation, and optimizer updates work together during learning. Tracking accuracy across epochs showed how the model gradually improved as it adjusted its weights. Finally, evaluating the model on test data provided a realistic measure of how well it generalizes beyond the training dataset.
Understanding these fundamentals creates a strong foundation for exploring more advanced architectures such as CNNs, RNNs, and Transformers, along with practical techniques like regularization, transfer learning, and learning rate scheduling.





