top of page
Gradient With Circle
Image by Nick Morrison

Insights Across Technology, Software, and AI

Discover articles across technology, software, and AI. From core concepts to modern tech and practical implementations.

Recurrent Neural Networks in Python (RNN)

  • Writer: Samul Black
    Samul Black
  • Dec 14, 2025
  • 11 min read

Machine learning systems that understand sequential data — sentences, sensor signals, financial time series, speech patterns — lie at the heart of modern AI applications. Before transformers dominated the landscape, Recurrent Neural Networks (RNNs) were the backbone of sequence modeling and still remain highly relevant in domains where temporal continuity and lightweight deployment matter.

This extensive guide provides a research-oriented, implementation-ready, and academically grounded exploration of RNNs, LSTMs, and GRUs using Python and PyTorch. Ideal for students, developers, researchers, and startups seeking hands-on fundamentals, practical workflows, and insights aligned with global and Indian tech ecosystems.


Recurrent Neural Networks - Colabcodes

What Are Recurrent Neural Networks (RNNs)?

Recurrent Neural Networks (RNNs) are a class of neural architectures designed to process sequential data by retaining information from earlier time steps. Unlike traditional feedforward neural networks—where each input is treated independently—RNNs incorporate a form of internal memory that enables them to model temporal continuity and dependencies across a sequence. This ability makes them valuable in applications such as speech recognition, virtual assistants like Siri and Google Voice Search, text generation, machine translation, and time-series forecasting.

In conventional neural networks, each layer contains its own set of parameters. RNNs, however, share the same parameters across all time steps. This weight sharing allows the model to apply the same transformation to every element in a sequence, making it well-suited for tasks where the order of inputs matters. During training, gradients flow through these recurrent connections to update the shared weights, enabling the network to learn patterns that evolve over time.


How Recurrent Neural Networks Work

Like traditional neural networks, such as feedforward neural networks and convolutional neural networks (CNNs), recurrent neural networks use training data to learn. They are distinguished by their “memory” as they take information from prior inputs to influence the current input and output. While traditional deep learning networks assume that inputs and outputs are independent of each other, the output of recurrent neural networks depend on the prior elements within the sequence. While future events would also be helpful in determining the output of a given sequence, unidirectional recurrent neural networks cannot account for these events in their predictions.

Recurrent Neural Networks are built on the principle that sequential data contains temporal structure—relationships that evolve and accumulate meaning as the sequence unfolds. To capture this structure, an RNN introduces a recurrent loop that allows information from earlier positions in the sequence to be carried forward and intermixed with later inputs. This looping mechanism produces a dynamic form of memory, enabling the model to interpret each time step not in isolation but as a continuation of what preceded it.

At the heart of this mechanism lies the hidden state, denoted by ht​. It functions as a continuously updated summary of all information the network has processed up to time step t. When an input sequence {x1, x2,...,xT} is fed into the network, the RNN processes each element in order. For every time step t, the model computes the new hidden state by combining the current input vector with the previous hidden state:


ht =tanh(WxXt + Whht-1 + bh)


where Wx​ determines how the network interprets the current input, Wh​ controls how memory from previous steps influences the present, and bh​ is a bias term. The nonlinearity introduced by the tanh function bounds the values of the hidden state and enriches the model with expressive capacity.

Once the hidden state is computed, the model generates an output for that time step using:


yt =f(Wyht + by)


where f is typically a softmax function for classification tasks or a linear layer for regression problems. This output represents the network’s interpretation or prediction based on both the current input and the accumulated memory stored in ht​.

The mathematics behind training an RNN further amplifies its theoretical richness. During learning, gradients are propagated backward through time using a procedure known as Backpropagation Through Time (BPTT).

The gradient of the loss L with respect to a parameter such as Wh involves contributions from each time step:

BPTT - Colabcodes

Because the hidden state at each time step depends recursively on all previous states, these gradients accumulate across the entire sequence. The chain of derivatives takes the form:

BPTT - colabcodes

where the derivative of the tanh function appears in the diagonal term. When this recursive multiplication spans many time steps, gradients tend to shrink exponentially, leading to the vanishing gradient problem. This is precisely what makes standard RNNs struggle with long-range dependencies: the earlier a signal appears in a sequence, the more likely it is to be diluted before reaching the updates for parameters affecting later predictions.

Conversely, the same structure can cause gradients to grow exponentially, producing the exploding gradient problem, which destabilizes training and must often be controlled with gradient clipping. Despite these challenges, the conceptual design of RNNs remains foundational. The architecture introduced a systematic way of modeling temporal continuity using shared parameters and recursive state updates, anticipating more advanced models such as LSTMs and GRUs.


To appreciate the operation of an RNN more concretely, imagine a sequence such as a stock time series or a sentence. When the network processes the first value x1​, the resulting hidden state h1 contains information derived solely from that input. When the network processes x2​, the new hidden state h2​ integrates knowledge from both x1​ and x2. By the time the model reaches xt​, the hidden state ht​ embodies a compressed representation of the entire history {x1,x2,...,xt}. This cumulative property is what gives RNNs their ability to model phenomena that depend on time, context, or sequential ordering.

Ultimately, the theoretical shortcomings of standard RNNs—particularly their inability to maintain meaningful gradients over long sequences—motivated breakthroughs in recurrent architectures and later paved the way for attention-based models and Transformers. Yet, understanding this mathematical and conceptual foundation is essential for appreciating the full evolution of sequence modeling in modern machine learning.


Types of Recurrent Neural Networks

Recurrent Neural Networks can be categorized based on how input sequences are mapped to outputs and how information flows across time steps. These variations allow RNNs to be adapted to a wide range of sequential learning problems, from time-series forecasting to language modeling and sequence generation.


1. One-to-One RNN

The one-to-one configuration represents the simplest form of neural network and is conceptually included for completeness. In this setting, a single input corresponds to a single output, and there is no explicit temporal dependency between multiple inputs. While this structure does not exploit the recurrent nature of RNNs, it forms the foundation upon which more complex sequence-based architectures are built. Traditional feedforward neural networks used for image classification can be viewed as one-to-one mappings.


2. One-to-Many RNN

In a one-to-many RNN, a single input is used to generate a sequence of outputs over time. This structure is particularly useful in generative tasks where a fixed input must produce a variable-length output sequence. A common example is image captioning, where an image embedding serves as the initial input and the RNN generates a sequence of words forming a descriptive sentence. At each time step, the hidden state evolves by incorporating previously generated tokens, enabling coherent sequence generation.

Mathematically, the initial hidden state is often conditioned on the input vector:


h0 = f(WxX + b)


Subsequent outputs are generated as:


y= g(Whht + c)


3. Many-to-One RNN

The many-to-one architecture processes a sequence of inputs and produces a single output. This formulation is widely used in tasks where a decision must be made based on the entire sequence, such as sentiment analysis, document classification, and activity recognition. Each input in the sequence contributes to the evolving hidden state, and the final hidden state serves as a compressed representation of the entire input sequence.

Formally, given a sequence {x1,x2,…,xT}, the prediction is based on the final hidden state:


y= g(WhhT + c)



4. Many-to-Many RNN

In the many-to-many configuration, both the input and output are sequences. This is the most general and powerful RNN formulation and is extensively used in sequence-to-sequence tasks such as machine translation, speech recognition, and named entity recognition. Each input token influences the corresponding output token while also contributing to future predictions through the recurrent hidden state.


ht = f(Wxxt + Whht-1 + b)

y= g(Wyht + c)


This architecture enables fine-grained alignment between inputs and outputs across time.


5. Encoder–Decoder RNN

A specialized form of many-to-many architecture is the encoder–decoder RNN. In this setup, the encoder processes the entire input sequence and compresses it into a fixed-length context vector. The decoder then unfolds this context into an output sequence. This approach was foundational in early neural machine translation systems before the introduction of attention mechanisms and Transformers.

The encoder computes:


h= Encoder(x1, x2,.....xT)


The decoder generates:


yt  = Decoder(hT, yt -1)


Although effective for short sequences, this architecture struggles with long inputs due to information bottlenecks, motivating the development of attention-based models.


6. Bidirectional RNN

Bidirectional RNNs enhance traditional RNNs by processing sequences in both forward and backward directions. This allows the model to incorporate both past and future context when computing hidden states. Such models are particularly effective in tasks like part-of-speech tagging and named entity recognition, where understanding a word depends on both preceding and succeeding tokens.

The hidden representation at each time step is formed by concatenating forward and backward states. This dual perspective significantly improves contextual understanding but increases computational complexity.


Implementation of Recurrent Neural Networks (RNN) in Python for Sequential Data From-Scratch

This implementation demonstrates how a vanilla RNN learns temporal dependencies using a simple sequence-to-one forecasting task. The objective is to predict the next value in a numerical sequence based on previous observations and demonstrate sequence modeling, forward pass, backpropagation through time (BPTT), and training on a sequential dataset (character-level sequence prediction).


1. Data Generation and GPU-Ready Tensor Preparation

This section focuses on preparing the time-series data and configuring GPU execution. A synthetic sine wave is generated and converted into a supervised learning format by creating fixed-length input sequences paired with their next-step targets. Each input sequence represents a sliding window over the signal, allowing the model to learn temporal dependencies. The data is then converted into PyTorch tensors and reshaped to match the expected RNN input format (samples, sequence length, features). Finally, the tensors are moved to the GPU when available, enabling faster computation during training and inference while maintaining automatic fallback to CPU execution if a GPU is not present.

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np

# -----------------------------
# 1. Generate time-series data
# -----------------------------
np.random.seed(42)
SEQ_LENGTH = 20
NUM_SAMPLES = 1000

def generate_sine_wave(seq_length, num_samples):
    X, y = [], []
    x = np.linspace(0, num_samples + seq_length, num_samples + seq_length)
    data = np.sin(x)
    for i in range(num_samples):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

X_np, y_np = generate_sine_wave(SEQ_LENGTH, NUM_SAMPLES)

# Convert to PyTorch tensors
X = torch.tensor(X_np, dtype=torch.float32).unsqueeze(-1)  # shape: (samples, seq_len, 1)
y = torch.tensor(y_np, dtype=torch.float32).unsqueeze(-1)  # shape: (samples, 1)

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
X, y = X.to(device), y.to(device)

2. RNN Architecture Definition

This section defines a simple recurrent neural network designed for univariate time-series forecasting. The model consists of a vanilla RNN layer with a tanh activation function, which processes sequential input data and captures temporal dependencies across time steps. The batch_first=True configuration ensures the input tensor follows the intuitive (batch, sequence, features) format. The hidden state produced at the final time step is passed through a fully connected linear layer to generate the next-step prediction. The model is then transferred to the selected computation device, allowing seamless execution on a GPU when available.

# -----------------------------
# 2. Define RNN model
# -----------------------------
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True, nonlinearity='tanh')
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x, h0=None):
        out, hn = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])  # take last time step
        return out, hn

input_size = 1
hidden_size = 50
output_size = 1
model = SimpleRNN(input_size, hidden_size, output_size).to(device)

3. Training Configuration and Optimization Setup

This section defines the training strategy for the RNN model. Mean Squared Error (MSE) loss is used as the objective function, making it well suited for continuous-valued time-series regression tasks. The Adam optimizer is selected to adaptively adjust learning rates for each parameter, enabling stable and efficient convergence during training. A fixed number of training epochs is specified to control how many full passes the model makes over the dataset, balancing learning progress with computational cost.

# -----------------------------
# 3. Training setup
# -----------------------------
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
epochs = 30

4. Model Training Loop

This section implements the core training process for the recurrent neural network. During each epoch, gradients from the previous iteration are cleared to prevent accumulation, and the entire training dataset is passed through the model in a forward pass to generate predictions. The loss between the predicted values and the true targets is computed using the predefined loss function. Backpropagation is then applied to compute gradients with respect to the model parameters, and the optimizer updates these parameters accordingly. The training loss is printed at each epoch, providing continuous feedback on the model’s learning progress and convergence behavior over time.

# -----------------------------
# 4. Training loop
# -----------------------------
for epoch in range(epochs):
    optimizer.zero_grad()
    outputs, _ = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

5. Autoregressive Forecasting of Future Time Steps

This section performs multi-step time-series forecasting using an autoregressive strategy. The trained model is switched to evaluation mode to disable training-specific behaviors, and an initial input sequence is selected from the dataset. At each prediction step, the model generates the next value based on the current input window and hidden state. This predicted value is then appended to the sequence while the oldest time step is removed, creating a rolling input window for the next iteration. By repeatedly feeding its own predictions back into the model, the RNN produces a sequence of future values that extend beyond the observed data, enabling long-horizon forecasting.

# -----------------------------
# 5. Forecast future values
# -----------------------------
model.eval()
test_input = X[0].unsqueeze(0)  # shape (1, seq_len, 1)
predictions = []

h = None
current_input = test_input.clone()

for _ in range(50):
    with torch.no_grad():
        y_pred, h = model(current_input, h)
        predictions.append(y_pred.item())
        next_input = y_pred.unsqueeze(0)  # shape (1, 1, 1) - Corrected line
        current_input = torch.cat((current_input[:, 1:, :], next_input), dim=1)

6. Visualization and Prediction Comparison

This section visualizes the forecasting performance of the trained RNN by comparing its predicted values with the true sine wave signal. The model’s future predictions are plotted over a fixed horizon, providing insight into how well the network extrapolates temporal patterns beyond the training window. For reference, the corresponding ground-truth sine wave values are overlaid using a dashed line, making deviations and phase differences easy to observe. This side-by-side visualization helps evaluate the model’s ability to capture periodic behavior and assess prediction accuracy in a clear and intuitive manner.

# -----------------------------
# 6. Visualization with original sine wave
# -----------------------------
plt.figure(figsize=(10, 4))

# Plot RNN predictions
plt.plot(range(50), predictions, label="RNN Prediction")

# Plot original sine wave for comparison
original_wave = X_np[0, -SEQ_LENGTH:]  # last part of input sequence
# Append next 50 true values from y_np to match predictions
true_future = y_np[:50]
plt.plot(range(50), true_future, label="Original Sine Wave", linestyle='--')

plt.title("RNN Time-Series Forecasting vs Original Sine Wave - GPU")
plt.xlabel("Time Step")
plt.ylabel("Value")
plt.legend()
plt.show()

Output:
Epoch 0, Loss: 0.5575
Epoch 1, Loss: 0.5393
Epoch 2, Loss: 0.5217
Epoch 3, Loss: 0.5047
Epoch 4, Loss: 0.4880
Epoch 5, Loss: 0.4716
Epoch 6, Loss: 0.4552
Epoch 7, Loss: 0.4387
Epoch 8, Loss: 0.4219
Epoch 9, Loss: 0.4046
Epoch 10, Loss: 0.3868
Epoch 11, Loss: 0.3681
Epoch 12, Loss: 0.3486
Epoch 13, Loss: 0.3278
Epoch 14, Loss: 0.3058
Epoch 15, Loss: 0.2822
Epoch 16, Loss: 0.2569
Epoch 17, Loss: 0.2297
Epoch 18, Loss: 0.2005
Epoch 19, Loss: 0.1693
Epoch 20, Loss: 0.1362
Epoch 21, Loss: 0.1019
Epoch 22, Loss: 0.0678
Epoch 23, Loss: 0.0364
Epoch 24, Loss: 0.0122
Epoch 25, Loss: 0.0005
Epoch 26, Loss: 0.0053
Epoch 27, Loss: 0.0233
Epoch 28, Loss: 0.0423
Epoch 29, Loss: 0.0521
predictions from rnn model - colabcodes

Conclusion

Recurrent Neural Networks represent one of the most important milestones in the evolution of sequence modeling, introducing the idea that neural systems can retain and update memory across time. Through both theory and hands-on implementation, this guide has shown how RNNs process sequential data, how temporal dependencies are encoded via hidden states, and why challenges such as vanishing and exploding gradients naturally arise from their recursive structure. By building a complete RNN pipeline in Python using PyTorch—from data generation and GPU-enabled training to autoregressive forecasting and visualization—you have seen how abstract mathematical concepts translate into practical, working systems.

While modern architectures like LSTMs, GRUs, and Transformers have extended and, in many cases, surpassed vanilla RNNs, the foundational ideas explored here remain essential for understanding sequence learning as a whole. RNNs continue to be valuable in resource-constrained environments, real-time systems, and educational settings where interpretability and architectural simplicity matter. Mastering these fundamentals not only strengthens intuition for advanced models but also equips you to make informed design choices across time-series analysis, natural language processing, and sequential decision-making tasks in real-world machine learning applications.

Get in touch for customized mentorship, research and freelance solutions tailored to your needs.

bottom of page