Building Stateful AI Workflows with LangGraph in Python
- Samul Black

- 12 minutes ago
- 12 min read
Modern AI systems rarely succeed as single prompt–response interactions. Real-world applications such as research assistants, coding mentors, analytical agents, and decision-support systems demand state, control flow, and reliability across multiple reasoning steps. As large language models grow more capable, the challenge has shifted from generating text to orchestrating intelligence in a structured, auditable, and maintainable way.
In this article, we explore how to build stateful AI workflows with LangGraph, starting from core concepts and progressing to practical design patterns. You will learn how LangGraph supports multi-step reasoning, conditional branching, human-in-the-loop execution, and multi-agent collaboration. The focus remains on engineering robust systems that move beyond demos and toward production-ready AI applications used in research, education, and enterprise environments.

What Is LangGraph and Why It Matters
LangGraph is a framework designed to build stateful, multi-step AI workflows using large language models in a structured and deterministic manner. Instead of relying on linear prompt chains or uncontrolled agent loops, LangGraph represents AI logic as a graph, where each node performs a well-defined operation and transitions govern how execution progresses. This approach allows developers to design complex reasoning systems with clear execution paths, shared state, and predictable behavior, making LangGraph particularly suited for advanced AI engineering, research workflows, and production-grade applications.
What LangGraph Is
At its core, LangGraph provides a graph-based execution layer on top of LLMs. Each node in the graph encapsulates a specific task such as reasoning, tool usage, validation, or decision-making, while edges define how control flows between these tasks. A shared state object persists across nodes, enabling the system to accumulate knowledge, track decisions, and adapt behavior as the workflow progresses. This explicit structure makes AI systems easier to reason about, test, and maintain compared to implicit agent-driven approaches.
Importance of Stateful AI Workflows
Modern AI applications rarely operate in a single interaction. Research assistants, coding mentors, analytical agents, and decision-support systems must remember prior steps, intermediate results, and evolving goals. Stateful workflows allow AI systems to build upon earlier context rather than restarting reasoning at every step. This leads to more coherent outputs, improved reliability, and greater alignment with real-world problem-solving processes, where decisions depend on accumulated information and previous outcomes.
Problems Solved by LangGraph
LangGraph addresses several limitations found in traditional LLM pipelines. It reduces unpredictability by enforcing deterministic control flow, limits hallucinations through validation and review nodes, and simplifies debugging by making execution paths explicit. It also enables clean integration of tools, supports multi-agent collaboration through shared state, and allows human oversight at critical points. Together, these capabilities transform LLM-based systems from experimental prototypes into robust, maintainable, and scalable AI workflows suitable for real-world deployment.
Core Concepts of LangGraph
LangGraph is built around a small set of foundational concepts that together enable reliable, stateful AI workflows. By making execution structure explicit, these concepts help developers design LLM systems that are easier to reason about, extend, and operate in real-world environments.
Nodes and Edges
Nodes represent the individual units of work within a LangGraph workflow. Each node is responsible for a specific task such as generating a response, calling a tool, validating output, or making a decision. Edges define how execution moves from one node to another, forming a directed graph that models the workflow logic. This clear separation of responsibilities encourages modular design and allows complex behavior to emerge from simple, well-defined components.
Shared State and Transitions
A central feature of LangGraph is its shared state, which persists across nodes during execution. This state holds information such as intermediate outputs, decisions, tool results, and contextual data accumulated over time. Transitions between nodes can read from and update this shared state, enabling the workflow to evolve as it progresses. By structuring data flow explicitly, LangGraph ensures consistency across steps and avoids the loss of context common in stateless pipelines.
Execution Flow and Control Logic
LangGraph provides fine-grained control over how workflows execute. Instead of fixed, linear paths, execution can branch based on state values, validation results, or external inputs. This allows conditional logic, retries, fallbacks, and termination conditions to be defined directly within the graph. Such control makes AI behavior more deterministic and predictable, which is essential for debugging, evaluation, and production deployment of LLM-powered systems.
Implementing LangGraph in Python for Stateful Research Workflows
In this section, we implement LangGraph in Python by building a research-focused multi-agent workflow that reflects how complex reasoning tasks are approached in practice. The workflow is composed of specialized agents responsible for literature analysis, hypothesis generation, experiment design, and critical review, all coordinated through a shared state and explicit graph transitions. For language generation, we use the open-source mistralai/Mistral-7B-Instruct-v0.2 model, which provides strong instruction-following capabilities while remaining practical to run on modern GPUs. LangGraph serves as the orchestration layer that governs execution order, state propagation, and iterative revision, allowing the system to evolve dynamically based on reviewer feedback. The following subsections walk through this implementation step by step, covering model setup, agent definitions, graph construction, and execution flow.
1. Environment Setup and Model Initialization
We begin by setting up the core dependencies required to build and execute a LangGraph-based workflow in Python. The TypedDict utility from the typing module is used to define a structured and type-safe shared state that flows across agents, ensuring clarity and consistency as the workflow evolves. LangGraph’s StateGraph and END primitives provide the foundation for defining graph nodes, execution order, and termination conditions. For language generation, we load the open-source mistralai/Mistral-7B-Instruct-v0.2 model using the Hugging Face pipeline API, enabling instruction-following text generation without relying on proprietary services or API keys. The device_map="auto" configuration allows the model to be placed efficiently on available hardware, making the setup suitable for both local environments and GPU-backed notebooks such as Google Colab. This initialization step establishes the runtime context for the research agents defined in the subsequent sections.
from typing import TypedDict
from langgraph.graph import StateGraph, END
from transformers import pipeline
# -----------------------------
# Load Open-Source Model
# -----------------------------
generator = pipeline(
"text-generation",
model="mistralai/Mistral-7B-Instruct-v0.2",
device_map="auto"
)2. Defining the Shared Research State
At the core of the workflow is a shared state object that captures and propagates intermediate outputs across all agents in the graph. Using TypedDict, we define a structured ResearchState that explicitly specifies the data produced and consumed at each stage of the research process. This includes the initial research topic, synthesized literature context, the formulated hypothesis, the designed experiment, and the reviewer’s feedback, along with a boolean flag that signals approval or the need for revision. By enforcing a well-defined state schema, LangGraph ensures that each agent operates on a consistent and predictable set of inputs, reducing coupling between components and making the overall workflow easier to reason about, debug, and extend as additional research steps are introduced.
# -----------------------------
# Shared Research State
# -----------------------------
class ResearchState(TypedDict):
research_topic: str
literature: str
hypothesis: str
experiment: str
review: str
approved: bool3. Implementing Research Agents and Review Logic
With the shared state in place, we define a set of research agents as pure Python functions, each responsible for a distinct stage of the research workflow. The literature_agent gathers relevant prior work based on the research topic, providing contextual grounding for downstream reasoning. Building on this context, the hypothesis_agent formulates a clear and testable hypothesis, which is then passed to the experiment_agent to design an experimental procedure. The reviewer_agent evaluates the proposed experiment for clarity, rigor, and feasibility, explicitly encoding its decision into the shared state through an approval flag. This decision drives conditional routing logic via the review_router, enabling the workflow to either terminate upon approval or enter a revision loop handled by the revision_agent. Together, these agents illustrate how LangGraph supports modular reasoning, explicit feedback cycles, and decision-driven execution, closely resembling how real research processes evolve through iterative review and refinement.
# -----------------------------
# Research Agents
# -----------------------------
def literature_agent(state: ResearchState) -> ResearchState:
prompt = f"Get relevant research and prior work for this research topic:\n{state['research_topic']}"
state["literature"] = generator(prompt, max_new_tokens=150)[0]["generated_text"]
return state
def hypothesis_agent(state: ResearchState) -> ResearchState:
prompt = (
f"Based on this literature, formulate a clear hypothesis:\n"
f"{state['literature']}"
)
state["hypothesis"] = generator(prompt, max_new_tokens=100)[0]["generated_text"]
return state
def experiment_agent(state: ResearchState) -> ResearchState:
prompt = (
f"Design and describe an experiment to test this hypothesis:\n"
f"{state['hypothesis']}"
)
state["experiment"] = generator(prompt, max_new_tokens=180)[0]["generated_text"]
return state
def reviewer_agent(state: ResearchState) -> ResearchState:
prompt = (
f"Review the experiment for clarity, rigor, and feasibility. "
f"Respond with APPROVED or REVISE and justification:\n"
f"{state['experiment']}"
)
review = generator(prompt, max_new_tokens=80)[0]["generated_text"]
state["review"] = review
state["approved"] = "APPROVED" in review.upper()
return state
# -----------------------------
# Conditional Routing
# -----------------------------
def review_router(state: ResearchState):
return "end" if state["approved"] else "revise"
def revision_agent(state: ResearchState) -> ResearchState:
prompt = (
f"Revise the experiment based on reviewer feedback:\n"
f"{state['review']}\n\n"
f"Original experiment:\n{state['experiment']}"
)
state["experiment"] = generator(prompt, max_new_tokens=150)[0]["generated_text"]
return state4. Constructing the LangGraph Workflow
The final step is to assemble the research agents into an executable graph using LangGraph’s StateGraph abstraction. We begin by initializing the graph with the shared ResearchState, ensuring that all nodes operate on a consistent state schema. Each agent is then registered as a named node, making its role in the workflow explicit and reusable. The execution flow is defined through directed edges that connect literature analysis to hypothesis generation, experiment design, and subsequent review. To model iterative refinement, conditional edges are introduced at the reviewer node, allowing the graph to either terminate when the experiment is approved or route execution to a revision step before re-evaluation. Once the graph structure is defined, calling compile() produces a runnable workflow that encodes both the reasoning steps and the control logic in a clear, declarative form.
# -----------------------------
# Graph Construction
# -----------------------------
graph = StateGraph(ResearchState)
graph.add_node("literature", literature_agent)
graph.add_node("hypothesis", hypothesis_agent)
graph.add_node("experiment", experiment_agent)
graph.add_node("reviewer", reviewer_agent)
graph.add_node("revise", revision_agent)
graph.set_entry_point("literature")
graph.add_edge("literature", "hypothesis")
graph.add_edge("hypothesis", "experiment")
graph.add_edge("experiment", "reviewer")
graph.add_conditional_edges(
"reviewer",
review_router,
{
"revise": "revise",
"end": END
}
)
graph.add_edge("revise", "reviewer")
workflow = graph.compile()5. Executing the Workflow and Inspecting Results
Once the graph is compiled, the workflow is executed by invoking it with an initial state that specifies the research topic and initializes all other fields. LangGraph then drives execution automatically, passing the shared state through each agent according to the defined graph structure and applying conditional routing at the review stage until a termination condition is met. After execution completes, the final state contains all intermediate and final research artifacts produced by the agents. To present these results clearly, a lightweight output formatter is used to display only the relevant sections in a structured, human-readable form, along with an explicit approval status. This separation between execution and presentation keeps the workflow logic clean while making the results easy to interpret, share, and include directly in notebooks or blog demonstrations.
# -----------------------------
# Run Workflow
# -----------------------------
final_state = workflow.invoke({
"research_topic": "multi-agent LLM systems",
"literature": "",
"hypothesis": "",
"experiment": "",
"review": "",
"approved": False
})
def print_clean_output(state):
ordered_sections = [
("research topic", "research_topic"),
("Literature Review", "literature"),
("Hypothesis", "hypothesis"),
("Experiment Design", "experiment"),
("Reviewer Feedback", "review"),
]
print("\n" + "=" * 60)
print("RESEARCH WORKFLOW RESULT")
print("=" * 60)
for title, key in ordered_sections:
value = state.get(key, "").strip()
if value:
print(f"\n{title}")
print("-" * len(title))
print(value)
print("\nStatus:", "APPROVED ✅" if state.get("approved") else "REVISION REQUIRED ❌")
print("=" * 60)
print_clean_output(final_state)
Output:
============================================================
RESEARCH WORKFLOW RESULT
============================================================
research topic
--------------
multi-agent LLM systems
Literature Review
-----------------
Get relevant research and prior work for this research topic:
multi-agent LLM systems
- "Multiagent Learning in Large-Scale Mobile Ad-hoc Networks: A Survey" by Li et al. (2016). This paper provides a comprehensive survey of multiagent learning methods used in large-scale mobile ad-hoc networks. It covers various approaches, including cooperative learning, competitive learning, and multi-agent reinforcement learning. The authors also discuss the challenges and future directions of multiagent learning in this context.
- "Multi-Agent Learning for Wireless Network Management: Recent Advances and Challenges" by Al-Anazi et al. (2019). This paper presents a survey of multiagent learning techniques for wireless network management, focusing on the challenges and recent
Hypothesis
----------
Based on this literature, formulate a clear hypothesis:
Get relevant research and prior work for this research topic:
multi-agent LLM systems
- "Multiagent Learning in Large-Scale Mobile Ad-hoc Networks: A Survey" by Li et al. (2016). This paper provides a comprehensive survey of multiagent learning methods used in large-scale mobile ad-hoc networks. It covers various approaches, including cooperative learning, competitive learning, and multi-agent reinforcement learning. The authors also discuss the challenges and future directions of multiagent learning in this context.
- "Multi-Agent Learning for Wireless Network Management: Recent Advances and Challenges" by Al-Anazi et al. (2019). This paper presents a survey of multiagent learning techniques for wireless network management, focusing on the challenges and recent advances. The authors discuss various applications, such as resource allocation and channel selection, and provide an overview of popular algorithms, including Q-learning, deep reinforcement learning, and deep Q-networks.
- "Multi-Agent Deep Reinforcement Learning for Wireless Resource Allocation" by Wang et al. (2019). This paper proposes a multi-agent deep reinforcement learning framework for wireless resource allocation in the Internet of Things (IoT) networks. The authors
Experiment Design
-----------------
Design and describe an experiment to test this hypothesis:
Based on this literature, formulate a clear hypothesis:
Get relevant research and prior work for this research topic:
multi-agent LLM systems
- "Multiagent Learning in Large-Scale Mobile Ad-hoc Networks: A Survey" by Li et al. (2016). This paper provides a comprehensive survey of multiagent learning methods used in large-scale mobile ad-hoc networks. It covers various approaches, including cooperative learning, competitive learning, and multi-agent reinforcement learning. The authors also discuss the challenges and future directions of multiagent learning in this context.
- "Multi-Agent Learning for Wireless Network Management: Recent Advances and Challenges" by Al-Anazi et al. (2019). This paper presents a survey of multiagent learning techniques for wireless network management, focusing on the challenges and recent advances. The authors discuss various applications, such as resource allocation and channel selection, and provide an overview of popular algorithms, including Q-learning, deep reinforcement learning, and deep Q-networks.
- "Multi-Agent Deep Reinforcement Learning for Wireless Resource Allocation" by Wang et al. (2019). This paper proposes a multi-agent deep reinforcement learning framework for wireless resource allocation in the Internet of Things (IoT) networks. The authors use deep neural networks to model the network environment and design decentralized Q-learning algorithms for multiple agents. They evaluate the performance of their approach through simulations and compare it to other methods.
Based on the above literature, the following hypothesis can be formulated:
Hypothesis: Multi-agent deep reinforcement learning (MADRL) outperforms traditional centralized approaches for wireless resource allocation in large-scale mobile ad-hoc networks, leading to improved network efficiency and fairness.
To test this hypothesis, the following experiment can be designed:
1. Set up a large-scale mobile ad-hoc network with multiple nodes, each representing a user or a base station.
2. Implement a centralized resource allocation algorithm, such as water-filling or proportional fairness, as a baseline.
3. Implement a multi
Reviewer Feedback
-----------------
Review the experiment for clarity, rigor, and feasibility. Respond with APPROVED or REVISE and justification:
Design and describe an experiment to test this hypothesis:
Based on this literature, formulate a clear hypothesis:
Get relevant research and prior work for this research topic:
multi-agent LLM systems
- "Multiagent Learning in Large-Scale Mobile Ad-hoc Networks: A Survey" by Li et al. (2016). This paper provides a comprehensive survey of multiagent learning methods used in large-scale mobile ad-hoc networks. It covers various approaches, including cooperative learning, competitive learning, and multi-agent reinforcement learning. The authors also discuss the challenges and future directions of multiagent learning in this context.
- "Multi-Agent Learning for Wireless Network Management: Recent Advances and Challenges" by Al-Anazi et al. (2019). This paper presents a survey of multiagent learning techniques for wireless network management, focusing on the challenges and recent advances. The authors discuss various applications, such as resource allocation and channel selection, and provide an overview of popular algorithms, including Q-learning, deep reinforcement learning, and deep Q-networks.
- "Multi-Agent Deep Reinforcement Learning for Wireless Resource Allocation" by Wang et al. (2019). This paper proposes a multi-agent deep reinforcement learning framework for wireless resource allocation in the Internet of Things (IoT) networks. The authors use deep neural networks to model the network environment and design decentralized Q-learning algorithms for multiple agents. They evaluate the performance of their approach through simulations and compare it to other methods.
Based on the above literature, the following hypothesis can be formulated:
Hypothesis: Multi-agent deep reinforcement learning (MADRL) outperforms traditional centralized approaches for wireless resource allocation in large-scale mobile ad-hoc networks, leading to improved network efficiency and fairness.
To test this hypothesis, the following experiment can be designed:
1. Set up a large-scale mobile ad-hoc network with multiple nodes, each representing a user or a base station.
2. Implement a centralized resource allocation algorithm, such as water-filling or proportional fairness, as a baseline.
3. Implement a multi-agent deep reinforcement learning (MADRL) framework, such as decentralized Q-learning or actor-critic methods, for resource allocation among the nodes.
4. Configure the network environment and initial conditions for both algorithms, ensuring fairness and similar network conditions.
5. Run simulations for both algorithms and collect data on network efficiency (throughput, latency, and
Status: APPROVED ✅
============================================================6. Visualizing the Workflow Graph
To make the execution structure easier to understand, LangGraph provides built-in support for visualizing workflows as graphs. By rendering the compiled workflow using a Mermaid diagram, we can clearly inspect how agents are connected, where conditional routing occurs, and how iterative loops are formed within the system. The visualization highlights the overall control flow—from literature analysis through hypothesis generation and experiment design to reviewer-driven revision—making the orchestration logic immediately apparent. This graphical view is especially useful for debugging, teaching, and documentation, as it transforms abstract execution logic into an explicit, inspectable structure that mirrors how the research workflow operates at runtime.
from IPython.display import Image, display
display(Image(workflow.get_graph().draw_mermaid_png()))Output:

Conclusion
LangGraph demonstrates how structured orchestration transforms LLMs from isolated text generators into reliable, stateful agents capable of handling complex, multi-step reasoning tasks. By representing AI workflows as graphs with explicit nodes, shared state, and conditional transitions, developers can build systems that track context, validate outputs, and support iterative improvement. The research agent example illustrates how literature review, hypothesis formulation, experiment design, and reviewer evaluation can be seamlessly integrated into a single workflow that remains auditable, reproducible, and production-ready.
Using open-source models such as mistralai/Mistral-7B-Instruct-v0.2 within this framework shows that stateful AI workflows are not just theoretical—they can be implemented practically on accessible hardware while maintaining modularity and scalability. Whether applied to research, education, or enterprise AI solutions, LangGraph provides a blueprint for moving beyond ad-hoc prompt chains toward structured, maintainable, and trustworthy multi-agent systems.
In essence, LangGraph shifts the focus from generating outputs to orchestrating intelligence—enabling modern AI systems to reason, iterate, and deliver results in a controlled and explainable manner.




