Functional Modes of Large Language Models (LLMs) – Explained with Gemini API Examples
- Samul Black

- Oct 16
- 10 min read
Large Language Models (LLMs) are no longer limited to generating text. They can reason, code, perceive, plan, and act. Frameworks like Gemini (by Google DeepMind) represent a new generation of multi-functional, multimodal AI systems — capable of operating in diverse modes ranging from text generation and code reasoning to function calling and autonomous agentic behavior.
This article explores the main functional modes of LLMs, focusing on how they work conceptually and how developers can use the Gemini API to implement each mode.
By the end of this blog, you’ll have learned:
The core operational modes of modern LLMs
The theoretical foundations behind each mode
Step-by-step Python code examples for Gemini API
How to integrate multiple modes (like embeddings + multimodal + tool calling)
Real-world use cases and best practices
This tutorial blends deep theoretical explanation with hands-on coding, ideal for developers who want to move from “chatbot-level” AI to system-level intelligent applications.

Theoretical Aspects: Understanding Functional Modes of LLMs
To truly harness the power of modern LLMs, it’s essential to understand how they operate internally. Functional modes define the behavior and capabilities of a model — from generating natural language to executing complex, multi-step tasks. In this section, we’ll explore the theory behind each mode, the architectural principles that enable them, and how these concepts translate into real-world AI applications.
1. What Are Functional Modes?
A functional mode defines how an LLM processes information and interacts with users or systems. It represents a behavioural layer built over the model’s core transformer architecture.
An LLM like Gemini can operate in several modes — for example,
reading and generating natural language (Text Mode),
writing code (Code Mode),
transforming text into numerical embeddings (Embeddings Mode),
understanding images and audio (Multimodal Mode),
performing structured tasks (Function Calling Mode)
reasoning over long documents (Long Context Mode).
Each mode uses the same model weights but activates different pathways of reasoning and input-output formatting.
2. Architecture Behind These Modes
Let’s briefly explore how a model like Gemini supports multiple modes internally:
(a) Shared Transformer Backbone
At the core, Gemini uses a multimodal transformer that processes tokens from text, image, or audio streams in a unified embedding space.Each token — textual, visual, or auditory — is represented as a vector, allowing cross-modal reasoning.
(b) Adapters and Specialized Heads
Different functional modes correspond to specialised “heads” or “adapters” attached to the main transformer.For instance:
Text head → optimized for natural language understanding and generation
Code head → fine-tuned on code corpora (GitHub, StackOverflow)
Embedding head → outputs dense vector representations
Multimodal encoders → handle non-text input like images
(c) Function Calling and Structured Output Layer
Modern LLM APIs support structured outputs using JSON schemas. This allows models to interface with external tools safely, enabling action-taking modes.
3. Gemini vs Traditional LLMs
Feature | Traditional LLM (GPT-3, LLaMA) | Gemini (Multimodal LLM) |
Input Types | Text only | Text, Image, Audio, Video |
Modes Supported | Text, Code | Text, Code, Embeddings, Multimodal, Function Calling, Long Context |
Context Length | Up to 32K | Up to 1 Million tokens |
Tool Use | Limited | Natively supports structured function calls |
Architecture | Text-based transformer | Multimodal transformer + cross-attention layers |
4. Developer API Structure
Gemini’s Python API follows a simple structure:
from google import genai
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Hello Gemini!")
print(response.text)Different model variants correspond to specific modes:
gemini-1.5-pro → Text, Code, Multimodal
embedding-001 → Embeddings generation
Text Mode – The Foundation of Reasoning and Communication
Text mode is the heart of every LLM. It enables language comprehension, generation, summarization, translation, and reasoning.
Gemini’s text mode can handle multi-turn conversations, logical queries, creative writing, and even structured tasks using plain text input.
Code Example
from google import genai
model = genai.GenerativeModel("gemini-1.5-pro")
prompt = """
Explain reinforcement learning in simple terms.
Provide a real-world analogy and an example in Python.
"""
response = model.generate_content(prompt)
print(response.text)
Here, Gemini interprets the query, reasons over the topic, and outputs a coherent answer.Internally, the model uses self-attention to weigh different parts of the input and generate contextually consistent text. Use cases:
Conversational agents and chatbots
Educational tutors
Report summarisation
Email or document generation
Code Mode – LLMs as Intelligent Developers
LLMs trained on large code datasets can act as programming assistants. Gemini’s code mode supports multiple programming languages, algorithmic reasoning, and even explaining existing code.
Code Example
code_request = """
Write a Python function that performs linear regression
from scratch using NumPy. Include comments.
"""
response = model.generate_content(code_request)
print(response.text)
Generated Output Example
import numpy as np
def linear_regression(X, y):
X_b = np.c_[np.ones((len(X), 1)), X] # Add bias term
theta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
return theta
# Example use
X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 6, 8])
print(linear_regression(X, y))
Use Cases
AI-assisted coding
Debugging support
Code documentation generation
Learning programming concepts interactively
Embeddings Mode – Semantic Search and Knowledge Integration
Embeddings mode converts text or multimodal input into high-dimensional numeric vectors that represent semantic meaning. This enables similarity search, RAG (Retrieval-Augmented Generation), and clustering.
Code Example
embedding_model = genai.GenerativeModel("embedding-001")
text = "Machine learning models that learn from experience."
embedding = embedding_model.embed_content(text)
print(embedding['embedding'][:10]) # Display first 10 dimensionsEach text input is mapped to a point in an n-dimensional semantic space.Two similar sentences will produce vectors that are close in distance, enabling context retrieval in chatbots or AI assistants. Use case example: RAG with gemini
Convert all your documents into embeddings
Store them in a vector database like Pinecone or FAISS
Retrieve the top relevant chunks using cosine similarity
Feed them back into Gemini for context-aware response generation
Multimodal Mode – Unified Understanding of Text, Image, and Audio
Multimodal LLMs like Gemini can perceive multiple input types at once. This unlocks applications like image captioning, document parsing, diagram reasoning, and visual Q&A.
Code Example
with open("chart.png", "rb") as f:
image_bytes = f.read()
response = model.generate_content([
{"role": "user", "parts": [
{"text": "Explain what this chart represents:"},
{"file_data": {"mime_type": "image/png", "data": image_bytes}}
]}
])
print(response.text)Gemini uses an internal cross-attention mechanism to connect visual and textual tokens, allowing reasoning like:
“The x-axis shows time, and the y-axis shows sales — the upward trend indicates growth.”
Use Cases
Visual question answering
Automatic document analysis (charts, receipts, PDFs)
Educational visual tutoring
Content moderation and captioning
Function Calling Mode – Connecting Language Models with Real-World Actions
The Function Calling mode is one of the most powerful evolutions in modern LLMs. It allows models like Gemini to go beyond text generation and produce structured, machine-readable outputs — typically in JSON format — that can be used to trigger API calls, database queries, or system actions.
Instead of only returning text like “Sure, I’ll send an email,” the model outputs something like:
{
"function": "send_email",
"arguments": {
"to": "samultechie@gmail.com",
"subject": "Meeting Rescheduled",
"body": "The meeting is rescheduled to 4 PM."
}
}This structured reasoning capability bridges natural language understanding with programmatic control — enabling AI-driven automation systems and intelligent backend integrations.
At the architecture level, function calling is achieved through structured prompting and fine-tuning with schema-conditioned outputs.
When the model detects that an instruction can be resolved via an API or a system tool, it formats its output according to a JSON schema or tool definition you provide.
For example:
In a booking bot, functions might include book_flight, check_weather, cancel_reservation.
In a developer assistant, functions might include run_code, search_docs, or analyze_error.
This mode turns an LLM into an intelligent controller for external systems.
Gemini API Example: Function Calling
Currently, Google’s Gemini Python SDK allows structured prompting. You can instruct Gemini to output JSON objects that your Python app can parse and route to an external function.
from google import genai
import json
model = genai.GenerativeModel("gemini-1.5-pro")
prompt = """
You are a task automation assistant.
Whenever I ask to perform an action, respond with a JSON object
containing 'function' and 'arguments'.
User: Send an email to Samul saying the meeting is at 4 PM.
"""
response = model.generate_content(prompt)
try:
data = json.loads(response.text)
print(f"Function to execute: {data['function']}")
print(f"Arguments: {data['arguments']}")
except:
print("Raw output:", response.text)Explanation
The model interprets the natural language command.
It follows your instruction to return structured data.
Your backend parses this JSON and routes it to an execution handler (e.g., email sender or calendar updater).
This forms the backbone of agentic LLM systems, where models act as the “brains” and external functions as the “hands.”
Real-World Use Cases
Task automation bots: Automate calendar scheduling, emails, or Slack updates.
Conversational RPA (Robotic Process Automation): Gemini acts as the reasoning layer on top of RPA workflows.
Customer service orchestration: Dynamically call CRM APIs or ticketing systems.
Data pipelines: Execute SQL queries or API requests based on natural queries.
Example: Email Automation
def send_email(to, subject, body):
print(f"Sending Email to: {to}\nSubject: {subject}\nBody: {body}")
response_json = {
"function": "send_email",
"arguments": {
"to": "samuel@colabcodes.com",
"subject": "Project Update",
"body": "The deployment is complete and live on the server."
}
}
# Execute dynamically
if response_json["function"] == "send_email":
args = response_json["arguments"]
send_email(args["to"], args["subject"], args["body"])In a production workflow, you would parse Gemini’s JSON output and route it automatically using a dispatcher or orchestrator class.
Best Practices
Always define a clear schema and instruct Gemini to follow it strictly.
Validate the output with json.loads() before executing.
Use fallback prompts if Gemini returns free text instead of JSON.
Combine this mode with Embeddings Mode for context-aware actions.
Long Context Mode – Understanding and Reasoning Over Massive Inputs
Traditional LLMs could handle a few thousand tokens, which limited their usefulness for document-heavy tasks. Gemini 1.5 revolutionises this with up to 1 million tokens of context, allowing the model to process entire codebases, research papers, or legal documents in a single query.
This is called Long Context Mode. Long Context enables new categories of AI systems:
Research summarization (analyzing multiple academic papers)
Enterprise intelligence (reasoning over reports, contracts, or logs)
Codebase understanding (full repository analysis)
Memory-enhanced chatbots (long-term context retention)
It essentially turns an LLM into a persistent cognitive engine capable of in-depth reasoning across extensive information.
To support long context efficiently, Gemini employs:
Sparse Attention Mechanisms – the model doesn’t attend to every token but learns to focus selectively.
Segmented Memory Encoding – divides long text into sections with hierarchical summaries.
Large Language Models (LLMs) are no longer limited to generating text. They can reason, code, perceive, plan, and act. Frameworks like Gemini (by Google DeepMind) represent a new generation of multi-functional, multimodal AI systems — capable of operating in diverse modes ranging from text generation and code reasoning to function calling and autonomous agentic behavior.
Integrating LLM Modes & Real-World AI Projects with Gemini API
By now, you understand the individual functional modes of LLMs like Gemini: Text, Code, Embeddings, Multimodal, Function Calling, Long Context, and Agentic.However, real-world AI systems rarely use just one mode in isolation. The true power of LLMs emerges when multiple modes are orchestrated together.
This section explores:
Mode integration for complex workflows
RAG (Retrieval-Augmented Generation) with Gemini
Multimodal + Function Calling pipelines
Gemini vs GPT: mode capabilities and trade-offs
Mini-project examples to illustrate practical applications
Best practices for production-ready AI systems
Integrating Multiple Functional Modes
Integrating multiple functional modes allows Large Language Models (LLMs) to operate seamlessly across reasoning, generation, coding, and tool-using capabilities within a single workflow. This fusion enables dynamic task handling — for example, a model can analyze data, generate insights, and execute function calls in one coherent interaction.
Using LLMs in multi-step workflows unlocks advanced applications, such as:
AI research assistants that retrieve papers, summarize, and plan follow-ups
Customer support systems that combine text reasoning, knowledge base search, and API-driven actions
Autonomous agents that analyze multimodal data and take structured actions
Each mode acts as a modular component that can be orchestrated in a pipeline.
Example Architecture: RAG + Function Calling + Multimodal
Input: User asks a question, optionally including an image or document
Embeddings Mode: Convert text/document to vectors for semantic search
RAG: Retrieve top relevant documents or knowledge snippets
Long Context Mode: Summarize or reason over retrieved content
Function Calling Mode: Generate structured outputs for action (like sending an email or storing data)
Multimodal Mode: If the input includes an image, process it alongside text
This architecture forms a flexible, multi-modal AI system capable of reasoning, acting, and interacting intelligently.
Python Example: Multi-Mode Pipeline
from google import genai
import json
model = genai.GenerativeModel("gemini-1.5-pro")
embedding_model = genai.GenerativeModel("embedding-001")
# Step 1: Convert user query to embedding
query = "Summarize the sales trends from this chart and notify the team."
query_embedding = embedding_model.embed_content(query)
# Step 2: Retrieve relevant documents (mock retrieval)
documents = [
"Q1 Sales increased by 20%. Q2 Sales show slight decline...",
"The marketing campaign improved engagement by 15%..."
]
# Step 3: Summarize retrieved documents
context = "\n".join(documents)
summary_prompt = f"Summarize the following data and generate a JSON object for notifications:\n{context}"
summary_response = model.generate_content(summary_prompt)
# Step 4: Parse JSON and trigger function (Function Calling Mode)
try:
action_data = json.loads(summary_response.text)
print(action_data)
except:
print("Output is not JSON, raw response:", summary_response.text)This dummy pipeline is an example of how multiple Gemini modes can be chained in a real-world workflow.
Retrieval-Augmented Generation (RAG) with Gemini
RAG combines embedding-based retrieval with LLM reasoning. Gemini supports RAG pipelines by integrating:
Embeddings for vector search
Text summarization / generation
Function calls for structured outputs
Step-by-Step RAG Workflow
Document Embedding: Convert knowledge base content to vectors.
Semantic Retrieval: Use similarity metric to fetch the most relevant documents.
Contextual Generation: Feed the retrieved documents as additional context to Gemini.
Optional Function Calling: Output results in structured format for automation.
RAG Example with Gemini API
# Mock embeddings and retrieval
docs = [
"Gemini can process text, images, and code.",
"It supports function calling and agentic reasoning for complex tasks."
]
user_query = "Explain Gemini's functional modes in a structured summary."
# Step 1: Convert query to embedding
query_vector = embedding_model.embed_content(user_query)
# Step 2: Retrieve relevant document (mock retrieval)
retrieved_doc = docs[0] # Normally, similarity search selects the best match
# Step 3: Generate response with context
rag_prompt = f"""
You are an AI assistant. Use the following context to answer the user query.
Context: {retrieved_doc}
User Query: {user_query}
Provide a structured JSON summary.
"""
response = model.generate_content(rag_prompt)
print(response.text)
Multimodal + Function Calling Pipeline
Gemini allows combining images, text, and function calls in one pipeline.
Example: Chart Analysis and Notification
with open("sales_chart.png", "rb") as f:
chart_bytes = f.read()
prompt = [
{"role": "user", "parts": [
{"text": "Analyze this sales chart and suggest actions for the marketing team."},
{"file_data": {"mime_type": "image/png", "data": chart_bytes}}
]}
]
response = model.generate_content(prompt)
print(response.text)The LLM can return a JSON object specifying actionable insights, which your backend can execute.
Conclusion
Large Language Models like Gemini have evolved far beyond simple text generation. Today, they are multi-functional cognitive engines capable of understanding, reasoning, coding, perceiving, planning, and acting. By leveraging different functional modes — Text, Code, Embeddings, Multimodal, Function Calling, Long Context, and Agentic — developers can build AI systems that are not only responsive but also autonomous, context-aware, and action-driven.
The key takeaway is that each mode is a tool, and the true power of modern LLMs lies in integrating these modes to create sophisticated pipelines:
Text Mode powers reasoning and conversation.
Code Mode enables AI-assisted programming.
Embeddings Mode drives semantic search and RAG workflows.
Multimodal Mode allows LLMs to perceive and analyze images, text, and audio together.
Function Calling Mode bridges the model with real-world actions.
Long Context Mode supports in-depth reasoning over massive documents.
Agentic Mode orchestrates multi-step planning and execution.
By combining these capabilities, developers can create research assistants, workflow automation systems, intelligent chatbots, and autonomous AI agents that operate at scale and complexity previously unimaginable.
Gemini API offers a unified, developer-friendly ecosystem for harnessing all these modes — from simple text generation to complex, multi-modal, agentic pipelines. Understanding and leveraging these functional modes allows you to unlock the full potential of modern AI, bridging the gap between human-like reasoning and practical, real-world action.




