top of page

Learn, Explore & Get Support from Freelance Experts

Welcome to Colabcodes, where innovation drives technology forward. Explore the latest trends, practical programming tutorials, and in-depth insights across software development, AI, ML, NLP and more. Connect with our experienced freelancers and mentors for personalised guidance and support tailored to your needs.

Coding expert help blog - colabcodes

Functional Modes of Large Language Models (LLMs) – Explained with Gemini API Examples

  • Writer: Samul Black
    Samul Black
  • Oct 16
  • 10 min read

Large Language Models (LLMs) are no longer limited to generating text. They can reason, code, perceive, plan, and act. Frameworks like Gemini (by Google DeepMind) represent a new generation of multi-functional, multimodal AI systems — capable of operating in diverse modes ranging from text generation and code reasoning to function calling and autonomous agentic behavior.

This article explores the main functional modes of LLMs, focusing on how they work conceptually and how developers can use the Gemini API to implement each mode.

By the end of this blog, you’ll have learned:


  • The core operational modes of modern LLMs

  • The theoretical foundations behind each mode

  • Step-by-step Python code examples for Gemini API

  • How to integrate multiple modes (like embeddings + multimodal + tool calling)

  • Real-world use cases and best practices


This tutorial blends deep theoretical explanation with hands-on coding, ideal for developers who want to move from “chatbot-level” AI to system-level intelligent applications.


ree

Theoretical Aspects: Understanding Functional Modes of LLMs

To truly harness the power of modern LLMs, it’s essential to understand how they operate internally. Functional modes define the behavior and capabilities of a model — from generating natural language to executing complex, multi-step tasks. In this section, we’ll explore the theory behind each mode, the architectural principles that enable them, and how these concepts translate into real-world AI applications.


1. What Are Functional Modes?

A functional mode defines how an LLM processes information and interacts with users or systems. It represents a behavioural layer built over the model’s core transformer architecture.

An LLM like Gemini can operate in several modes — for example,


  1. reading and generating natural language (Text Mode),

  2. writing code (Code Mode),

  3. transforming text into numerical embeddings (Embeddings Mode),

  4. understanding images and audio (Multimodal Mode),

  5. performing structured tasks (Function Calling Mode)

  6. reasoning over long documents (Long Context Mode).


Each mode uses the same model weights but activates different pathways of reasoning and input-output formatting.


2. Architecture Behind These Modes

Let’s briefly explore how a model like Gemini supports multiple modes internally:


(a) Shared Transformer Backbone

At the core, Gemini uses a multimodal transformer that processes tokens from text, image, or audio streams in a unified embedding space.Each token — textual, visual, or auditory — is represented as a vector, allowing cross-modal reasoning.


(b) Adapters and Specialized Heads

Different functional modes correspond to specialised “heads” or “adapters” attached to the main transformer.For instance:


  1. Text head → optimized for natural language understanding and generation

  2. Code head → fine-tuned on code corpora (GitHub, StackOverflow)

  3. Embedding head → outputs dense vector representations

  4. Multimodal encoders → handle non-text input like images


(c) Function Calling and Structured Output Layer

Modern LLM APIs support structured outputs using JSON schemas. This allows models to interface with external tools safely, enabling action-taking modes.


3. Gemini vs Traditional LLMs

Feature

Traditional LLM (GPT-3, LLaMA)

Gemini (Multimodal LLM)

Input Types

Text only

Text, Image, Audio, Video

Modes Supported

Text, Code

Text, Code, Embeddings, Multimodal, Function Calling, Long Context

Context Length

Up to 32K

Up to 1 Million tokens

Tool Use

Limited

Natively supports structured function calls

Architecture

Text-based transformer

Multimodal transformer + cross-attention layers

4. Developer API Structure

Gemini’s Python API follows a simple structure:

from google import genai

model = genai.GenerativeModel("gemini-1.5-pro")

response = model.generate_content("Hello Gemini!")
print(response.text)

Different model variants correspond to specific modes:


  • gemini-1.5-pro → Text, Code, Multimodal

  • embedding-001 → Embeddings generation


Text Mode – The Foundation of Reasoning and Communication

Text mode is the heart of every LLM. It enables language comprehension, generation, summarization, translation, and reasoning.

Gemini’s text mode can handle multi-turn conversations, logical queries, creative writing, and even structured tasks using plain text input.


Code Example

from google import genai

model = genai.GenerativeModel("gemini-1.5-pro")

prompt = """
Explain reinforcement learning in simple terms. 
Provide a real-world analogy and an example in Python.
"""

response = model.generate_content(prompt)
print(response.text)

Here, Gemini interprets the query, reasons over the topic, and outputs a coherent answer.Internally, the model uses self-attention to weigh different parts of the input and generate contextually consistent text. Use cases:


  1. Conversational agents and chatbots

  2. Educational tutors

  3. Report summarisation

  4. Email or document generation


Code Mode – LLMs as Intelligent Developers

LLMs trained on large code datasets can act as programming assistants. Gemini’s code mode supports multiple programming languages, algorithmic reasoning, and even explaining existing code.


Code Example

code_request = """
Write a Python function that performs linear regression 
from scratch using NumPy. Include comments.
"""

response = model.generate_content(code_request)
print(response.text)

Generated Output Example

import numpy as np

def linear_regression(X, y):
    X_b = np.c_[np.ones((len(X), 1)), X]  # Add bias term
    theta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
    return theta

# Example use
X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 6, 8])
print(linear_regression(X, y))

Use Cases

  • AI-assisted coding

  • Debugging support

  • Code documentation generation

  • Learning programming concepts interactively


Embeddings Mode – Semantic Search and Knowledge Integration

Embeddings mode converts text or multimodal input into high-dimensional numeric vectors that represent semantic meaning. This enables similarity search, RAG (Retrieval-Augmented Generation), and clustering.


Code Example

embedding_model = genai.GenerativeModel("embedding-001")

text = "Machine learning models that learn from experience."
embedding = embedding_model.embed_content(text)
print(embedding['embedding'][:10])  # Display first 10 dimensions

Each text input is mapped to a point in an n-dimensional semantic space.Two similar sentences will produce vectors that are close in distance, enabling context retrieval in chatbots or AI assistants. Use case example: RAG with gemini


  1. Convert all your documents into embeddings

  2. Store them in a vector database like Pinecone or FAISS

  3. Retrieve the top relevant chunks using cosine similarity

  4. Feed them back into Gemini for context-aware response generation


Multimodal Mode – Unified Understanding of Text, Image, and Audio

Multimodal LLMs like Gemini can perceive multiple input types at once. This unlocks applications like image captioning, document parsing, diagram reasoning, and visual Q&A.


Code Example

with open("chart.png", "rb") as f:
    image_bytes = f.read()

response = model.generate_content([
    {"role": "user", "parts": [
        {"text": "Explain what this chart represents:"},
        {"file_data": {"mime_type": "image/png", "data": image_bytes}}
    ]}
])

print(response.text)

Gemini uses an internal cross-attention mechanism to connect visual and textual tokens, allowing reasoning like:

“The x-axis shows time, and the y-axis shows sales — the upward trend indicates growth.”

Use Cases

  • Visual question answering

  • Automatic document analysis (charts, receipts, PDFs)

  • Educational visual tutoring

  • Content moderation and captioning


Function Calling Mode – Connecting Language Models with Real-World Actions

The Function Calling mode is one of the most powerful evolutions in modern LLMs. It allows models like Gemini to go beyond text generation and produce structured, machine-readable outputs — typically in JSON format — that can be used to trigger API calls, database queries, or system actions.

Instead of only returning text like “Sure, I’ll send an email,” the model outputs something like:

{
  "function": "send_email",
  "arguments": {
    "to": "samultechie@gmail.com",
    "subject": "Meeting Rescheduled",
    "body": "The meeting is rescheduled to 4 PM."
  }
}

This structured reasoning capability bridges natural language understanding with programmatic control — enabling AI-driven automation systems and intelligent backend integrations.

At the architecture level, function calling is achieved through structured prompting and fine-tuning with schema-conditioned outputs.

When the model detects that an instruction can be resolved via an API or a system tool, it formats its output according to a JSON schema or tool definition you provide.

For example:


  • In a booking bot, functions might include book_flight, check_weather, cancel_reservation.

  • In a developer assistant, functions might include run_code, search_docs, or analyze_error.


This mode turns an LLM into an intelligent controller for external systems.


Gemini API Example: Function Calling

Currently, Google’s Gemini Python SDK allows structured prompting. You can instruct Gemini to output JSON objects that your Python app can parse and route to an external function.

from google import genai
import json

model = genai.GenerativeModel("gemini-1.5-pro")

prompt = """
You are a task automation assistant. 
Whenever I ask to perform an action, respond with a JSON object
containing 'function' and 'arguments'.

User: Send an email to Samul saying the meeting is at 4 PM.
"""

response = model.generate_content(prompt)

try:
    data = json.loads(response.text)
    print(f"Function to execute: {data['function']}")
    print(f"Arguments: {data['arguments']}")
except:
    print("Raw output:", response.text)

Explanation


  1. The model interprets the natural language command.

  2. It follows your instruction to return structured data.

  3. Your backend parses this JSON and routes it to an execution handler (e.g., email sender or calendar updater).


This forms the backbone of agentic LLM systems, where models act as the “brains” and external functions as the “hands.”


Real-World Use Cases


  1. Task automation bots: Automate calendar scheduling, emails, or Slack updates.

  2. Conversational RPA (Robotic Process Automation): Gemini acts as the reasoning layer on top of RPA workflows.

  3. Customer service orchestration: Dynamically call CRM APIs or ticketing systems.

  4. Data pipelines: Execute SQL queries or API requests based on natural queries.


Example: Email Automation

def send_email(to, subject, body):
    print(f"Sending Email to: {to}\nSubject: {subject}\nBody: {body}")

response_json = {
    "function": "send_email",
    "arguments": {
        "to": "samuel@colabcodes.com",
        "subject": "Project Update",
        "body": "The deployment is complete and live on the server."
    }
}

# Execute dynamically
if response_json["function"] == "send_email":
    args = response_json["arguments"]
    send_email(args["to"], args["subject"], args["body"])

In a production workflow, you would parse Gemini’s JSON output and route it automatically using a dispatcher or orchestrator class.


Best Practices

  • Always define a clear schema and instruct Gemini to follow it strictly.

  • Validate the output with json.loads() before executing.

  • Use fallback prompts if Gemini returns free text instead of JSON.

  • Combine this mode with Embeddings Mode for context-aware actions.


Long Context Mode – Understanding and Reasoning Over Massive Inputs

Traditional LLMs could handle a few thousand tokens, which limited their usefulness for document-heavy tasks. Gemini 1.5 revolutionises this with up to 1 million tokens of context, allowing the model to process entire codebases, research papers, or legal documents in a single query.

This is called Long Context Mode. Long Context enables new categories of AI systems:


  1. Research summarization (analyzing multiple academic papers)

  2. Enterprise intelligence (reasoning over reports, contracts, or logs)

  3. Codebase understanding (full repository analysis)

  4. Memory-enhanced chatbots (long-term context retention)


It essentially turns an LLM into a persistent cognitive engine capable of in-depth reasoning across extensive information.

To support long context efficiently, Gemini employs:


  1. Sparse Attention Mechanisms – the model doesn’t attend to every token but learns to focus selectively.

  2. Segmented Memory Encoding – divides long text into sections with hierarchical summaries.

  3. Large Language Models (LLMs) are no longer limited to generating text. They can reason, code, perceive, plan, and act. Frameworks like Gemini (by Google DeepMind) represent a new generation of multi-functional, multimodal AI systems — capable of operating in diverse modes ranging from text generation and code reasoning to function calling and autonomous agentic behavior.


Integrating LLM Modes & Real-World AI Projects with Gemini API

By now, you understand the individual functional modes of LLMs like Gemini: Text, Code, Embeddings, Multimodal, Function Calling, Long Context, and Agentic.However, real-world AI systems rarely use just one mode in isolation. The true power of LLMs emerges when multiple modes are orchestrated together.

This section explores:


  • Mode integration for complex workflows

  • RAG (Retrieval-Augmented Generation) with Gemini

  • Multimodal + Function Calling pipelines

  • Gemini vs GPT: mode capabilities and trade-offs

  • Mini-project examples to illustrate practical applications

  • Best practices for production-ready AI systems


Integrating Multiple Functional Modes

Integrating multiple functional modes allows Large Language Models (LLMs) to operate seamlessly across reasoning, generation, coding, and tool-using capabilities within a single workflow. This fusion enables dynamic task handling — for example, a model can analyze data, generate insights, and execute function calls in one coherent interaction.


Using LLMs in multi-step workflows unlocks advanced applications, such as:


  • AI research assistants that retrieve papers, summarize, and plan follow-ups

  • Customer support systems that combine text reasoning, knowledge base search, and API-driven actions

  • Autonomous agents that analyze multimodal data and take structured actions


Each mode acts as a modular component that can be orchestrated in a pipeline.


Example Architecture: RAG + Function Calling + Multimodal


  1. Input: User asks a question, optionally including an image or document

  2. Embeddings Mode: Convert text/document to vectors for semantic search

  3. RAG: Retrieve top relevant documents or knowledge snippets

  4. Long Context Mode: Summarize or reason over retrieved content

  5. Function Calling Mode: Generate structured outputs for action (like sending an email or storing data)

  6. Multimodal Mode: If the input includes an image, process it alongside text


This architecture forms a flexible, multi-modal AI system capable of reasoning, acting, and interacting intelligently.


Python Example: Multi-Mode Pipeline

from google import genai
import json

model = genai.GenerativeModel("gemini-1.5-pro")
embedding_model = genai.GenerativeModel("embedding-001")

# Step 1: Convert user query to embedding
query = "Summarize the sales trends from this chart and notify the team."
query_embedding = embedding_model.embed_content(query)

# Step 2: Retrieve relevant documents (mock retrieval)
documents = [
    "Q1 Sales increased by 20%. Q2 Sales show slight decline...",
    "The marketing campaign improved engagement by 15%..."
]

# Step 3: Summarize retrieved documents
context = "\n".join(documents)
summary_prompt = f"Summarize the following data and generate a JSON object for notifications:\n{context}"
summary_response = model.generate_content(summary_prompt)

# Step 4: Parse JSON and trigger function (Function Calling Mode)
try:
    action_data = json.loads(summary_response.text)
    print(action_data)
except:
    print("Output is not JSON, raw response:", summary_response.text)

This dummy pipeline is an example of how multiple Gemini modes can be chained in a real-world workflow.


Retrieval-Augmented Generation (RAG) with Gemini

RAG combines embedding-based retrieval with LLM reasoning. Gemini supports RAG pipelines by integrating:


  • Embeddings for vector search

  • Text summarization / generation

  • Function calls for structured outputs


Step-by-Step RAG Workflow


  1. Document Embedding: Convert knowledge base content to vectors.

  2. Semantic Retrieval: Use similarity metric to fetch the most relevant documents.

  3. Contextual Generation: Feed the retrieved documents as additional context to Gemini.

  4. Optional Function Calling: Output results in structured format for automation.


RAG Example with Gemini API

# Mock embeddings and retrieval
docs = [
    "Gemini can process text, images, and code.",
    "It supports function calling and agentic reasoning for complex tasks."
]
user_query = "Explain Gemini's functional modes in a structured summary."

# Step 1: Convert query to embedding
query_vector = embedding_model.embed_content(user_query)

# Step 2: Retrieve relevant document (mock retrieval)
retrieved_doc = docs[0]  # Normally, similarity search selects the best match

# Step 3: Generate response with context
rag_prompt = f"""
You are an AI assistant. Use the following context to answer the user query.
Context: {retrieved_doc}
User Query: {user_query}
Provide a structured JSON summary.
"""

response = model.generate_content(rag_prompt)
print(response.text)

Multimodal + Function Calling Pipeline

Gemini allows combining images, text, and function calls in one pipeline.

Example: Chart Analysis and Notification

with open("sales_chart.png", "rb") as f:
    chart_bytes = f.read()

prompt = [
    {"role": "user", "parts": [
        {"text": "Analyze this sales chart and suggest actions for the marketing team."},
        {"file_data": {"mime_type": "image/png", "data": chart_bytes}}
    ]}
]

response = model.generate_content(prompt)
print(response.text)

The LLM can return a JSON object specifying actionable insights, which your backend can execute.


Conclusion

Large Language Models like Gemini have evolved far beyond simple text generation. Today, they are multi-functional cognitive engines capable of understanding, reasoning, coding, perceiving, planning, and acting. By leveraging different functional modes — Text, Code, Embeddings, Multimodal, Function Calling, Long Context, and Agentic — developers can build AI systems that are not only responsive but also autonomous, context-aware, and action-driven.

The key takeaway is that each mode is a tool, and the true power of modern LLMs lies in integrating these modes to create sophisticated pipelines:


  • Text Mode powers reasoning and conversation.

  • Code Mode enables AI-assisted programming.

  • Embeddings Mode drives semantic search and RAG workflows.

  • Multimodal Mode allows LLMs to perceive and analyze images, text, and audio together.

  • Function Calling Mode bridges the model with real-world actions.

  • Long Context Mode supports in-depth reasoning over massive documents.

  • Agentic Mode orchestrates multi-step planning and execution.


By combining these capabilities, developers can create research assistants, workflow automation systems, intelligent chatbots, and autonomous AI agents that operate at scale and complexity previously unimaginable.

Gemini API offers a unified, developer-friendly ecosystem for harnessing all these modes — from simple text generation to complex, multi-modal, agentic pipelines. Understanding and leveraging these functional modes allows you to unlock the full potential of modern AI, bridging the gap between human-like reasoning and practical, real-world action.


Get in touch for customized mentorship, research and freelance solutions tailored to your needs.

bottom of page