Context

Overview

“Context” is all the information Zeus sends to the model on every Agent invocation. It is bounded by the model’s context window (token limit). At a high level, the Context is composed of:

System Prompt (built by Zeus): layered prompts, tool descriptions, skill metadata, runtime info, user profile & memories
Conversation History: user + assistant messages from the current session
Tool Calls / Results: tool invocation parameters and return values
Attachments: user-uploaded files and sandbox data

Context is not the same as Memory. Memory can be persisted to disk and loaded later, while Context is the real-time content within the current model window. See Memory for the full memory architecture.

Architecture

Zeus assembles the Context through BaseService._build_system_prompt(), with each component maintained independently and combined in a fixed order.

System Prompt

Detailed breakdown of each module (CORE, SOUL, TOOLS, WORKFLOW, etc.) and how they are assembled

Memory

Four-layer memory model, profile generation, and how memories are injected into the prompt

What Counts Toward the Context Window

Everything sent to the model counts toward the context window:

Component	Description	Estimate
System Prompt	Full layered prompt text (CORE + SOUL + TOOLS + WORKFLOW + MEMORY + MODE)	Varies with enabled tools and mode
Tool Descriptions	Dynamically generated tool names, descriptions, parameter schemas (JSON)	Proportional to enabled tool count
Connector Skills Prompts	Browser Operator / Desktop instructions	Only injected when enabled
MCP Prompts	Business rules/templates from MCP servers	Depends on configuration
Resource Files	User-uploaded resource file content	Depends on file size
Profile Data	User profile + project context	Typically small (< 500 tokens)
Retrieved Memories	Semantically retrieved Top K memory entries	Up to ~500 tokens (max 2000 chars)
Conversation History	Recent 30 messages	Grows with conversation
Tool Calls + Results	Tool call parameters and return values	Depends on tool usage frequency and result size
Attachments	File attachment content	Depends on attachment content

System Prompt Assembly

The System Prompt is the largest and most complex component of the Context. It is rebuilt by BaseService._build_system_prompt() on every Agent invocation. The assembly follows a fixed 15-step pipeline — loading layered prompt modules, injecting dynamic content (time, skills, connectors, MCP prompts, resources), and appending personalization data (profile and memories).

System Prompt — Full Module Breakdown

How each module (CORE, SOUL, TOOLS, WORKFLOW, MEMORY, MODE) is structured, how tool descriptions are injected, and how mode prompts control Agent behavior

Tool Description Injection

TOOLS.md contains a {tools_description} placeholder that is dynamically filled at build time. The system iterates over all enabled tools, extracts their names, descriptions, and parameter schemas, and generates a formatted tool description list. Tools are loaded by category, each with different enablement conditions:

Tool Category	Enable Condition	Examples
Built-in Middleware	Always enabled	TodoList, SubAgent, Filesystem
MCP Tools	User-configured	Third-party MCP server tools
OAuth Tools	User-authorized	GitHub, Gmail
Browser/Desktop	Connector configured	Browser Operator, Desktop Operator
Sandbox Tools	When sandbox is enabled	execute_code, create_sandbox
RAG Tools	When knowledge base is linked	search_knowledge_base
Memory Tools	Default in Agent mode	memory_save, memory_search
Web Search Tools	When configured	web_search
Skill Tools	Always enabled	load_skill

Profile & Memory Injection

At the end of the System Prompt assembly, Zeus fetches the user’s profile and semantically relevant memories from the Memory system, then appends them as structured text. This allows the Agent to be aware of user preferences, project context, and historical information without explicit retrieval.

Parameter	Default	Description
Top K	5	Return the 5 most relevant memories
Max Length	2000 chars	Maximum formatted length
Scopes	user + project + session	Multi-scope merged retrieval
Ranking	`similarity×0.5 + confidence×0.3 + scope_priority×0.2`	Weighted scoring

Memory — Retrieval & Ranking Details

How memories are stored, retrieved, ranked, and the complete Memory Gate pipeline

Conversation History

How It’s Built

Conversation history is constructed by _build_chat_history(), converting raw messages into LangChain format:

Message Type	LangChain Type	Description
User message	`HumanMessage`	Text sent by the user
Assistant message	`AIMessage`	Agent’s reply
System message	`SystemMessage`	System-level instructions

Truncation Strategy

Parameter	Default	Description
`max_messages`	30	Keep the most recent 30 messages
Storage	LangGraph Checkpointer	Associated with session via `thread_id`

Messages beyond the 30-message limit are removed by truncation. Automatic summarization is handled by SummarizationMiddleware (see below).

Token Management

Model Profiles

Zeus ships with 200+ model token-limit configurations. Common examples:

Model	max_input_tokens	max_output_tokens
gpt-4-turbo	128,000	4,096
gpt-4o	128,000	16,384
claude-3-5-sonnet	200,000	8,192

When a model is not in the predefined list, a default of 64,000 tokens is used.

SummarizationMiddleware

When context approaches the model’s token limit, SummarizationMiddleware (provided by the DeepAgents framework) automatically triggers conversation history summarization:

Parameter	Value	Description
Trigger threshold	`max_input_tokens × 0.85`	Triggers at 85% usage
Retention policy	`keep_recent`	Keep recent messages
Retention count	5 messages	Last 5 messages are not summarized
Summarization model	Same as primary model	Uses the same LLM to generate summaries

Prompt Caching

For Anthropic models, AnthropicPromptCachingMiddleware enables prompt caching to reduce token billing for repeated System Prompt content. The caching mechanism is natively supported by the Anthropic API, caching the static portions of the System Prompt (CORE, SOUL, etc.) and significantly reducing token consumption across multi-turn conversations.

Complete Data Flow

The following diagram shows the full lifecycle of Context in a single Agent invocation — from assembly to consumption:

Context Changes by Phase

Phase	Context Content	Token Growth
Phase 1	System Prompt + History	Base cost (fixed + history)
Phase 2	+ Tool Schemas (JSON)	Tool count × schema size
Phase 3	+ Tool Calls + Results	Depends on tool usage frequency and result size

Mode-Based Context

Different modes produce different Context compositions:

Component	Agent Mode	Ask Mode	Plan Mode
Core prompts (CORE/SOUL)	Full	Full	Full
TOOLS.md	All tools	Read-only tools	Read-only tools
WORKFLOW.md	Full	Simplified	Planning-specific
MEMORY.md	Full	Simplified	Simplified
Mode prompt	agent_mode.md	ask_mode.md	plan_mode.md
Memory Tools	Enabled	Restricted	Restricted
Write tools	Enabled	Disabled	Disabled
RAG Tools	Enabled	Enabled	Enabled
Profile injection	Injected	Injected	Injected
Memory injection	Injected	Injected	Injected

Ask and Plan modes reduce context consumption by limiting the number of available tools, which in turn reduces Tool Schema size.

Optimization Strategies

1. Progressive Disclosure

Problem: Skills and Connector prompts can be very long; injecting all of them wastes context space. Solution: Only inject metadata summaries; the Agent loads full content on-demand via tools.

2. Agentic RAG (On-Demand Retrieval)

Problem: Knowledge base content can be massive; pre-injection is impractical. Solution: The knowledge base is exposed as a tool. The Agent decides when and what to retrieve.

3. Automatic Summarization (SummarizationMiddleware)

Problem: Long conversations cause history to consume large amounts of context space. Solution: When token usage hits the threshold, older messages are automatically compressed into summaries.

4. History Truncation

Problem: Session messages grow without bound. Solution: Only the most recent 30 messages are kept, combined with LangGraph Checkpoint for full history persistence.

5. Prompt Caching

Problem: The System Prompt is mostly unchanged across turns but is billed every time. Solution: Anthropic models use AnthropicPromptCachingMiddleware to cache static portions.

6. Memory Formatting Limits

Problem: Too many retrieved memories can consume excessive space. Solution: Limit formatted output to max 2000 characters, Top K = 5.

Get Started

Fundamentals

Core Capabilities

Infra

Channels

Overview

Architecture

System Prompt

Memory

What Counts Toward the Context Window

System Prompt Assembly

System Prompt — Full Module Breakdown

Tool Description Injection

Profile & Memory Injection

Memory — Retrieval & Ranking Details

Conversation History

How It’s Built

Truncation Strategy

Token Management

Model Profiles

SummarizationMiddleware

Prompt Caching

Complete Data Flow

Context Changes by Phase

Mode-Based Context

Optimization Strategies

1. Progressive Disclosure

2. Agentic RAG (On-Demand Retrieval)

3. Automatic Summarization (SummarizationMiddleware)

4. History Truncation

5. Prompt Caching

6. Memory Formatting Limits

Get Started

Fundamentals

Core Capabilities

Infra

Channels

​Overview

​Architecture

System Prompt

Memory

​What Counts Toward the Context Window

​System Prompt Assembly

System Prompt — Full Module Breakdown

​Tool Description Injection

​Profile & Memory Injection

Memory — Retrieval & Ranking Details

​Conversation History

​How It’s Built

​Truncation Strategy

​Token Management

​Model Profiles

​SummarizationMiddleware

​Prompt Caching

​Complete Data Flow

​Context Changes by Phase

​Mode-Based Context

​Optimization Strategies

​1. Progressive Disclosure

​2. Agentic RAG (On-Demand Retrieval)

​3. Automatic Summarization (SummarizationMiddleware)

​4. History Truncation

​5. Prompt Caching

​6. Memory Formatting Limits

Overview

Architecture

What Counts Toward the Context Window

System Prompt Assembly

Tool Description Injection

Profile & Memory Injection

Conversation History

How It’s Built

Truncation Strategy

Token Management

Model Profiles

SummarizationMiddleware

Prompt Caching

Complete Data Flow

Context Changes by Phase

Mode-Based Context

Optimization Strategies

1. Progressive Disclosure

2. Agentic RAG (On-Demand Retrieval)

3. Automatic Summarization (SummarizationMiddleware)

4. History Truncation

5. Prompt Caching

6. Memory Formatting Limits