Overview
“Context” is all the information Zeus sends to the model on every Agent invocation. It is bounded by the model’s context window (token limit). At a high level, the Context is composed of:- System Prompt (built by Zeus): layered prompts, tool descriptions, skill metadata, runtime info, user profile & memories
- Conversation History: user + assistant messages from the current session
- Tool Calls / Results: tool invocation parameters and return values
- Attachments: user-uploaded files and sandbox data
Context is not the same as Memory. Memory can be persisted to disk and loaded later, while Context is the real-time content within the current model window. See Memory for the full memory architecture.
Architecture
Zeus assembles the Context throughBaseService._build_system_prompt(), with each component maintained independently and combined in a fixed order.
System Prompt
Detailed breakdown of each module (CORE, SOUL, TOOLS, WORKFLOW, etc.) and how they are assembled
Memory
Four-layer memory model, profile generation, and how memories are injected into the prompt
What Counts Toward the Context Window
Everything sent to the model counts toward the context window:| Component | Description | Estimate |
|---|---|---|
| System Prompt | Full layered prompt text (CORE + SOUL + TOOLS + WORKFLOW + MEMORY + MODE) | Varies with enabled tools and mode |
| Tool Descriptions | Dynamically generated tool names, descriptions, parameter schemas (JSON) | Proportional to enabled tool count |
| Connector Skills Prompts | Browser Operator / Desktop instructions | Only injected when enabled |
| MCP Prompts | Business rules/templates from MCP servers | Depends on configuration |
| Resource Files | User-uploaded resource file content | Depends on file size |
| Profile Data | User profile + project context | Typically small (< 500 tokens) |
| Retrieved Memories | Semantically retrieved Top K memory entries | Up to ~500 tokens (max 2000 chars) |
| Conversation History | Recent 30 messages | Grows with conversation |
| Tool Calls + Results | Tool call parameters and return values | Depends on tool usage frequency and result size |
| Attachments | File attachment content | Depends on attachment content |
System Prompt Assembly
The System Prompt is the largest and most complex component of the Context. It is rebuilt byBaseService._build_system_prompt() on every Agent invocation.
The assembly follows a fixed 15-step pipeline — loading layered prompt modules, injecting dynamic content (time, skills, connectors, MCP prompts, resources), and appending personalization data (profile and memories).
System Prompt — Full Module Breakdown
How each module (CORE, SOUL, TOOLS, WORKFLOW, MEMORY, MODE) is structured, how tool descriptions are injected, and how mode prompts control Agent behavior
Tool Description Injection
TOOLS.md contains a {tools_description} placeholder that is dynamically filled at build time. The system iterates over all enabled tools, extracts their names, descriptions, and parameter schemas, and generates a formatted tool description list.
Tools are loaded by category, each with different enablement conditions:
| Tool Category | Enable Condition | Examples |
|---|---|---|
| Built-in Middleware | Always enabled | TodoList, SubAgent, Filesystem |
| MCP Tools | User-configured | Third-party MCP server tools |
| OAuth Tools | User-authorized | GitHub, Gmail |
| Browser/Desktop | Connector configured | Browser Operator, Desktop Operator |
| Sandbox Tools | When sandbox is enabled | execute_code, create_sandbox |
| RAG Tools | When knowledge base is linked | search_knowledge_base |
| Memory Tools | Default in Agent mode | memory_save, memory_search |
| Web Search Tools | When configured | web_search |
| Skill Tools | Always enabled | load_skill |
Profile & Memory Injection
At the end of the System Prompt assembly, Zeus fetches the user’s profile and semantically relevant memories from the Memory system, then appends them as structured text. This allows the Agent to be aware of user preferences, project context, and historical information without explicit retrieval.| Parameter | Default | Description |
|---|---|---|
| Top K | 5 | Return the 5 most relevant memories |
| Max Length | 2000 chars | Maximum formatted length |
| Scopes | user + project + session | Multi-scope merged retrieval |
| Ranking | similarity×0.5 + confidence×0.3 + scope_priority×0.2 | Weighted scoring |
Memory — Retrieval & Ranking Details
How memories are stored, retrieved, ranked, and the complete Memory Gate pipeline
Conversation History
How It’s Built
Conversation history is constructed by_build_chat_history(), converting raw messages into LangChain format:
| Message Type | LangChain Type | Description |
|---|---|---|
| User message | HumanMessage | Text sent by the user |
| Assistant message | AIMessage | Agent’s reply |
| System message | SystemMessage | System-level instructions |
Truncation Strategy
| Parameter | Default | Description |
|---|---|---|
max_messages | 30 | Keep the most recent 30 messages |
| Storage | LangGraph Checkpointer | Associated with session via thread_id |
SummarizationMiddleware (see below).
Token Management
Model Profiles
Zeus ships with 200+ model token-limit configurations. Common examples:| Model | max_input_tokens | max_output_tokens |
|---|---|---|
| gpt-4-turbo | 128,000 | 4,096 |
| gpt-4o | 128,000 | 16,384 |
| claude-3-5-sonnet | 200,000 | 8,192 |
64,000 tokens is used.
SummarizationMiddleware
When context approaches the model’s token limit,SummarizationMiddleware (provided by the DeepAgents framework) automatically triggers conversation history summarization:
| Parameter | Value | Description |
|---|---|---|
| Trigger threshold | max_input_tokens × 0.85 | Triggers at 85% usage |
| Retention policy | keep_recent | Keep recent messages |
| Retention count | 5 messages | Last 5 messages are not summarized |
| Summarization model | Same as primary model | Uses the same LLM to generate summaries |
Prompt Caching
For Anthropic models,AnthropicPromptCachingMiddleware enables prompt caching to reduce token billing for repeated System Prompt content. The caching mechanism is natively supported by the Anthropic API, caching the static portions of the System Prompt (CORE, SOUL, etc.) and significantly reducing token consumption across multi-turn conversations.
Complete Data Flow
The following diagram shows the full lifecycle of Context in a single Agent invocation — from assembly to consumption:Context Changes by Phase
| Phase | Context Content | Token Growth |
|---|---|---|
| Phase 1 | System Prompt + History | Base cost (fixed + history) |
| Phase 2 | + Tool Schemas (JSON) | Tool count × schema size |
| Phase 3 | + Tool Calls + Results | Depends on tool usage frequency and result size |
Mode-Based Context
Different modes produce different Context compositions:| Component | Agent Mode | Ask Mode | Plan Mode |
|---|---|---|---|
| Core prompts (CORE/SOUL) | Full | Full | Full |
| TOOLS.md | All tools | Read-only tools | Read-only tools |
| WORKFLOW.md | Full | Simplified | Planning-specific |
| MEMORY.md | Full | Simplified | Simplified |
| Mode prompt | agent_mode.md | ask_mode.md | plan_mode.md |
| Memory Tools | Enabled | Restricted | Restricted |
| Write tools | Enabled | Disabled | Disabled |
| RAG Tools | Enabled | Enabled | Enabled |
| Profile injection | Injected | Injected | Injected |
| Memory injection | Injected | Injected | Injected |
Optimization Strategies
1. Progressive Disclosure
Problem: Skills and Connector prompts can be very long; injecting all of them wastes context space. Solution: Only inject metadata summaries; the Agent loads full content on-demand via tools.2. Agentic RAG (On-Demand Retrieval)
Problem: Knowledge base content can be massive; pre-injection is impractical. Solution: The knowledge base is exposed as a tool. The Agent decides when and what to retrieve.3. Automatic Summarization (SummarizationMiddleware)
Problem: Long conversations cause history to consume large amounts of context space. Solution: When token usage hits the threshold, older messages are automatically compressed into summaries.4. History Truncation
Problem: Session messages grow without bound. Solution: Only the most recent 30 messages are kept, combined with LangGraph Checkpoint for full history persistence.5. Prompt Caching
Problem: The System Prompt is mostly unchanged across turns but is billed every time. Solution: Anthropic models useAnthropicPromptCachingMiddleware to cache static portions.