Skip to main content

Overview

“Context” is all the information Zeus sends to the model on every Agent invocation. It is bounded by the model’s context window (token limit). At a high level, the Context is composed of:
  • System Prompt (built by Zeus): layered prompts, tool descriptions, skill metadata, runtime info, user profile & memories
  • Conversation History: user + assistant messages from the current session
  • Tool Calls / Results: tool invocation parameters and return values
  • Attachments: user-uploaded files and sandbox data
Context is not the same as Memory. Memory can be persisted to disk and loaded later, while Context is the real-time content within the current model window. See Memory for the full memory architecture.

Architecture

Zeus assembles the Context through BaseService._build_system_prompt(), with each component maintained independently and combined in a fixed order.

What Counts Toward the Context Window

Everything sent to the model counts toward the context window:
ComponentDescriptionEstimate
System PromptFull layered prompt text (CORE + SOUL + TOOLS + WORKFLOW + MEMORY + MODE)Varies with enabled tools and mode
Tool DescriptionsDynamically generated tool names, descriptions, parameter schemas (JSON)Proportional to enabled tool count
Connector Skills PromptsBrowser Operator / Desktop instructionsOnly injected when enabled
MCP PromptsBusiness rules/templates from MCP serversDepends on configuration
Resource FilesUser-uploaded resource file contentDepends on file size
Profile DataUser profile + project contextTypically small (< 500 tokens)
Retrieved MemoriesSemantically retrieved Top K memory entriesUp to ~500 tokens (max 2000 chars)
Conversation HistoryRecent 30 messagesGrows with conversation
Tool Calls + ResultsTool call parameters and return valuesDepends on tool usage frequency and result size
AttachmentsFile attachment contentDepends on attachment content

System Prompt Assembly

The System Prompt is the largest and most complex component of the Context. It is rebuilt by BaseService._build_system_prompt() on every Agent invocation. The assembly follows a fixed 15-step pipeline — loading layered prompt modules, injecting dynamic content (time, skills, connectors, MCP prompts, resources), and appending personalization data (profile and memories).

System Prompt — Full Module Breakdown

How each module (CORE, SOUL, TOOLS, WORKFLOW, MEMORY, MODE) is structured, how tool descriptions are injected, and how mode prompts control Agent behavior

Tool Description Injection

TOOLS.md contains a {tools_description} placeholder that is dynamically filled at build time. The system iterates over all enabled tools, extracts their names, descriptions, and parameter schemas, and generates a formatted tool description list. Tools are loaded by category, each with different enablement conditions:
Tool CategoryEnable ConditionExamples
Built-in MiddlewareAlways enabledTodoList, SubAgent, Filesystem
MCP ToolsUser-configuredThird-party MCP server tools
OAuth ToolsUser-authorizedGitHub, Gmail
Browser/DesktopConnector configuredBrowser Operator, Desktop Operator
Sandbox ToolsWhen sandbox is enabledexecute_code, create_sandbox
RAG ToolsWhen knowledge base is linkedsearch_knowledge_base
Memory ToolsDefault in Agent modememory_save, memory_search
Web Search ToolsWhen configuredweb_search
Skill ToolsAlways enabledload_skill

Profile & Memory Injection

At the end of the System Prompt assembly, Zeus fetches the user’s profile and semantically relevant memories from the Memory system, then appends them as structured text. This allows the Agent to be aware of user preferences, project context, and historical information without explicit retrieval.
ParameterDefaultDescription
Top K5Return the 5 most relevant memories
Max Length2000 charsMaximum formatted length
Scopesuser + project + sessionMulti-scope merged retrieval
Rankingsimilarity×0.5 + confidence×0.3 + scope_priority×0.2Weighted scoring

Memory — Retrieval & Ranking Details

How memories are stored, retrieved, ranked, and the complete Memory Gate pipeline

Conversation History

How It’s Built

Conversation history is constructed by _build_chat_history(), converting raw messages into LangChain format:
Message TypeLangChain TypeDescription
User messageHumanMessageText sent by the user
Assistant messageAIMessageAgent’s reply
System messageSystemMessageSystem-level instructions

Truncation Strategy

ParameterDefaultDescription
max_messages30Keep the most recent 30 messages
StorageLangGraph CheckpointerAssociated with session via thread_id
Messages beyond the 30-message limit are removed by truncation. Automatic summarization is handled by SummarizationMiddleware (see below).

Token Management

Model Profiles

Zeus ships with 200+ model token-limit configurations. Common examples:
Modelmax_input_tokensmax_output_tokens
gpt-4-turbo128,0004,096
gpt-4o128,00016,384
claude-3-5-sonnet200,0008,192
When a model is not in the predefined list, a default of 64,000 tokens is used.

SummarizationMiddleware

When context approaches the model’s token limit, SummarizationMiddleware (provided by the DeepAgents framework) automatically triggers conversation history summarization:
ParameterValueDescription
Trigger thresholdmax_input_tokens × 0.85Triggers at 85% usage
Retention policykeep_recentKeep recent messages
Retention count5 messagesLast 5 messages are not summarized
Summarization modelSame as primary modelUses the same LLM to generate summaries

Prompt Caching

For Anthropic models, AnthropicPromptCachingMiddleware enables prompt caching to reduce token billing for repeated System Prompt content. The caching mechanism is natively supported by the Anthropic API, caching the static portions of the System Prompt (CORE, SOUL, etc.) and significantly reducing token consumption across multi-turn conversations.

Complete Data Flow

The following diagram shows the full lifecycle of Context in a single Agent invocation — from assembly to consumption:

Context Changes by Phase

PhaseContext ContentToken Growth
Phase 1System Prompt + HistoryBase cost (fixed + history)
Phase 2+ Tool Schemas (JSON)Tool count × schema size
Phase 3+ Tool Calls + ResultsDepends on tool usage frequency and result size

Mode-Based Context

Different modes produce different Context compositions:
ComponentAgent ModeAsk ModePlan Mode
Core prompts (CORE/SOUL)FullFullFull
TOOLS.mdAll toolsRead-only toolsRead-only tools
WORKFLOW.mdFullSimplifiedPlanning-specific
MEMORY.mdFullSimplifiedSimplified
Mode promptagent_mode.mdask_mode.mdplan_mode.md
Memory ToolsEnabledRestrictedRestricted
Write toolsEnabledDisabledDisabled
RAG ToolsEnabledEnabledEnabled
Profile injectionInjectedInjectedInjected
Memory injectionInjectedInjectedInjected
Ask and Plan modes reduce context consumption by limiting the number of available tools, which in turn reduces Tool Schema size.

Optimization Strategies

1. Progressive Disclosure

Problem: Skills and Connector prompts can be very long; injecting all of them wastes context space. Solution: Only inject metadata summaries; the Agent loads full content on-demand via tools.

2. Agentic RAG (On-Demand Retrieval)

Problem: Knowledge base content can be massive; pre-injection is impractical. Solution: The knowledge base is exposed as a tool. The Agent decides when and what to retrieve.

3. Automatic Summarization (SummarizationMiddleware)

Problem: Long conversations cause history to consume large amounts of context space. Solution: When token usage hits the threshold, older messages are automatically compressed into summaries.

4. History Truncation

Problem: Session messages grow without bound. Solution: Only the most recent 30 messages are kept, combined with LangGraph Checkpoint for full history persistence.

5. Prompt Caching

Problem: The System Prompt is mostly unchanged across turns but is billed every time. Solution: Anthropic models use AnthropicPromptCachingMiddleware to cache static portions.

6. Memory Formatting Limits

Problem: Too many retrieved memories can consume excessive space. Solution: Limit formatted output to max 2000 characters, Top K = 5.