Skip to main content
The Agent Loop is a complete Agent run: message reception → context assembly → model reasoning → tool execution → streaming response → state persistence. It is the core path that transforms a user message into actions and a final reply. In Zeus, each Loop is a serialized run per Session, emitting lifecycle events and stream events during model reasoning, tool calls, and streaming output.

Entry Points

EntryRouteDescription
Web frontendPOST /api/agent/invokeNext.js API Route, proxied to Python backend
Python APIPOST /api/agent/invokeFastAPI route, directly calls AgentService
Resume (HITL)POST /api/agent/resumeResume execution after user approval

How It Works (High-level)

  1. Request Reception — Next.js API validates identity, checks credit balance, loads LLM and tool configuration, asynchronously saves user message, forwards to Python backend
  2. Context Assembly_init_context() sequentially loads tools (MCP + OAuth + Built-in), initializes LLM, retrieves Memory/Profile, activates Skills, builds System Prompt, caches to context_cache
  3. Agent Creation — Creates a LangGraph graph via DeepAgents, assembling LLM, tools, middleware pipeline, Checkpointer, and HITL interrupt configuration
  4. Message Construction — Frontend chat_history is converted to LangChain message types (max 30), with current user message appended
  5. Streaming Execution — Enters _astream_events() core loop; framework events are converted to SSE messages and streamed
  6. Completion — Sends CompleteMessage; Checkpointer automatically saves state

Context Assembly

After context assembly completes, it is cached in _context_cache[session_id] for reuse during HITL resume().

Event Streaming

_astream_events() listens to DeepAgents framework internal events and converts them to standard SSE messages for the frontend:

SSE Event Types

SSE EventTrigger TimingKey Fields
textLLM outputs each tokencontent, role
tool_callLLM decides to call a tooltool_name, parameters, requires_approval
tool_call_resultTool execution completetool_name, result, is_error
completeAgent execution finishedcontent, summary
errorException occurrederror, error_code, details
token_usageAfter LLM call endsprompt_tokens, completion_tokens

Messages

Learn about the complete message flow, state management, and persistence

Tool Execution

Execution Decision

Approval decisions are based on Auto-Run mode (Run Everything / Use Allowlist / Ask Everytime). Tool Call IDs are matched via a FIFO queue, stored grouped by tool name; enqueued during on_chat_model_end, dequeued during on_tool_end.

HITL Interrupt & Recovery

When a tool requires approval, the Agent Loop is suspended and state is persisted via Checkpointer. Recovery flow: Rejected tools have a SystemMessage appended, explicitly telling the Agent not to retry.

HITL Details

Complete description of Auto-Run modes, approval UI, and interrupt recovery mechanism

Frontend Processing

The frontend handleStreamMessage() consumes the SSE stream, routing events to corresponding state management:

Event Persistence

RealtimeEventSaver batch-persists real-time events:
ConfigurationValue
Batch size3 events
Batch interval100ms
Retry strategyExponential backoff, max 3 retries
FallbackLocalStorage backup

Error Handling

Backend Errors

Error TypeDetection ConditionUser Message
Input length exceededRange of input length / InvalidParameterSuggest shortening input
Context window overflowcontext length / token limitSuggest starting a new session
General exceptionAll other ExceptionsIncludes traceback details
Errors are sent via ErrorMessage SSE events, containing error_code and details.

Frontend Errors

HTTP Status CodeMeaningHandling
401UnauthorizedRedirect to login
403Insufficient creditsToast notification
503Backend not startedConnection error message
504Request timeoutTimeout message
Stream errors (AbortError, network disconnect, parse errors) all have corresponding exception handling and user notifications.

Timeouts

Timeout ItemDefaultDescription
Agent max execution time7200s (2h)FastAPI single call limit
MCP server1800s (30min)Connection timeout per MCP server
HITL tool approvalConfigurableIndependently set per tool
LangGraph recursion limit999Maximum iteration count for Agent loop

Concurrency & Isolation

  • Each Session has independent Checkpointer state (thread_id isolation)
  • Context Cache is isolated by session_id; resume can only recover the corresponding session
  • Tool execution is serialized (LangGraph guarantees no concurrent tool execution within the same Session)
  • User workspaces are fully isolated by user_id

Where Things Can End Early

  • Agent timeout — Exceeds 7200s maximum execution time
  • HITL approval timeout — User does not respond within the specified time
  • Frontend disconnect — Network interruption or user closes the page
  • Credit exhaustion — Pre-call check fails
  • Model error — Context window overflow or API exception