Skip to main content
Zeus AI Backend is an Agent server built on FastAPI, responsible for receiving user requests, orchestrating Agent execution, managing tool calls, and node communication. It interacts with the frontend via HTTP REST + SSE, and communicates with remote nodes (browser extensions, desktop applications) via WebSocket + JSON-RPC 2.0.

Overview

  • A single FastAPI instance hosts both REST API and WebSocket gateway
  • Agent invocations are initiated via /api/agent/invoke and return an SSE stream
  • Node tool calls are routed through the WebSocket gateway in JSON-RPC 2.0 (MCP protocol) format
  • Each user has an independent cloud workspace (Supabase Storage) and session state (PostgreSQL Checkpointer)

Components & Data Flow

API Layer (Routing Layer)

The FastAPI application provides the following route modules:
RouteDescriptionProtocol
/api/agent/invokeAgent conversation invocationHTTP POST → SSE Stream
/api/agent/resumeHITL resume executionHTTP POST → SSE Stream
/api/tools/*Tool management CRUDHTTP REST
/api/skill/*Skill managementHTTP REST
/api/knowledge-base/*Knowledge base managementHTTP REST
/api/node/*Node device queriesHTTP REST
/api/scheduled-task/*Scheduled task managementHTTP REST
/ws/extensionBrowser extension WebSocketWebSocket
/ws/desktopDesktop application WebSocketWebSocket
/ws/webWeb client WebSocketWebSocket
/api/feishu/*Feishu channel WebhookHTTP REST
All REST endpoints are authenticated via JWT (verify_jwt_token); WebSocket connections pass client_id / user_id via query parameters.

WebSocket Gateway

The gateway manages all WebSocket connections through ConnectionManager, supporting three node types:
  • Extension / Desktop nodes: Each node is uniquely identified by node_id; a user can have multiple nodes
  • Web clients: Managed by user_id; the same user can have multiple Web connections
  • Tool calls: The Agent initiates JSON-RPC requests via call_tool(); the Gateway routes requests to the corresponding node and awaits responses (Future-based)

Agent Service

The Agent Service is the core orchestration layer, built on the DeepAgents framework (a higher-level wrapper over LangGraph): Service Layer Responsibilities:
ServiceFileResponsibility
BaseServiceservices/base.pyContext initialization, tool assembly, prompt construction, SSE event streaming
AgentServiceservices/agent.pyAgent mode invoke/resume entry point
RAGServiceservices/rag.pyKnowledge base retrieval (vector + BM25 hybrid search)
DocumentServiceservices/document.pyDocument processing and chunking
FeishuServiceservices/feishu.pyFeishu channel integration
SchedulerServiceservices/scheduler.pyScheduled task scheduling

Tool System

Zeus tools are organized into four layers:
LayerSourceRegistrationExecution Location
Built-inutils/tools/built_in/Registered directly in codeAI Backend local
MCPPassed from frontend configlangchain_mcp_adaptersMCP Server (remote)
OAuthPassed from frontend configDynamically built LangChain ToolAI Backend → OAuth API
ConnectorReported by WebSocket nodesBound via SessionManagerRemote nodes (Extension/Desktop)
Connector Tools call chain: Agent → ToolRouter → Gateway → WebSocket → Node → Execute → Return via same path.

Node Management

Node management is handled by three cooperating components:
ComponentDescription
NodeManagerNode registration/deregistration, heartbeat TTL (60s), periodic cleanup (30s)
SessionManagerBinds sessions to specific nodes, supports preferred_node_id specification
ToolRouterRoutes tool calls to the appropriate node based on session binding
Each user can have up to 10 nodes; nodes that miss heartbeats are automatically marked offline and deregistered.

Storage & State

StorageTechnologyPurpose
CheckpointerPostgreSQL (PostgresSaver)Session-level state persistence, supports HITL recovery
WorkspaceSupabase StorageUser files (outputs, uploads, sandbox results)
MemoryPostgreSQL + pgvectorLong-term memory (vectors + metadata), three-tier scoping
Knowledge BasePostgreSQL + pgvector + BM25RAG document storage and hybrid retrieval
CacheRedisWorkspace cache (5min), memory cache (10min)

Connection Lifecycle

WebSocket Node Connection

Agent Invocation Flow

If HITL (Human-in-the-Loop) is enabled, the Agent pauses before executing sensitive tools, sends an InterruptMessage, the frontend displays an approval UI, and after user confirmation, /api/agent/resume is called to continue execution.

Communication Protocols

HTTP SSE (Agent Response Stream)

Agent invocations return text/event-stream. SSE event types:
Event Typedata FieldDescription
text{type, content}Streaming text tokens
tool_call{type, tool_name, tool_args, tool_call_id}Agent initiates a tool call
tool_call_result{type, tool_name, result, ...}Tool execution result
interrupt{type, tool_calls, ...}HITL interrupt, awaiting user approval
complete{type, finish_reason}Stream ended
error{type, error}Error message

WebSocket JSON-RPC 2.0 (Node Tool Calls)

Node tool calls follow the MCP (Model Context Protocol) specification: Request:
{
  "jsonrpc": "2.0",
  "id": "uuid-string",
  "method": "tools/call",
  "params": {
    "name": "browser_click",
    "arguments": { "selector": "#submit-btn" },
    "session_id": "session_abc123"
  }
}
Response (Success):
{
  "jsonrpc": "2.0",
  "id": "uuid-string",
  "result": {
    "content": [
      { "type": "text", "text": "Clicked element successfully" },
      { "type": "image", "data": "base64...", "mimeType": "image/jpeg" }
    ],
    "isError": false
  }
}
Response (Error):
{
  "jsonrpc": "2.0",
  "id": "uuid-string",
  "error": {
    "code": -32603,
    "message": "Element not found"
  }
}

WebSocket Message Types (Non JSON-RPC)

DirectiontypeDescription
Node → GatewayregisterNode registration (capabilities, tools)
Gateway → NoderegisteredRegistration confirmation
Node → GatewayheartbeatHeartbeat report (status, current_tasks)
Gateway → Nodeheartbeat_ackHeartbeat acknowledgment
Node ↔ Gatewayping / pongKeepalive
Web → Gatewayget_workflowsRequest workflow list (forwarded to Extension)
Web → Gatewayexecute_workflowExecute workflow
Node → Gatewaytask_completeWorkflow execution completed

Startup & Lifecycle

Startup Sequence

Key Environment Variables

VariableDescription
DATABASE_URLPostgreSQL connection string (Checkpointer + pgvector)
SUPABASE_URL / SUPABASE_SERVICE_KEYSupabase Storage configuration
NEXTJS_API_URLNext.js backend API (user data, configuration)
OPENAI_API_KEY / OPENAI_BASE_URLDefault LLM configuration
REDIS_URLRedis cache (optional)
LANGCHAIN_API_KEYLangSmith tracing (optional)

Health Checks

  • GET /health{"status": "ok"}
  • GET /{"name": "Zeus Backend API", "version": "1.0.0", "status": "running"}

System Invariants

  • JWT Authentication: All REST APIs require a valid JWT Token
  • Session Isolation: Each session_id has independent Checkpointer state; different sessions do not interfere
  • Node Heartbeat: Nodes that miss heartbeats for over 60 seconds are automatically marked offline; the Gateway immediately deregisters nodes on disconnect
  • Tool Call Timeout: WebSocket tool calls default to 60-second timeout; workflow execution has a 300-second timeout
  • SSE Non-Replay: Agent invocation SSE streams are one-time; after disconnect, context must be restored via Checkpointer
  • Single-Instance Gateway: The current ConnectionManager is a per-process singleton; WebSocket connections are not shared across processes

Directory Structure

apps/ai-backend/src/
├── api/                          # FastAPI routing layer
│   ├── main.py                   # Entry point & lifecycle
│   ├── gateway.py                # WebSocket gateway
│   ├── agent.py                  # Agent API
│   ├── tools.py                  # Tool management API
│   ├── skill.py                  # Skill management API
│   ├── knowledge_base.py         # Knowledge base API
│   ├── node.py                   # Node query API
│   ├── scheduled_task.py         # Scheduled task API
│   └── channels/                 # Channels (Feishu)
├── services/                     # Business logic layer
│   ├── base.py                   # BaseService (Agent core)
│   ├── agent.py                  # AgentService
│   ├── rag.py                    # RAG retrieval
│   ├── document.py               # Document processing
│   ├── feishu.py                 # Feishu service
│   └── scheduler.py              # Scheduled tasks
├── repository/                   # Data models & prompts
│   ├── models/                   # Pydantic Models
│   ├── prompts/                  # System Prompts (.md)
│   └── skills/                   # Skill definitions
└── utils/                        # Utility layer
    ├── core/                     # LLM, Memory, Skills, HITL
    ├── infra/                    # Backend, Checkpoint, Node, Redis, Auth
    ├── tools/                    # Tool implementations
    │   ├── built_in/             # Built-in tools
    │   └── oauth/                # OAuth tools
    ├── knowledge_base/           # Vector storage & chunking
    └── channels/                 # Channel integrations