Overview
- A single FastAPI instance hosts both REST API and WebSocket gateway
- Agent invocations are initiated via
/api/agent/invokeand return an SSE stream - Node tool calls are routed through the WebSocket gateway in JSON-RPC 2.0 (MCP protocol) format
- Each user has an independent cloud workspace (Supabase Storage) and session state (PostgreSQL Checkpointer)
Components & Data Flow
API Layer (Routing Layer)
The FastAPI application provides the following route modules:| Route | Description | Protocol |
|---|---|---|
/api/agent/invoke | Agent conversation invocation | HTTP POST → SSE Stream |
/api/agent/resume | HITL resume execution | HTTP POST → SSE Stream |
/api/tools/* | Tool management CRUD | HTTP REST |
/api/skill/* | Skill management | HTTP REST |
/api/knowledge-base/* | Knowledge base management | HTTP REST |
/api/node/* | Node device queries | HTTP REST |
/api/scheduled-task/* | Scheduled task management | HTTP REST |
/ws/extension | Browser extension WebSocket | WebSocket |
/ws/desktop | Desktop application WebSocket | WebSocket |
/ws/web | Web client WebSocket | WebSocket |
/api/feishu/* | Feishu channel Webhook | HTTP REST |
verify_jwt_token); WebSocket connections pass client_id / user_id via query parameters.
WebSocket Gateway
The gateway manages all WebSocket connections throughConnectionManager, supporting three node types:
- Extension / Desktop nodes: Each node is uniquely identified by
node_id; a user can have multiple nodes - Web clients: Managed by
user_id; the same user can have multiple Web connections - Tool calls: The Agent initiates JSON-RPC requests via
call_tool(); the Gateway routes requests to the corresponding node and awaits responses (Future-based)
Agent Service
The Agent Service is the core orchestration layer, built on the DeepAgents framework (a higher-level wrapper over LangGraph): Service Layer Responsibilities:| Service | File | Responsibility |
|---|---|---|
| BaseService | services/base.py | Context initialization, tool assembly, prompt construction, SSE event streaming |
| AgentService | services/agent.py | Agent mode invoke/resume entry point |
| RAGService | services/rag.py | Knowledge base retrieval (vector + BM25 hybrid search) |
| DocumentService | services/document.py | Document processing and chunking |
| FeishuService | services/feishu.py | Feishu channel integration |
| SchedulerService | services/scheduler.py | Scheduled task scheduling |
Tool System
Zeus tools are organized into four layers:| Layer | Source | Registration | Execution Location |
|---|---|---|---|
| Built-in | utils/tools/built_in/ | Registered directly in code | AI Backend local |
| MCP | Passed from frontend config | langchain_mcp_adapters | MCP Server (remote) |
| OAuth | Passed from frontend config | Dynamically built LangChain Tool | AI Backend → OAuth API |
| Connector | Reported by WebSocket nodes | Bound via SessionManager | Remote nodes (Extension/Desktop) |
Node Management
Node management is handled by three cooperating components:| Component | Description |
|---|---|
| NodeManager | Node registration/deregistration, heartbeat TTL (60s), periodic cleanup (30s) |
| SessionManager | Binds sessions to specific nodes, supports preferred_node_id specification |
| ToolRouter | Routes tool calls to the appropriate node based on session binding |
Storage & State
| Storage | Technology | Purpose |
|---|---|---|
| Checkpointer | PostgreSQL (PostgresSaver) | Session-level state persistence, supports HITL recovery |
| Workspace | Supabase Storage | User files (outputs, uploads, sandbox results) |
| Memory | PostgreSQL + pgvector | Long-term memory (vectors + metadata), three-tier scoping |
| Knowledge Base | PostgreSQL + pgvector + BM25 | RAG document storage and hybrid retrieval |
| Cache | Redis | Workspace cache (5min), memory cache (10min) |
Connection Lifecycle
WebSocket Node Connection
Agent Invocation Flow
If HITL (Human-in-the-Loop) is enabled, the Agent pauses before executing sensitive tools, sends anInterruptMessage, the frontend displays an approval UI, and after user confirmation, /api/agent/resume is called to continue execution.
Communication Protocols
HTTP SSE (Agent Response Stream)
Agent invocations returntext/event-stream. SSE event types:
| Event Type | data Field | Description |
|---|---|---|
text | {type, content} | Streaming text tokens |
tool_call | {type, tool_name, tool_args, tool_call_id} | Agent initiates a tool call |
tool_call_result | {type, tool_name, result, ...} | Tool execution result |
interrupt | {type, tool_calls, ...} | HITL interrupt, awaiting user approval |
complete | {type, finish_reason} | Stream ended |
error | {type, error} | Error message |
WebSocket JSON-RPC 2.0 (Node Tool Calls)
Node tool calls follow the MCP (Model Context Protocol) specification: Request:WebSocket Message Types (Non JSON-RPC)
| Direction | type | Description |
|---|---|---|
| Node → Gateway | register | Node registration (capabilities, tools) |
| Gateway → Node | registered | Registration confirmation |
| Node → Gateway | heartbeat | Heartbeat report (status, current_tasks) |
| Gateway → Node | heartbeat_ack | Heartbeat acknowledgment |
| Node ↔ Gateway | ping / pong | Keepalive |
| Web → Gateway | get_workflows | Request workflow list (forwarded to Extension) |
| Web → Gateway | execute_workflow | Execute workflow |
| Node → Gateway | task_complete | Workflow execution completed |
Startup & Lifecycle
Startup Sequence
Key Environment Variables
| Variable | Description |
|---|---|
DATABASE_URL | PostgreSQL connection string (Checkpointer + pgvector) |
SUPABASE_URL / SUPABASE_SERVICE_KEY | Supabase Storage configuration |
NEXTJS_API_URL | Next.js backend API (user data, configuration) |
OPENAI_API_KEY / OPENAI_BASE_URL | Default LLM configuration |
REDIS_URL | Redis cache (optional) |
LANGCHAIN_API_KEY | LangSmith tracing (optional) |
Health Checks
GET /health→{"status": "ok"}GET /→{"name": "Zeus Backend API", "version": "1.0.0", "status": "running"}
System Invariants
- JWT Authentication: All REST APIs require a valid JWT Token
- Session Isolation: Each
session_idhas independent Checkpointer state; different sessions do not interfere - Node Heartbeat: Nodes that miss heartbeats for over 60 seconds are automatically marked offline; the Gateway immediately deregisters nodes on disconnect
- Tool Call Timeout: WebSocket tool calls default to 60-second timeout; workflow execution has a 300-second timeout
- SSE Non-Replay: Agent invocation SSE streams are one-time; after disconnect, context must be restored via Checkpointer
- Single-Instance Gateway: The current
ConnectionManageris a per-process singleton; WebSocket connections are not shared across processes