Architecture

Zeus AI Backend is an Agent server built on FastAPI, responsible for receiving user requests, orchestrating Agent execution, managing tool calls, and node communication. It interacts with the frontend via HTTP REST + SSE, and communicates with remote nodes (browser extensions, desktop applications) via WebSocket + JSON-RPC 2.0.

Overview

A single FastAPI instance hosts both REST API and WebSocket gateway
Agent invocations are initiated via /api/agent/invoke and return an SSE stream
Node tool calls are routed through the WebSocket gateway in JSON-RPC 2.0 (MCP protocol) format
Each user has an independent cloud workspace (Supabase Storage) and session state (PostgreSQL Checkpointer)

Components & Data Flow

API Layer (Routing Layer)

The FastAPI application provides the following route modules:

Route	Description	Protocol
`/api/agent/invoke`	Agent conversation invocation	HTTP POST → SSE Stream
`/api/agent/resume`	HITL resume execution	HTTP POST → SSE Stream
`/api/tools/*`	Tool management CRUD	HTTP REST
`/api/skill/*`	Skill management	HTTP REST
`/api/knowledge-base/*`	Knowledge base management	HTTP REST
`/api/node/*`	Node device queries	HTTP REST
`/api/scheduled-task/*`	Scheduled task management	HTTP REST
`/ws/extension`	Browser extension WebSocket	WebSocket
`/ws/desktop`	Desktop application WebSocket	WebSocket
`/ws/web`	Web client WebSocket	WebSocket
`/api/feishu/*`	Feishu channel Webhook	HTTP REST

All REST endpoints are authenticated via JWT (verify_jwt_token); WebSocket connections pass client_id / user_id via query parameters.

WebSocket Gateway

The gateway manages all WebSocket connections through ConnectionManager, supporting three node types:

Extension / Desktop nodes: Each node is uniquely identified by node_id; a user can have multiple nodes
Web clients: Managed by user_id; the same user can have multiple Web connections
Tool calls: The Agent initiates JSON-RPC requests via call_tool(); the Gateway routes requests to the corresponding node and awaits responses (Future-based)

Agent Service

The Agent Service is the core orchestration layer, built on the DeepAgents framework (a higher-level wrapper over LangGraph): Service Layer Responsibilities:

Service	File	Responsibility
BaseService	`services/base.py`	Context initialization, tool assembly, prompt construction, SSE event streaming
AgentService	`services/agent.py`	Agent mode invoke/resume entry point
RAGService	`services/rag.py`	Knowledge base retrieval (vector + BM25 hybrid search)
DocumentService	`services/document.py`	Document processing and chunking
FeishuService	`services/feishu.py`	Feishu channel integration
SchedulerService	`services/scheduler.py`	Scheduled task scheduling

Tool System

Zeus tools are organized into four layers:

Layer	Source	Registration	Execution Location
Built-in	`utils/tools/built_in/`	Registered directly in code	AI Backend local
MCP	Passed from frontend config	`langchain_mcp_adapters`	MCP Server (remote)
OAuth	Passed from frontend config	Dynamically built LangChain Tool	AI Backend → OAuth API
Connector	Reported by WebSocket nodes	Bound via SessionManager	Remote nodes (Extension/Desktop)

Connector Tools call chain: Agent → ToolRouter → Gateway → WebSocket → Node → Execute → Return via same path.

Node Management

Node management is handled by three cooperating components:

Component	Description
NodeManager	Node registration/deregistration, heartbeat TTL (60s), periodic cleanup (30s)
SessionManager	Binds sessions to specific nodes, supports `preferred_node_id` specification
ToolRouter	Routes tool calls to the appropriate node based on session binding

Each user can have up to 10 nodes; nodes that miss heartbeats are automatically marked offline and deregistered.

Storage & State

Storage	Technology	Purpose
Checkpointer	PostgreSQL (`PostgresSaver`)	Session-level state persistence, supports HITL recovery
Workspace	Supabase Storage	User files (outputs, uploads, sandbox results)
Memory	PostgreSQL + pgvector	Long-term memory (vectors + metadata), three-tier scoping
Knowledge Base	PostgreSQL + pgvector + BM25	RAG document storage and hybrid retrieval
Cache	Redis	Workspace cache (5min), memory cache (10min)

Connection Lifecycle

WebSocket Node Connection

Agent Invocation Flow

If HITL (Human-in-the-Loop) is enabled, the Agent pauses before executing sensitive tools, sends an InterruptMessage, the frontend displays an approval UI, and after user confirmation, /api/agent/resume is called to continue execution.

Communication Protocols

HTTP SSE (Agent Response Stream)

Agent invocations return text/event-stream. SSE event types:

Event Type	data Field	Description
`text`	`{type, content}`	Streaming text tokens
`tool_call`	`{type, tool_name, tool_args, tool_call_id}`	Agent initiates a tool call
`tool_call_result`	`{type, tool_name, result, ...}`	Tool execution result
`interrupt`	`{type, tool_calls, ...}`	HITL interrupt, awaiting user approval
`complete`	`{type, finish_reason}`	Stream ended
`error`	`{type, error}`	Error message

WebSocket JSON-RPC 2.0 (Node Tool Calls)

Node tool calls follow the MCP (Model Context Protocol) specification: Request:

{
  "jsonrpc": "2.0",
  "id": "uuid-string",
  "method": "tools/call",
  "params": {
    "name": "browser_click",
    "arguments": { "selector": "#submit-btn" },
    "session_id": "session_abc123"
  }
}

Response (Success):

{
  "jsonrpc": "2.0",
  "id": "uuid-string",
  "result": {
    "content": [
      { "type": "text", "text": "Clicked element successfully" },
      { "type": "image", "data": "base64...", "mimeType": "image/jpeg" }
    ],
    "isError": false
  }
}

Response (Error):

{
  "jsonrpc": "2.0",
  "id": "uuid-string",
  "error": {
    "code": -32603,
    "message": "Element not found"
  }
}

WebSocket Message Types (Non JSON-RPC)

Direction	type	Description
Node → Gateway	`register`	Node registration (capabilities, tools)
Gateway → Node	`registered`	Registration confirmation
Node → Gateway	`heartbeat`	Heartbeat report (status, current_tasks)
Gateway → Node	`heartbeat_ack`	Heartbeat acknowledgment
Node ↔ Gateway	`ping` / `pong`	Keepalive
Web → Gateway	`get_workflows`	Request workflow list (forwarded to Extension)
Web → Gateway	`execute_workflow`	Execute workflow
Node → Gateway	`task_complete`	Workflow execution completed

Startup & Lifecycle

Startup Sequence

Key Environment Variables

Variable	Description
`DATABASE_URL`	PostgreSQL connection string (Checkpointer + pgvector)
`SUPABASE_URL` / `SUPABASE_SERVICE_KEY`	Supabase Storage configuration
`NEXTJS_API_URL`	Next.js backend API (user data, configuration)
`OPENAI_API_KEY` / `OPENAI_BASE_URL`	Default LLM configuration
`REDIS_URL`	Redis cache (optional)
`LANGCHAIN_API_KEY`	LangSmith tracing (optional)

Health Checks

GET /health → {"status": "ok"}
GET / → {"name": "Zeus Backend API", "version": "1.0.0", "status": "running"}

System Invariants

JWT Authentication: All REST APIs require a valid JWT Token
Session Isolation: Each session_id has independent Checkpointer state; different sessions do not interfere
Node Heartbeat: Nodes that miss heartbeats for over 60 seconds are automatically marked offline; the Gateway immediately deregisters nodes on disconnect
Tool Call Timeout: WebSocket tool calls default to 60-second timeout; workflow execution has a 300-second timeout
SSE Non-Replay: Agent invocation SSE streams are one-time; after disconnect, context must be restored via Checkpointer
Single-Instance Gateway: The current ConnectionManager is a per-process singleton; WebSocket connections are not shared across processes

Directory Structure

apps/ai-backend/src/
├── api/                          # FastAPI routing layer
│   ├── main.py                   # Entry point & lifecycle
│   ├── gateway.py                # WebSocket gateway
│   ├── agent.py                  # Agent API
│   ├── tools.py                  # Tool management API
│   ├── skill.py                  # Skill management API
│   ├── knowledge_base.py         # Knowledge base API
│   ├── node.py                   # Node query API
│   ├── scheduled_task.py         # Scheduled task API
│   └── channels/                 # Channels (Feishu)
├── services/                     # Business logic layer
│   ├── base.py                   # BaseService (Agent core)
│   ├── agent.py                  # AgentService
│   ├── rag.py                    # RAG retrieval
│   ├── document.py               # Document processing
│   ├── feishu.py                 # Feishu service
│   └── scheduler.py              # Scheduled tasks
├── repository/                   # Data models & prompts
│   ├── models/                   # Pydantic Models
│   ├── prompts/                  # System Prompts (.md)
│   └── skills/                   # Skill definitions
└── utils/                        # Utility layer
    ├── core/                     # LLM, Memory, Skills, HITL
    ├── infra/                    # Backend, Checkpoint, Node, Redis, Auth
    ├── tools/                    # Tool implementations
    │   ├── built_in/             # Built-in tools
    │   └── oauth/                # OAuth tools
    ├── knowledge_base/           # Vector storage & chunking
    └── channels/                 # Channel integrations

Agent Runtime

Runtime detailed design — Workspace, Session, Modes

Gateway Protocol

Channels & Gateway — Feishu, WebSocket node communication

Tool System

Four-layer tool system — Built-in, MCP, OAuth, Connector

File System

Storage architecture — CloudDriveBackend, Checkpoint

Get Started

Fundamentals

Core Capabilities

Infra

Channels

Architecture

Overview

Components & Data Flow

API Layer (Routing Layer)

WebSocket Gateway

Agent Service

Tool System

Node Management

Storage & State

Connection Lifecycle

WebSocket Node Connection

Agent Invocation Flow

Communication Protocols

HTTP SSE (Agent Response Stream)

WebSocket JSON-RPC 2.0 (Node Tool Calls)

WebSocket Message Types (Non JSON-RPC)

Startup & Lifecycle

Startup Sequence

Key Environment Variables

Health Checks

System Invariants

Directory Structure

Agent Runtime

Gateway Protocol

Tool System

File System

Get Started

Fundamentals

Core Capabilities

Infra

Channels

​Overview

​Components & Data Flow

​API Layer (Routing Layer)

​WebSocket Gateway

​Agent Service

​Tool System

​Node Management

​Storage & State

​Connection Lifecycle

​WebSocket Node Connection

​Agent Invocation Flow

​Communication Protocols

​HTTP SSE (Agent Response Stream)

​WebSocket JSON-RPC 2.0 (Node Tool Calls)

​WebSocket Message Types (Non JSON-RPC)

​Startup & Lifecycle

​Startup Sequence

​Key Environment Variables

​Health Checks

​System Invariants

​Directory Structure

Agent Runtime

Gateway Protocol

Tool System

File System

Overview

Components & Data Flow

API Layer (Routing Layer)

WebSocket Gateway

Agent Service

Tool System

Node Management

Storage & State

Connection Lifecycle

WebSocket Node Connection

Agent Invocation Flow

Communication Protocols

HTTP SSE (Agent Response Stream)

WebSocket JSON-RPC 2.0 (Node Tool Calls)

WebSocket Message Types (Non JSON-RPC)

Startup & Lifecycle

Startup Sequence

Key Environment Variables

Health Checks

System Invariants

Directory Structure