Skip to main content

RPA Workflow

Zeus Desktop supports publishing recorded and parameterized browser workflows as standard MCP (JSON-RPC 2.0) tools, enabling AI Agents to drive browser-based business operations just like calling regular functions.

Overview

Architecture

Design Principles

PrincipleDescription
No Open Ports on ClientAll communication goes through client-initiated WebSocket connections
Standard ProtocolStrictly follows JSON-RPC 2.0 / MCP protocol specification
Dynamic RegistrationPublished workflows are automatically registered as callable tools without restart
Silent ExecutionBrowser can run silently in the background without disturbing the user
Strict FlowWorkflows execute steps sequentially; exploratory actions are not allowed

Publishing a Workflow as a Tool

Publishing Process

  1. Complete Recording: Record browser actions and save as a workflow
  2. Parameterize: Mark values that need dynamic injection as variables (see Recording - Parameterization)
  3. Publish Tool: Click the Publish Tool button in the workflow editor
  4. Configure Metadata:
    • Tool Name (toolName): Identifier used by the Agent when calling, e.g. query_power_data
    • Tool Description (toolDescription): Tells the Agent what this tool does
  5. Confirm Publishing: The tool is immediately available after saving

Publish Dialog

The publish dialog automatically extracts all parameterized actions from the workflow and generates a tool parameter list:
FieldSourceDescription
Parameter NamevariableNameVariable name used as tool input parameter name
DescriptionparamDescriptionDescription of parameter purpose
Default ValuedefaultValueOptional default value
RequiredNo default → requiredParameters without default values are required

Generated Tool Definition

After publishing, the system automatically generates an MCP-compliant tool definition:
{
  "name": "workflow_query_power_data",
  "description": "Log in to the power system and query electricity usage data for a specified region",
  "inputSchema": {
    "type": "object",
    "properties": {
      "username": {
        "type": "string",
        "description": "Login username"
      },
      "password": {
        "type": "string",
        "description": "Login password"
      },
      "region": {
        "type": "string",
        "description": "Region code for query",
        "default": "110000"
      },
      "profileId": {
        "type": "string",
        "description": "Browser profile ID (optional override)"
      }
    },
    "required": ["username", "password"]
  }
}

Agent Invocation Flow

Tool Discovery

The Agent retrieves all available tools on the client via the MCP tools/list method:
// Agent → Desktop (JSON-RPC Request)
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/list"
}

// Desktop → Agent (JSON-RPC Response)
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "tools": [
      {
        "name": "desktop_exec",
        "description": "Execute shell commands on desktop"
      },
      {
        "name": "browser_read",
        "description": "Extract data from browser pages"
      },
      {
        "name": "hitl_prompt",
        "description": "Request user input during automation"
      },
      {
        "name": "workflow_query_power_data",
        "description": "Log in to the power system and query electricity usage data for a specified region",
        "inputSchema": { "..." }
      }
    ]
  }
}

Tool Invocation

When the Agent decides to call a workflow tool, it sends a standard tools/call request:
// Agent → Desktop
{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/call",
  "params": {
    "name": "workflow_query_power_data",
    "arguments": {
      "username": "operator_001",
      "password": "secure_pass",
      "region": "330100"
    }
  }
}

Execution Result

// Desktop → Agent
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "{\"success\": true, \"totalActions\": 8, \"completedActions\": 8, \"failedActions\": 0}"
      }
    ]
  }
}

Built-in Tools

In addition to dynamically registered workflow tools, Desktop also provides the following built-in MCP tools:

desktop_exec

Execute shell commands on the client.
{
  "name": "desktop_exec",
  "arguments": {
    "command": "ls -la /tmp/reports/"
  }
}

browser_read

Extract data from browser pages with multiple extraction methods:
ActionDescriptionReturn Value
textExtract text content from a page or elementPlain text string
tableParse HTML tables into structured data2D array (headers + rows)
formExtract current form field valuesField name-value mapping
htmlGet the HTML of an element or pageHTML string
urlGet the current page URLURL string
screenshotTake a page screenshotBase64 PNG image
Example call:
{
  "name": "browser_read",
  "arguments": {
    "action": "table",
    "selector": "#data-table",
    "profileId": "profile-001"
  }
}
Typical scenario: The Agent first executes workflow_query_power_data to navigate to the data page, then uses browser_read to extract table data from the page for analysis.

hitl_prompt

Request manual user input during an automated workflow — useful for captchas, SMS verification codes, and other values that cannot be obtained automatically.
{
  "name": "hitl_prompt",
  "arguments": {
    "title": "SMS Verification Code",
    "message": "Please enter the 6-digit verification code sent to your phone",
    "inputType": "text",
    "timeout": 120
  }
}
Execution flow:
  1. The system shows a notification to alert the user
  2. An input dialog appears in the Desktop UI
  3. After the user enters a value, it is returned to the Agent
  4. If no input is provided before timeout, a timeout status is returned
Interaction method priority:
PriorityMethodDescription
1Renderer popupWhen the Desktop main window is visible, show a dialog in the renderer process
2Native dialogWhen the main window is not visible, use an Electron native dialog
3TimeoutIf no response within the configured time, return a timeout status

Silent Browser Mode

When the Agent remotely calls a workflow tool, the browser launches in silent mode, without displaying a browser window on the user’s screen:

Silent Mode Features

FeatureDescription
Window Not VisibleBrowser window is positioned off-screen (-2400, -2400)
Fixed ViewportWindow size is fixed at 1280×720
Minimal ResourcesGPU, extensions, popups, and notifications are disabled
Isolated ConfigurationUses the specified browser profile, isolating cookies and state

Silent Mode Launch Arguments

--window-position=-2400,-2400
--window-size=1280,720
--disable-gpu
--disable-extensions
--disable-popup-blocking
--disable-notifications

Dynamic Tool Registration

Registration Timing

Desktop syncs tool information to the backend at the following times:
  1. When WebSocket connection is established: Sends a registration message containing all available tools
  2. When a workflow is published/unpublished: Notifies the backend to update the tool list

Registration Message

{
  "type": "register",
  "deviceId": "device-abc123",
  "capabilities": ["desktop_exec", "browser_control", "workflow_execution"],
  "available_tools": [
    "desktop_exec",
    "browser_read",
    "hitl_prompt",
    "workflow_query_power_data",
    "workflow_generate_report"
  ]
}

Backend Tool Creation

When the AI Backend (Python) receives the tool list, it dynamically creates a LangChain StructuredTool for each workflow tool:
# Dynamically create Pydantic schema
DynamicModel = create_model(
    "QueryPowerDataInput",
    username=(str, Field(description="Login username")),
    password=(str, Field(description="Login password")),
    region=(str, Field(default="110000", description="Region code for query")),
)

# Create LangChain Tool
tool = StructuredTool.from_function(
    name="workflow_query_power_data",
    description="Log in to the power system and query electricity usage data for a specified region",
    args_schema=DynamicModel,
    func=lambda **kwargs: call_desktop_tool("workflow_query_power_data", kwargs),
)

Typical Business Scenarios

Scenario 1: Power Data Collection

Scenario 2: Operations Requiring Verification Codes

Scenario 3: Batch Data Processing

The Agent can call workflow tools in a loop to handle batch tasks:
  1. Call workflow_login to log in to the system
  2. Call workflow_query_data in a loop (with different parameters) to extract data across multiple pages
  3. Call browser_read to retrieve page data
  4. Aggregate and analyze, then call workflow_generate_report to produce a report

Error Handling

Error Codes

Error CodeDescription
-32001Tool execution error
-32002Workflow execution error
-32003Workflow not found
-32004Browser not running

Replay Engine Fault Tolerance

MechanismParametersDescription
Element WaitMax 10sWait for target element to appear in the DOM
Action Retry3 attempts, 1s intervalAutomatically retry failed actions
Timeout Protection30s per actionSkip to the next step if a single action times out

Execution Result

After workflow execution completes, a structured result is returned:
interface WorkflowExecutionResult {
  success: boolean
  totalActions: number
  completedActions: number
  failedActions: number
  results: ActionResult[]
}

Compatibility

Windows 7 Support

Since some clients run on Windows 7, note the following:
  • Electron 22.x is the last version to support Windows 7
  • The corresponding Chromium engine version must be used
  • Some modern Web APIs may not be available

Domestic OS Support

Compatibility with domestic operating systems (UOS / Kylin):
PlatformArchitectureDistribution Format
UOSx64 / arm64.deb
Kylinx64 / arm64.rpm / .deb
Generic Linuxx64.zip (portable)