RPA Workflow

Zeus Desktop supports publishing recorded and parameterized browser workflows as standard MCP (JSON-RPC 2.0) tools, enabling AI Agents to drive browser-based business operations just like calling regular functions.

Overview

Architecture

Design Principles

Principle	Description
No Open Ports on Client	All communication goes through client-initiated WebSocket connections
Standard Protocol	Strictly follows JSON-RPC 2.0 / MCP protocol specification
Dynamic Registration	Published workflows are automatically registered as callable tools without restart
Silent Execution	Browser can run silently in the background without disturbing the user
Strict Flow	Workflows execute steps sequentially; exploratory actions are not allowed

Publishing a Workflow as a Tool

Publishing Process

Complete Recording: Record browser actions and save as a workflow
Parameterize: Mark values that need dynamic injection as variables (see Recording - Parameterization)
Publish Tool: Click the Publish Tool button in the workflow editor
Configure Metadata:
- Tool Name (toolName): Identifier used by the Agent when calling, e.g. query_power_data
- Tool Description (toolDescription): Tells the Agent what this tool does
Confirm Publishing: The tool is immediately available after saving

Publish Dialog

The publish dialog automatically extracts all parameterized actions from the workflow and generates a tool parameter list:

Field	Source	Description
Parameter Name	`variableName`	Variable name used as tool input parameter name
Description	`paramDescription`	Description of parameter purpose
Default Value	`defaultValue`	Optional default value
Required	No default → required	Parameters without default values are required

Generated Tool Definition

After publishing, the system automatically generates an MCP-compliant tool definition:

{
  "name": "workflow_query_power_data",
  "description": "Log in to the power system and query electricity usage data for a specified region",
  "inputSchema": {
    "type": "object",
    "properties": {
      "username": {
        "type": "string",
        "description": "Login username"
      },
      "password": {
        "type": "string",
        "description": "Login password"
      },
      "region": {
        "type": "string",
        "description": "Region code for query",
        "default": "110000"
      },
      "profileId": {
        "type": "string",
        "description": "Browser profile ID (optional override)"
      }
    },
    "required": ["username", "password"]
  }
}

Agent Invocation Flow

Tool Discovery

The Agent retrieves all available tools on the client via the MCP tools/list method:

// Agent → Desktop (JSON-RPC Request)
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/list"
}

// Desktop → Agent (JSON-RPC Response)
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "tools": [
      {
        "name": "desktop_exec",
        "description": "Execute shell commands on desktop"
      },
      {
        "name": "browser_read",
        "description": "Extract data from browser pages"
      },
      {
        "name": "hitl_prompt",
        "description": "Request user input during automation"
      },
      {
        "name": "workflow_query_power_data",
        "description": "Log in to the power system and query electricity usage data for a specified region",
        "inputSchema": { "..." }
      }
    ]
  }
}

Tool Invocation

When the Agent decides to call a workflow tool, it sends a standard tools/call request:

// Agent → Desktop
{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/call",
  "params": {
    "name": "workflow_query_power_data",
    "arguments": {
      "username": "operator_001",
      "password": "secure_pass",
      "region": "330100"
    }
  }
}

Execution Result

// Desktop → Agent
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "{\"success\": true, \"totalActions\": 8, \"completedActions\": 8, \"failedActions\": 0}"
      }
    ]
  }
}

Built-in Tools

In addition to dynamically registered workflow tools, Desktop also provides the following built-in MCP tools:

desktop_exec

Execute shell commands on the client.

{
  "name": "desktop_exec",
  "arguments": {
    "command": "ls -la /tmp/reports/"
  }
}

browser_read

Extract data from browser pages with multiple extraction methods:

Action	Description	Return Value
`text`	Extract text content from a page or element	Plain text string
`table`	Parse HTML tables into structured data	2D array (headers + rows)
`form`	Extract current form field values	Field name-value mapping
`html`	Get the HTML of an element or page	HTML string
`url`	Get the current page URL	URL string
`screenshot`	Take a page screenshot	Base64 PNG image

Example call:

{
  "name": "browser_read",
  "arguments": {
    "action": "table",
    "selector": "#data-table",
    "profileId": "profile-001"
  }
}

Typical scenario: The Agent first executes workflow_query_power_data to navigate to the data page, then uses browser_read to extract table data from the page for analysis.

hitl_prompt

Request manual user input during an automated workflow — useful for captchas, SMS verification codes, and other values that cannot be obtained automatically.

{
  "name": "hitl_prompt",
  "arguments": {
    "title": "SMS Verification Code",
    "message": "Please enter the 6-digit verification code sent to your phone",
    "inputType": "text",
    "timeout": 120
  }
}

Execution flow:

The system shows a notification to alert the user
An input dialog appears in the Desktop UI
After the user enters a value, it is returned to the Agent
If no input is provided before timeout, a timeout status is returned

Interaction method priority:

Priority	Method	Description
1	Renderer popup	When the Desktop main window is visible, show a dialog in the renderer process
2	Native dialog	When the main window is not visible, use an Electron native dialog
3	Timeout	If no response within the configured time, return a timeout status

Silent Browser Mode

When the Agent remotely calls a workflow tool, the browser launches in silent mode, without displaying a browser window on the user’s screen:

Silent Mode Features

Feature	Description
Window Not Visible	Browser window is positioned off-screen (-2400, -2400)
Fixed Viewport	Window size is fixed at 1280×720
Minimal Resources	GPU, extensions, popups, and notifications are disabled
Isolated Configuration	Uses the specified browser profile, isolating cookies and state

Silent Mode Launch Arguments

--window-position=-2400,-2400
--window-size=1280,720
--disable-gpu
--disable-extensions
--disable-popup-blocking
--disable-notifications

Dynamic Tool Registration

Registration Timing

Desktop syncs tool information to the backend at the following times:

When WebSocket connection is established: Sends a registration message containing all available tools
When a workflow is published/unpublished: Notifies the backend to update the tool list

Registration Message

{
  "type": "register",
  "deviceId": "device-abc123",
  "capabilities": ["desktop_exec", "browser_control", "workflow_execution"],
  "available_tools": [
    "desktop_exec",
    "browser_read",
    "hitl_prompt",
    "workflow_query_power_data",
    "workflow_generate_report"
  ]
}

Backend Tool Creation

When the AI Backend (Python) receives the tool list, it dynamically creates a LangChain StructuredTool for each workflow tool:

# Dynamically create Pydantic schema
DynamicModel = create_model(
    "QueryPowerDataInput",
    username=(str, Field(description="Login username")),
    password=(str, Field(description="Login password")),
    region=(str, Field(default="110000", description="Region code for query")),
)

# Create LangChain Tool
tool = StructuredTool.from_function(
    name="workflow_query_power_data",
    description="Log in to the power system and query electricity usage data for a specified region",
    args_schema=DynamicModel,
    func=lambda **kwargs: call_desktop_tool("workflow_query_power_data", kwargs),
)

Typical Business Scenarios

Scenario 1: Power Data Collection

Scenario 2: Operations Requiring Verification Codes

Scenario 3: Batch Data Processing

The Agent can call workflow tools in a loop to handle batch tasks:

Call workflow_login to log in to the system
Call workflow_query_data in a loop (with different parameters) to extract data across multiple pages
Call browser_read to retrieve page data
Aggregate and analyze, then call workflow_generate_report to produce a report

Error Handling

Error Codes

Error Code	Description
`-32001`	Tool execution error
`-32002`	Workflow execution error
`-32003`	Workflow not found
`-32004`	Browser not running

Replay Engine Fault Tolerance

Mechanism	Parameters	Description
Element Wait	Max 10s	Wait for target element to appear in the DOM
Action Retry	3 attempts, 1s interval	Automatically retry failed actions
Timeout Protection	30s per action	Skip to the next step if a single action times out

Execution Result

After workflow execution completes, a structured result is returned:

interface WorkflowExecutionResult {
  success: boolean
  totalActions: number
  completedActions: number
  failedActions: number
  results: ActionResult[]
}

Compatibility

Windows 7 Support

Since some clients run on Windows 7, note the following:

Electron 22.x is the last version to support Windows 7
The corresponding Chromium engine version must be used
Some modern Web APIs may not be available

Domestic OS Support

Compatibility with domestic operating systems (UOS / Kylin):

Platform	Architecture	Distribution Format
UOS	x64 / arm64	`.deb`
Kylin	x64 / arm64	`.rpm` / `.deb`
Generic Linux	x64	`.zip` (portable)

Recording - Record, edit, and parameterize workflows
Desktop Overview - Desktop application overview
Workflow Recording - Video + audio + event multi-layer recording technical design

Web

Desktop

Chrome Extension

iOS

CLI

RPA Workflow

RPA Workflow

Overview

Architecture

Design Principles

Publishing a Workflow as a Tool

Publishing Process

Publish Dialog

Generated Tool Definition

Agent Invocation Flow

Tool Discovery

Tool Invocation

Execution Result

Built-in Tools

desktop_exec

browser_read

hitl_prompt

Silent Browser Mode

Silent Mode Features

Silent Mode Launch Arguments

Dynamic Tool Registration

Registration Timing

Registration Message

Backend Tool Creation

Typical Business Scenarios

Scenario 1: Power Data Collection

Scenario 2: Operations Requiring Verification Codes

Scenario 3: Batch Data Processing

Error Handling

Error Codes

Replay Engine Fault Tolerance

Execution Result

Compatibility

Windows 7 Support

Domestic OS Support

Web

Desktop

Chrome Extension

iOS

CLI

​RPA Workflow

​Overview

​Architecture

​Design Principles

​Publishing a Workflow as a Tool

​Publishing Process

​Publish Dialog

​Generated Tool Definition

​Agent Invocation Flow

​Tool Discovery

​Tool Invocation

​Execution Result

​Built-in Tools

​desktop_exec

​browser_read

​hitl_prompt

​Silent Browser Mode

​Silent Mode Features

​Silent Mode Launch Arguments

​Dynamic Tool Registration

​Registration Timing

​Registration Message

​Backend Tool Creation

​Typical Business Scenarios

​Scenario 1: Power Data Collection

​Scenario 2: Operations Requiring Verification Codes

​Scenario 3: Batch Data Processing

​Error Handling

​Error Codes

​Replay Engine Fault Tolerance

​Execution Result

​Compatibility

​Windows 7 Support

​Domestic OS Support

​Related Documentation

RPA Workflow

Overview

Architecture

Design Principles

Publishing a Workflow as a Tool

Publishing Process

Publish Dialog

Generated Tool Definition

Agent Invocation Flow

Tool Discovery

Tool Invocation

Execution Result

Built-in Tools

desktop_exec

browser_read

hitl_prompt

Silent Browser Mode

Silent Mode Features

Silent Mode Launch Arguments

Dynamic Tool Registration

Registration Timing

Registration Message

Backend Tool Creation

Typical Business Scenarios

Scenario 1: Power Data Collection

Scenario 2: Operations Requiring Verification Codes

Scenario 3: Batch Data Processing

Error Handling

Error Codes

Replay Engine Fault Tolerance

Execution Result

Compatibility

Windows 7 Support

Domestic OS Support

Related Documentation