Browser Farm Architecture — Mata (Vision) & Tangan (Action)

🌐 Browser Automation • javascript • v1 • 2026-04-01

The Browser Farm uses a Vision-Action pattern: Mata (Eyes) captures screenshots via capturePage(), and Tangan (Hands) executes JavaScript via executeJavaScript(). All controlled via WebSocket on Port 5001.

Mata & Tangan

Flowork's Browser Farm uses a vision-action paradigm inspired by human browsing:

Mata (Eyes) — Vision

Capture what the browser sees:

// Returns Base64 screenshot
const image = await bView.webContents.capturePage();
return image.toDataURL();

Tangan (Hands) — Action

Interact with the page:

// Click, type, scrape — anything JS can do
const result = await bView.webContents.executeJavaScript(script);

The Golden Rule

Always capture FIRST, analyze, THEN act. Never execute blind JavaScript. The AI should: 1. capture_browser — See the current state 2. Analyze the screenshot — Understand what's on screen 3. execute_browser_script — Perform the action 4. capture_browser again — Verify the result

WebSocket Commands

All browser control happens via WebSocket JSON messages on Port 5001:

open_ai_tab — Create browser window
capture_browser — Screenshot
execute_browser_script — Inject JS
list_browsers — Get all tabs
get_console_logs — Read console
ai_navigate — Load URL
scrape_page — Extract text
browser_lifecycle — Back/forward/reload/stop

INDEXED KEYWORDS: