Browser Farm Architecture — Mata (Vision) & Tangan (Action)

🌐 Browser Automation javascript v1

The Browser Farm uses a Vision-Action pattern: Mata (Eyes) captures screenshots via capturePage(), and Tangan (Hands) executes JavaScript via executeJavaScript(). All controlled via WebSocket on Port 5001.

Mata & Tangan

Flowork's Browser Farm uses a vision-action paradigm inspired by human browsing:

Mata (Eyes) — Vision

Capture what the browser sees:
// Returns Base64 screenshot
const image = await bView.webContents.capturePage();
return image.toDataURL();

Tangan (Hands) — Action

Interact with the page:
// Click, type, scrape — anything JS can do
const result = await bView.webContents.executeJavaScript(script);

The Golden Rule

Always capture FIRST, analyze, THEN act. Never execute blind JavaScript. The AI should: 1. capture_browser — See the current state 2. Analyze the screenshot — Understand what's on screen 3. execute_browser_script — Perform the action 4. capture_browser again — Verify the result

WebSocket Commands

All browser control happens via WebSocket JSON messages on Port 5001:

  • open_ai_tab — Create browser window
  • capture_browser — Screenshot
  • execute_browser_script — Inject JS
  • list_browsers — Get all tabs
  • get_console_logs — Read console
  • ai_navigate — Load URL
  • scrape_page — Extract text
  • browser_lifecycle — Back/forward/reload/stop