Mata & Tangan
Flowork's Browser Farm uses a vision-action paradigm inspired by human browsing:
Mata (Eyes) — Vision
Capture what the browser sees:// Returns Base64 screenshot
const image = await bView.webContents.capturePage();
return image.toDataURL();
Tangan (Hands) — Action
Interact with the page:// Click, type, scrape — anything JS can do
const result = await bView.webContents.executeJavaScript(script);
The Golden Rule
Always capture FIRST, analyze, THEN act. Never execute blind JavaScript. The AI should:
1. capture_browser — See the current state
2. Analyze the screenshot — Understand what's on screen
3. execute_browser_script — Perform the action
4. capture_browser again — Verify the result
WebSocket Commands
All browser control happens via WebSocket JSON messages on Port 5001:
open_ai_tab— Create browser windowcapture_browser— Screenshotexecute_browser_script— Inject JSlist_browsers— Get all tabsget_console_logs— Read consoleai_navigate— Load URLscrape_page— Extract textbrowser_lifecycle— Back/forward/reload/stop