opal screenshot

opal

Voice-Controlled AI Agent That Browses the Web & Plays Games Through Discord

Overview

Opal is a full-stack AI agent system that lets Discord users—via text or voice—command a real Chrome browser to do anything: browse the web, play browser games, search YouTube, fill forms, and more. Say “opal task go to YouTube and play lo-fi” in a voice channel, and it just does it.

Built at NexHacks (Jan 2026).

Architecture

The system uses a planner/navigator split agent architecture. A strategic planner provides high-level reasoning every 3 steps, while a multimodal navigator executes DOM + vision actions every step. The Chrome extension streams live tab frames via WebRTC to a VLM relay for real-time screen understanding.

Discord Bot

Receives voice/text commands, queues goals, forwards to CUA backend

CUA Backend

FastAPI service orchestrating the planner/navigator agent loop

Chrome Extension

CDP + WebRTC for DOM indexing, live streaming, and action execution

LiveKit Voice Bridge

Deepgram STT + ElevenLabs TTS for voice channel interaction

Key Features

  • Voice-to-browser control: speak a goal in Discord and opal executes it in a real Chrome browser
  • Real-time vision via Overshoot VLM — always-on screen understanding without polling
  • Tactical game mode: auto-detects browser games and switches to 100–300ms tick loop (Krunker, CS:GO, League, Minecraft)
  • 3-layer memory system: working memory, episode compression every 8 steps, persistent session summaries
  • Spectator dashboard with live YOLO bounding box overlays, raycast visualization, and HUD stats
  • Persistent agent queue per Discord guild with goal preemption

Tactical Game Mode

When the navigator detects a browser game, opal automatically switches to a fast tick loop (100–300ms) with game-specific tactical backends. It uses a visual observer combined with heuristic strategies to play games like Krunker, Skribbl.io, CS:GO browser mirrors, League of Legends, and Minecraft—all in real-time through the Chrome extension.