
opal
Voice-Controlled AI Agent That Browses the Web & Plays Games Through Discord
Overview
Opal is a full-stack AI agent system that lets Discord users—via text or voice—command a real Chrome browser to do anything: browse the web, play browser games, search YouTube, fill forms, and more. Say “opal task go to YouTube and play lo-fi” in a voice channel, and it just does it.
Built at NexHacks (Jan 2026).
Architecture
The system uses a planner/navigator split agent architecture. A strategic planner provides high-level reasoning every 3 steps, while a multimodal navigator executes DOM + vision actions every step. The Chrome extension streams live tab frames via WebRTC to a VLM relay for real-time screen understanding.
Discord Bot
Receives voice/text commands, queues goals, forwards to CUA backend
CUA Backend
FastAPI service orchestrating the planner/navigator agent loop
Chrome Extension
CDP + WebRTC for DOM indexing, live streaming, and action execution
LiveKit Voice Bridge
Deepgram STT + ElevenLabs TTS for voice channel interaction
Key Features
- ●Voice-to-browser control: speak a goal in Discord and opal executes it in a real Chrome browser
- ●Real-time vision via Overshoot VLM — always-on screen understanding without polling
- ●Tactical game mode: auto-detects browser games and switches to 100–300ms tick loop (Krunker, CS:GO, League, Minecraft)
- ●3-layer memory system: working memory, episode compression every 8 steps, persistent session summaries
- ●Spectator dashboard with live YOLO bounding box overlays, raycast visualization, and HUD stats
- ●Persistent agent queue per Discord guild with goal preemption
Tactical Game Mode
When the navigator detects a browser game, opal automatically switches to a fast tick loop (100–300ms) with game-specific tactical backends. It uses a visual observer combined with heuristic strategies to play games like Krunker, Skribbl.io, CS:GO browser mirrors, League of Legends, and Minecraft—all in real-time through the Chrome extension.