flow

speak a concept, step inside it in 3d // spatial learning platform powered by gaussian splatting

Overview

flow converts voice commands into explorable 3D gaussian splat environments. say "show me ancient rome" and walk around inside it. after a ~5-minute generation pipeline chaining 6 APIs, you can first-person explore photorealistic spaces with educational overlays at 60fps. press 't' mid-exploration to ask questions about what you're seeing and get voice responses.

Built at SB Hacks XII (Jan 2026). Won President's Pick and Best Use of ElevenLabs.

How It Works

Voice Capture

Deepgram captures your voice command in real-time using streaming STT with Flux model

Content Orchestration

Gemini orchestrates educational content and generates a cinematic image via 2.0-flash-exp-image-generation

3D Conversion

Marble API converts the generated image into a 3D gaussian splat environment

Real-time Rendering

SparkJS renders the .spz file at 60fps with collision detection for immersive exploration

Interactive Q&A

Screenshot your view, Gemini Vision analyzes it, ElevenLabs provides voice narration

Key Features

●Voice-controlled world generation with Deepgram streaming STT
●Photorealistic 3D gaussian splat rendering at 60fps using SparkJS
●Real-time pipeline updates via WebSocket with 6-API integration
●Sphere-based collision detection with smooth wall sliding
●Scene library system: checks local files → MongoDB → generates new
●Contextual voice Q&A using Gemini Vision and ElevenLabs TTS
●Rate limiting and admin bypass for production-ready deployment

Challenges Overcome

⚠ Deepgram WebSocket dying instantly
✓ Explicitly declared linear16 PCM at 48kHz mono
⚠ Gemini model compatibility issues
✓ Built backend proxy with fallback model chain
⚠ Marble API CORS blocked client calls
✓ Created Express proxy for full async workflow
⚠ Collision detection needed refinement
✓ Implemented multiple raycasts for smooth wall sliding

What's Next

□Improve collision mesh processing for more accurate interactions
□Multi-user collaborative exploration in shared 3D environments
□VR/AR support for fully immersive spatial learning
□AI tutoring guide that follows you through scenes
□Educator tools for creating custom learning experiences
□Community marketplace for user-generated worlds