flow
speak a concept, step inside it in 3d // spatial learning platform powered by gaussian splatting
Overview
flow converts voice commands into explorable 3D gaussian splat environments. say "show me ancient rome" and walk around inside it. after a ~5-minute generation pipeline chaining 6 APIs, you can first-person explore photorealistic spaces with educational overlays at 60fps. press 't' mid-exploration to ask questions about what you're seeing and get voice responses.
Built at SB Hacks XII (Jan 2026). Won President's Pick and Best Use of ElevenLabs.
How It Works
Voice Capture
Deepgram captures your voice command in real-time using streaming STT with Flux model
Content Orchestration
Gemini orchestrates educational content and generates a cinematic image via 2.0-flash-exp-image-generation
3D Conversion
Marble API converts the generated image into a 3D gaussian splat environment
Real-time Rendering
SparkJS renders the .spz file at 60fps with collision detection for immersive exploration
Interactive Q&A
Screenshot your view, Gemini Vision analyzes it, ElevenLabs provides voice narration
Key Features
- ●Voice-controlled world generation with Deepgram streaming STT
- ●Photorealistic 3D gaussian splat rendering at 60fps using SparkJS
- ●Real-time pipeline updates via WebSocket with 6-API integration
- ●Sphere-based collision detection with smooth wall sliding
- ●Scene library system: checks local files → MongoDB → generates new
- ●Contextual voice Q&A using Gemini Vision and ElevenLabs TTS
- ●Rate limiting and admin bypass for production-ready deployment
Challenges Overcome
- ⚠ Deepgram WebSocket dying instantly✓ Explicitly declared linear16 PCM at 48kHz mono
- ⚠ Gemini model compatibility issues✓ Built backend proxy with fallback model chain
- ⚠ Marble API CORS blocked client calls✓ Created Express proxy for full async workflow
- ⚠ Collision detection needed refinement✓ Implemented multiple raycasts for smooth wall sliding
What's Next
- □Improve collision mesh processing for more accurate interactions
- □Multi-user collaborative exploration in shared 3D environments
- □VR/AR support for fully immersive spatial learning
- □AI tutoring guide that follows you through scenes
- □Educator tools for creating custom learning experiences
- □Community marketplace for user-generated worlds