flow

speak a concept, step inside it in 3d // spatial learning platform powered by gaussian splatting

Overview

flow converts voice commands into explorable 3D gaussian splat environments. say "show me ancient rome" and walk around inside it. after a ~5-minute generation pipeline chaining 6 APIs, you can first-person explore photorealistic spaces with educational overlays at 60fps. press 't' mid-exploration to ask questions about what you're seeing and get voice responses.

Built at SB Hacks XII (Jan 2026). Won President's Pick and Best Use of ElevenLabs.

How It Works

1

Voice Capture

Deepgram captures your voice command in real-time using streaming STT with Flux model

2

Content Orchestration

Gemini orchestrates educational content and generates a cinematic image via 2.0-flash-exp-image-generation

3

3D Conversion

Marble API converts the generated image into a 3D gaussian splat environment

4

Real-time Rendering

SparkJS renders the .spz file at 60fps with collision detection for immersive exploration

5

Interactive Q&A

Screenshot your view, Gemini Vision analyzes it, ElevenLabs provides voice narration

Key Features

  • Voice-controlled world generation with Deepgram streaming STT
  • Photorealistic 3D gaussian splat rendering at 60fps using SparkJS
  • Real-time pipeline updates via WebSocket with 6-API integration
  • Sphere-based collision detection with smooth wall sliding
  • Scene library system: checks local files → MongoDB → generates new
  • Contextual voice Q&A using Gemini Vision and ElevenLabs TTS
  • Rate limiting and admin bypass for production-ready deployment

Challenges Overcome

  • Deepgram WebSocket dying instantly
    Explicitly declared linear16 PCM at 48kHz mono
  • Gemini model compatibility issues
    Built backend proxy with fallback model chain
  • Marble API CORS blocked client calls
    Created Express proxy for full async workflow
  • Collision detection needed refinement
    Implemented multiple raycasts for smooth wall sliding

What's Next

  • Improve collision mesh processing for more accurate interactions
  • Multi-user collaborative exploration in shared 3D environments
  • VR/AR support for fully immersive spatial learning
  • AI tutoring guide that follows you through scenes
  • Educator tools for creating custom learning experiences
  • Community marketplace for user-generated worlds