Using StreamMultiDiffusion Online
Real-Time Region-Based Text-to-Image Generation
Real-time image generation from regional text prompts. 10x faster than rivals, achieving 1.57 FPS on RTX 2080 Ti GPU.
Semantic Palette
A new interactive image generation paradigm: semantic color palette. Users can generate high-quality images in real-time by providing multiple hand-drawn regions with predefined semantic meanings, such as "eagle" or "girl".
Acceleration and Stability
The framework stabilizes MultiDiffusion with three techniques (latent pre-averaging, mask-centering bootstrapping, and quantized masks) to make it compatible with fast inference techniques such as LCM LoRA, thereby achieving fast controllable text-to-image synthesis.