Using StreamMultiDiffusion Online

Real-Time Region-Based Text-to-Image Generation

Real-time image generation from regional text prompts. 10x faster than rivals, achieving 1.57 FPS on RTX 2080 Ti GPU.

Semantic Palette

A new interactive image generation paradigm: semantic color palette. Users can generate high-quality images in real-time by providing multiple hand-drawn regions with predefined semantic meanings, such as "eagle" or "girl".

Acceleration and Stability

The framework stabilizes MultiDiffusion with three techniques (latent pre-averaging, mask-centering bootstrapping, and quantized masks) to make it compatible with fast inference techniques such as LCM LoRA, thereby achieving fast controllable text-to-image synthesis.

logo

StreamMultiDiffusion