A realistic fantasy portrait of a woman with honey-blonde hair and emerald eyes, looking upward with a tear on her cheek, wearing silver earrings and a necklace.
Four Neo-Victorian heroines in a sunlit attic conservatory with swirling hair and magical sparks, set against a cityscape with airships at golden hour.

Recommended Parameters

steps

10 - 20

resolution

1024x1024

Tips

Use the 3.6 billion parameter version of Stage C for best results as the main finetuning was done on it.

Use the 1.5 billion parameter variant for Stage B to excel at reconstructing small and fine details.

Model is well suited for efficient training and inference due to smaller latent space and supports extensions like finetuning, LoRA, ControlNet, IP-Adapter, and LCM.

The model is intended for research purposes only and should not be used to generate factual representations or violate Stability AI's Acceptable Use Policy.

Faces and people may not be generated properly as the model's autoencoding is lossy.

Creator Sponsors

Demos:

Stable Cascade

This model is built upon the Würstchen architecture and its main

difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this

important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes.

How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being

encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a

1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the

highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable

Diffusion 1.5. <br> <br>

Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions

like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well.

Model Details

Model Description

Stable Cascade is a diffusion model trained to generate images given a text prompt.

  • Developed by: Stability AI

  • Funded by: Stability AI

  • Model type: Generative text-to-image model

Model Sources

For research purposes, we recommend our StableCascade Github repository (https://github.com/Stability-AI/StableCascade).

Model Overview

Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade to generate images,

hence the name "Stable Cascade".

Stage A & B are used to compress images, similar to what the job of the VAE is in Stable Diffusion.

However, with this setup, a much higher compression of images can be achieved. While the Stable Diffusion models use a

spatial compression factor of 8, encoding an image with resolution of 1024 x 1024 to 128 x 128, Stable Cascade achieves

a compression factor of 42. This encodes a 1024 x 1024 image to 24 x 24, while being able to accurately decode the

image. This comes with the great benefit of cheaper training and inference. Furthermore, Stage C is responsible

for generating the small 24 x 24 latents given a text prompt. The following picture shows this visually.

For this release, we are providing two checkpoints for Stage C, two for Stage B and one for Stage A. Stage C comes with

a 1 billion and 3.6 billion parameter version, but we highly recommend using the 3.6 billion version, as most work was

put into its finetuning. The two versions for Stage B amount to 700 million and 1.5 billion parameters. Both achieve

great results, however the 1.5 billion excels at reconstructing small and fine details. Therefore, you will achieve the

best results if you use the larger variant of each. Lastly, Stage A contains 20 million parameters and is fixed due to

its small size.

Evaluation

According to our evaluation, Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all

comparisons. The above picture shows the results from a human evaluation using a mix of parti-prompts (link) and aesthetic prompts. Specifically, Stable Cascade (30 inference steps) was compared against Playground v2 (50 inference

steps), SDXL (50 inference steps), SDXL Turbo (1 inference step) and Würstchen v2 (30 inference steps).

Code Example

⚠️ Important: For the code below to work, you have to install diffusers from this branch while the PR is WIP.

pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3

import torch

from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

device = "cuda"

num_images_per_prompt = 2

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to(device)

decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.float16).to(device)

prompt = "Anthropomorphic cat dressed as a pilot"

negative_prompt = ""

prior_output = prior(

prompt=prompt,

height=1024,

width=1024,

negative_prompt=negative_prompt,

guidance_scale=4.0,

num_images_per_prompt=num_images_per_prompt,

num_inference_steps=20

)

decoder_output = decoder(

image_embeddings=prior_output.image_embeddings.half(),

prompt=prompt,

negative_prompt=negative_prompt,

guidance_scale=0.0,

output_type="pil",

num_inference_steps=10

).images

#Now decoder_output is a list with your PIL images

Uses

Direct Use

The model is intended for research purposes for now. Possible research areas and tasks include

  • Research on generative models.

  • Safe deployment of models which have the potential to generate harmful content.

  • Probing and understanding the limitations and biases of generative models.

  • Generation of artworks and use in design and other artistic processes.

  • Applications in educational or creative tools.

Excluded uses are described below.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events,

and therefore using the model to generate such content is out-of-scope for the abilities of this model.

The model should not be used in any way that violates Stability AI's Acceptable Use Policy.

Limitations and Bias

Limitations

  • Faces and people in general may not be generated properly.

  • The autoencoding part of the model is lossy.

Recommendations

The model is intended for research purposes only.

How to Get Started with the Model

Check out https://github.com/Stability-AI/StableCascade

Previous
Midnight - v5.0
Next
Film Emulation - Halation 35mm (Subtle)

Model Details

Model type

Checkpoint

Base model

Stable Cascade

Model version

base

Model hash

0d28c8562d

Discussion

Please log in to leave a comment.

Images by Stable Cascade - base

A realistic fantasy portrait of a woman with honey-blonde hair and emerald eyes, looking upward with a tear on her cheek, wearing silver earrings and a necklace.
Four Neo-Victorian heroines in a sunlit attic conservatory with swirling hair and magical sparks, set against a cityscape with airships at golden hour.

anime Images

Anime-style Egyptian female kneeling in desert, with tan skin, white hair, red eyes, wearing a white robe and Egyptian collar, sunbeam in background.
Anime girl with short hair standing in a blue flower field at night, with a glowing blue light and galaxy in the sky, city lights in the distance.
A stylized anime girl with long black hair and striking green eyes, wearing a black outfit and thigh-high stockings adorned with red flowers tattoo, holding a katana against a red background.
A dramatic steampunk battle scene with airships looming over a city skyline, robots and steam-powered exosuits fighting on the street while an explosion lights up the center under stormy skies.
A smiling fox girl with blonde hair, yellow eyes, fox ears, and a fox tail, wearing a colorful kimono with fur trim and traditional hair ornaments, standing in front of a torii gate.
Anime girl with purple-blue hair in a pink china dress sits on a teal couch holding trays of drinks in a cafe setting.
Anime-style blonde girl wearing a white sundress and large sunhat with a sunflower decoration, standing happily in a sunflower field at sunset with her arms raised.
A confident woman with short white hair and piercing blue eyes wearing a sleeveless black turtleneck sweater, black pants, and fingerless gloves, standing in a dimly lit urban alleyway under a starry night sky.
Blonde woman wearing black and gold helldiver armor with cleavage cutout, thighboots, and holding a helmet, with blue eyes and a light smile
Anime girl with red hair in a braid, wearing a navy school uniform with gold trim, sitting with legs spread, looking at viewer with red eyes, and wearing red ribbons and earrings.

art Images

Closeup abstract portrait featuring a face with closed eyes, created using tricolor ink with explosive brush strokes, splashes of orange, blue, red, and black, conveying emotional intensity and chaotic energy.
Highly detailed digital illustration of a mandrill's head with vibrant red face, yellow eyes, intricate black and white patterns, and feathered fur texture on black background.
A red and black sketch of a dragon looming over a person in the rain at night.
Abstract acrylic painting of a goldfish underwater with striking red, white, and black colors on a dark background
Portrait of a determined military commander with ginger hair and blue eyes wearing an elegant, gold-accented navy uniform blending Napoleonic era and cyberpunk styles, standing in a smoky urban stronghold.
A detailed portrait of a freckled elven hemomancer woman wearing a scarlet hood and robes, with blood-red eyes and intricate magical symbols swirling in a dark forest cave.
Silhouette of a woman standing against a beige background with vivid colorful fractal-like splashes of paint in red, yellow, blue, orange, and purple around her.
Stylish woman wearing a black leather bralette and high-waisted wide-legged pants, standing confidently with arms extended. She has long braided hair and bright green platform heels. The background features green tones with shadow patterns.
Minimalist flat vector artwork showing a silhouette of a slender woman walking on a beach at Cap Canaille, Southern France with a large blue sky filled with white clouds and contrails above the Côte d'Azur coast.
Silhouette of a woman with gold foil dress, standing in a lake with a giant golden moon in the background, set in an Asian landscape.

base model Images

Photorealistic scene of undead characters including zombies and skeletons walking through a spooky cemetery illuminated by glowing jack-o'-lanterns under a dark, ominous sky.

logo Images

Illustration of a big squid-like humanoid character wearing a striped employee shirt standing behind a shop counter filled with instant noodle cups and various items, drawn with detailed linear hatching and earthy tones.
D.Va character from Overwatch kneeling in her signature blue and pink bodysuit, holding a handgun, with a large artistic close-up of her face in the pink-themed background featuring cinematic lighting and smoke effects.
Steampunk style coffee machine with smiling girl, watercolor sketch.
Pixel art of chibi Shiroko from Blue Archive with a sword on an isometric grid.
A detailed still life with various fruits and lit candles, AI generated using Stable Diffusion.
Vintage style illustration of a muscular man with long flowing hair in a heroic pose surrounded by mystical symbols and ghostly hands, AI generated using stable diffusion.

realism Images

A young blonde princess with braided hair crouching beside a campfire in a forest clearing during a tribal party, surrounded by figures in the background near bonfires.
Realistic portrayal of a woman with striking emerald green eyes, wearing a crown made of delicate crystal shards and a gown resembling frozen waterfalls, illuminated by refracted icy blue and silver light in a dark glacial cave.
An office worker sitting at a desk with his head in his hands, illuminated by a glowing laptop screen, surrounded by stacks of reports and energy drink cans resembling golden chalices, under luxurious Baroque curtains.
A roaring Tyrannosaurus Rex chasing a young woman walking in a dense jungle, depicted in the detailed style of Sergey Krasovskiy.
A woman holding a lit candle with a pitch black dark background illuminating half of her face with warm candlelight.
A detailed digital painting of a rusted military propeller plane flying mid-air over the ocean, with spinning propellers and an open cockpit showing passengers, under a clear blue sky with clouds.
Close-up photo of a redheaded girl with freckles and blue eyes standing among tall grasses in intense sunlight, showcasing detailed natural features and analog film grain effect.
A supernatural female face with glowing eyes emerging from jungle foliage and glowing plants, a luminous waterfall flows from her mouth, digital fantasy art.
Dramatic close-up portrait of an elderly man with white hair and glowing yellow eyes, wearing dark detailed armor and holding a round shield against a solid black background.
A rusty and malfunctioning vintage coffee maker emitting synthetic steam, with a robotic arm twitching, sitting on a stained countertop under flickering fluorescent lights.