models/SD XL - v1.0 VAE fix

SD XL - v1.0 VAE fix

7/1/2025

1:41:38 PM

Related Keywords & Tags

base model,checkpoint,latent diffusion model,official,sd xl,sdxl,sdxl 1.0,stability ai,stable diffusion xl,text-to-image generation,v1.0 vae fix

Aerial view of a long rectangular cabin situated on a green valley floor, surrounded by dense trees and towering mountains under a dark, rain-filled sky with heavy clouds.

Modern front-inclined square cabin in a forest at night, featuring a large glass wall with an indoor black sofa, plants, warm incandescent lighting, and outdoor porch seating.

Two-storey wooden greenhouse cabin elevated on columns on a sloped forest surface surrounded by pine trees, mist, and yellow grass.

Twin modern concrete cabins with large glass windows stacked perpendicularly on concrete columns, situated on a snow-covered rocky mountain slope with a mountainous backdrop.

Rectangular white-painted modern cabin with glass windows and indoor lighting, elevated above forest trees on two slanted concrete columns under a grey sky.

A forged carbon mask with glowing orange eyes surrounded by vibrant flames on a dark background.

Portrait of a young woman resembling Zelda with blonde hair, elf ears, blue eyes, wearing a golden tiara and deep purple medieval dress with ornate gold details.

A black cat with glowing orange eyes sits amidst intense flames inside an ancient temple with pillars, surrounded by fire and smoke.

A dark fantasy styled autumn occult altar featuring steaming coffee in a cup, lit candle with rising smoke, small pumpkins on plates, a teapot, and an open book with aged watercolor textures.

Watercolor painting of a large crashed sci-fi ship wreckage on a desert-like landscape with a stranded pilot standing nearby, created with bold lines, expressive colorful sketch style, and high-contrast lighting.

Watercolor painting depicting a flooded city street lined with intricate ruined buildings, featuring two figures in dynamic poses, illuminated by warm, high-contrast lighting.

Recommended Negative Prompts

(deformed iris, deformed pupils), text, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, (extra fingers), (mutated hands), poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, (fused fingers), (too many fingers), long neck, camera

Recommended Parameters

samplers

Euler

steps

cfg

resolution

525x525

Tips

The model is intended for research purposes including artwork generation, educational tools, and safe deployment.

It is not intended to generate factual or true depictions of people or events.

Limitations include imperfect photorealism, inability to render legible text, challenges with compositional prompts, and possible improper face generation.

The model uses two pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.

The two-step pipeline includes base latent generation followed by high-resolution refinement using SDEdit (img2img).

Creator Sponsors

Originally Posted to Hugging Face and shared here with permission from Stability AI.

SDXL consists of a two-step pipeline for latent diffusion: First, we use a base model to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img") to the latents generated in the first step, using the same prompt.

Model Description

Developed by: Stability AI
Model type: Diffusion-based text-to-image generative model
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
Resources for more information: GitHub Repository.

Model Sources

Repository: https://github.com/Stability-AI/generative-models
Demo [optional]: https://clipdrop.co/stable-diffusion

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism
The model cannot render legible text
The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

The chart above evaluates user preference for SDXL (with and without refinement) over Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance.

Contributor

Kate Thompson

I'm the gallery editor at Diffus and I write blogs on topics related to AI art. With expertise in Midjourney, Dalle 3, and Stable Diffusion, I actively contribute to Reddit, Facebook, and Discord communities. I meticulously curate top AI-generated content, ensuring our gallery's excellence.

Plant Milk 🌿 - Model Suite - Walnut

FLUX.1 - DEV FP8 - Kijai [11 GB]

Use this model