models/SD XL - v1.0

SD XL - v1.0

Kate Thompson

7/8/2025

4:54:42 AM

| Discussion

Related Keywords & Tags

base model,checkpoint,sd xl,sdxl,sdxl 1.0,stability ai,stable diffusion xl,text-to-image generation,v1.0

Animated movie poster featuring a pretty young expressive woman in motion, set against a bright city metropolis background with sketch-like concept art style.

Closeup of a mischievous animated dog leaning on a wooden table with an open book, set in a cozy room with wooden furnishings and warm lighting.

Closeup portrait of a brave maiden explorer with long sweeping brown hair and bright blue eyes holding a friendly spotted leopard kitten, set against a lush green background with a waterfall.

A dramatic image of a crow flying with spread wings displaying fiery orange feathers against a moody sky background.

A robotic terminator covered in dice patterns stands on a glowing lava floor surrounded by scattered dice in a surrealistic hellish cave.

Close-up black and white image of female parted lips with teeth visible, overlaid by abstract interference patterns.

Black and white cityscape showing silhouettes of people walking through a foggy urban environment with tall buildings in the background.

Close-up view of a curious alien with large reflective eyes, detailed alien skin texture, standing among alien flora with mountains in the background under a wide angle lens and film grain effect.

Close-up view of a highly detailed alien face with large reflective eyes showing an alien landscape, captured by an interstellar probe with film grain effect.

A hyper realistic portrait of a sculptural young redhead woman with curly hair, outdoors in a dreamy panorama with a blurred barren landscape in the background.

Close-up hyper realistic image of a green eye surrounded by freckles, with red ginger hair and black painted lips.

Portrait of a woman with long red hair, freckles on white skin, light green eyes, black lips, and intricate detailing in a hyper-realistic style.

Recommended Parameters

resolution

525x525

Tips

The model is intended for research purposes including artwork generation, educational tools, and safe deployment.

It is not intended to generate factual or true depictions of people or events.

Limitations include imperfect photorealism, inability to render legible text, challenges with compositional prompts, and possible improper face generation.

The model uses two pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.

The two-step pipeline includes base latent generation followed by high-resolution refinement using SDEdit (img2img).

Creator Sponsors

Originally Posted to Hugging Face and shared here with permission from Stability AI.

SDXL consists of a two-step pipeline for latent diffusion: First, we use a base model to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img") to the latents generated in the first step, using the same prompt.

Model Description

Developed by: Stability AI
Model type: Diffusion-based text-to-image generative model
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
Resources for more information: GitHub Repository.

Model Sources

Repository: https://github.com/Stability-AI/generative-models
Demo [optional]: https://clipdrop.co/stable-diffusion

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism
The model cannot render legible text
The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

The chart above evaluates user preference for SDXL (with and without refinement) over Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance.

Contributor

Kate Thompson

I'm the gallery editor at Diffus and I write blogs on topics related to AI art. With expertise in Midjourney, Dalle 3, and Stable Diffusion, I actively contribute to Reddit, Facebook, and Discord communities. I meticulously curate top AI-generated content, ensuring our gallery's excellence.

epiCPhotoGasm - V1

IlluQuaint - v0.3

Use this model