Aerial view of a long rectangular cabin situated on a green valley floor, surrounded by dense trees and towering mountains under a dark, rain-filled sky with heavy clouds.
Modern front-inclined square cabin in a forest at night, featuring a large glass wall with an indoor black sofa, plants, warm incandescent lighting, and outdoor porch seating.
Two-storey wooden greenhouse cabin elevated on columns on a sloped forest surface surrounded by pine trees, mist, and yellow grass.
Twin modern concrete cabins with large glass windows stacked perpendicularly on concrete columns, situated on a snow-covered rocky mountain slope with a mountainous backdrop.
Rectangular white-painted modern cabin with glass windows and indoor lighting, elevated above forest trees on two slanted concrete columns under a grey sky.
A forged carbon mask with glowing orange eyes surrounded by vibrant flames on a dark background.
Profile of a floating woman with a detailed face, her skin and hair flowing with vivid, swirling colorful paint strokes against a dark background.
Portrait of a young woman resembling Zelda with blonde hair, elf ears, blue eyes, wearing a golden tiara and deep purple medieval dress with ornate gold details.
A black cat with glowing orange eyes sits amidst intense flames inside an ancient temple with pillars, surrounded by fire and smoke.
A dark fantasy styled autumn occult altar featuring steaming coffee in a cup, lit candle with rising smoke, small pumpkins on plates, a teapot, and an open book with aged watercolor textures.
Watercolor painting of a large crashed sci-fi ship wreckage on a desert-like landscape with a stranded pilot standing nearby, created with bold lines, expressive colorful sketch style, and high-contrast lighting.
Watercolor painting depicting a flooded city street lined with intricate ruined buildings, featuring two figures in dynamic poses, illuminated by warm, high-contrast lighting.

Recommended Negative Prompts

(deformed iris, deformed pupils), text, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, (extra fingers), (mutated hands), poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, (fused fingers), (too many fingers), long neck, camera

Recommended Parameters

samplers

Euler

steps

50

cfg

8

resolution

525x525

Tips

The model is intended for research purposes including artwork generation, educational tools, and safe deployment.

It is not intended to generate factual or true depictions of people or events.

Limitations include imperfect photorealism, inability to render legible text, challenges with compositional prompts, and possible improper face generation.

The model uses two pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.

The two-step pipeline includes base latent generation followed by high-resolution refinement using SDEdit (img2img).

Creator Sponsors

Originally Posted to Hugging Face and shared here with permission from Stability AI.

Originally Posted to Hugging Face and shared here with permission from Stability AI.

SDXL consists of a two-step pipeline for latent diffusion: First, we use a base model to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img") to the latents generated in the first step, using the same prompt.

Model Description

  • Developed by: Stability AI

  • Model type: Diffusion-based text-to-image generative model

  • Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).

  • Resources for more information: GitHub Repository.

Model Sources

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

  • Generation of artworks and use in design and other artistic processes.

  • Applications in educational or creative tools.

  • Research on generative models.

  • Safe deployment of models which have the potential to generate harmful content.

  • Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Limitations and Bias

Limitations

  • The model does not achieve perfect photorealism

  • The model cannot render legible text

  • The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”

  • Faces and people in general may not be generated properly.

  • The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

The chart above evaluates user preference for SDXL (with and without refinement) over Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance.

Previous
Plant Milk 🌿 - Model Suite - Walnut
Next
FLUX.1 - DEV FP8 - Kijai [11 GB]

Model Details

Model type

Checkpoint

Base model

SDXL 1.0

Model version

v1.0 VAE fix

Model hash

e6bb9ea85b

Discussion

Please log in to leave a comment.

Images by SD XL - v1.0 VAE fix

base model Images

official Images

sdxl Images

stability ai Images