Grumpy white duck with an orange beak standing in front of a height chart under dramatic spotlight, holding a black mugshot nameplate.
A grumpy anthropomorphic broccoli character standing in a rainy meadow under a heavy rain cloud with volumetric lighting and wet shiny surfaces.
A girl with flowing multicolored hair and blue eyes wearing a black lace dress and a golden crown, surrounded by vibrant blooming flowers indoors with volumetric lighting.
A cyborg geisha demon with a glowing skeletal face, crouching on one knee in bloody red armor and a golden cape, surrounded by skulls in a mysterious castle garden.
Illustration of an angel with grey hair and one wing leaning over an open book, featuring a dripping red halo above their head and a menacing aura in manga style.
Close-up portrait of a gaunt girl with wild messy hair covering dark eyes, a wide sinister smile with bloody mouth, spiked collar, and visible skeletal chest details in lineart style.
Closeup portrait of an anime girl with short brown hair and freckles, wearing a green dress and fairy wings, surrounded by a detailed, softly lit nighttime background with light rays and particles.
A petite girl with short blonde hair and circle glasses, wearing a yellow hoodie and striped socks, is sitting on the floor surrounded by pillows and ferns, reading a green book in a cozy bedroom with rustic windows and a bookshelf.
Vibrant impressionist oil painting of a blue and an orange wolf silhouetted against contrasting backgrounds with glow effect
An anime-style blonde girl wearing a pink military uniform and red boots is in a dynamic fighting stance, aiming a gun inside a high-tech spacecraft corridor with vivid lighting and retro futuristic details.
A young woman with purple eyes and a black glamorous cocktail dress passionately singing into a vintage microphone on a dimly lit classic jazz club stage with warm volumetric lighting and musical instruments in the background.
Anime-style girl with black medium hair and yellow eyes wearing a blue jacket, red plaid skirt, and blue gloves firing an AR-15 rifle inside a room with broken windows and carrying duffle bags filled with money.

Tips

This model is a LoRA fine-tuned checkpoint.

The training used 4,000 prompts for 10 epochs.

Step-by-step Preference Optimization allows fine-grained visual improvements at each step, improving aesthetics effectively.

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference

Arxiv Paper

Github Code

Project Page

Abstract

Generating visually appealing images is fundamental to modern text-to-image generation models. A potential solution to better aesthetics is direct preference optimization (DPO), which has been applied to diffusion models to improve general image quality including prompt alignment and aesthetics. Popular DPO methods propagate preference labels from clean image pairs to all the intermediate steps along the two generation trajectories. However, preference labels provided in existing datasets are blended with layout and aesthetic opinions, which would disagree with aesthetic preference. Even if aesthetic labels were provided (at substantial cost), it would be hard for the two-trajectory methods to capture nuanced visual differences at different steps.

To improve aesthetics economically, this paper uses existing generic preference data and introduces step-by-step preference optimization (SPO) that discards the propagation strategy and allows fine-grained image details to be assessed. Specifically, at each denoising step, we 1) sample a pool of candidates by denoising from a shared noise latent, 2) use a step-aware preference model to find a suitable win-lose pair to supervise the diffusion model, and 3) randomly select one from the pool to initialize the next denoising step. This strategy ensures that diffusion models focus on the subtle, fine-grained visual differences instead of layout aspect. We find that aesthetic can be significantly enhanced by accumulating these improved minor differences.

When fine-tuning Stable Diffusion v1.5 and SDXL, SPO yields significant improvements in aesthetics compared with existing DPO methods while not sacrificing image-text alignment compared with vanilla models. Moreover, SPO converges much faster than DPO methods due to the step-by-step alignment of fine-grained visual details. Code and model: https://rockeycoss.github.io/spo.github.io/

Model Description

This model is fine-tuned from stable-diffusion-xl-base-1.0. It has been trained on 4,000 prompts for 10 epochs. This checkpoint is a LoRA checkpoint. For more information, please visit here

Citation

If you find our work useful, please consider giving us a star and citing our work.

@article{liang2024step,
  title={Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization},
  author={Liang, Zhanhao and Yuan, Yuhui and Gu, Shuyang and Chen, Bohan and Hang, Tiankai and Cheng, Mingxi and Li, Ji and Zheng, Liang},
  journal={arXiv preprint arXiv:2406.04314},
  year={2024}
}

Contributor

Previous
RealCartoon-XL - V7
Next
Artsy Vibe - v1 - FP8

Model Details

Model type

LORA

Base model

SDXL 1.0

Model version

v1.0

Model hash

b6c2c16f3e

Creator

Discussion

Please log in to leave a comment.

Model Collection - SPO-SDXL_4k-p_10ep_LoRA_webui

Images by SPO-SDXL_4k-p_10ep_LoRA_webui - v1.0

base model Images