We concatenate all Gaussian noises in the denoising process as the latent code of stochastic diffusion models. This definition is straightforward, but we show that it allows us to perform zero-shot image editing and guidance for stochastic diffusion models.
Based on the above definition of latent space, we propose CycleDiffusion, a method for zero-shot and unpaired image editing using stochastic diffusion models.
CycleDiffusion is capable of zero-shot image editing with a pre-trained text-to-image diffusion model (e.g., Stable Diffusion). The editing is specified by a source prompt and a target prompt.
CycleDiffusion is also compatible with attention manipulation techniques such as Cross Attention Control (CAC). With CAC, CycleDiffusion can better preserve the structure of the input image during editing.
CycleDiffusion is capable of unpaired image editing with a two pre-trained diffusion model (e.g., a dog diffusion model and a cat diffusion model).
Definig a latent space for stochastic diffusion models allows us to guide them in the same way as guiding StyleGANs. For instance, we can guide a pre-trained diffusion model with off-the-shelf image understanding models such as CLIP. In this case, we are able to sub-sample a generative model of faces guided by attributes like "eyeglasses".
@inproceedings{cyclediffusion,
title={A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance},
author={Chen Henry Wu and Fernando De la Torre},
booktitle={ICCV},
year={2023},
}