Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models

Robotics Institute, Carnegie Mellon University


Generative models (e.g., GANs and diffusion models) learn the underlying data distribution in an unsupervised manner. However, many applications of interest require sampling from a particular region of the output space or sampling evenly over a range of characteristics. For efficient sampling in these scenarios, we propose Generative Visual Prompt (PromptGen), a framework for distributional control over pre-trained generative models by incorporating knowledge of other off-the-shelf models. PromptGen defines control as energy-based models (EBMs) and samples images in a feed-forward manner by approximating the EBM with invertible neural networks, which avoids optimization at inference. Our experiments show that PromptGen can efficiently sample from several unconditional generative models (e.g., StyleGAN2, StyleNeRF, diffusion autoencoder, NVAE) in a controlled or/and de-biased manner using various off-the-shelf models: (1) with the CLIP model as control, PromptGen can sample images guided by text, (2) with image classifiers as control, PromptGen can help de-bias generative models across a set of attributes or attribute combinations, and (3) with inverse graphics models as control, PromptGen can sample images of the same identity in different poses. (4) Finally, PromptGen reveals that the CLIP model shows a "reporting bias" when used as control, and PromptGen can further de-bias this controlled distribution in an iterative manner.

CLIP Guidance

Given a pretrained CLIP model and a text description, PromptGen gives you a subspace of a pre-trained generator that matches the text decsription. Below are a few examples using a StyleGAN model, trained on different datasets (Cat, FFHQ, etc)

text description: Photo of a cat with closed eyes

text description: Photo of a British shorthair

text description: Photo of a happy person

Pose Control

Using inverse graphics model DECA, PromptGen controls the pose of StyleGAN2 while preserving the identity.

De-biasing a Generator

Biases in the data sneak into the generative models. For instance, a CLIP guided PromptGen using the sentence "Photo of a person witout makeup" will result in mostly females. To balance this undesired effect, PromptGen can use a classifier control and make the generator more fare with respect to gender.
text description: Photo of a person w/o makeup - pre-debiasing

text description: Photo of a person w/o makeup - after-debiasing


  title={Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models},
  author={Chen Henry Wu and Saman Motamed and Shaunak Srivastava and Fernando De la Torre},
  booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},