A design company wants to introduce generative AI that generates new images matching a text description as input, in order to streamline the creation of advertising materials. Which type of model is MOST appropriate for this use case?

1 / 1
Select an answer
CorrectB

Explanation

Identify the model type that generates images from text.

  • 1generates new images matching a text description as inputGeneration conditioned on text (text-to-image)
  • 2text description as inputDiffusion model is suited for image generation
AIncorrect

Transformer-based language model

This is incorrect. Transformer-based language models handle text using self-attention mechanisms (a mechanism that computes how much each word in a sentence is related to every other word in the same sentence and weights them to capture context) and are mainly used for text generation and translation. They are not optimized for generating images.

BCorrect

Diffusion model

This is correct. A diffusion model generates images by progressively removing noise from random noise, and is widely used for text-description-based image generation (text-to-image).

CIncorrect

Embedding model

This is incorrect. An embedding model converts text or images into semantic vectors, and its output is a numerical vector. It can be used for search and similarity comparison, but it cannot generate new images.

DIncorrect

Image classification model

This is incorrect. An image classification model is a discriminative model that assigns input images to pre-defined categories. While it handles images, it 'understands' them, and cannot 'generate' images from text.

Key Takeaway

Distinguish model types by 'what goes in, what comes out.'
- Diffusion model: Generates images (and more) through denoising (representative for text → image).
- Transformer language model: Primarily generates text.
- Embedding model: Outputs vectors (does not generate).
- Image classification model: Outputs a category — discriminative (does not generate).
'Create new images' → diffusion model.