Examples
Still not sure where to start? Find some completion example prompts below.
OCR
Our multimodal models can translate texts in images into computer text; whether it's in handwriting or typeface. Simply upload your image and input a prompt text below it, like so:
The text says: "God grant me the serenity to accept the things I cannot change, the courage to change the things I can, and the wisdom to know the difference."
Object Detection
You can use our multimodal functionality to detect objects in images, for example by using the following prompt:
The items in this room are: desk, bed, computer, lamp, bookshelf, and guitar.
Context Q&A
Just as with our text-only prompts, multimodal input can be processed using the "Q: Some question? A: Model answer."-scheme. This method can robustly extract the content of a picture into textual form. However, not only do our multimodal models recognize what can be seen in an image, but they can "understand" that information contextually and offer high-level information. This allows for the performance of two tasks at once: image recognition and image interpretation.
Q: What is known about the structure in the upper part of this picture?
A: The Milky Way is a large, spiral galaxy.
Artwork Titles
One thing AIs are always thought to be bad at is creativity. And, sure enough, no AI to date can be thought of as truly creative. However, this does not mean that there are no tricks to get large language model to perform "creative" work, such as artwork naming. Using a prompt-based approach, we can try to get our multimodal model into a "dreamy state". In addition, let's increase the temperature to increase our chance of generating less likely, but potentially more "creative" completions. Take a look at the example below:
I had this crazy dream last night. Flashing lights and a rush of ecstasy. And then, as if I had always known, I knew that the title of my artwork had to be: The First Snow of Spring.
We ran this experiment a couple of times, and here are some other AI suggestions:
- A Frozen Forest
- Winter Dreams
- The Snow White Trees
- The Winter Palms
- Palm Trees and White Ashes
Impressive, isn't it?
Image Comparison
To compare two or more images, simply instruct the model to do so in natural language.

Compare the two images: one shows a dog on a skateboard, the other shows a cat on a skateboard.
You could try different variations of this prompt, such as...
Q: What do the two images have in common?
A: They have two things in common: animals and skateboarding.
or even...
The images are different, because the animals are different.