Still not sure where to start? Find some example prompts below.
You can use our multimodal functionality to detect objects in images, for example by using the following prompt:
The items in this room are: desk, bed, computer, lamp, bookshelf, and guitar.
Just as with text-only prompts, multimodal input can be processed using the "Q: Some question? A: Model answer."-scheme. This method can robustly extract the content of a picture into textual form. However, not only do our multimodal models recognize what can be seen in an image, but they can "understand" that information contextually and offer high-level information. This allows for the performance of two tasks at once: image recognition and image interpretation.
Q: What is known about the structure in the upper part of this picture? A: The Milky Way is a large, spiral galaxy.
We cam even get multimodal Luminous models to perform "creative" work, such as artwork naming. Using a prompt-based approach, we can try to get our multimodal model into a "dreamy state". In addition, let's increase the temperature to increase our chance of generating less likely, but potentially more "creative" completions. Take a look at the example below:
I had this crazy dream last night. Flashing lights and a rush of ecstasy. And then, as if I had always known, I knew that the title of my artwork had to be: The First Snow of Spring.
We ran this experiment a couple of times, and here are some other LLM suggestions:
- A Frozen Forest
- Winter Dreams
- The Snow White Trees
- The Winter Palms
- Palm Trees and White Ashes
Impressive, isn't it?
Optical Character Recognition
To some degree, Luminous can even translate texts in images into computer text; whether it's in handwriting or typeface. Simply upload your image and input a prompt text below it, like so:
The text says: "God grant me the serenity to accept the things I cannot change, the courage to change the things I can, and the wisdom to know the difference."
To compare two or more images, simply instruct the model to do so in natural language.
Compare the two images: one shows a dog on a skateboard, the other shows a cat on a skateboard.
You could try different variations of this prompt, such as...
Q: What do the two images have in common? A: They have two things in common: animals and skateboarding.
The images are different, because the animals are different.