Skip to main content

· 3 min read
Niklas Finken

We are happy to announce that we have improved our luminous-control models. These new models are more instructable and perform better across a variety of tasks.

The new model versions are:

  • luminous-base-control-20240215
  • luminous-extended-control-20240215
  • luminous-supreme-control-20240215

You can access the old models at:

  • luminous-base-control-20230501
  • luminous-extended-control-20230501
  • luminous-supreme-control-20230501

Until March 4th, the default luminous-*-control will continue to point to the old models. Thereafter, you will automatically access the updated models.

While we see improved performance across the board, you can check for your use-case by changing the model to the latest model name and trying it out. You can also experiment with a smaller model size and see if it performs just as well or even better while offering faster response times. If the performance is not as expected, you can pin to the old model name to maintain the current behavior.

What's New

These new models are even better at following instructions. We have achieved this by finetuning on high quality instruction samples.

Simply prompt these new models like so:

{all the content and instructions relevant to your query}

### Response:

These models are prticularly good at taking into account a given document during generation. This is particularly helpful for question-answering and summarization use-cases. They are significantly less prone to hallucinations when supplied with the proper context.

Question: Who was commander of the Russian army?
Answer the question using the Source. If there's no answer, say "NO ANSWER IN TEXT".

Source: The Battle of Waterloo was fought on Sunday 18 June 1815, near Waterloo (at that time in the United Kingdom of the Netherlands, now in Belgium). A French army under the command of Napoleon was defeated by two of the armies of the Seventh Coalition. One of these was a British-led coalition consisting of units from the United Kingdom, the Netherlands, Hanover, Brunswick, and Nassau, under the command of the Duke of Wellington (referred to by many authors as the Anglo-allied army or Wellington's army). The other was composed of three corps of the Prussian army under the command of Field Marshal von Blücher (the fourth corps of this army fought at the Battle of Wavre on the same day). The battle marked the end of the Napoleonic Wars. The battle was contemporaneously known as the Battle of Mont Saint-Jean (France) or La Belle Alliance ("the Beautiful Alliance" – Prussia).

### Response:

· 10 min read

In the last few weeks we introduced a number of features to improve your experience with our models. We hope they will make it easier for you to test, develop, and productionize the solutions built on top of Luminous. In this changelog we want to inform you about the following changes:

  • You can directly access the tokenizer we use in our models. This allows you to count the number of tokens in your prompt more accurately.

  • Previously, you had to use separate methods for different image sources (URLs or local file paths). With our new method Image.from_image_source, you can just pass in the URL, local file path, or bytes array, and we will do the heavy lifting in the background.

  • You can use the completion_bias_inclusion parameter to limit the output tokens to a set of allowed keywords.

  • We added a minimum tokens parameter. It guarantees that the answer will always have at least k tokens and not end too soon.

  • Sometimes our models tend to repeat the input prompt or previously generated words in the completion. We are introducing multiple new parameters to address this problem.

  • We added the echo parameter. We can return not only the completion but also the prompt, which might be useful for benchmarking.

  • Embeddings can now come in normalized form.

New Python Client Features

Local Tokenizer

You can now directly access the tokenizer we use in our models. This allows you to count the number of tokens in your prompt more accurately.

import os
from aleph_alpha_client import Client

client = Client(token=os.getenv("AA_TOKEN"))

tokenizer = client.tokenizer("luminous-supreme")
text = "Friends, Romans, countrymen, lend me your ears;"

tokens = tokenizer.encode(text)
and_back_to_text = tokenizer.decode(tokens.ids)

print("Tokens:", tokens.ids)
print("Back to text from ids:", and_back_to_text)
Tokens: [37634, 15, 51399, 15, 6326, 645, 15, 75938, 489, 867, 47317, 30]
Back to text from ids: Friends, Romans, countrymen, lend me your ears;

Unified method to load images

Previously, you had to use separate methods for different image source (URLs or local file paths). With our new method Image.from_image_source, you can just pass in the URL, local file path or bytes array and we will do the heavy lifting in the background.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest, Image

client = Client(token=os.getenv("AA_TOKEN"))

#This method can use many types of objects, local path str, local path Path object or bytes array
#image_source = "/path/to/my/image.png"
#image_source = Path("/path/to/my/image.png")
image_source = "https://docs.aleph-alpha.com/assets/images/starlight_lake-296ab6aa851c37de66e6b5afe046f12e.jpg"

image = Image.from_image_source(image_source=image_source)

prompt = Prompt(
items=[
image,
"Q: What is known about the structure in the upper part of this picture?\nA:",
]
)
params = {
"prompt": prompt,
"maximum_tokens": 16,
"stop_sequences": [".", ",", "?", "!"],
}
request = CompletionRequest(**params)
response = client.complete(request=request, model="luminous-extended")
completion = response.completions[0].completion

print(completion)
It is a star cluster

New Completion Parameters

Limit the model outputs to a set of allowed words

If you want to limit the model's output to a pre-defined set of words, use the completion_bias_inclusion parameter. One possible use case for this is classification, where you want to limit the model’s output to a set of pre-defined classes. The most basic use case would be a simple binary classification (e.g., "Yes" and "No"), but it can be extended to more advanced classification problems like the following:

import os
from typing import List
from aleph_alpha_client import Client, Prompt, CompletionRequest

client = Client(token=os.getenv("AA_TOKEN"))


def classify(prompt_text: str, key: str, values: List[str]) -> str:
prompt = Prompt.from_text(prompt_text)
completion_bias_request = CompletionRequest(
prompt=prompt,
maximum_tokens=20,
stop_sequences=["\n"],
completion_bias_inclusion=values,
)

completion_bias_result = client.complete(
completion_bias_request, model="luminous-extended"
)
return completion_bias_result.completions[0].completion.strip()


prompt_text = """Extract the correct information from the following text:
Text: I was running late for my meeting in the bavarian capital and is was really hot that day, especially, as the leaves were already starting to fall. I never thought the year 2022 would be this crazy.
{key}:"""


key = "temperature"
text_classification_inclusion_bias = classify(
prompt_text=prompt_text, key=key, values=["low", "medium", "high"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
f"{key} classification without the inclusion bias: {text_classification_standard}"
)
print()


key = "Venue"
text_classification_inclusion_bias = classify(
prompt_text=prompt_text, key=key, values=["Venice", "Munich", "Stockholm"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
f"{key} classification without the inclusion bias: {text_classification_standard}"
)
print()

key = "Season"
text_classification_inclusion_bias = classify(
prompt_text=prompt_text, key=key, values=["Spring", "Summer", "Autumn", "Winter"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
f"{key} classification without the inclusion bias: {text_classification_standard}"
)
print()

key = "Year"
text_classification_inclusion_bias = classify(
prompt_text=prompt_text, key=key, values=["2019", "2020", "2021", "2022", "2023"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
f"{key} classification without the inclusion bias: {text_classification_standard}"
)

If we limit the model only to the allowed set of classes, we get nice results, providing some classification for a given text. In this example, if we let the model use any tokens, it starts with an \n token and the model just stops, without producing any useful output.

Inclusion bias temperature classification: high
temperature classification without the inclusion bias:

Inclusion bias Venue classification: Munich
Venue classification without the inclusion bias:

Inclusion bias Season classification: Autumn
Season classification without the inclusion bias:

Inclusion bias Year classification: 2022
Year classification without the inclusion bias:

Minimum tokens

The image that we load

We added a minimum_tokens parameter to the completion API. Now we can guarantee that the answer will always have at least k tokens and not end too early. This could come in handy if you want to incentivize the model to provide more elaborate answers and not finish the completion prematurely. We hope to improve your experience with multimodal inputs and MAGMA to generate more elaborate completions. In the following example, we show you how this impacts the model’s behaviour and produces a better completion.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest, Image

client = Client(token=os.getenv("AA_TOKEN"))

image_source = "https://docs.aleph-alpha.com/assets/images/starlight_lake-296ab6aa851c37de66e6b5afe046f12e.jpg"
image = Image.from_url(url=image_source)

prompt = Prompt(
items=[
image,
"An image description focusing on the features and amenities:",
]
)

no_minimum_tokens_params = {
"prompt": prompt,
}
no_minimum_tokens_request = CompletionRequest(**no_minimum_tokens_params)
no_minimum_tokens_response = client.complete(request=no_minimum_tokens_request, model="luminous-extended")
no_minimum_tokens_completion = no_minimum_tokens_response.completions[0].completion


print("Completion with minimum_tokens not set: ", no_minimum_tokens_completion)


minimum_tokens_params = {
"prompt": prompt,
"minimum_tokens": 16,
"stop_sequences": ["."],
}
minimum_tokens_request = CompletionRequest(**minimum_tokens_params)
minimum_tokens_response = client.complete(request=minimum_tokens_request, model="luminous-extended")
minimum_tokens_completion = minimum_tokens_response.completions[0].completion.strip()
print("Completion with minimum_tokens set: ", minimum_tokens_completion)

When not using the minimum_tokens parameter, the model starts with an End-Of-Text (EOT) token and produces an empty completion. With the minimum_tokens in place, we get a proper description of the image.

Completion with minimum_tokens not set:  
Completion with minimum_tokens set: The Milky Way over a lake in the mountains

Preventing repetitions

Sometimes our models tend to repeat the input prompt or previously generated words in the completion. With a combination of multiple new parameters (repetition_penalties_include_prompt, repetition_penalties_include_completion, sequence_penalty_min_length, use_multiplicative_sequence_penalty) you can prevent such repetitions. To learn more about them, check out the API documentation.

To use the feature, just add the parameters to your request and get rid of unwanted repetitions. An example use case for this feature is summarization.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest


client = Client(token=os.getenv("AA_TOKEN"))

prompt_text = """Summarize each text.
###
Text: The text describes coffee, a brown to black, psychoactive, diuretic, and caffeinated beverage made from roasted and ground coffee beans and hot water. Its degree of roasting and grinding varies depending on the preparation method. The term "coffee beans" does not mean that the coffee is still unground, but refers to the purity of the product and distinguishes it from substitutes made from ingredients such as chicory, malted barley, and others. Coffee is a beverage that people enjoy and contains the vitamin niacin. The name "coffee" comes from the Arabic word "qahwa," which means "stimulating drink."
Summary: Coffee is a psychoactive and diuretic beverage made from roasted and ground coffee beans that is enjoyed for its caffeine content and prepared with varying degrees of roasting and grinding.
###
Text: Marc Antony was a Roman politician and military leader who played an important role in the transformation of the Roman Republic into the autocratic Roman Empire. He was a follower and relative of Julius Caesar and served as one of his generals during the conquest of Gaul and the Civil War. While Caesar eliminated his political opponents, Antony was appointed governor of Italy. After Caesar's assassination, Antony joined forces with Marcus Aemilius Lepidus and Octavian, Caesar's grandchild and adopted son, to form the Second Triumvirate, a three-man dictatorship. The Triumvirate defeated Caesar's murderous liberators at the Battle of Philippi and divided the republic's administration between them. Antony was given control of Rome's eastern provinces, including Egypt, ruled by Cleopatra VII Philopator, and was charged with leading Rome's war against Parthia.
Summary:"""

no_repetition_management_params = {
"prompt": Prompt.from_text(prompt_text),
"stop_sequences": ["\n", "###"]
}
no_repetition_management_request = CompletionRequest(**no_repetition_management_params)
no_repetition_management_response = client.complete(request=no_repetition_management_request, model="luminous-extended")
no_repetition_management_response = no_repetition_management_response.completions[0].completion

print("No sequence penalties:")
print(no_repetition_management_response.strip())
print()

print("Repetition management:")
repetition_management_params = {
"prompt": Prompt.from_text(prompt_text),
"repetition_penalties_include_prompt": True,
"repetition_penalties_include_completion": True,
"sequence_penalty":.7,
"stop_sequences": ["\n", "###"]
}

repetition_management_request = CompletionRequest(**repetition_management_params)
repetition_management_response = client.complete(request=repetition_management_request, model="luminous-extended")
repetition_management_response = repetition_management_response.completions[0].completion

print("Sequence penalties:")
print(repetition_management_response.strip())

When the sequence penalty is not applied, models tend to repeat the first sentence of the text they are supposed to summarize ("Marc Antony was a Roman politician …"). When applying the sequence penalty parameter, this is no longer the case. The output that we get is better, more creative, and, most importantly, isn’t just a repetition of the first sentence.

No sequence penalties:
Marc Antony was a Roman politician and military leader who played an important role in the transformation of the Roman Republic into the autocratic Roman Empire.

Sequence penalties:
Marc Anthony was an important Roman politician who served as a general under Julius Caesar

Echo for completions

We added the echo parameter. Setting this parameter will not only return the completion but also the prompt. If you run benchmarks on our models, the parameter allows you to get log probs for input and output tokens. This makes us compatible with other APIs that do not provide an evaluate endpoint. To use the functionality, set the echo parameter to True.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest

client = Client(token=os.getenv("AA_TOKEN"))
prompt_text = "If it were so, it was a grievous fault,\nAnd"
params = {
"prompt": Prompt.from_text(prompt_text),
"maximum_tokens": 16,
"stop_sequences": ["."],
"echo": True
}
request = CompletionRequest(**params)
response = client.complete(request=request, model="luminous-extended")
completion_with_echoed_prompt= response.completions[0].completion

print(completion_with_echoed_prompt)
If it were so, it was a grievous fault,
And grievously hath Caesar answer’d it

Embeddings

Embeddings normalization

We now provide you with the option to normalize our (semantic) embeddings, meaning that the vector norm of the result embedding is 1.0. This can be useful in applications where you need to calculate the cosine similarity. To use it, just set the normalize parameter to True.

import os
from aleph_alpha_client import Client, Prompt, SemanticRepresentation, SemanticEmbeddingRequest
import numpy as np

text = "I come to bury Caesar, not to praise him."

client = Client(token=os.getenv("AA_TOKEN"))
non_normalized_params = {
"prompt": Prompt.from_text(text),
"representation": SemanticRepresentation.Symmetric,
"compress_to_size": 128,
"normalize": False

}
non_normalized_request = SemanticEmbeddingRequest(**non_normalized_params)
non_normalized_response = client.semantic_embed(request=non_normalized_request, model="luminous-base")
non_normalized_response.embedding


normalized_params = {
"prompt": Prompt.from_text(text),
"representation": SemanticRepresentation.Symmetric,
"compress_to_size": 128,
"normalize": True

}
normalized_request = SemanticEmbeddingRequest(**normalized_params)
normalized_response = client.semantic_embed(request=normalized_request, model="luminous-base")
normalized_response.embedding


print("Normalized vector:", normalized_response.embedding[:10])
print(f"Non-normalized embedding sum: {np.linalg.norm(non_normalized_response.embedding):.2f}")
print(f"Normalized embedding sum: {np.linalg.norm(normalized_response.embedding):.2f}")
Normalized vector: [0.08886719, -0.018188477, -0.06591797, 0.014587402, 0.07421875, 0.064941406, 0.13183594, 0.079589844, 0.10986328, -0.12158203]
Non-normalized embedding sum: 27.81
Normalized embedding sum: 1.00

· 5 min read
Ben Brandt

We're excited to announce that we have added async support for our Python client! You can now upgrade to v2.5.0, and import AsyncClient to get started making requests to our API in async contexts.

When using simple scripts or a Jupyter notebook to experiment with our API, it is easy enough to use the default, synchronous client. But for many production use cases, async support unlocks a way to have concurrent requests and use with frameworks that take advantage of async (i.e. FastAPI's async def syntax for path operation functions).

We built AsyncClient on top of aiohttp, so you should be able to use it within any Python async runtime without any blocking I/0.

· 2 min read
Ben Brandt

We are extremely grateful that so many of you trust us with your AI needs. We constantly strive to improve the speed and reliability of our API, and the last thing we want is for your requests to start getting rejected because you ran out of credits and didn't notice.

With our latest release, you will not only be notified if you run out of credits, you can choose what credit balance will trigger the notification.

· 2 min read

Token Managment List

What's New

We now support the creation of multiple API tokens for each user account. Many of our users either have many clients, teams, or projects, and now you can have dedicated tokens for each of them.

You can multiple tokens for different use cases, instead of the single one we provided before. Separating tokens also allows you to securely revoke a token without affecting the API tokens of other projects or teams. The newly created tokens offer additional security by not being stored after creating them. Remember to copy the token since you won't see it afterward.

· 6 min read
Ben Brandt

Back in April, we released an initial version of our Summarization endpoint, which allowed for summarizing text using our language models.

We quickly noticed however, as did some of you, that it had a few problems when we tried integrating it into projects:

  • Document length was limited to the context size of the model (only ~2000 tokens available for the document).
  • As the documents grew larger, the endpoint became much more likely to just return the first sentence of the document.

In the meantime, we released some improvements to our Q&A endpoint that resolved similar problems for that use case. Particularly for Q&A, we can:

  • efficiently process really large documents, returning the best answers across the entire document.
  • process Docx and Text documents, not just Prompts.

With these improvements to our document handling, we were able to come back to Summarization with new approaches and better results.