3 posts tagged with "python client"

March 2023 API updates

March 10, 2023 · 10 min read

In the last few weeks we introduced a number of features to improve your experience with our models. We hope they will make it easier for you to test, develop, and productionize the solutions built on top of Luminous. In this changelog we want to inform you about the following changes:

You can directly access the tokenizer we use in our models. This allows you to count the number of tokens in your prompt more accurately.
Previously, you had to use separate methods for different image sources (URLs or local file paths). With our new method Image.from_image_source, you can just pass in the URL, local file path, or bytes array, and we will do the heavy lifting in the background.
You can use the completion_bias_inclusion parameter to limit the output tokens to a set of allowed keywords.
We added a minimum tokens parameter. It guarantees that the answer will always have at least k tokens and not end too soon.
Sometimes our models tend to repeat the input prompt or previously generated words in the completion. We are introducing multiple new parameters to address this problem.
We added the echo parameter. We can return not only the completion but also the prompt, which might be useful for benchmarking.
Embeddings can now come in normalized form.

New Python Client Features

Local Tokenizer

You can now directly access the tokenizer we use in our models. This allows you to count the number of tokens in your prompt more accurately.

import os
from aleph_alpha_client import Client

client = Client(token=os.getenv("AA_TOKEN"))

tokenizer = client.tokenizer("luminous-supreme")
text = "Friends, Romans, countrymen, lend me your ears;"

tokens = tokenizer.encode(text)
and_back_to_text = tokenizer.decode(tokens.ids)

print("Tokens:", tokens.ids)
print("Back to text from ids:", and_back_to_text)

Tokens: [37634, 15, 51399, 15, 6326, 645, 15, 75938, 489, 867, 47317, 30]
Back to text from ids:  Friends, Romans, countrymen, lend me your ears;

Unified method to load images

Previously, you had to use separate methods for different image source (URLs or local file paths). With our new method Image.from_image_source, you can just pass in the URL, local file path or bytes array and we will do the heavy lifting in the background.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest, Image

client = Client(token=os.getenv("AA_TOKEN"))

#This method can use many types of objects, local path str, local path Path object or bytes array
#image_source = "/path/to/my/image.png"
#image_source = Path("/path/to/my/image.png")
image_source = "https://docs.aleph-alpha.com/assets/images/starlight_lake-296ab6aa851c37de66e6b5afe046f12e.jpg"

image = Image.from_image_source(image_source=image_source)

prompt = Prompt(
    items=[
        image,
        "Q: What is known about the structure in the upper part of this picture?\nA:",
    ]
)
params = {
    "prompt": prompt,
    "maximum_tokens": 16,
    "stop_sequences": [".", ",", "?", "!"],
}
request = CompletionRequest(**params)
response = client.complete(request=request, model="luminous-extended")
completion = response.completions[0].completion

print(completion)

It is a star cluster

New Completion Parameters

Limit the model outputs to a set of allowed words

If you want to limit the model's output to a pre-defined set of words, use the completion_bias_inclusion parameter. One possible use case for this is classification, where you want to limit the model’s output to a set of pre-defined classes. The most basic use case would be a simple binary classification (e.g., "Yes" and "No"), but it can be extended to more advanced classification problems like the following:

import os
from typing import List
from aleph_alpha_client import Client, Prompt, CompletionRequest

client = Client(token=os.getenv("AA_TOKEN"))


def classify(prompt_text: str, key: str, values: List[str]) -> str:
    prompt = Prompt.from_text(prompt_text)
    completion_bias_request = CompletionRequest(
        prompt=prompt,
        maximum_tokens=20,
        stop_sequences=["\n"],
        completion_bias_inclusion=values,
    )

    completion_bias_result = client.complete(
        completion_bias_request, model="luminous-extended"
    )
    return completion_bias_result.completions[0].completion.strip()


prompt_text = """Extract the correct information from the following text:
Text: I was running late for my meeting in the bavarian capital and is was really hot that day, especially, as the leaves were already starting to fall. I never thought the year 2022 would be this crazy.
{key}:"""


key = "temperature"
text_classification_inclusion_bias = classify(
    prompt_text=prompt_text, key=key, values=["low", "medium", "high"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
    f"{key} classification without the inclusion bias: {text_classification_standard}"
)
print()


key = "Venue"
text_classification_inclusion_bias = classify(
    prompt_text=prompt_text, key=key, values=["Venice", "Munich", "Stockholm"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
    f"{key} classification without the inclusion bias: {text_classification_standard}"
)
print()

key = "Season"
text_classification_inclusion_bias = classify(
    prompt_text=prompt_text, key=key, values=["Spring", "Summer", "Autumn", "Winter"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
    f"{key} classification without the inclusion bias: {text_classification_standard}"
)
print()

key = "Year"
text_classification_inclusion_bias = classify(
    prompt_text=prompt_text, key=key, values=["2019", "2020", "2021", "2022", "2023"]
)
text_classification_standard = classify(prompt_text=prompt_text, key=key, values=None)

print(f"Inclusion bias {key} classification: {text_classification_inclusion_bias}")
print(
    f"{key} classification without the inclusion bias: {text_classification_standard}"
)

If we limit the model only to the allowed set of classes, we get nice results, providing some classification for a given text. In this example, if we let the model use any tokens, it starts with an \n token and the model just stops, without producing any useful output.

Inclusion bias temperature classification: high
temperature classification without the inclusion bias: 

Inclusion bias Venue classification: Munich
Venue classification without the inclusion bias: 

Inclusion bias Season classification: Autumn
Season classification without the inclusion bias: 

Inclusion bias Year classification: 2022
Year classification without the inclusion bias:

Minimum tokens

The image that we load

We added a minimum_tokens parameter to the completion API. Now we can guarantee that the answer will always have at least k tokens and not end too early. This could come in handy if you want to incentivize the model to provide more elaborate answers and not finish the completion prematurely. We hope to improve your experience with multimodal inputs and MAGMA to generate more elaborate completions. In the following example, we show you how this impacts the model’s behaviour and produces a better completion.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest, Image

client = Client(token=os.getenv("AA_TOKEN"))

image_source = "https://docs.aleph-alpha.com/assets/images/starlight_lake-296ab6aa851c37de66e6b5afe046f12e.jpg"
image = Image.from_url(url=image_source)

prompt = Prompt(
    items=[
        image,
        "An image description focusing on the features and amenities:",
    ]
)

no_minimum_tokens_params = {
    "prompt": prompt,
}
no_minimum_tokens_request = CompletionRequest(**no_minimum_tokens_params)
no_minimum_tokens_response = client.complete(request=no_minimum_tokens_request, model="luminous-extended")
no_minimum_tokens_completion = no_minimum_tokens_response.completions[0].completion


print("Completion with minimum_tokens not set: ", no_minimum_tokens_completion)


minimum_tokens_params = {
    "prompt": prompt,
    "minimum_tokens": 16,
    "stop_sequences": ["."],
}
minimum_tokens_request = CompletionRequest(**minimum_tokens_params)
minimum_tokens_response = client.complete(request=minimum_tokens_request, model="luminous-extended")
minimum_tokens_completion = minimum_tokens_response.completions[0].completion.strip()
print("Completion with minimum_tokens set: ", minimum_tokens_completion)

When not using the minimum_tokens parameter, the model starts with an End-Of-Text (EOT) token and produces an empty completion. With the minimum_tokens in place, we get a proper description of the image.

Completion with minimum_tokens not set:  
Completion with minimum_tokens set:  The Milky Way over a lake in the mountains

Preventing repetitions

Sometimes our models tend to repeat the input prompt or previously generated words in the completion. With a combination of multiple new parameters (repetition_penalties_include_prompt, repetition_penalties_include_completion, sequence_penalty_min_length, use_multiplicative_sequence_penalty) you can prevent such repetitions. To learn more about them, check out the API documentation.

To use the feature, just add the parameters to your request and get rid of unwanted repetitions. An example use case for this feature is summarization.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest


client = Client(token=os.getenv("AA_TOKEN"))

prompt_text = """Summarize each text.
###
Text: The text describes coffee, a brown to black, psychoactive, diuretic, and caffeinated beverage made from roasted and ground coffee beans and hot water. Its degree of roasting and grinding varies depending on the preparation method. The term "coffee beans" does not mean that the coffee is still unground, but refers to the purity of the product and distinguishes it from substitutes made from ingredients such as chicory, malted barley, and others. Coffee is a beverage that people enjoy and contains the vitamin niacin. The name "coffee" comes from the Arabic word "qahwa," which means "stimulating drink."
Summary: Coffee is a psychoactive and diuretic beverage made from roasted and ground coffee beans that is enjoyed for its caffeine content and prepared with varying degrees of roasting and grinding.
###
Text: Marc Antony was a Roman politician and military leader who played an important role in the transformation of the Roman Republic into the autocratic Roman Empire. He was a follower and relative of Julius Caesar and served as one of his generals during the conquest of Gaul and the Civil War. While Caesar eliminated his political opponents, Antony was appointed governor of Italy. After Caesar's assassination, Antony joined forces with Marcus Aemilius Lepidus and Octavian, Caesar's grandchild and adopted son, to form the Second Triumvirate, a three-man dictatorship. The Triumvirate defeated Caesar's murderous liberators at the Battle of Philippi and divided the republic's administration between them. Antony was given control of Rome's eastern provinces, including Egypt, ruled by Cleopatra VII Philopator, and was charged with leading Rome's war against Parthia.
Summary:"""

no_repetition_management_params = {
    "prompt": Prompt.from_text(prompt_text),
    "stop_sequences": ["\n", "###"]
}
no_repetition_management_request = CompletionRequest(**no_repetition_management_params)
no_repetition_management_response = client.complete(request=no_repetition_management_request, model="luminous-extended")
no_repetition_management_response = no_repetition_management_response.completions[0].completion

print("No sequence penalties:")
print(no_repetition_management_response.strip())
print()

print("Repetition management:")
repetition_management_params = {
    "prompt": Prompt.from_text(prompt_text),
    "repetition_penalties_include_prompt": True,
    "repetition_penalties_include_completion": True,
    "sequence_penalty":.7,
    "stop_sequences": ["\n", "###"]
}

repetition_management_request = CompletionRequest(**repetition_management_params)
repetition_management_response = client.complete(request=repetition_management_request, model="luminous-extended")
repetition_management_response = repetition_management_response.completions[0].completion

print("Sequence penalties:")
print(repetition_management_response.strip())

When the sequence penalty is not applied, models tend to repeat the first sentence of the text they are supposed to summarize ("Marc Antony was a Roman politician …"). When applying the sequence penalty parameter, this is no longer the case. The output that we get is better, more creative, and, most importantly, isn’t just a repetition of the first sentence.

No sequence penalties:
Marc Antony was a Roman politician and military leader who played an important role in the transformation of the Roman Republic into the autocratic Roman Empire.

Sequence penalties:
Marc Anthony was an important Roman politician who served as a general under Julius Caesar

Echo for completions

We added the echo parameter. Setting this parameter will not only return the completion but also the prompt. If you run benchmarks on our models, the parameter allows you to get log probs for input and output tokens. This makes us compatible with other APIs that do not provide an evaluate endpoint. To use the functionality, set the echo parameter to True.

import os
from aleph_alpha_client import Client, Prompt, CompletionRequest

client = Client(token=os.getenv("AA_TOKEN"))
prompt_text = "If it were so, it was a grievous fault,\nAnd"
params = {
    "prompt": Prompt.from_text(prompt_text),
    "maximum_tokens": 16,
    "stop_sequences": ["."],
    "echo": True
}
request = CompletionRequest(**params)
response = client.complete(request=request, model="luminous-extended")
completion_with_echoed_prompt= response.completions[0].completion

print(completion_with_echoed_prompt)

If it were so, it was a grievous fault,
And grievously hath Caesar answer’d it

Embeddings

Embeddings normalization

We now provide you with the option to normalize our (semantic) embeddings, meaning that the vector norm of the result embedding is 1.0. This can be useful in applications where you need to calculate the cosine similarity. To use it, just set the normalize parameter to True.

import os
from aleph_alpha_client import Client, Prompt, SemanticRepresentation, SemanticEmbeddingRequest
import numpy as np

text = "I come to bury Caesar, not to praise him."

client = Client(token=os.getenv("AA_TOKEN"))
non_normalized_params = {
    "prompt": Prompt.from_text(text),
    "representation": SemanticRepresentation.Symmetric,
    "compress_to_size": 128,
    "normalize": False

}
non_normalized_request = SemanticEmbeddingRequest(**non_normalized_params)
non_normalized_response = client.semantic_embed(request=non_normalized_request, model="luminous-base")
non_normalized_response.embedding


normalized_params = {
    "prompt": Prompt.from_text(text),
    "representation": SemanticRepresentation.Symmetric,
    "compress_to_size": 128,
    "normalize": True

}
normalized_request = SemanticEmbeddingRequest(**normalized_params)
normalized_response = client.semantic_embed(request=normalized_request, model="luminous-base")
normalized_response.embedding


print("Normalized vector:", normalized_response.embedding[:10])
print(f"Non-normalized embedding sum: {np.linalg.norm(non_normalized_response.embedding):.2f}")
print(f"Normalized embedding sum: {np.linalg.norm(normalized_response.embedding):.2f}")

Normalized vector: [0.08886719, -0.018188477, -0.06591797, 0.014587402, 0.07421875, 0.064941406, 0.13183594, 0.079589844, 0.10986328, -0.12158203]
Non-normalized embedding sum: 27.81
Normalized embedding sum: 1.00

Python Client v2.5 - Async Support

November 14, 2022 · 5 min read

Ben Brandt

Lead UX Engineer

We're excited to announce that we have added async support for our Python client! You can now upgrade to v2.5.0, and import AsyncClient to get started making requests to our API in async contexts.

When using simple scripts or a Jupyter notebook to experiment with our API, it is easy enough to use the default, synchronous client. But for many production use cases, async support unlocks a way to have concurrent requests and use with frameworks that take advantage of async (i.e. FastAPI's async def syntax for path operation functions).

We built AsyncClient on top of aiohttp, so you should be able to use it within any Python async runtime without any blocking I/0.

Improved Summarization Endpoint

September 14, 2022 · 6 min read

Ben Brandt

Lead UX Engineer

Back in April, we released an initial version of our Summarization endpoint, which allowed for summarizing text using our language models.

We quickly noticed however, as did some of you, that it had a few problems when we tried integrating it into projects:

Document length was limited to the context size of the model (only ~2000 tokens available for the document).
As the documents grew larger, the endpoint became much more likely to just return the first sentence of the document.

In the meantime, we released some improvements to our Q&A endpoint that resolved similar problems for that use case. Particularly for Q&A, we can:

efficiently process really large documents, returning the best answers across the entire document.
process Docx and Text documents, not just Prompts.

With these improvements to our document handling, we were able to come back to Summarization with new approaches and better results.

New Python Client Features​

Local Tokenizer​

Unified method to load images​

New Completion Parameters​

Limit the model outputs to a set of allowed words​

Minimum tokens​

Preventing repetitions​

Echo for completions​

Embeddings​

Embeddings normalization​