Skip to main content

Aleph Alpha API (1.0)

Download OpenAPI specification:Download

Access and interact with Aleph Alpha models and functionality over HTTP endpoints.

Current API version

Will return the version number of the API that is deployed to this environment.

Responses

Authenticate with the server

Will return token that must be used in an Authorization: Bearer <token> header for further requests

Request Body schema: application/json
email
required
string
password
required
string

Responses

Request samples

Content type
application/json
{
  • "email": "string",
  • "password": "string"
}

Response samples

Content type
application/json
{
  • "token": "string",
  • "role": "client"
}

Get a list of issued API tokens

Will return a list of API tokens that are registered for this user (only token metadata is returned, not the actual tokens)

Authorizations:
token

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Create a new API token

Create a new token to authenticate against the API with (the actual API token is only returned when calling this endpoint)

Authorizations:
token
Request Body schema: application/json
description
required
string

a simple description to remember the token by

Responses

Request samples

Content type
application/json
{
  • "description": "token used on my laptop"
}

Response samples

Content type
application/json
{
  • "metadata": {
    },
  • "token": "string"
}

Delete an API token

Authorizations:
token
path Parameters
token_id
required
integer <int32>

API token ID

Responses

Currently available models

Will return all currently available models.

Authorizations:
token

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Completes a prompt.

Will complete a prompt using a specific model. To obtain a valid model, use GET /models_available.

Authorizations:
token
Request Body schema: application/json
model
required
string

Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

hosting
string or null (Hosting)
Enum: "aleph-alpha" null

Optional paramter which determines in which datacenters the request may be processed. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

required
Text Prompt (string) or (Array of Multimodal (Text (object) or Image (object) or Token Ids (object))) (Prompt)
maximum_tokens
required
integer

The maximum number of tokens to be generated. Completion will terminate after the maximum number of tokens is reached.

Increase this value to generate longer texts. A text is split into tokens. Usually there are more tokens than words. The summed number of tokens of prompt and maximum_tokens depends on the model (for luminous-base, it may not exceed 2048 tokens).

temperature
number or null
Default: 0

A higher sampling temperature encourages the model to produce less probable outputs ("be more creative"). Values are expected in a range from 0.0 to 1.0. Try high values (e.g. 0.9) for a more "creative" response and the default 0.0 for a well defined and repeatable answer. It is recommended to use either temperature, top_k or top_p and not all at the same time. If a combination of temperature, top_k or top_p is used rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

top_k
integer or null
Default: 0

Introduces random sampling for generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more creative. Set to 0 if repeatable output is to be produced. It is recommended to use either temperature, top_k or top_p and not all at the same time. If a combination of temperature, top_k or top_p is used rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

top_p
number or null
Default: 0

Introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability top_p. Set to 0.0 if repeatable output is to be produced. It is recommended to use either temperature, top_k or top_p and not all at the same time. If a combination of temperature, top_k or top_p is used rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

presence_penalty
number or null
Default: 0

The presence penalty reduces the likelihood of generating tokens that are already present in the text. Presence penalty is independent of the number of occurences. Increase the value to produce text that is not repeating the input. An operation of like the following is applied:

logits[t] -> logits[t] - 1 * penalty

where logits[t] is the logits for any given token. Note that the formula is independent of the number of times that a token appears in context_tokens.

frequency_penalty
number or null
Default: 0

The frequency penalty reduces the likelihood of generating tokens that are already present in the text. Presence penalty is dependent on the number of occurences of a token. An operation of like the following is applied:

logits[t] -> logits[t] - count[t] * penalty

where logits[t] is the logits for any given token and count[t] is the number of times that token appears in context_tokens

repetition_penalties_include_prompt
boolean or null
Default: false

Flag deciding whether presence penalty or frequency penalty are applied to the prompt and completion (True) or only the completion (False)

use_multiplicative_presence_penalty
boolean or null
Default: false

Flag deciding whether presence penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for presence and frequency penalty.

penalty_bias
string or null
Default: null

If set, all tokens in this text will be used in addition to the already penalized tokens for repetition penalties. These consist of the already generated completion tokens and the prompt tokens, if repetition_penalties_include_prompt is set to true.

Potential use case for a chatbot-based completion:

Instead of using repetition_penalties_include_prompt, construct a new string with only the chatbot's reponses included. You would leave out any tokens you use for stop sequences (i.e. \nChatbot:), and all user messages.

With this bias, if you turn up the repetition penalties, you can avoid having your chatbot repeat itself, but not penalize the chatbot from mirroring language provided by the user.

penalty_exceptions
Array of strings or null
Default: null

List of strings that may be generated without penalty, regardless of other penalty settings.

This is particularly useful for any completion that uses a structured few-shot prompt. For example, if you have a prompt such as:

I want to travel to a location, where I can enjoy both beaches and mountains.

- Lake Garda, Italy. This large Italian lake in the southern alps features gravel beaches and mountainside hiking trails.
- Mallorca, Spain. This island is famous for its sandy beaches, turquoise water and hilly landscape.
- Lake Tahoe, California. This famous lake in the Sierra Nevada mountains offers an amazing variety of outdoor activities.
-

You could set penalty_exceptions to ["\n-"] to not penalize the generation of a new list item, but still increase other penalty settings to encourage the generation of new list items without repeating itself.

By default, we will also include any stop_sequences you have set, since completion performance can be degraded if expected stop sequences are penalized. You can disable this behavior by settings penalty_exceptions_include_stop_sequences to false.

penalty_exceptions_include_stop_sequences
boolean or null
Default: true

By default, we include any stop_sequences in penalty_exceptions, to not penalize the presence of stop sequences that are present in few-shot prompts to provide structure to your completions.

You can set this to false if you do not want this behavior.

See the description of penalty_exceptions above for more information on what penalty_exceptions are used for.

best_of
integer or null
Default: null

best_of number of completions are created on server side. The completion with the highest log probability per token is returned. If the parameter n is larger than 1 more than 1 (n) completions will be returned. best_of must be strictly greater than n.

n
integer or null
Default: 1

Number of completions to be returned. If only the argmax sampling is used (temperature, top_k, top_p are all default) the same completions will be produced. This parameter should only be increased if a random sampling is chosen.

logit_bias
object or null
Default: null
stop_sequences
Array of strings or null
Default: null

List of strings which will stop generation if they're generated. Stop sequences may be helpful in structured texts. Example: In a question answering scenario a text may consist of lines starting with either "Question: " or "Answer: " (alternating). After producing an answer, the model will be likely to generate "Question: ". "Question: " may therfore be used as stop sequence in order not to have the model generate more questions but rather restrict text generation to the answers.

tokens
boolean or null
Default: false

Flag indicating whether individual tokens of the completion are to be returned (True) or whether solely the generated text (i.e. the completion) is sufficient (False).

disable_optimizations
boolean or null
Default: false

We continually research optimal ways to work with our models. By default, we apply these optimizations to both your prompt and completion for you. Our goal is to improve your results while using our API. But you can always pass disable_optimizations: true and we will leave your prompt and completion untouched.

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-base",
  • "prompt": "An apple a day",
  • "maximum_tokens": 64
}

Response samples

Content type
application/json
{
  • "completions": [
    ],
  • "model_version": "2021-12",
  • "optimized_prompt": "An apple a day"
}

Embeds a text

Embeds a text using a specific model. Resulting vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers). To obtain a valid model, use GET /models_available.

Authorizations:
token
Request Body schema: application/json
model
required
string

Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

hosting
string or null (Hosting)
Enum: "aleph-alpha" null

Optional paramter which determines in which datacenters the request may be processed. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

required
Text Prompt (string) or (Array of Multimodal (Text (object) or Image (object) or Token Ids (object))) (Prompt)
layers
Array of integers

A list of layer indices from which to return embeddings.

- Index 0 corresponds to the word embeddings used as input to the first transformer layer

- Index 1 corresponds to the hidden state as output by the first transformer layer, index 2 to the output of the second layer etc.

- Index -1 corresponds to the last transformer layer (not the language modelling head), index -2 to the second last
tokens
boolean or null

Flag indicating whether the tokenized prompt is to be returned (True) or not (False)

pooling
Array of strings

Pooling operation to use. Pooling operations include:

- mean: aggregate token embeddings across the sequence dimension using an average

- weighted_mean: position weighted mean across sequence dimension with latter tokens having a higher weight

- max: aggregate token embeddings across the sequence dimension using a maximum

- last_token: just use the last token

- abs_max: aggregate token embeddings across the sequence dimension using a maximum of absolute values
type
string or null

Explictly set embedding type to be passed to the model. Unstable and experimental feature.

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-base",
  • "prompt": "An apple a day keeps the doctor away.",
  • "layers": [
    ],
  • "tokens": false,
  • "pooling": [
    ],
  • "type": "default"
}

Response samples

Content type
application/json
{
  • "model_version": "2021-12",
  • "embeddings": {
    },
  • "tokens": null
}

Semantic Embeddings

Embeds a prompt using a specific model and semantic embedding method. Resulting vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers). To obtain a valid model, use GET /models_available.

Authorizations:
token
Request Body schema: application/json
model
required
string

Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

hosting
string or null (Hosting)
Enum: "aleph-alpha" null

Optional paramter which determines in which datacenters the request may be processed. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

required
Text Prompt (string) or (Array of Multimodal (Text (object) or Image (object) or Token Ids (object))) (Prompt)
representation
required
string
Enum: "symmetric" "document" "query"

Type of embedding representation to embed the prompt with.

"symmetric" is useful for comparing prompts to each other, in use cases such as clustering, classification, similarity, etc. "symmetric" embeddings should be compared with other "symmetric" embeddings.

"document" and "query" are used together in use cases such as search where you want to compare shorter queries against larger documents.

"query" embeddings are optimized for shorter texts, such as questions or keywords.

"document" embeddings are optimized for larger pieces of text to compare queries against.

compress_to_size
integer or null
Value: 128

The default behavior is to return the full embedding, but you can optionally request an embedding compressed to a smaller set of dimensions.

Full embedding sizes for supported models:

  • luminous-base: 5120

The 128 size is expected to have a small drop in accuracy performance (4-6%), with the benefit of being much smaller, which makes comparing these embeddings much faster for use cases where speed is critical.

The 128 size can also perform better if you are embedding really short texts or documents.

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-base",
  • "prompt": "An apple a day keeps the doctor away.",
  • "representation": "symmetric",
  • "compress_to_size": 128
}

Response samples

Content type
application/json
{
  • "model_version": "2021-12",
  • "embedding": [
    ]
}

Evaluate likelihood of text

Evaluates the model's likelihood to produce a completion given a prompt.

Authorizations:
token
Request Body schema: application/json
model
required
string

Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

hosting
string or null (Hosting)
Enum: "aleph-alpha" null

Optional paramter which determines in which datacenters the request may be processed. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

required
Text Prompt (string) or (Array of Multimodal (Text (object) or Image (object) or Token Ids (object))) (Prompt)
completion_expected
required
string

The text to be completed. Unconditional completion can be used with an empty string (default). The prompt may contain a zero shot or few shot task.

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-base",
  • "prompt": "An apple a day",
  • "completion_expected": "keeps the doctor away."
}

Response samples

Content type
application/json
{
  • "model_version": "2021-12",
  • "result": {
    }
}

Tokenize a prompt

Tokenize a prompt for a specific model. To obtain a valid model, use GET /models_available.

Authorizations:
token
Request Body schema: application/json
model
required
string
prompt
required
string
tokens
required
boolean
token_ids
required
boolean

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-base",
  • "prompt": "An apple a day keeps the doctor away.",
  • "tokens": true,
  • "token_ids": true
}

Response samples

Content type
application/json
{
  • "tokens": [
    ],
  • "token_ids": [
    ]
}

Detokenize a list of tokens

Detokenize a list of tokens into a string. To obtain a valid model, use GET /models_available.

Authorizations:
token
Request Body schema: application/json
model
required
string
token_ids
required
Array of integers

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-base",
  • "token_ids": [
    ]
}

Response samples

Content type
application/json
{
  • "result": " An apple a day keeps the doctor away."
}

Answers a question about a prompt.

Will answer a question about a prompt. To obtain a valid model, use GET /models_available and look for a model that returns qa_support=true.

Authorizations:
token
Request Body schema: application/json
model
required
string

Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

hosting
string or null (Hosting)
Enum: "aleph-alpha" null

Optional paramter which determines in which datacenters the request may be processed. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

query
required
string

The question to be answered about the prompt by the model. The prompt may not contain a valid answer.

required
Array of Docx (object) or Text (object) or Prompt (object) (Document)

A list of documents. Valid document formats for tasks like Q&A and Summarization.

These can be one of the following formats:

  • Docx: A base64 encoded Docx file
  • Text: A string of text
  • Prompt: A multimodal prompt, as is used in our other tasks like Completion

Documents of types Docx and Text are usually preferred, and will have optimizations (such as chunking) applied to work better with the respective task that is being run.

Prompt documents are assumed to be used for advanced use cases, and will be left as-is.

maximum_tokens
required
integer

The maximum number of tokens to be generated. Completion will terminate after the maximum number of tokens is reached.

Increase this value to generate longer texts. A text is split into tokens. Usually there are more tokens than words. The summed number of tokens of prompt and maximum_tokens depends on the model (for luminous-base, it may not exceed 2048 tokens).

max_chunk_size
integer or null
Default: 175

Long documents will be split into chunks if they exceed max_chunk_size. The splitting will be done along the following boundaries until all chunks are shorter than max_chunk_size or all splitting criteria have been exhausted. The splitting boundaries are, in the given order:

  1. Split first by double newline (assumed to mark the boundary between 2 paragraphs).
  2. Split paragraphs that are still too long by their median sentence as long as we can still find multiple sentences in the paragraph.
  3. Split each remaining chunk of a paragraph or sentence further along white spaces until each chunk is smaller than max_chunk_size or until no whitespace can be found anymore.
disable_optimizations
boolean or null
Default: false

We continually research optimal ways to work with our models. By default, we apply these optimizations to both your query, documents, and answers for you. Our goal is to improve your results while using our API. But you can always pass disable_optimizations: true and we will leave your query, documents, and answers untouched.

max_answers
integer
Default: 0

The upper limit of maximum number of answers.

min_score
integer
Default: 0

The lower limit of minimum score for every answer.

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-extended",
  • "query": "Who likes Pizza?",
  • "documents": [
    ],
  • "maximum_tokens": 64
}

Response samples

Content type
application/json
{
  • "answers": [
    ],
  • "model_version": "2021-12"
}

Summarizes a document

Will summarize a document using a specific model. To obtain a valid model, use GET /models_available.

Authorizations:
token
Request Body schema: application/json
model
required
string

Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

hosting
string or null (Hosting)
Enum: "aleph-alpha" null

Optional paramter which determines in which datacenters the request may be processed. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

required
Docx (object) or Text (object) or Prompt (object) (Document)

Valid document formats for tasks like Q&A and Summarization.

These can be one of the following formats:

  • Docx: A base64 encoded Docx file
  • Text: A string of text
  • Prompt: A multimodal prompt, as is used in our other tasks like Completion

Documents of types Docx and Text are usually preferred, and will have optimizations (such as chunking) applied to work better with the respective task that is being run.

Prompt documents are assumed to be used for advanced use cases, and will be left as-is.

disable_optimizations
boolean or null
Default: false

We continually research optimal ways to work with our models. By default, we apply these optimizations to both your query, documents, and answers for you. Our goal is to improve your results while using our API. But you can always pass disable_optimizations: true and we will leave your document and summary untouched.

Responses

Request samples

Content type
application/json
{
  • "model": "luminous-extended",
  • "document": {
    }
}

Response samples

Content type
application/json
{
  • "summary": "All people love food",
  • "model_version": "2021-12"
}