Completion

POST /complete

Will complete a prompt using a specific model. To obtain a valid model, use GET /models_available.

Request

Query Parameters

nice boolean

Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.

application/json

Body

required

model stringrequired

The name of the model from the Luminous model family. Models and their respective architectures can differ in parameter size and capabilities. The most recent version of the model is always used. The model output contains information as to the model version.

hosting Hostingnullable

Possible values: [aleph-alpha, null]

Optional parameter that specifies which datacenters may process the request. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximum availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

prompt object required

This field is used to send prompts to the model. A prompt can either be a text prompt or a multimodal prompt. A text prompt is a string of text. A multimodal prompt is an array of prompt items. It can be a combination of text, images, and token ID arrays.

In the case of a multimodal prompt, the prompt items will be concatenated and a single prompt will be used for the model.

Tokenization:

Token ID arrays are used as as-is.
Text prompt items are tokenized using the tokenizers specific to the model.
Each image is converted into 144 tokens.

oneOf

Text Prompt
Multimodal

string

Array [

oneOf

Text
Image
Token Ids

type stringrequired

Possible values: [text]

data stringrequired

controls object[]

Array [

start integerrequired

Starting character index to apply the factor to.

length integerrequired

The amount of characters to apply the factor to.

factor numberrequired

Factor to apply to the given token in the attention matrix.

0 <= factor < 1 => Suppress the given token
factor == 1 => identity operation, no change to attention
factor > 1 => Amplify the given token

token_overlap string

Possible values: [partial, complete]

Default value: partial

What to do if a control partially overlaps with a text token.

If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identity operation for control factors.)

If set to "complete", the full factor will be applied as long as the control overlaps with the token at all.

]

type stringrequired

Possible values: [image]

data stringrequired

An image send as part of a prompt to a model. The image is represented as base64.

Note: The models operate on square images. All non-square images are center-cropped before going to the model, so portions of the image may not be visible.

You can supply specific cropping parameters if you like, to choose a different area of the image than a center-crop. Or, you can always transform the image yourself to a square before sending it.

x integer

x-coordinate of top left corner of cropping box in pixels

y integer

y-coordinate of top left corner of cropping box in pixels

size integer

Size of the cropping square in pixels

controls object[]

Array [

rect objectrequired

Bounding box in logical coordinates. From 0 to 1. With (0,0) being the upper left corner, and relative to the entire image.

Keep in mind, non-square images are center-cropped by default before going to the model. (You can specify a custom cropping if you want.). Since control coordinates are relative to the entire image, all or a portion of your control may be outside the "model visible area".

left numberrequired

x-coordinate of top left corner of the control bounding box. Must be a value between 0 and 1, where 0 is the left corner and 1 is the right corner.

top numberrequired

y-coordinate of top left corner of the control bounding box Must be a value between 0 and 1, where 0 is the top pixel row and 1 is the bottom row.

width numberrequired

width of the control bounding box Must be a value between 0 and 1, where 1 means the full width of the image.

height numberrequired

height of the control bounding box Must be a value between 0 and 1, where 1 means the full height of the image.

factor numberrequired

Factor to apply to the given token in the attention matrix.

0 <= factor < 1 => Suppress the given token
factor == 1 => identity operation, no change to attention
factor > 1 => Amplify the given token

token_overlap string

Possible values: [partial, complete]

Default value: partial

What to do if a control partially overlaps with an image token.

If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers half of the image "tile", would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identity operation for control factors.)

If set to "complete", the full factor will be applied as long as the control overlaps with the token at all.

]

type stringrequired

Possible values: [token_ids]

data integer[]required

controls object[]

Array [

index integerrequired

Index of the token, relative to the list of tokens IDs in the current prompt item.

factor numberrequired

Factor to apply to the given token in the attention matrix.

0 <= factor < 1 => Suppress the given token
factor == 1 => identity operation, no change to attention
factor > 1 => Amplify the given token

]

maximum_tokens integerrequired

The maximum number of tokens to be generated. Completion will terminate after the maximum number of tokens is reached.

Increase this value to generate longer texts. A text is split into tokens. Usually there are more tokens than words. The sum of input tokens and maximum_tokens may not exceed 2048.

minimum_tokens integer

Generate at least this number of tokens before an end-of-text token is generated.

echo boolean

Default value: false

Echo the prompt in the completion. This may be especially helpful when log_probs is set to return logprobs for the prompt.

temperature numbernullable

A higher sampling temperature encourages the model to produce less probable outputs ("be more creative"). Values are expected in a range from 0.0 to 1.0. Try high values (e.g., 0.9) for a more "creative" response and the default 0.0 for a well defined and repeatable answer. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

top_k integernullable

Introduces random sampling for generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more creative. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

top_p numbernullable

Introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability top_p. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

presence_penalty numbernullable

The presence penalty reduces the likelihood of generating tokens that are already present in the generated text (repetition_penalties_include_completion=true) respectively the prompt (repetition_penalties_include_prompt=true). Presence penalty is independent of the number of occurrences. Increase the value to reduce the likelihood of repeating text. An operation like the following is applied:

logits[t] -> logits[t] - 1 * penalty

where logits[t] is the logits for any given token. Note that the formula is independent of the number of times that a token appears.

frequency_penalty numbernullable

The frequency penalty reduces the likelihood of generating tokens that are already present in the generated text (repetition_penalties_include_completion=true) respectively the prompt (repetition_penalties_include_prompt=true). If repetition_penalties_include_prompt=True, this also includes the tokens in the prompt. Frequency penalty is dependent on the number of occurrences of a token. An operation like the following is applied:

logits[t] -> logits[t] - count[t] * penalty

where logits[t] is the logits for any given token and count[t] is the number of times that token appears.

sequence_penalty number

Increasing the sequence penalty reduces the likelihood of reproducing token sequences that already appear in the prompt (if repetition_penalties_include_prompt is True) and prior completion.

sequence_penalty_min_length integer

Default value: 2

Minimal number of tokens to be considered as sequence

repetition_penalties_include_prompt booleannullable

Default value: false

Flag deciding whether presence penalty or frequency penalty are updated from tokens in the prompt

repetition_penalties_include_completion boolean

Default value: true

Flag deciding whether presence penalty or frequency penalty are updated from tokens in the completion

use_multiplicative_presence_penalty booleannullable

Default value: false

Flag deciding whether presence penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for presence penalty.

use_multiplicative_frequency_penalty boolean

Default value: false

Flag deciding whether frequency penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for frequency penalty.

use_multiplicative_sequence_penalty boolean

Default value: false

Flag deciding whether sequence penalty is applied multiplicatively (True) or additively (False).

penalty_bias stringnullable

All tokens in this text will be used in addition to the already penalized tokens for repetition penalties. These consist of the already generated completion tokens and the prompt tokens, if repetition_penalties_include_prompt is set to true.

penalty_exceptions string[]nullable

List of strings that may be generated without penalty, regardless of other penalty settings. By default, we will also include any stop_sequences you have set, since completion performance can be degraded if expected stop sequences are penalized. You can disable this behavior by setting penalty_exceptions_include_stop_sequences to false.

penalty_exceptions_include_stop_sequences booleannullable

Default value: true

By default we include all stop_sequences in penalty_exceptions, so as not to penalise the presence of stop sequences that are present in few-shot prompts to give structure to your completions.

You can set this to false if you do not want this behaviour.

See the description of penalty_exceptions for more information on what penalty_exceptions are used for.

best_of integernullable

Possible values: <= 100

Default value: 1

If a value is given, the number of best_of completions will be generated on the server side. The completion with the highest log probability per token is returned. If the parameter n is greater than 1 more than 1 (n) completions will be returned. best_of must be strictly greater than n.

n integernullable

Default value: 1

The number of completions to return. If argmax sampling is used (temperature, top_k, top_p are all default) the same completions will be produced. This parameter should only be increased if random sampling is used.

logit_bias objectnullable

log_probs integernullable

Number of top log probabilities for each token generated. Log probabilities can be used in downstream tasks or to assess the model's certainty when producing tokens. No log probabilities are returned if set to None. Log probabilities of generated tokens are returned if set to 0. Log probabilities of generated tokens and top n log probabilities are returned if set to n.

stop_sequences string[]nullable

List of strings that will stop generation if they're generated. Stop sequences may be helpful in structured texts.

tokens booleannullable

Default value: false

Flag indicating whether individual tokens of the completion should be returned (True) or whether solely the generated text (i.e. the completion) is sufficient (False).

raw_completion boolean

Default value: false

Setting this parameter to true forces the raw completion of the model to be returned. For some models, we may optimize the completion that was generated by the model and return the optimized completion in the completion field of the CompletionResponse. The raw completion, if returned, will contain the un-optimized completion. Setting tokens to true or log_probs to any value will also trigger the raw completion to be returned.

disable_optimizations booleannullable

Default value: false

We continually research optimal ways to work with our models. By default, we apply these optimizations to both your prompt and completion for you. Our goal is to improve your results while using our API. But you can always pass disable_optimizations: true and we will leave your prompt and completion untouched.

completion_bias_inclusion string[]

Default value: ``

Bias the completion to only generate options within this list; all other tokens are disregarded at sampling

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa

completion_bias_inclusion_first_token_only boolean

Default value: false

Only consider the first token for the completion_bias_inclusion

completion_bias_exclusion string[]

Default value: ``

Bias the completion to NOT generate options within this list; all other tokens are unaffected in sampling

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa

completion_bias_exclusion_first_token_only boolean

Default value: false

Only consider the first token for the completion_bias_exclusion

contextual_control_threshold numbernullable

If set to null, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

control_log_additive boolean

Default value: true

true: apply controls on prompt items by adding the log(control_factor) to attention scores. false: apply controls on prompt items by (attention_scores - -attention_scores.min(-1)) * control_factor

Responses

application/json

Schema
Example (from schema)

Schema

model_version string

model name and version (if any) of the used model for inference

completions object[]

list of completions; may contain only one entry if no more are requested (see parameter n)

Array [

log_probs objectnullable

list with a dictionary for each generated token. The dictionary maps the keys' tokens to the respective log probabilities. This field is only returned if requested with the parameter "log_probs".

completion string

generated completion on the basis of the prompt

raw_completion stringnullable

For some models, we may optimize the completion that was generated by the model and return the optimized completion in the completion field of the CompletionResponse. The raw completion, if returned, will contain the un-optimized completion. Setting the parameter raw_completion in the CompletionRequest to true forces the raw completion of the model to be returned. Setting tokens to true or log_probs to any value will also trigger the raw completion to be returned.

completion_tokens string[]

completion split into tokens. This field is only returned if requested with the parameter "tokens".

finish_reason stringnullable

reason for termination of generation. This may be a stop sequence or maximum number of tokens reached.

]

optimized_prompt object[]

Describes prompt after optimizations. This field is only returned if the flag disable_optimizations flag is not set and the prompt has actually changed.

Array [

oneOf

Text
Image
Token Ids

type string

Possible values: [text]

data string

type string

Possible values: [image]

data string

base64 encoded image

type string

Possible values: [token_ids]

data integer[]

]

num_tokens_prompt_total integer

Number of tokens combined across all completion tasks.

In particular, if you set best_of or n to a number larger than 1 then we report the combined prompt token count for all best_of or n tasks.

Tokenization:

Token ID arrays are used as as-is.
Text prompt items are tokenized using the tokenizers specific to the model.
Each image is converted into a fixed amount of tokens that depends on the chosen model.

num_tokens_generated integer

Number of tokens combined across all completion tasks. If multiple completions are returned or best_of is set to a value greater than 1 then this value contains the combined generated token count.

{
  "completions": [
    {
      "completion": "keeps the doctor away,",
      "finish_reason": "maximum_tokens"
    }
  ],
  "model_version": "2021-12",
  "optimized_prompt": "An apple a day",
  "num_tokens_prompt_total": 4,
  "num_tokens_generated": 5
}

Completion

/complete

Request​

Query Parameters

Body

Responses​

Request

Responses