Evaluate

POST /evaluate

Evaluates the model's likelihood to produce a completion given a prompt.

Request

Query Parameters

nice boolean

Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.

application/json

Body

required

model stringrequired

Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

hosting Hostingnullable

Possible values: [aleph-alpha, null]

Optional paramter that specifies which datacenters may process the request. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximum availability.

Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

prompt object required

This field is used to send prompts to the model. A prompt can either be a text prompt or a multimodal prompt. A text prompt is a string of text. A multimodal prompt is an array of prompt items. It can be a combination of text, images, and token ID arrays.

In the case of a multimodal prompt, the prompt items will be concatenated and a single prompt will be used for the model.

Tokenization:

Token ID arrays are used as as-is.
Text prompt items are tokenized using the tokenizers specific to the model.
Each image is converted into 144 tokens.

oneOf

Text Prompt
Multimodal

string

Array [

oneOf

Text
Image
Token Ids

type stringrequired

Possible values: [text]

data stringrequired

controls object[]

Array [

start integerrequired

Starting character index to apply the factor to.

length integerrequired

The amount of characters to apply the factor to.

factor numberrequired

Factor to apply to the given token in the attention matrix.

0 <= factor < 1 => Supress the given token
factor == 1 => identity operation, no change to attention
factor > 1 => Amplify the given token

token_overlap string

Possible values: [partial, complete]

Default value: partial

What to do if a control partially overlaps with a text token.

If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identiy operation for control factors.)

If set to "complete", the full factor will be applied as long as the control overlaps with the token at all.

]

type stringrequired

Possible values: [image]

data stringrequired

An image send as part of a prompt to a model. The image is represented as base64.

Note: The models operate on square images. All non-square images are center-cropped before going to the model, so portions of the image may not be visible.

You can supply specific cropping parameters if you like, to choose a different area of the image than a center-crop. Or, you can always transform the image yourself to a square before sending it.

x integer

x-coordinate of top left corner of cropping box in pixels

y integer

y-coordinate of top left corner of cropping box in pixels

size integer

Size of the cropping square in pixels

controls object[]

Array [

rect objectrequired

Bounding box in logical coordinates. From 0 to 1. With (0,0) being the upper left corner, and relative to the entire image.

Keep in mind, non-square images are center-cropped by default before going to the model. (You can specify a custom cropping if you want.). Since control coordinates are relative to the entire image, all or a portion of your control may be outside the "model visible area".

left numberrequired

x-coordinate of top left corner of the control bounding box. Must be a value between 0 and 1, where 0 is the left corner and 1 is the right corner.

top numberrequired

y-coordinate of top left corner of the control bounding box Must be a value between 0 and 1, where 0 is the top pixel row and 1 is the bottom row.

width numberrequired

width of the control bounding box Must be a value between 0 and 1, where 1 means the full width of the image.

height numberrequired

height of the control bounding box Must be a value between 0 and 1, where 1 means the full height of the image.

factor numberrequired

Factor to apply to the given token in the attention matrix.

0 <= factor < 1 => Supress the given token
factor == 1 => identity operation, no change to attention
factor > 1 => Amplify the given token

token_overlap string

Possible values: [partial, complete]

Default value: partial

What to do if a control partially overlaps with an image token.

If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers half of the image "tile", would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identiy operation for control factors.)

If set to "complete", the full factor will be applied as long as the control overlaps with the token at all.

]

type stringrequired

Possible values: [token_ids]

data integer[]required

controls object[]

Array [

index integerrequired

Index of the token, relative to the list of tokens IDs in the current prompt item.

factor numberrequired

Factor to apply to the given token in the attention matrix.

0 <= factor < 1 => Supress the given token
factor == 1 => identity operation, no change to attention
factor > 1 => Amplify the given token

]

completion_expected stringrequired

The completion that you would expect to be completed. Unconditional completion can be used with an empty string (default). The prompt may contain a zero shot or few shot task.

contextual_control_threshold numbernullable

If set to null, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

control_log_additive boolean

Default value: true

true: apply controls on prompt items by adding the log(control_factor) to attention scores. false: apply controls on prompt items by (attention_scores - -attention_scores.min(-1)) * control_factor

Responses

application/json

Schema
Example (from schema)

Schema

model_version string

model name and version (if any) of the used model for inference

result object

dictionary with result metrics of the evaluation

log_probability numbernullable

log probability of producing the expected completion given the prompt. This metric refers to all tokens and is therefore dependent on the used tokenizer. It cannot be directly compared among models with different tokenizers.

log_perplexity numbernullable

log perplexity associated with the expected completion given the prompt. This metric refers to all tokens and is therefore dependent on the used tokenizer. It cannot be directly compared among models with different tokenizers.

log_perplexity_per_token numbernullable

log perplexity associated with the expected completion given the prompt normalized for the number of tokens. This metric computes an average per token and is therefore dependent on the used tokenizer. It cannot be directly compared among models with different tokenizers.

log_perplexity_per_character numbernullable

log perplexity associated with the expected completion given the prompt normalized for the number of characters. This metric is independent of any tokenizer. It can be directly compared among models with different tokenizers.

correct_greedy booleannullable

Flag indicating whether a greedy completion would have produced the expected completion.

token_count integernullable

Number of tokens in the expected completion.

character_count integernullable

Number of characters in the expected completion.

completion stringnullable

argmax completion given the input consisting of prompt and expected completion. This may be used as an indicator of what the model would have produced. As only one single forward is performed an incoherent text could be produced especially for long expected completions.

{
  "model_version": "2021-12",
  "result": {
    "log_probability": -1.2281955,
    "log_perplexity": 1.2281955,
    "log_perplexity_per_token": 0.24563909,
    "log_perplexity_per_character": 1.2281955,
    "correct_greedy": true,
    "token_count": 5,
    "character_count": 1,
    "completion": " keeps the doctor away."
  }
}

Evaluate

/evaluate

Request​

Query Parameters

Body

Responses​

Request

Responses