Embeddings
POST/embed
Embeds a text using a specific model. Resulting vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers). To obtain a valid model, use GET
/models_available
.
Request
Query Parameters
Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.
- application/json
Body
required
- Token ID arrays are used as as-is.
- Text prompt items are tokenized using the tokenizers specific to the model.
- Each image is converted into 144 tokens.
- Text Prompt
- Multimodal
- Array [
- Text
- Image
- Token Ids
- Array [
- 0 <= factor < 1 => Suppress the given token
- factor == 1 => identity operation, no change to attention
- factor > 1 => Amplify the given token
- ]
- Array [
- 0 <= factor < 1 => Suppress the given token
- factor == 1 => identity operation, no change to attention
- factor > 1 => Amplify the given token
- ]
- Array [
- 0 <= factor < 1 => Suppress the given token
- factor == 1 => identity operation, no change to attention
- factor > 1 => Amplify the given token
- ]
- ]
Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.
Possible values: [aleph-alpha
, null
]
Optional parameter that specifies which datacenters may process the request.
You can either set the parameter to "aleph-alpha" or omit it (defaulting to null
).
Not setting this value, or setting it to null
, gives us maximal flexibility in processing your request in our
own datacenters and on servers hosted with other providers. Choose this option for maximum availability.
Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.
prompt object required
This field is used to send prompts to the model. A prompt can either be a text prompt or a multimodal prompt. A text prompt is a string of text. A multimodal prompt is an array of prompt items. It can be a combination of text, images, and token ID arrays.
In the case of a multimodal prompt, the prompt items will be concatenated and a single prompt will be used for the model.
Tokenization:
string
Possible values: [text
]
controls object[]
Starting character index to apply the factor to.
The amount of characters to apply the factor to.
Factor to apply to the given token in the attention matrix.
Possible values: [partial
, complete
]
Default value: partial
What to do if a control partially overlaps with a text token.
If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identity operation for control factors.)
If set to "complete", the full factor will be applied as long as the control overlaps with the token at all.
Possible values: [image
]
An image send as part of a prompt to a model. The image is represented as base64.
Note: The models operate on square images. All non-square images are center-cropped before going to the model, so portions of the image may not be visible.
You can supply specific cropping parameters if you like, to choose a different area of the image than a center-crop. Or, you can always transform the image yourself to a square before sending it.
x-coordinate of top left corner of cropping box in pixels
y-coordinate of top left corner of cropping box in pixels
Size of the cropping square in pixels
controls object[]
rect objectrequired
Bounding box in logical coordinates. From 0 to 1. With (0,0) being the upper left corner, and relative to the entire image.
Keep in mind, non-square images are center-cropped by default before going to the model. (You can specify a custom cropping if you want.). Since control coordinates are relative to the entire image, all or a portion of your control may be outside the "model visible area".
x-coordinate of top left corner of the control bounding box. Must be a value between 0 and 1, where 0 is the left corner and 1 is the right corner.
y-coordinate of top left corner of the control bounding box Must be a value between 0 and 1, where 0 is the top pixel row and 1 is the bottom row.
width of the control bounding box Must be a value between 0 and 1, where 1 means the full width of the image.
height of the control bounding box Must be a value between 0 and 1, where 1 means the full height of the image.
Factor to apply to the given token in the attention matrix.
Possible values: [partial
, complete
]
Default value: partial
What to do if a control partially overlaps with an image token.
If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers half of the image "tile", would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identity operation for control factors.)
If set to "complete", the full factor will be applied as long as the control overlaps with the token at all.
Possible values: [token_ids
]
controls object[]
Index of the token, relative to the list of tokens IDs in the current prompt item.
Factor to apply to the given token in the attention matrix.
A list of layer indices from which to return embeddings.
- Index 0 corresponds to the word embeddings used as input to the first transformer layer
- Index 1 corresponds to the hidden state as output by the first transformer layer, index 2 to the output of the second layer etc.
- Index -1 corresponds to the last transformer layer (not the language modelling head), index -2 to the second last
Flag indicating whether the tokenized prompt is to be returned (True) or not (False)
Pooling operation to use. Pooling operations include:
- mean: Aggregate token embeddings across the sequence dimension using an average.
- weighted_mean: Position weighted mean across sequence dimension with latter tokens having a higher weight.
- max: Aggregate token embeddings across the sequence dimension using a maximum.
- last_token: Use the last token.
- abs_max: Aggregate token embeddings across the sequence dimension using a maximum of absolute values.
Explicitly set embedding type to be passed to the model. This parameter was created to allow for semantic_embed embeddings and will be deprecated. Please use the semantic_embed-endpoint instead.
Default value: false
Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric.
If set to null
, attention control parameters only apply to those tokens that have
explicitly been set in the request.
If set to a non-null value, we apply the control parameters to similar tokens as well.
Controls that have been applied to one token will then be applied to all other tokens
that have at least the similarity score defined by this parameter.
The similarity score is the cosine similarity of token embeddings.
Default value: true
true
: apply controls on prompt items by adding the log(control_factor)
to attention scores.
false
: apply controls on prompt items by (attention_scores - -attention_scores.min(-1)) * control_factor
Responses
- 200
OK
- application/json
- Schema
- Example (from schema)
Schema
- Token ID arrays are used as as-is.
- Text prompt items are tokenized using the tokenizers specific to the model.
- Each image is converted into a fixed amount of tokens that depends on the chosen model.
model name and version (if any) of the used model for inference
embeddings: - pooling: a dict with layer names as keys and and pooling output as values. A pooling output is a dict with pooling operation as key and a pooled embedding (list of floats) as values
Number of tokens in the prompt.
Tokenization:
{
"model_version": "2021-12",
"embeddings": {
"layer_0": {
"max": [
-0.053497314,
0.0053749084,
0.06427002,
0.05316162,
-0.0044059753,
"..."
]
},
"layer_1": {
"max": [
0.14086914,
-0.24780273,
1.3232422,
-0.07055664,
1.2148438,
"..."
]
}
},
"tokens": null,
"num_tokens_prompt_total": 42
}