Skip to main content

Embeddings

POST 

/embed

Embeds a text using a specific model. Resulting vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers). To obtain a valid model, use GET /model-settings.

Request

Query Parameters

    nice boolean

    Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.

Bodyrequired

    modelstringrequired

    Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version.

    hostingHosting (string)nullable

    Optional parameter that specifies which datacenters may process the request. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null).

    Not setting this value, or setting it to null, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximum availability.

    Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

    Possible values: [aleph-alpha, null]

    prompt objectrequired

    This field is used to send prompts to the model. A prompt can either be a text prompt or a multimodal prompt. A text prompt is a string of text. A multimodal prompt is an array of prompt items. It can be a combination of text, images, and token ID arrays.

    In the case of a multimodal prompt, the prompt items will be concatenated and a single prompt will be used for the model.

    Tokenization:

    • Token ID arrays are used as as-is.
    • Text prompt items are tokenized using the tokenizers specific to the model.
    • Each image is converted into 144 tokens.
    oneOf
    string

    The text to be completed. Unconditional completion can be started with an empty string (default). The prompt may contain a zero shot or few shot task.

    layersinteger[]

    A list of layer indices from which to return embeddings.

    - Index 0 corresponds to the word embeddings used as input to the first transformer layer - Index 1 corresponds to the hidden state as output by the first transformer layer, index 2 to the output of the second layer etc. - Index -1 corresponds to the last transformer layer (not the language modelling head), index -2 to the second last
    tokensbooleannullable

    Flag indicating whether the tokenized prompt is to be returned (True) or not (False)

    poolingstring[]

    Pooling operation to use. Pooling operations include:

    - mean: Aggregate token embeddings across the sequence dimension using an average. - weighted_mean: Position weighted mean across sequence dimension with latter tokens having a higher weight. - max: Aggregate token embeddings across the sequence dimension using a maximum. - last_token: Use the last token. - abs_max: Aggregate token embeddings across the sequence dimension using a maximum of absolute values.
    typestringnullable

    Explicitly set embedding type to be passed to the model. This parameter was created to allow for semantic_embed embeddings and will be deprecated. Please use the semantic_embed-endpoint instead.

    normalizeboolean

    Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric.

    Default value: false
    contextual_control_thresholdnumbernullable

    If set to null, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

    Default value: null
    control_log_additiveboolean

    true: apply controls on prompt items by adding the log(control_factor) to attention scores. false: apply controls on prompt items by (attention_scores - -attention_scores.min(-1)) * control_factor

    Default value: true

Responses

OK

Schema
    model_versionstring

    model name and version (if any) of the used model for inference

    embeddingsobjectnullable

    embeddings: - pooling: a dict with layer names as keys and and pooling output as values. A pooling output is a dict with pooling operation as key and a pooled embedding (list of floats) as values

    tokensstring[]nullable
    num_tokens_prompt_totalinteger

    Number of tokens in the prompt.

    Tokenization:

    • Token ID arrays are used as as-is.
    • Text prompt items are tokenized using the tokenizers specific to the model.
    • Each image is converted into a fixed amount of tokens that depends on the chosen model.

Authorization: http

name: tokentype: httpscheme: bearerdescription: Can be generated in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile)
var client = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Post, "https://docs.aleph-alpha.com/embed");
request.Headers.Add("Accept", "application/json");
request.Headers.Add("Authorization", "Bearer <token>");
var content = new StringContent("{\n \"model\": \"luminous-base\",\n \"prompt\": \"An apple a day keeps the doctor away.\",\n \"layers\": [\n 0,\n 1\n ],\n \"tokens\": false,\n \"pooling\": [\n \"max\"\n ],\n \"type\": \"default\"\n}", null, "application/json");
request.Content = content;
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
Console.WriteLine(await response.Content.ReadAsStringAsync());
Request Collapse all
Auth
Parameters
— query
Body required
{
  "model": "luminous-base",
  "prompt": "An apple a day keeps the doctor away.",
  "layers": [
    0,
    1
  ],
  "tokens": false,
  "pooling": [
    "max"
  ],
  "type": "default"
}