Chat

POST /chat/completions

Retrieves one or multiple chat completions for a given prompt

Request

Query Parameters

nice boolean

Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.

application/json

Bodyrequired

messages object[]required

modelstringrequired

The ID of the model to query.

The requested model must be eligible for chat completions.

frequency_penaltynumber

When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.

The penalty is cumulative. The more a token is mentioned in the completion, the more its probability will decrease.

Possible values: >= -2 and <= 2

logit_bias

logprobsboolean

When set to true, the model will return the log probabilities of the sampled tokens in the completion.

top_logprobsinteger

When specified, the model will return the log probabilities of the top n tokens in the completion.

Possible values: <= 20

max_tokensinteger

The maximum number of tokens to generate in the completion. The model will stop generating tokens once it reaches this length.

The maximum value for this parameter depends on the specific model and the length of the input prompt. When no value is provided, the highest possible value will be used.

Possible values: >= 1

ninteger

The number of completions to generate for each prompt. The model will generate this many completions and return all of them.

When no value is provided, one completion will be returned.

Possible values: >= 1

presence_penaltynumber

When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.s

The penalty is not cumulative. Mentioning a token more than once will not increase the penalty further.

Possible values: >= -2 and <= 2

response_formatdeprecated

This parameter is unsupported and will be rejected.

seeddeprecated

This parameter is unsupported and will be ignored.

service_tierdeprecated

This parameter is unsupported and will be ignored.

stop object

streamboolean

When set to true, the model will transmit all completions tokens as soon as they become available via the server-sent events protocol.

stream_options object

temperaturenumber

Controls the randomness of the model. Lower values will make the model more deterministic and higher values will make it more random.

Mathematically, the temperature is used to divide the logits before sampling. A temperature of 0 will always return the most likely token.

When no value is provided, the default value of 1 will be used.

Possible values: <= 2

top_pnumber

"nucleus" parameter to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. It specifies a probability threshold, below which all less likely tokens are filtered out.

When no value is provided, the default value of 1 will be used.

Possible values: <= 1

steering_conceptsSteeringConcept (string)[]

Specifies how the output of the model should be steered. This steers the output in the direction given by positive examples associated to the steering concept and away from the negative examples.

Possible values: Value must match regular expression ^_worker/[a-zA-Z0-9-_]{1,64}$

Default value: []

toolsdeprecated

This parameter is unsupported and will be rejected.

tool_choicedeprecated

This parameter is unsupported and will be rejected.

parallel_tool_callsdeprecated

This parameter is unsupported and will be rejected.

userdeprecated

This parameter is unsupported and will be ignored.

Responses

application/json

Schema
Example (auto)

Schema

Array [

idstring

An ID that is unique throughout the given request. When multiple chunks are returned using server-sent events, this ID will be the same for all of them.

choices object[]

A list of chat completion choices. Can be more than one if n is greater than 1.

Array [

finish_reasonstring

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence or length if the maximum number of tokens specified in the request was reached. If the API is unable to understand the stop reason emitted by one of the workers, content_filter is returned.

When streaming is enabled, the value is only set in the last chunk of a completion and null otherwise.

Possible values: [stop, length, content_filter]

indexintegerrequired

The index of the current chat completion in the conversation. Use this parameter to associate chunks with the correct message stream as chunks might arrive out of order. This is mostly relevant when streaming is enabled and multiple completions are requested.

message object

delta object

logprobs object

]

createdinteger

The Unix timestamp (in seconds) of when the chat completion was created.

modelstring

The ID of the model that generated the completion.

system_fingerprintstring

The specific version of the model that generated the completion. This field can be used to track inconsistencies between calls to different deployments of otherwise identical models.

When streaming is enabled, the value is only set in the last chunk of a completion and null otherwise.

objectstring

Will be chat.completion by default and chat.completion.chunk when streaming is enabled.

Possible values: [chat.completion, chat.completion.chunk]

usage object

]

[
  {
    "id": "string",
    "choices": [
      {
        "finish_reason": "stop",
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "string"
        },
        "delta": {
          "role": "assistant",
          "content": "string"
        },
        "logprobs": {
          "content": [
            {
              "token": "string",
              "logprob": 0,
              "bytes": [
                0
              ],
              "top_logprobs": [
                {
                  "token": "string",
                  "logprob": 0,
                  "bytes": [
                    0
                  ]
                }
              ]
            }
          ]
        }
      }
    ],
    "created": 0,
    "model": "string",
    "system_fingerprint": "string",
    "object": "chat.completion",
    "usage": {
      "completion_tokens": 0,
      "prompt_tokens": 0,
      "total_tokens": 0
    }
  }
]

Authorization: http

name: tokentype: httpscheme: bearerdescription: Can be generated in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile)

HTTPCLIENT
RESTSHARP

var client = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Post, "https://docs.aleph-alpha.com/chat/completions");
request.Headers.Add("Accept", "application/json");
request.Headers.Add("Authorization", "Bearer <token>");
var content = new StringContent("{\n  \"messages\": [\n    {\n      \"role\": \"system\",\n      \"content\": \"string\"\n    }\n  ],\n  \"model\": \"string\",\n  \"frequency_penalty\": 0,\n  \"logprobs\": true,\n  \"top_logprobs\": 0,\n  \"max_tokens\": 0,\n  \"n\": 0,\n  \"presence_penalty\": 0,\n  \"stop\": \"string\",\n  \"stream\": true,\n  \"stream_options\": {\n    \"include_usage\": true\n  },\n  \"temperature\": 0,\n  \"top_p\": 0,\n  \"steering_concepts\": [\n    \"string\"\n  ]\n}", null, "application/json");
request.Content = content;
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
Console.WriteLine(await response.Content.ReadAsStringAsync());

Chat

/chat/completions

Request​

Query Parameters

Bodyrequired

Responses​

Authorization: http

Request

Responses