Skip to main content

Chat

POST 

/chat/completions

Retrieves one or multiple chat completions for a given prompt

Request

Query Parameters

    nice boolean

    Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.

Bodyrequired

    messages object[]required

    A list of messages comprising the conversation so far.

  • Array [
  • rolestringrequired

    The role of the current message.

    Only one optional "system" message is allowed at the beginning of the conversation. The remaining conversation:

    • Must alternate between "user" and "assistant" messages.
    • Must begin with a "user" message.
    • Must end with a "user" message.

    Possible values: [system, user, assistant]

    contentstringrequired

    The content of the current message.

    namedeprecated

    This parameter is unsupported and will be ignored.

  • ]
  • modelstringrequired

    The ID of the model to query.

    The requested model must be eligible for chat completions.

    frequency_penaltynumber

    When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.

    The penalty is cumulative. The more a token is mentioned in the completion, the more its probability will decrease.

    Possible values: >= -2 and <= 2

    logit_bias

    When specified, the provided hash map will affect the likelihood of the specified token IDs (!) appearing in the completion.

    Mathematically, the bias is added to the logits generated by the model prior to sampling. Values between -1 and 1 should decrease or increase likelihood of selection while values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

    Note that since JSON does not support integer keys, the token IDs are represented as strings.

    property name*number
    logprobsboolean

    When set to true, the model will return the log probabilities of the sampled tokens in the completion.

    top_logprobsinteger

    When specified, the model will return the log probabilities of the top n tokens in the completion.

    Possible values: <= 20

    max_tokensinteger

    The maximum number of tokens to generate in the completion. The model will stop generating tokens once it reaches this length.

    The maximum value for this parameter depends on the specific model and the length of the input prompt. When no value is provided, the highest possible value will be used.

    Possible values: >= 1

    ninteger

    The number of completions to generate for each prompt. The model will generate this many completions and return all of them.

    When no value is provided, one completion will be returned.

    Possible values: >= 1

    presence_penaltynumber

    When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.s

    The penalty is not cumulative. Mentioning a token more than once will not increase the penalty further.

    Possible values: >= -2 and <= 2

    response_formatdeprecated

    This parameter is unsupported and will be rejected.

    seeddeprecated

    This parameter is unsupported and will be ignored.

    service_tierdeprecated

    This parameter is unsupported and will be ignored.

    stop object

    When specified, sequence generation will stop when the model generates this token.

    oneOf
    string
    streamboolean

    When set to true, the model will transmit all completions tokens as soon as they become available via the server-sent events protocol.

    stream_options object

    Additional options to affect the streaming behavior.

    include_usageboolean

    If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array.

    temperaturenumber

    Controls the randomness of the model. Lower values will make the model more deterministic and higher values will make it more random.

    Mathematically, the temperature is used to divide the logits before sampling. A temperature of 0 will always return the most likely token.

    When no value is provided, the default value of 1 will be used.

    Possible values: <= 2

    top_pnumber

    "nucleus" parameter to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. It specifies a probability threshold, below which all less likely tokens are filtered out.

    When no value is provided, the default value of 1 will be used.

    Possible values: <= 1

    steering_conceptsSteeringConcept (string)[]

    Specifies how the output of the model should be steered. This steers the output in the direction given by positive examples associated to the steering concept and away from the negative examples.

    Possible values: Value must match regular expression ^_worker/[a-zA-Z0-9-_]{1,64}$

    Default value: []
    toolsdeprecated

    This parameter is unsupported and will be rejected.

    tool_choicedeprecated

    This parameter is unsupported and will be rejected.

    parallel_tool_callsdeprecated

    This parameter is unsupported and will be rejected.

    userdeprecated

    This parameter is unsupported and will be ignored.

Responses

OK

Schema
  • Array [
  • idstring

    An ID that is unique throughout the given request. When multiple chunks are returned using server-sent events, this ID will be the same for all of them.

    choices object[]

    A list of chat completion choices. Can be more than one if n is greater than 1.

  • Array [
  • finish_reasonstring

    The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence or length if the maximum number of tokens specified in the request was reached. If the API is unable to understand the stop reason emitted by one of the workers, content_filter is returned.

    When streaming is enabled, the value is only set in the last chunk of a completion and null otherwise.

    Possible values: [stop, length, content_filter]

    indexintegerrequired

    The index of the current chat completion in the conversation. Use this parameter to associate chunks with the correct message stream as chunks might arrive out of order. This is mostly relevant when streaming is enabled and multiple completions are requested.

    message object

    Chat completion generated by the model when streaming is disabled.

    rolestringrequired

    The role of the current chat completion. Will assistant.

    Possible values: [assistant]

    contentstringrequired

    The content of the current chat completion.

    delta object

    Chat completion chunk generated by the model when streaming is enabled.

    rolestring

    The role of the current chat completion. Will be assistant for the first chunk of every completion stream and missing for the remaining chunks.

    Possible values: [assistant]

    contentstringrequired

    The content of the current chat completion. Will be empty for the first chunk of every completion stream and non-empty for the remaining chunks.

    logprobs object

    Log probability information for the choice. null if this is the end of a completion stream.

    content object[]

    A list of message content tokens with log probability information.

  • Array [
  • tokenstringrequired

    The token.

    logprobnumberrequired

    The log probability of the token. If the log probability is not returned by the worker, -9999.0 is used as a fallback.

    bytesinteger[]required

    A list of integers representing the UTF-8 bytes representation of the token.

    top_logprobs object[]required

    List of the most likely tokens and their log probability, at this token position. In rare cases, there may be fewer than the number of requested top_logprobs returned.

  • Array [
  • tokenstringrequired

    The token.

    logprobnumberrequired

    The log probability of the token.

    bytesinteger[]required

    A list of integers representing the UTF-8 bytes representation of the token.

  • ]
  • ]
  • ]
  • createdinteger

    The Unix timestamp (in seconds) of when the chat completion was created.

    modelstring

    The ID of the model that generated the completion.

    system_fingerprintstring

    The specific version of the model that generated the completion. This field can be used to track inconsistencies between calls to different deployments of otherwise identical models.

    When streaming is enabled, the value is only set in the last chunk of a completion and null otherwise.

    objectstring

    Will be chat.completion by default and chat.completion.chunk when streaming is enabled.

    Possible values: [chat.completion, chat.completion.chunk]

    usage object

    Usage statistics for the completion request.

    When streaming is enabled, this field will be null by default. To include an additional usage-only message in the response stream, set stream_options.include_usage to true.

    completion_tokensintegerrequired

    Number of tokens in the generated completion.

    prompt_tokensintegerrequired

    Number of tokens in the prompt.

    total_tokensintegerrequired

    Total number of tokens used in the request (prompt + completion).

  • ]

Authorization: http

name: tokentype: httpscheme: bearerdescription: Can be generated in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile)
var client = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Post, "https://docs.aleph-alpha.com/chat/completions");
request.Headers.Add("Accept", "application/json");
request.Headers.Add("Authorization", "Bearer <token>");
var content = new StringContent("{\n \"messages\": [\n {\n \"role\": \"system\",\n \"content\": \"string\"\n }\n ],\n \"model\": \"string\",\n \"frequency_penalty\": 0,\n \"logprobs\": true,\n \"top_logprobs\": 0,\n \"max_tokens\": 0,\n \"n\": 0,\n \"presence_penalty\": 0,\n \"stop\": \"string\",\n \"stream\": true,\n \"stream_options\": {\n \"include_usage\": true\n },\n \"temperature\": 0,\n \"top_p\": 0,\n \"steering_concepts\": [\n \"string\"\n ]\n}", null, "application/json");
request.Content = content;
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
Console.WriteLine(await response.Content.ReadAsStringAsync());
Request Collapse all
Auth
Parameters
— query
Body required
{
  "messages": [
    {
      "role": "system",
      "content": "string"
    }
  ],
  "model": "string",
  "frequency_penalty": 0,
  "logprobs": true,
  "top_logprobs": 0,
  "max_tokens": 0,
  "n": 0,
  "presence_penalty": 0,
  "stop": "string",
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "temperature": 0,
  "top_p": 0,
  "steering_concepts": [
    "string"
  ]
}