Chat

POST /chat/completions

Retrieves one or multiple chat completions for a given prompt

Request

application/json

Body

required

messages object[]required

A list of messages comprising the conversation so far.

Array [

role stringrequired

Possible values: [system, user, assistant]

The role of the current message.

Only one optional "system" message is allowed at the beginning of the conversation. The remaining conversation:

Must alternate between "user" and "assistant" messages.
Must begin with a "user" message.
Must end with a "user" message.

content stringrequired

The content of the current message.

name deprecated

This parameter is unsupported and will be ignored.

]

model stringrequired

The ID of the model to query.

The requested model must be eligible for chat completions.

frequency_penalty number

Possible values: >= -2 and <= 2

When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.

The penalty is cumulative. The more a token is mentioned in the completion, the more its probability will decrease.

logit_bias

When specified, the provided hash map will affect the likelihood of the specified token IDs (!) appearing in the completion.

Mathematically, the bias is added to the logits generated by the model prior to sampling. Values between -1 and 1 should decrease or increase likelihood of selection while values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

Note that since JSON does not support integer keys, the token IDs are represented as strings.

property name* number

logprobs boolean

When set to true, the model will return the log probabilities of the sampled tokens in the completion.

top_logprobs integer

Possible values: <= 20

When specified, the model will return the log probabilities of the top n tokens in the completion.

max_tokens integer

Possible values: >= 1

The maximum number of tokens to generate in the completion. The model will stop generating tokens once it reaches this length.

The maximum value for this parameter depends on the specific model and the length of the input prompt. When no value is provided, the highest possible value will be used.

n integer

Possible values: >= 1

The number of completions to generate for each prompt. The model will generate this many completions and return all of them.

When no value is provided, one completion will be returned.

presence_penalty number

Possible values: >= -2 and <= 2

When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.s

The penalty is not cumulative. Mentioning a token more than once will not increase the penalty further.

response_format deprecated

This parameter is unsupported and will be rejected.

seed deprecated

This parameter is unsupported and will be ignored.

service_tier deprecated

This parameter is unsupported and will be ignored.

stop object

When specified, sequence generation will stop when the model generates this token.

oneOf

MOD1
MOD2

string

stream boolean

When set to true, the model will transmit all completions tokens as soon as they become available via the server-sent events protocol.

stream_options object

Additional options to affect the streaming behavior.

include_usage boolean

If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array.

temperature number

Possible values: <= 2

Controls the randomness of the model. Lower values will make the model more deterministic and higher values will make it more random.

Mathematically, the temperature is used to divide the logits before sampling. A temperature of 0 will always return the most likely token.

When no value is provided, the default value of 1 will be used.

top_p number

Possible values: <= 1

"nucleus" parameter to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. It specifies a probability threshold, below which all less likely tokens are filtered out.

When no value is provided, the default value of 1 will be used.

tools deprecated

This parameter is unsupported and will be rejected.

tool_choice deprecated

This parameter is unsupported and will be rejected.

parallel_tool_calls deprecated

This parameter is unsupported and will be rejected.

user deprecated

This parameter is unsupported and will be ignored.

Responses

application/json

Schema
Example (from schema)

Schema

Array [

id string

An ID that is unique throughout the given request. When multiple chunks are returned using server-sent events, this ID will be the same for all of them.

choices object[]

A list of chat completion choices. Can be more than one if n is greater than 1.

Array [

finish_reason string

Possible values: [stop, length, content_filter]

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence or length if the maximum number of tokens specified in the request was reached. If the API is unable to understand the stop reason emitted by one of the workers, content_filter is returned.

When streaming is enabled, the value is only set in the last chunk of a completion and null otherwise.

index integerrequired

The index of the current chat completion in the conversation. Use this parameter to associate chunks with the correct message stream as chunks might arrive out of order. This is mostly relevant when streaming is enabled and multiple completions are requested.

message object

Chat completion generated by the model when streaming is disabled.

role stringrequired

Possible values: [assistant]

The role of the current chat completion. Will assistant.

content stringrequired

The content of the current chat completion.

delta object

Chat completion chunk generated by the model when streaming is enabled.

role string

Possible values: [assistant]

The role of the current chat completion. Will be assistant for the first chunk of every completion stream and missing for the remaining chunks.

content stringrequired

The content of the current chat completion. Will be empty for the first chunk of every completion stream and non-empty for the remaining chunks.

logprobs object

Log probability information for the choice. null if this is the end of a completion stream.

content object[]

A list of message content tokens with log probability information.

Array [

token stringrequired

The token.

logprob numberrequired

The log probability of the token. If the log probability is not returned by the worker, -9999.0 is used as a fallback.

bytes integer[]required

A list of integers representing the UTF-8 bytes representation of the token.

top_logprobs object[]required

List of the most likely tokens and their log probability, at this token position. In rare cases, there may be fewer than the number of requested top_logprobs returned.

Array [

token stringrequired

The token.

logprob numberrequired

The log probability of the token.

bytes integer[]required

A list of integers representing the UTF-8 bytes representation of the token.

]

created integer

The Unix timestamp (in seconds) of when the chat completion was created.

model string

The ID of the model that generated the completion.

system_fingerprint string

The specific version of the model that generated the completion. This field can be used to track inconsistencies between calls to different deployments of otherwise identical models.

When streaming is enabled, the value is only set in the last chunk of a completion and null otherwise.

object string

Possible values: [chat.completion, chat.completion.chunk]

Will be chat.completion by default and chat.completion.chunk when streaming is enabled.

usage object

Usage statistics for the completion request.

When streaming is enabled, this field will be null by default. To include an additional usage-only message in the response stream, set stream_options.include_usage to true.

completion_tokens integerrequired

Number of tokens in the generated completion.

prompt_tokens integerrequired

Number of tokens in the prompt.

total_tokens integerrequired

Total number of tokens used in the request (prompt + completion).

]

[
  {
    "id": "string",
    "choices": [
      {
        "finish_reason": "stop",
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "string"
        },
        "delta": {
          "role": "assistant",
          "content": "string"
        },
        "logprobs": {
          "content": [
            {
              "token": "string",
              "logprob": 0,
              "bytes": [
                0
              ],
              "top_logprobs": [
                {
                  "token": "string",
                  "logprob": 0,
                  "bytes": [
                    0
                  ]
                }
              ]
            }
          ]
        }
      }
    ],
    "created": 0,
    "model": "string",
    "system_fingerprint": "string",
    "object": "chat.completion",
    "usage": {
      "completion_tokens": 0,
      "prompt_tokens": 0,
      "total_tokens": 0
    }
  }
]

Chat

/chat/completions

Request​

Body

Responses​

Request

Responses