Chat
POST/chat/completions
Retrieves one or multiple chat completions for a given prompt
Request
- application/json
Body
required
- Array [
- Must alternate between "user" and "assistant" messages.
- Must begin with a "user" message.
- Must end with a "user" message.
- ]
- MOD1
- MOD2
- Array [
- ]
messages object[]required
A list of messages comprising the conversation so far.
Possible values: [system
, user
, assistant
]
The role of the current message.
Only one optional "system" message is allowed at the beginning of the conversation. The remaining conversation:
The content of the current message.
This parameter is unsupported and will be ignored.
The ID of the model to query.
The requested model must be eligible for chat completions.
Possible values: >= -2
and <= 2
When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.
The penalty is cumulative. The more a token is mentioned in the completion, the more its probability will decrease.
logit_bias
When specified, the provided hash map will affect the likelihood of the specified token IDs (!) appearing in the completion.
Mathematically, the bias is added to the logits generated by the model prior to sampling. Values between -1 and 1 should decrease or increase likelihood of selection while values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
Note that since JSON does not support integer keys, the token IDs are represented as strings.
When set to true, the model will return the log probabilities of the sampled tokens in the completion.
Possible values: <= 20
When specified, the model will return the log probabilities of the top n
tokens in the completion.
Possible values: >= 1
The maximum number of tokens to generate in the completion. The model will stop generating tokens once it reaches this length.
The maximum value for this parameter depends on the specific model and the length of the input prompt. When no value is provided, the highest possible value will be used.
Possible values: >= 1
The number of completions to generate for each prompt. The model will generate this many completions and return all of them.
When no value is provided, one completion will be returned.
Possible values: >= -2
and <= 2
When specified, this number will decrease (or increase) the likelihood of repeating tokens that were mentioned prior in the completion.s
The penalty is not cumulative. Mentioning a token more than once will not increase the penalty further.
This parameter is unsupported and will be rejected.
This parameter is unsupported and will be ignored.
This parameter is unsupported and will be ignored.
stop object
When specified, sequence generation will stop when the model generates this token.
string
string
When set to true, the model will transmit all completions tokens as soon as they become available via the server-sent events protocol.
stream_options object
Additional options to affect the streaming behavior.
If set, an additional chunk will be streamed before the data: [DONE]
message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array.
Possible values: <= 2
Controls the randomness of the model. Lower values will make the model more deterministic and higher values will make it more random.
Mathematically, the temperature is used to divide the logits before sampling. A temperature of 0 will always return the most likely token.
When no value is provided, the default value of 1 will be used.
Possible values: <= 1
"nucleus" parameter to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. It specifies a probability threshold, below which all less likely tokens are filtered out.
When no value is provided, the default value of 1 will be used.
This parameter is unsupported and will be rejected.
This parameter is unsupported and will be rejected.
This parameter is unsupported and will be rejected.
This parameter is unsupported and will be ignored.
Responses
- 200
OK
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- Array [
- Array [
- Array [
- ]
- ]
- ]
- ]
An ID that is unique throughout the given request. When multiple chunks are returned using server-sent events, this ID will be the same for all of them.
choices object[]
A list of chat completion choices. Can be more than one if n
is greater than 1.
Possible values: [stop
, length
, content_filter
]
The reason the model stopped generating tokens. This will be stop
if the model hit a natural stop point or a provided stop sequence or length
if the maximum number of tokens specified in the request was reached. If the API is unable to understand the stop reason emitted by one of the workers, content_filter
is returned.
When streaming is enabled, the value is only set in the last chunk of a completion and null
otherwise.
The index of the current chat completion in the conversation. Use this parameter to associate chunks with the correct message stream as chunks might arrive out of order. This is mostly relevant when streaming is enabled and multiple completions are requested.
message object
Chat completion generated by the model when streaming is disabled.
Possible values: [assistant
]
The role of the current chat completion. Will assistant
.
The content of the current chat completion.
delta object
Chat completion chunk generated by the model when streaming is enabled.
Possible values: [assistant
]
The role of the current chat completion. Will be assistant
for the first chunk of every completion stream and missing for the remaining chunks.
The content of the current chat completion. Will be empty for the first chunk of every completion stream and non-empty for the remaining chunks.
logprobs object
Log probability information for the choice. null
if this is the end of a completion stream.
content object[]
A list of message content tokens with log probability information.
The token.
The log probability of the token. If the log probability is not returned by the worker, -9999.0 is used as a fallback.
A list of integers representing the UTF-8 bytes representation of the token.
top_logprobs object[]required
List of the most likely tokens and their log probability, at this token position. In rare cases, there may be fewer than the number of requested top_logprobs
returned.
The token.
The log probability of the token.
A list of integers representing the UTF-8 bytes representation of the token.
The Unix timestamp (in seconds) of when the chat completion was created.
The ID of the model that generated the completion.
The specific version of the model that generated the completion. This field can be used to track inconsistencies between calls to different deployments of otherwise identical models.
When streaming is enabled, the value is only set in the last chunk of a completion and null
otherwise.
Possible values: [chat.completion
, chat.completion.chunk
]
Will be chat.completion
by default and chat.completion.chunk
when streaming is enabled.
usage object
Usage statistics for the completion request.
When streaming is enabled, this field will be null
by default. To include an additional usage-only message in the response stream, set stream_options.include_usage
to true
.
Number of tokens in the generated completion.
Number of tokens in the prompt.
Total number of tokens used in the request (prompt + completion).
[
{
"id": "string",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "string"
},
"delta": {
"role": "assistant",
"content": "string"
},
"logprobs": {
"content": [
{
"token": "string",
"logprob": 0,
"bytes": [
0
],
"top_logprobs": [
{
"token": "string",
"logprob": 0,
"bytes": [
0
]
}
]
}
]
}
}
],
"created": 0,
"model": "string",
"system_fingerprint": "string",
"object": "chat.completion",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0
}
}
]