JSON Completion
POSThttps://api.aleph-alpha.com/complete/json
Will complete a prompt using a specific model.
In contrast to the /complete
endpoint, some guardrails are applied which steer the model towards generating a completion of valid JSON format, even if not requested explicitly.
This is achieved by appending a builtin helper prompt to the user prompt and by using a specialized sampling method that favors JSON-compatible tokens.
To obtain a valid model, use GET /models_available
.
Request
Query Parameters
Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones.
- application/json
Bodyrequired
The name of the model from the Luminous model family. Models and their respective architectures can differ in parameter size and capabilities. The most recent version of the model is always used. The model output contains information as to the model version.
Optional parameter that specifies which datacenters may process the request.
You can either set the parameter to "aleph-alpha" or omit it (defaulting to null
).
Not setting this value, or setting it to null
, gives us maximal flexibility in processing your request in our
own datacenters and on servers hosted with other providers. Choose this option for maximum availability.
Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.
Possible values: [aleph-alpha
, null
]
prompt objectrequired
The maximum number of tokens to be generated. Completion will terminate after the maximum number of tokens is reached.
Increase this value to generate longer texts. A text is split into tokens. Usually there are more tokens than words. The sum of input tokens and maximum_tokens may not exceed the model's context window size.
Generate at least this number of tokens before an end-of-text token is generated.
0
Echo the prompt in the completion. This may be especially helpful when log_probs is set to return logprobs for the prompt.
false
A higher sampling temperature encourages the model to produce less probable outputs ("be more creative"). Values are expected in a range from 0.0 to 1.0. Try high values (e.g., 0.9) for a more "creative" response and the default 0.0 for a well defined and repeatable answer. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.
0
Introduces random sampling for generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more creative. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.
0
Introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability top_p. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.
0
The presence penalty reduces the likelihood of generating tokens that are already present in the
generated text (repetition_penalties_include_completion=true
) respectively the prompt (repetition_penalties_include_prompt=true
).
Presence penalty is independent of the number of occurrences. Increase the value to reduce the likelihood of repeating text.
An operation like the following is applied:
logits[t] -> logits[t] - 1 * penalty
where logits[t]
is the logits for any given token. Note that the formula is independent of the number of times
that a token appears.
0
The frequency penalty reduces the likelihood of generating tokens that are already present in the
generated text (repetition_penalties_include_completion=true
) respectively the prompt (repetition_penalties_include_prompt=true
).
If repetition_penalties_include_prompt=True
, this also includes the tokens in the prompt.
Frequency penalty is dependent on the number of occurrences of a token.
An operation like the following is applied:
logits[t] -> logits[t] - count[t] * penalty
where logits[t]
is the logits for any given token and count[t]
is the number of times that token appears.
0
Increasing the sequence penalty reduces the likelihood of reproducing token sequences that already appear in the prompt (if repetition_penalties_include_prompt is True) and prior completion.
0
Minimal number of tokens to be considered as sequence
2
Flag deciding whether presence penalty or frequency penalty are updated from tokens in the prompt
false
Flag deciding whether presence penalty or frequency penalty are updated from tokens in the completion
true
Flag deciding whether presence penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for presence penalty.
false
Flag deciding whether frequency penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for frequency penalty.
false
Flag deciding whether sequence penalty is applied multiplicatively (True) or additively (False).
false
All tokens in this text will be used in addition to the already penalized tokens for repetition penalties. These consist of the already generated completion tokens and the prompt tokens, if repetition_penalties_include_prompt
is set to true
.
null
List of strings that may be generated without penalty, regardless of other penalty settings.
By default, we will also include any stop_sequences
you have set, since completion performance can be degraded if expected stop sequences are penalized.
You can disable this behavior by setting penalty_exceptions_include_stop_sequences
to false
.
By default we include all stop_sequences
in penalty_exceptions
, so as not to penalise the presence of stop sequences that are present in few-shot prompts to give structure to your completions.
You can set this to false
if you do not want this behaviour.
See the description of penalty_exceptions
for more information on what penalty_exceptions
are used for.
true
If a value is given, the number of best_of
completions will be generated on the server side. The completion with the highest log probability per token is returned. If the parameter n
is greater than 1 more than 1 (n
) completions will be returned. best_of
must be strictly greater than n
.
Possible values: <= 100
1
The number of completions to return. If argmax sampling is used (temperature, top_k, top_p are all default) the same completions will be produced. This parameter should only be increased if random sampling is used.
1
Number of top log probabilities for each token generated. Log probabilities can be used in downstream tasks or to assess the model's certainty when producing tokens. No log probabilities are returned if set to None. Log probabilities of generated tokens are returned if set to 0. Log probabilities of generated tokens and top n log probabilities are returned if set to n.
Possible values: <= 20
null
List of strings that will stop generation if they're generated. Stop sequences may be helpful in structured texts. Say the user has specified "tor away" as one of the requested stop sequences and the model has generated the following sequence of tokens ["An", " apple", " a", " day", " keeps", " the", " doctor", " away"]. The user will see "An apple a day keeps the" as the model's response, omitting the last two tokens which contain the stop sequence. Note that even though " doc" is not part of the stop sequence "tor away", it won't appear in the user output since it is part of the token " doctor" which contains part of the stop sequence.
Flag indicating whether individual tokens of the completion should be returned (True) or whether solely the generated text (i.e. the completion) is sufficient (False).
false
Setting this parameter to true forces the raw completion of the model to be returned.
For some models, we may optimize the completion that was generated by the model and
return the optimized completion in the completion field of the CompletionResponse
.
The raw completion, if returned, will contain the un-optimized completion.
Setting tokens to true or log_probs to any value will also trigger the raw completion
to be returned.
false
We continually research optimal ways to work with our models. By default, we apply these optimizations to both your prompt and completion for you.
Our goal is to improve your results while using our API. But you can always pass disable_optimizations: true
and we will leave your prompt and completion untouched.
false
Bias the completion to only generate options within this list; all other tokens are disregarded at sampling
Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa
[]
Only consider the first token for the completion_bias_inclusion
false
Bias the completion to NOT generate options within this list; all other tokens are unaffected in sampling
Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa
[]
Only consider the first token for the completion_bias_exclusion
false
If set to null
, attention control parameters only apply to those tokens that have
explicitly been set in the request.
If set to a non-null value, we apply the control parameters to similar tokens as well.
Controls that have been applied to one token will then be applied to all other tokens
that have at least the similarity score defined by this parameter.
The similarity score is the cosine similarity of token embeddings.
null
true
: apply controls on prompt items by adding the log(control_factor)
to attention scores.
false
: apply controls on prompt items by (attention_scores - -attention_scores.min(-1)) * control_factor
true
When set to true, the model will transmit all completions tokens as soon as they become available via the server-sent events protocol.
false
Specifies how the output of the model should be steered. This steers the output in the direction given by positive examples associated to the steering concept and away from the negative examples.
Possible values: Value must match regular expression ^_worker/[a-zA-Z0-9-_]{1,64}$
[]
Responses
- 200
OK
- application/json
- Schema
- Example (auto)
Schema
- Token ID arrays are used as as-is.
- Text prompt items are tokenized using the tokenizers specific to the model.
- Each image is converted into a fixed amount of tokens that depends on the chosen model.
model name and version (if any) of the used model for inference
completions object[]
optimized_prompt object[]
Number of tokens combined across all completion tasks.
In particular, if you set best_of or n to a number larger than 1 then we report the combined prompt token count for all best_of or n tasks.
Tokenization:
Number of tokens combined across all completion tasks. If multiple completions are returned or best_of is set to a value greater than 1 then this value contains the combined generated token count.
{
"completions": [
{
"completion": "keeps the doctor away.",
"finish_reason": "maximum_tokens"
}
],
"model_version": "2021-12",
"optimized_prompt": "An apple a day",
"num_tokens_prompt_total": 4,
"num_tokens_generated": 5
}
Authorization: http
name: tokentype: httpscheme: bearerdescription: Can be generated in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile)
- csharp
- curl
- dart
- go
- http
- java
- javascript
- kotlin
- c
- nodejs
- objective-c
- ocaml
- php
- powershell
- python
- r
- ruby
- rust
- shell
- swift
- HTTPCLIENT
- RESTSHARP
var client = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Post, "https://api.aleph-alpha.com/complete/json");
request.Headers.Add("Accept", "application/json");
request.Headers.Add("Authorization", "Bearer <token>");
var content = new StringContent("{\n \"model\": \"llama-3.1-8b-instruct\",\n \"prompt\": \"An apple a day\"\n}", null, "application/json");
request.Content = content;
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
Console.WriteLine(await response.Content.ReadAsStringAsync());