Model settings

This article describes all the supported settings for models in PhariaStudio.

In this article:

Maximum tokens (int, optional, default none)
Temperature (float, optional, default 0.0)
Top K (int, optional, default 0)
Top P (float, optional, default 0.0)
Repetition penalties include completion (bool, optional, default true)
Repetition penalties include prompt (bool, optional, default false)
Presence penalty (float, optional, default 0.0)
Frequency penalty (float, optional, default 0.0)
Use multiplicative presence penalty (bool, optional, default true)
Penalty bias (string, optional)
Penalty exceptions (List(str), optional)
Stop sequences (List(str), optional, default none)
Penalty exceptions include stop sequences (bool, optional, default true)
Disable optimisations (bool, optional, default false)
Minimum tokens (int, default 0)
Echo (bool, default false)
Use multiplicative frequency penalty (bool, default false)
Sequence penalty (float, default 0.0)
Sequence penalty min length (int, default 2)
Use multiplicative sequence penalty (bool, default false)
Completion bias inclusion (List[str], default [])
Completion bias inclusion first token only (bool, default false)
Completion bias exclusion (List[str], default [])
Completion bias exclusion first token only (bool, default false)
Contextual control threshold (float, default 0)
Control log additive (bool, default true)
Raw completion (bool, default false)

Maximum tokens (int, optional, default none)

The maximum number of tokens to be generated. Prompt completion terminates when the maximum number of tokens is reached. Increase this value to generate longer texts.

Temperature (float, optional, default 0.0)

A higher sampling temperature encourages the model to be "more creative", that is, to produce less probable outputs. Values are expected in a range from 0.0 to 1.0. You can try high values (such as 0.9) for a more "creative" response. The default 0.0 usually produces a well defined and repeatable answer.

It is recommended to use temperature or Top K or Top P (see below), but not all at the same time. If a combination of temperature, Top K or Top P is used, the rescaling of logits with temperature is performed first. Then Top K is applied. Finally, Top P is applied.

Top K (int, optional, default 0)

Top K introduces random sampling from generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more "creative". Set to 0 if you want to produce repeatable output.

Top P (float, optional, default 0.0)

Top P introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability Top P. Set to 0.0 if you want to produce repeatable output.

It is recommended to use temperature or Top K or Top P, but not all at the same time. If a combination of temperature, Top K or Top P is used, the rescaling of logits with temperature is performed first. Then Top K is applied. Finally, Top P is applied.

Repetition penalties include completion (bool, optional, default true)

This option determines whether the presence penalty and/or frequency penalty (see below) are updated by the generated completion text.

Repetition penalties include prompt (bool, optional, default false)

This option determines whether the presence penalty and/or frequency penalty (see below) are updated by the prompt text.

Presence penalty (float, optional, default 0.0)

The presence penalty reduces the probability of generating tokens that are already present in the generated text (Repetition penalties include completion is true) or in the prompt (Repetition penalties include prompt is true).

The presence penalty is independent of the number of occurences. Increase the value to produce text that does not repeat the input.

Frequency penalty (float, optional, default 0.0)

The frequency penalty reduces the probability of generating tokens that are already present in the generated text (Repetition penalties include completion is true) or in the prompt (Repetition penalties include prompt is true).

The frequency penalty is dependent on the number of occurences of a token. Increase the value to produce text that reduces repetition in the input.

Use multiplicative presence penalty (bool, optional, default true)

This option determines whether the presence penalty is applied multiplicatively (true) or additively (false). This changes the formula for presence and frequency penalties.

Penalty bias (string, optional)

If set, all tokens in this text are used in addition to the already penalised tokens for repetition penalties. These consist of the generated completion tokens if Repetition penalties include completion is true and the prompt tokens if Repetition penalties include prompt is true.

Penalty exceptions (List(str), optional)

You can provide a list of strings that can be generated without penalty, regardless of other penalty settings.

This option is particularly useful for a completion that uses a structured few-shot prompt.

Stop sequences (List(str), optional, default none)

You can provide a list of strings that stop generation if they are themselves generated.

Stop sequences are useful in structured texts.

Penalty exceptions include stop sequences (bool, optional, default true)

By default, stop sequences are included in the penalty exceptions. This avoids penalising the presence of stop sequences in few-shot prompts to provide structure to your completions.

Set this to false to exclude stop sequences from the penalty exceptions.

Disable optimisations (bool, optional, default false)

We continually research optimal ways to work with our models. By default, we apply these optimisations to both your prompt and completion. This helps to improve your results while using our API.

Set this option to true to keep your prompt and completion unaffected by any optimisations.

Minimum tokens (int, default 0)

Generate at least this number of tokens before an end-of-text token is generated.

Echo (bool, default false)

Include the prompt in the completion. This can be helpful when log_probs is set to return logprobs for the prompt.

Use multiplicative frequency penalty (bool, default false)

This option determines whether the frequency penalty is applied multiplicatively (true) or additively (false). This changes the formula for presence and frequency penalties.

Sequence penalty (float, default 0.0)

A higher sequence penalty reduces the probability of reproducing token sequences that already appear in the prompt (Repetition penalties include prompt is true) or previous completions (Repetition penalties include prompt is true).

Sequence penalty min length (int, default 2)

This option defines the minimal number of tokens to be considered as a sequence. The value must be two or greater.

Use multiplicative sequence penalty (bool, default false)

This option determines whether the sequence penalty is applied multiplicatively (true) or additively (false).

Completion bias inclusion (List[str], default [])

You can bias the completion results by generating only the strings that you include in this list. All other tokens are disregarded in sampling.

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list, and vice versa.

Completion bias inclusion first token only (bool, default false)

This option applies the Completion bias inclusion list only to the first token.

Completion bias exclusion (List[str], default [])

You can bias the completion results by not generating the strings that you include in this list. All other tokens are unaffected in sampling.

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list, and vice versa.

Completion bias exclusion first token only (bool, default false)

This option applies the Completion bias exclusion list only to the first token.

Contextual control threshold (float, default 0)

When set to zero, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-zero value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token are applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

Control log additive (bool, default true)

When set to true, control is applied by adding the log(control_factor) to the attention scores.

When set to false: control is applied by (attention_scores - - attention_scores.min(-1)) * control_factor.

Raw completion (bool, default false)

When set to true, the option forces the raw completion of the model to be returned. For some models, we may optimise the completion generated by the model and return the optimised completion in the completion field of the CompletionResponse. The raw completion, if returned, contains the non-optimised completion.