Skip to main content

Model Parameter

Model

The name of the model to be used. The model name refers to the architecture of the model (e.g. the number of parameters). The latest version of the model is always used.

Maximum Tokens

Different use cases demand completions of varying lengths. By tweaking the maximum tokens parameter, you can influence the length of the model response. Additionally, computing expenses increase with a greater number of input and output tokens. Therefore, try to select an appropriate number of maximum tokens to be generated. If you simply want to generate “True” or “False”, for example, you may want to reduce the maximum number of tokens to something like 3. On the other hand, if you wish to generate an extensive summary, pick a higher maximum token value. Generating more tokens requires more computing capacity and thus a slightly longer computation time and possibly higher costs.

Stop Sequences

To explain stop sequences, let's think back to the vacation example from the Prompting section. If unchecked, the model may continue generating ideas for potential destinations. However, we may only want one potential destination to be returned. In this case, we could make use of a stop sequence. By typing the enter key into the stop sequence window of the playground, the generation will stop at the first line break. Any string of characters can be entered as a stop sequence. However, be aware that this feature is still work in progress. Currently, simply entering a stop string may not yield a satisfactory result. As a temporary workaround, try to start your stop sequence with a whitespace, for example “ plant”. Our engineers are currently invested in fixing this issue!

Temperature

Tweaking temperature allows you to encourage the model to be more or less “creative”. When temperature is 0, the most likely token will be chosen every time. With temperature ≠ 0, the model samples from the probability distribution of tokens. At low temperatures, this distribution is heavily skewed towards the most likely result(s). However, as temperature increases towards 1, less likely results become more probable. In practice, this implies that a higher temperature makes unlikely words more likely to be generated, whereas a lower temperature means the model will tend to choose its most confident guess every time. Low temperatures are beneficial when you need the model to be robust, and higher temperatures may be better for things like poetry, or creative writing.

Top K

Applying top K introduces random sampling from the k most likely tokens. If top K is equal to 0, it is not applied. A low top K means that the next token will be chosen from among the few most likely options. For example, if you wish to sample only from the three most likely tokens, enter top K = 3. As top K increases, the set of options to be sampled from approaches the complete set of possible options.

Top P

The parameter top P introduces random sampling from the smallest set of tokens, whose cumulative probability comes close but does not exceed P. For example, if the tokens “can” and “may” are most likely to be generated at 20% probability and “will” is third place with 15% probability, then a P of 0.5 implies that the model only samples from “can” and “may”. Set top P to 0, if you do not wish to apply it. Similarly to top K, top P limits the set of options to be sampled from to some number of most likely options. In the example above, the parameter must be set to 0.55 in order to include “will”.

Best of

Generates up to ten completions server-side, but returns only the best. The “best” completion is always the most likely one as judged by the model. Let’s say we generate two completions for the prompt “An apple a day”. We get: 1) “keeps the doctor away.” and 2) “is healthy.”. Technically speaking, both completions are valid. However, the first one is way more likely, as is evident when cumulatively multiplying the likelihood of each token. To visualize this, tick Show Probabilities. Because of this, the Best of parameter should only be touched when sampling is used (i.e. when temperature, top K and/or top P are applied). Note that a Best of larger than one multiplies the costs of your request by the chosen value.

Presence Penalty

When applying a positive presence penalty, the model is discouraged from repeating tokens that are already present in the text, independent of how often they are present in the text. Each time a new token is generated, a penalty is applied to all tokens that already occur once or more, thus lowering their chance of being generated. A negative value can be chosen to encourage repetition. For modestly incentivising the model, low penalty values (between 0 and 1) will be sufficient.

Frequency Penalty

Similar to the presence penalty, the frequency penalty reduces the chance of sampling tokens already present in the text. Unlike the presence penalty, the frequency penalty increases with each recurring appearance of the repeating token. Thus, tokens are penalized more heavily the more often they occur.

Penalty includes prompt

If this option is enabled, tokens that appear in both the prompt and the completion are taken into account when penalizing the the next token. Tick this option to prevent the model from repeating tokens from the prompt.

Multiplicative Presence Penalty

Currently, this feature only applies to the presence penalty. However, our engineers are actively working on implementing multiplicative frequency penalties as well. If ticked, each token’s score is multiplied by the penalty before generation. For example, a presence penalty of 0.3 implies that a token’s score is multiplied by 1 - 0.3 = 0.7. If unchecked, the penalty will be subtracted from the score instead.

note

If a parameter is non-zero, it is applied during the completion. The sequence is always the following:

  1. Presence and frequency penalty
  2. Top K
  3. Top P
  4. Temperature

Show Probabilities

Display alternative tokens and their respective probabilities of being generated. This function is particularly useful if you are trying to figure out why the model acted the way it did.