In essence, transformer-based language models, such as ours, are fancy autocomplete machines. Thus, all they do is find the best continuation for a given input. For example, think about how you would continue the following input text:
Our models attempt to find the best continuation for a given input. For certain tasks, simply providing a natural language instruction to the model may be sufficient to have it complete a task. Completing a task like this is called zero-shot learning. Let’s illustrate this using an example.
Language models can only work with data that is in a digestible format. In essence, such models comprise many large matrices containing floating point numbers. Matrices do not run on characters but on numbers. Therefore, sets of characters are “translated” into sets of integers, so called tokens.