📄️ Attention Manipulation (AtMan)
AtMan is our method to manipulate the attention of an input sequence (this can be a token, a word, or even a whole sentence) to steer the model's prediction in a different contextual direction. With AtMan, you can manipulate attention in both directions, either suppressing or amplifying an input sequence. If you would like to know more about the technical details of AtMan, you can refer to the paper we published.
📄️ Explainability
In the previous section, we explained how you can steer the attention of our models and either suppress or amplify parts of the input sequences. Very roughly speaking, our explainability method uses AtMan to suppress individual parts (depending on the level of granularity you want) of a prompt to find out how they would change the log-probabilities of the already generated completion relative to each other. In the following example, we will investigate which part of the prompt influenced the completion the most.