Attention manipulation (AtMan)

Attention manipulation (AtMan) is the Aleph Alpha method to manipulate the attention of an input sequence by suppression or amplification.

In this article:

What is attention manipulation?
Suppressing an input sequence
Amplifying an input sequence
AtMan for embeddings, evaluation, and multimodal input

What is attention manipulation?

Attention manipulation can be applied to input sequences — such as a token, word, or sentence — to steer the model’s prediction in a different contextual direction.

With AtMan, you can manipulate attention in both directions, either suppressing or amplifying an input sequence. This opens up many opportunities when designing your prompt.

To learn about the technical details of AtMan, see the paper we published.

Suppressing an input sequence

AtMan can suppress the attention that is given to a token, or set of tokens, in an input. For example, the completion for the following prompt without any attention manipulation looks like this:

Hello, my name is Lucas. I like soccer and basketball. Today I will play soccer.

With AtMan, you can suppress any part of a text in your prompt to obtain a different completion. In our example, we suppress "soccer":

Hello, my name is Lucas. I like soccer and basketball. Today I will play basketball with my friends.

The suppression of "soccer" led to a different completion.

Amplifying an input sequence

AtMan also allows you to amplify the attention given to a token. The completion for the following prompt without any attention manipulation looks like this:

I bought a game and a party hat. Tonight I will be wearing the party hat while playing the game.

Let’s say that we really want to play the game tonight. In this case, we amplify the attention paid to "game":

I bought a game and a party hat. Tonight I will be playing games with my friends.

Again, the attention manipulation led to a different completion.

AtMan for embeddings, evaluation, and multimodal input

AtMan can be used not only for text input completions, but also for multimodal input completions, (semantic) embeddings, and evaluation calls.