Steering
Large language models (LLMs) generate text based on patterns that they have learned from vast amounts of data. In many use cases, however, we need to influence how the model responds. We can do this with a technique called steering.
What is steering?
Steering is a technique that nudges a model’s responses in a particular direction, but without changing the model itself. Instead of describing the desired change in the prompt, which takes up valuable context space, steering works by identifying underlying patterns in the model’s internal representations.
By providing a steering concept, consisting of a set of positive and negative examples, we can compute a direction that subtly guides the model’s responses towards the desired style or behaviour.
For example, we can coax the model to speak more formally, use slang, or adopt a specific tone. This approach offers an efficient and flexible way to control a model’s responses in a user-defined manner.
Note that a steering concept does not depend on the model on which you intend to use the concept. You can use the same steering concept with different models.
Defining steering concepts
You can define steering concepts in two different ways:
-
User-defined steering concepts: You create the steering concepts using the PhariaInference API. It is enabled by default for
llama-3.1-8b-instruct. -
Worker-defined steering concepts You load the examples from text files during worker startup. Note that all worker-defined steering concepts need to be prefixed with
_worker/in completion and chat requests.
See Configuring steering for further details.
Example: A 'slang' steering concept
In this example, consider a slang steering concept. For this, we would provide two text files containing negative and positive examples. The negative examples might include very formal phrases, while the positive examples would be their paraphrased slang counterparts.
Once this steering concept is created, we can apply it to any future chat or completion request, specifying when we want our response to be influenced by it.
Paraphrase a sentence
-
The prompt:
"Reply only with a paraphrased version of the following sentence:
"You are an honest man.""
-
Non-steered completion response (formal language)
Your integrity is evident in your words and actions.
-
Steered completion response (slang)
You're a straight shooter, no games.
Write an email
-
The prompt:
"Write an email to a friend inviting him to a barbecue and telling him to bring his dog."
-
Non-steered chat response (formal language)
Here is an email to a friend inviting him to a barbecue and asking him to bring his dog:
Subject: Join us for a BBQ this weekend!
Hey [Friend's Name],
Hope you're doing well! I was thinking of firing up the grill this weekend and I'd love for you to come over for a barbecue. The weather is supposed to be great, so it'll be a perfect opportunity to get some sun and enjoy some good food and company.
I'd also love for you to bring your furry friend along! We'd love to have [Dog's Name] join us for the afternoon. We'll make sure to have plenty of treats and belly rubs on hand.
Let me know if you can make it and what time works best for you. I'm planning on starting to cook around 2 pm, but feel free to come over whenever you'd like.
Looking forward to seeing you and [Dog's Name] on Saturday!
Best,
[Your Name]
-
Steered chat response (slang)
Here's a casual email to a friend:
Hey buddy!
It's finally time to fire up the grill and get this summer started! I'm having a backyard BBQ on Saturday and I'd love for you to come on over. And don't even think about leaving your furry sidekick at home - bring Max along too! We'll have plenty of space for him to run around and get some treats.
I'll be throwing some burgers, dogs, and veggies on the grill, and we'll have all the fixins' for a good time. It's gonna be a chill afternoon, just hangin' out and enjoying the sunshine. You know, the usual.
So, you and Max come on over around 2? Can't wait to see you!
Cheers,
[Your Name]