Skip to main content

Steering

Large language models (LLMs) generate text based on patterns they’ve learned from vast amounts of data, but sometimes we want to influence how they respond. Steering is a technique that nudges a model’s responses in a particular direction—without changing the model itself. Instead of manually inserting examples into the prompt, which takes up valuable context space, this method works by identifying underlying patterns in the model’s internal representations. By providing a set of positive and negative examples, we can compute a direction that subtly guides the model’s responses toward the desired style or behavior—like making it speak more formally, use slang, or adopt a specific tone. This approach offers an efficient and flexible way to control LLMs responses in a user-defined manner. Note that this is an experimental feature that will be refined in future versions and is currently only supported for the llama-3.1-8b-instruct model.

Let's see how this works in practice by creating and using a slang steering concept. As outlined in the configuration setup for steering, we first provide two .txt files containing positive and negative examples that define this concept. In this case, the negative examples might be very formal phrases, while the positive examples would be their paraphrased slang counterparts. Once this steering concept is created, we can apply it to any future chat or complete requests, specifying when we want our response to be influenced by it. An important detail to remember is that when submitting a chat or completion request, we must reference the newly created steering concept by prefixing its name with _worker/. For example, if our concept is named slang, we should refer to it as _worker/slang. The following code snippets illustrate this in practice.

If there is any doubt about how complete and chat requests work, please consult the associated links.

Completion Requests

import os
from aleph_alpha_client import Client, CompletionRequest, Prompt

client = Client(
token=os.environ["AA_TOKEN"],
host = f"https://{os.environ['YOUR_CONFIGURED_DOMAIN']}"
)

prompt = "What is some good music I can listen to?"

# Not steered

non_steered_completion_request = CompletionRequest(
prompt=Prompt.from_text(prompt),
maximum_tokens=40,
)
non_steered_completion_response = client.complete(non_steered_completion_request, model="llama-3.1-8b-instruct")
print("------------ Non steered completion response ------------")
print(f"{non_steered_completion_response.completions[0].completion}\n\n")

# Steered

steered_completion_request = CompletionRequest(
prompt=Prompt.from_text(prompt),
maximum_tokens=40,
steering_concepts=["_worker/slang"],
)
steered_completion_response = client.complete(steered_completion_request, model="llama-3.1-8b-instruct")
print("-------------- Steered completion response --------------")
print(f"{steered_completion_response.completions[0].completion}\n\n")

Chat Requests

import os
from aleph_alpha_client import Client
from aleph_alpha_client.chat import (
ChatRequest,
Message,
Role
)

client = Client(
token=os.environ["AA_TOKEN"],
host = f"https://{os.environ['YOUR_CONFIGURED_DOMAIN']}"
)

prompt = "Write an email to a friend inviting him to a barbecue and telling him to bring his dog."


# Not steered

non_steered_chat_request = ChatRequest(
messages=[Message(role=Role.User, content=prompt)],
model="llama-3.1-8b-instruct",
maximum_tokens = 150,
)
non_steered_chat_response = client.chat(non_steered_chat_request, model="llama-3.1-8b-instruct")
print("------------ Non steered chat response ------------")
print(f"{non_steered_chat_response.message.content}\n\n")

# Steered

steered_chat_request = ChatRequest(
messages=[Message(role=Role.User, content=prompt)],
model="llama-3.1-8b-instruct",
maximum_tokens = 150,
steering_concepts=["_worker/slang"],
)
steered_chat_response = client.chat(steered_chat_request, model="llama-3.1-8b-instruct")
print("-------------- Steered chat response --------------")
print(f"{steered_chat_response.message.content}\n\n")

To illustrate the impact of steering, here is an example of how using a slang steering concept can influence responses:

Terminal Output

------------ Non steered chat response ------------
Here is an email to a friend inviting him to a barbecue and asking him to bring his dog:

Subject: Join us for a BBQ this weekend!

Hey [Friend's Name],

Hope you're doing well! I was thinking of firing up the grill this weekend and I'd love for you to come over for a barbecue. The weather is supposed to be great, so it'll be a perfect opportunity to get some sun and enjoy some good food and company.

I'd also love for you to bring your furry friend along! We'd love to have [Dog's Name] join us for the afternoon. We'll make sure to have plenty of treats and belly rubs on hand.

Let me know if you can make it and what time works best for you. I'm planning on starting to cook around 2 pm, but feel free to come over whenever you'd like.

Looking forward to seeing you and [Dog's Name] on Saturday!

Best,
[Your Name]


-------------- Steered chat response --------------
Here's a quick email:

Hey bro!

You free this weekend? I'm throwin' a BBQ at my place and I'm gonna fire up the grill. You gotta come thru! Bring your pup too, I wanna meet 'em! We'll chill, eat some good food, and get this party started!

Hit me back and let me know you're comin'!