Skip to main content

Steering

Large language models (LLMs) generate text based on patterns that they have learned from vast amounts of data. In many use cases, however, we need to influence how the LLMs respond.

Steering is a technique that nudges a model’s responses in a particular direction, but without changing the model itself. Instead of describing the desired change in the prompt, which takes up valuable context space, this method works by identifying underlying patterns in the model’s internal representations.

By providing a steering concept consisting of a set of positive and negative examples, we can compute a direction that subtly guides the model’s responses towards the desired style or behaviour. For example, we can coax it to speak more formally, use slang, or adopt a specific tone. This approach offers an efficient and flexible way to control LLMs' responses in a user-defined manner.

note

User-defined steering is enabled by default for llama-3.1-8b-instruct since feature set 250600. Please refer to the configuration setup for steering for other options or if you want to use another model.

Let's see how this works in practice by creating and using a slang steering concept. In this case, the negative examples might be very formal phrases, while the positive examples would be their paraphrased slang counterparts. Once this steering concept is created, we can apply it to any future chat or complete requests, specifying when we want our response to be influenced by it.

For details on the usage of the completion endpoints, please refer to the PhariaInference API reference documentation for complete and chat.

Setup

To keep the other examples concise, we list the necessary imports and setting up the client only once. Furthermore, llama-3.1-8b-instruct requires using a prompt template which we'll wrap in a little helper function.

import os
from aleph_alpha_client import Client, ChatRequest, CompletionRequest, Prompt
from aleph_alpha_client.chat import Message, Role
from aleph_alpha_client.steering import (
SteeringConceptCreationRequest,
SteeringPairedExample,
)

client = Client(
token=os.environ["PHARIA_TOKEN"],
host="https://inference-api.pharia.example.com",
)
model = "llama-3.1-8b-instruct"

def llama_prompt(text: str) -> Prompt:
return Prompt.from_text(
"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n"
f"{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
)

Creating a steering concept

We use just two negative and two positive examples for brevity. You can find the complete list of examples for a more robust slang steering concept here.

Creating a steering concept returns a unique ID that is later used to steer completion requests. Note that a steering concept does not depend on the model which you intend to use the concept on. You can use the same steering concept with different models.

request = SteeringConceptCreationRequest(
examples=[
SteeringPairedExample(
negative="I appreciate your valuable feedback on this matter.",
positive="Thanks for the real talk, fam.",
),
SteeringPairedExample(
negative="The financial projections indicate significant growth potential.",
positive="Yo, these numbers are looking mad stacked!",
),
]
)

steering_concept_id = client.create_steering_concept(request).id

Usage in completion and chat requests

Both CompletionRequest and ChatRequest accept a list of steering concepts ids via the steering_concepts field. At the moment, only one steering concept at a time is supported.

Completion

prompt = llama_prompt(
"Reply only with a paraphrased version of the following sentence: "
"You are an honest man."
)

non_steered_completion_response = client.complete(
CompletionRequest(prompt=prompt, maximum_tokens=40),
model=model,
)
print("------------------- Non steered completion response -------------------")
print(f"{non_steered_completion_response.completions[0].completion}\n")

steered_completion_response = client.complete(
CompletionRequest(
prompt=prompt,
maximum_tokens=40,
steering_concepts=[steering_concept_id],
),
model=model,
)
print("--------------------- Steered completion response ---------------------")
print(f"{steered_completion_response.completions[0].completion}\n")

Output:

------------------- Non steered completion response -------------------
Your integrity is evident in your words and actions.

--------------------- Steered completion response ---------------------
You're a straight shooter, no games.

Chat

prompt = (
"Write an email to a friend inviting him to a barbecue and "
"telling him to bring his dog."
)

non_steered_chat_request = ChatRequest(
messages=[Message(role=Role.User, content=prompt)],
model=model,
maximum_tokens=150,
)
non_steered_chat_response = client.chat(non_steered_chat_request, model=model)
print("---------------------- Non steered chat response ----------------------")
print(f"{non_steered_chat_response.message.content}\n")

steered_chat_request = ChatRequest(
messages=[Message(role=Role.User, content=prompt)],
model=model,
maximum_tokens=150,
steering_concepts=[steering_concept_id],
)
steered_chat_response = client.chat(steered_chat_request, model=model)
print("------------------------ Steered chat response ------------------------")
print(f"{steered_chat_response.message.content}\n")

Output:

---------------------- Non steered chat response ----------------------
Here is an email to a friend inviting him to a barbecue and asking him
to bring his dog:

Subject: Join us for a BBQ this weekend!

Hey [Friend's Name],

Hope you're doing well! I was thinking of firing up the grill this
weekend and I'd love for you to come over for a barbecue. The weather
is supposed to be great, so it'll be a perfect opportunity to get some
sun and enjoy some good food and company.

I'd also love for you to bring your furry friend along! We'd love to
have [Dog's Name] join us for the afternoon. We'll make sure to have
plenty of treats and belly rubs on hand.

Let me know if you can make it and what time works best for you. I'm
planning on starting to cook around 2 pm, but feel free to come over
whenever you'd like.

Looking forward to seeing you and [Dog's Name] on Saturday!

Best,
[Your Name]

------------------------ Steered chat response ------------------------
Here's a casual email to a friend:

Hey buddy!

It's finally time to fire up the grill and get this summer started! I'm
having a backyard BBQ on Saturday and I'd love for you to come on over.
And don't even think about leaving your furry sidekick at home - bring
Max along too! We'll have plenty of space for him to run around and get
some treats.

I'll be throwing some burgers, dogs, and veggies on the grill, and
we'll have all the fixins' for a good time. It's gonna be a chill
afternoon, just hangin' out and enjoying the sunshine. You know, the
usual.

So, you and Max come on over around 2? Can't wait to see you!

Cheers,
[Your Name]