Steering Configuration

Large language models (LLMs) generate text based on patterns they’ve learned from vast amounts of data, but sometimes we want to influence how they respond. Steering is a technique that nudges a model’s responses in a particular direction—without changing the model itself. Instead of manually inserting examples into the prompt, which takes up valuable context space, this method works by identifying underlying patterns in the model’s internal representations. By providing a set of positive and negative examples, we can compute a direction that subtly guides the model’s responses toward the desired style or behavior—like making it speak more formally, use slang, or adopt a specific tone. This approach offers an efficient and flexible way to control LLMs responses in a user-defined manner. Note that this is an experimental feature that will be refined in future versions and is currently only supported for the llama-3.1-8b-instruct model.

Before steering can be applied to model responses, the desired steering concepts must first be defined and configured. This involves two key steps:

Creating the steering configuration – Define the necessary files and organize them in a new folder.
Updating the worker configuration – Modifying the helm chart to set steering_config_dirpath to reference the newly created folder.

At a high level, these steps ensure that the steering configuration is properly loaded during the worker’s startup phase. As a result, whenever a new steering concept is added or an existing one is modified, the corresponding workers must be restarted for the changes to take effect.

1. Creating the steering configuration

Let's explore this in practice by creating a slang steering concept. Please note that all steering concept names must match the following regex: ^[a-zA-Z0-9-_]{1,64}$, which ensures that the entire string is 1 to 64 characters long and contains only letters, digits, hyphens, and underscores. First we create a new directory to store steering configurations: /steering.

This directory will contain the following files:

config.yml
slang-negative.txt
slang-positive.txt

A steering concept is defined by specifying a set of paired negative and positive examples. In this case, the negative examples might be very formal phrases, while the positive examples would be their paraphrased slang counterparts. The examples in a given row of both files should correspond in semantics. i.e. they should be ordered in the same way.

config.yml

This file defines the steering strength and registered concepts. The strength parameter should be set to 0.062 for the llama-3.1-8b-instruct model.

{
    "version": ".unknown.",
    "steering_config": 
    {
        "strength": 0.062,   
        "concepts": ["slang"] # List[string]
    }
    
}

slang-negative.txt

Contains a set of N counterexamples which we want to steer the model away from.

I appreciate your valuable feedback on this matter.
The financial projections indicate significant growth potential.
Please ensure all documentation is submitted by the deadline.
The restaurant's ambiance was sophisticated and the cuisine was exceptional.
Your assistance in this project has been invaluable.
The meteorological forecast predicts inclement weather conditions.
This academic paper requires substantial revision before submission.
The theatrical performance exceeded all expectations.
I encountered significant traffic during my morning commute.
The cellular device appears to be malfunctioning.
Our quarterly earnings have surpassed initial projections.
The real estate market shows signs of increasing stability.
Your conduct during the meeting was highly unprofessional.
I require additional time to complete this assignment.
The automobile requires immediate maintenance attention.
This establishment's customer service is subpar.
I must depart from this gathering immediately.
The compensation package appears quite competitive.
Your argumentative position lacks sufficient evidence.
The social gathering was extremely enjoyable.
Please coordinate with relevant stakeholders regarding this matter.
The apartment's condition has deteriorated significantly.
I found the cinematographic experience rather disappointing.
Your sartorial choices are quite impressive today.
The technological interface requires optimization.
This culinary creation is absolutely magnificent.
The musical composition was incredibly moving.
Please refrain from excessive noise after designated quiet hours.
The romantic relationship has reached its natural conclusion.
The examination results were less than satisfactory.

slang-positive.txt

Contains a set of N examples which showcase the desired style/theme we want the model to follow.

Thanks for the real talk, fam.
Yo, these numbers are looking mad stacked!
Get those papers in ASAP or it's gonna be big yikes.
That spot was mad fancy and the food was straight fire!
You're the real MVP on this one, no cap.
Heads up, weather's gonna be straight trash.
This paper needs major work before it's gucci.
That show was bussin' fr fr!
Traffic was stupid thick this morning, ngl.
My phone's acting mad sus rn.
We're making bank, way more than we thought!
The housing game's finally chilling out.
You were wildin' in that meeting, not gonna lie.
I need a min to get this done, dawg.
Whip's acting up, needs a mechanic ASAP.
This place's service be straight garbage.
Gotta bounce, no cap.
This bag they're offering is pretty lit.
Your receipts ain't adding up, fam.
That party was straight vibing!
Hit up the crew about this real quick.
This crib's gotten mad sketchy.
That movie was mid af.
You drippin' today, no cap!
This app needs mad work fr fr.
This food slaps so hard!
That track hit different, on god.
Keep it down after hours or it's gonna be beef.
We ain't a thing no more, it's done done.
These grades ain't it, chief.

To add another steering concept, such as formalgerman, modify the concepts field in config.yml to be ["slang", "formalgerman"]. Additionally, create two new files:

formalgerman-negative.txt - Should contain collection of informal phrases in German.
formalgerman-positive.txt - Should contain collection of formal phrases in German.

2. Updating the worker configuration

We now create a configmap with our newly created directory containing all the files.

kubectl create configmap steering-llama-3-1-8b-instruct --from-file=steering -n <pharia-ai-install-namespace>

As outlined here, we would then need to access values-override.yaml in the inference-worker.checkpoints section and add two new lines:

steering_config_dirpath = "/steering"
steeringConfigMap: "steering-llama-3-1-8b-instruct"

inference-worker:
  checkpoints:
    - generator:
        type: "luminous"
        pipeline_parallel_size: 1
        tensor_parallel_size: 1
        tokenizer_path: "luminous-base-2022-04/alpha-001-128k.json"
        weight_set_directories: ["luminous-base-2022-04"]
      queue: "luminous-base"
      replicas: 1
      modelVolumeClaim: "pharia-ai-models-luminous-base"
    - generator:
        type: "luminous"
        pipeline_parallel_size: 1
        tensor_parallel_size: 1
        tokenizer_path: "Meta-Llama-3.1-8B-Instruct/tokenizer.json"
        weight_set_directories: ["Meta-Llama-3.1-8B-Instruct"]
        steering_config_dirpath: "/steering"
      queue: "llama-3.1-8b-instruct"
      replicas: 1
      modelVolumeClaim: "pharia-ai-models-llama-3.1-8b-instruct"
      steeringConfigMap: "steering-llama-3-1-8b-instruct"

The helm chart must now be redeployed again for the changes to take effect and for the worker to read these files during its startup phase.

1. Creating the steering configuration​

2. Updating the worker configuration​

1. Creating the steering configuration

2. Updating the worker configuration