Using the models

This article describes how to use the Pharia-1-LLM-7B models.


Inference the model

To inference the Pharia-1-LLM-7B models, you need to install the Scaling library. Follow the installation instructions provided in the repository’s README file. After installation, download the model weights and use the Scaling inference module to load the checkpoint, vocabulary, and configuration files.

from pathlib import Path
from scaling.transformer.inference import TransformerInferenceModule
inference_model = TransformerInferenceModule.from_checkpoint(
    checkpoint_dir=Path("path/to/Pharia-1-LLM-7B-control-aligned"),
)
input_text = """<|start_header_id|>user<|end_header_id|>
When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
generation = inference_model.generate(max_tokens=100, input_text=input_text)
print(generation.completion_text)

Prompt formatting

The prompt format used for Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned is identical. It is a derivative of the Llama prompt format. We highly recommend using it when prompting the model to ensure optimal performance.

Token Description

<|begin_of_text|>

Specifies the start of the prompt

<|start_header_id|> / <|end_header_id|>

These enclose the role for a particular message; possible values are system, user, and assistant (see Roles)

<|eot_id|>

End of turn: must be appended to each message

<|endoftext|>

End of text: generated when the model has finished generating

Roles

Both Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned support three different roles:

  • system: Sets the context in which to interact with the AI model. It typically includes rules, guidelines, or necessary information that helps the model respond effectively.

  • user: Represents the human interacting with the model. It includes the inputs, commands, and questions to the model.

  • assistant: Represents the response generated by the AI model based on the context provided in the system and user prompts.

Best practice

To achieve the best results, we recommend the following:

  • Use a system prompt to steer the model, such as "You are a helpful assistant. You give engaging, well-structured answers to user inquiries."

  • Include two new-lines before each message and end the prompt on two new-lines.

Multi-turn interaction

The Pharia-1-LLM-7B models support multi-turn interactions. The following is an example of such an interaction with a system prompt:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>

When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Rome was founded on April 21, 753 BC, according to traditional stories. However, it is difficult to determine the exact date of its founding with certainty.<|eot_id|><|start_header_id|>user<|end_header_id|>

Who was it's founder?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Instructions with long contexts

When providing a longer context with the prompt, we recommend specifying the instructions at the end of the prompt. For example:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>

"Heidelberg is a city in the German state of Baden-Württemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students."

Based on the information provided in quotes above: How many people live in Heidelberg?<|eot_id|><|start_header_id|>assistant<|end_header_id|>