Skip to main content

How to use Pharia-1-LLM-7B?

Inference

To inference the model, you’ll first need to install the Scaling library. Follow the installation instructions provided in the repository's README file. After installation, download the model weights and use the Scaling inference module to load the checkpoint, vocabulary, and configuration files.

from pathlib import Path
from scaling.transformer.inference import TransformerInferenceModule
inference_model = TransformerInferenceModule.from_checkpoint(
checkpoint_dir=Path("path/to/Pharia-1-LLM-7B-control-aligned"),
)
input_text = """<|start_header_id|>user<|end_header_id|>
When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
generation = inference_model.generate(max_tokens=100, input_text=input_text)
print(generation.completion_text)

Prompt formatting

The prompt format used for Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned is identical and a derivative of the Llama prompt format. We highly recommend using it when prompting the model to ensure optimal performance.

  • <|begin_of_text|>: Specifies the start of the prompt

  • <|start_header_id|> / <|end_header_id|>: These tokens enclose the role for a particular message. Possible values are: [system, user, assistant]

  • <|eot_id|>: End of turn. Should be appended after each message.

  • <|endoftext|>: End of text. Will be generated when the model has finished generating.

Both Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned support three different roles:

  • system: Sets the context in which to interact with the AI model. It typically includes rules, guidelines, or necessary information that helps the model respond effectively.

  • user: Represents the human interacting with the model. It includes the inputs, commands, and questions to the model.

  • assistant: Represents the response generated by the AI model based on the context provided in the system and user prompts.

To achieve the best results, we recommend…

  • utilizing a system prompt to steer the model, such as You are a helpful assistant. You give engaging, well-structured answers to user inquiries.

  • including two newlines before each message and ending the prompt on two newlines.

Multi-turn interaction

The Pharia-1-LLM-7B models support multi-turn interactions. Here is an example of such an interaction with a system prompt:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>

When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Rome was founded on April 21, 753 BC, according to traditional stories. However, it is difficult to determine the exact date of its founding with certainty.<|eot_id|><|start_header_id|>user<|end_header_id|>

Who was it's founder?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Instructions with long contexts

When providing a longer context with the prompt, we recommend specifying the instructions at the end of the prompt.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>

"Heidelberg is a city in the German state of Baden-Württemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students."

Based on the information provided in quotes above: How many people live in Heidelberg?<|eot_id|><|start_header_id|>assistant<|end_header_id|>