PhariaFinetuning Guided Walk-through

In this tutorial, we'll go through the full finetuning workflow, starting from downloading the dataset, converting it into the required format, uploading it, choosing the hyperparameters, starting a finetuning job and deploying the model.

Dataset

For the dataset we will use LDJnr/Pure-Dove dataset from huggingface which contains over 3.8K multi-turn examples.

Putting the data in the right format

Since the dataset is slightly different from the format needed for the finetuning service, we will adapt it with the following script.

import json

from datasets import load_dataset


def convert_to_messages_format(conversation):
    """Convert a conversation with multiple turns to the desired message format."""
    messages = []

    # Process each turn in the conversation
    for turn in conversation:
        # Add user message
        messages.append({"role": "user", "content": turn["input"]})

        # Add assistant message
        messages.append({"role": "assistant", "content": turn["output"]})

    # Create the final dictionary structure
    return {"messages": messages}


def convert_dataset_to_jsonl(output_file="formatted_data.jsonl"):
    """Convert the entire dataset to JSONL format."""
    # Load the dataset
    ds = load_dataset("LDJnr/Pure-Dove")

    # Open the output file in write mode
    with open(output_file, "w", encoding="utf-8") as f:
        # Process each example in the training set
        for example in ds["train"]:
            # Convert the conversation to the desired format
            formatted_data = convert_to_messages_format(example["conversation"])

            # Write the formatted data as a JSON line
            json.dump(formatted_data, f, ensure_ascii=False)
            f.write("\n")


if __name__ == "__main__":
    # Convert the dataset and save to formatted_data.jsonl
    convert_dataset_to_jsonl()
    print("Conversion completed! Check formatted_data.jsonl for results.")

Now you have a file called formatted_data.jsonl that is in the desired chat format.

Uploading the data

The finetuning service allows users to upload data via the API before starting their finetuning jobs.

You can upload the data through the Swagger UI at:

https://pharia-finetuning-api.<YOUR_CONFIGURED_URL_POSTFIX>/docs#/

where <YOUR_CONFIGURED_URL_POSTFIX> is the URL postfix configured during the installation of the Pharia-finetuning Helm chart.

Go to the PhariaStudio page and log in if necessary
In the upper right corner, click on your profile
In the popup, click on Copy Bearer Token

copy-bearer-token

Once you have the token:

Click on the "Authorize" button in the top-right corner of the Swagger UI.
Paste your token to authenticate.
After authorization, you can safely close the popup window.

authorize

Upload dataset using the /api/v1/finetuning/datasets route in Swagger UI.

You will get the following response:

{
  "dataset_id": "<your-dataset-id>",  # example "2dbac66e-b405-498e-8e1c-7c284d700266"
  "validation_dataset_id": null,
  "limit_samples": null
}

Now that we have uploaded our data, we are ready to put it to good use and finetune our model.

Finetuning

Job Submission

Follow these steps to submit a finetuning job:

Click on the /api/v1/finetuning/jobs endpoint to start a new job.
Click "Try it out" to enter your parameters.
Fill in the request body with the necessary parameters:

You can choose whatever LLAMA model by inserting the following text:

meta-llama/Llama-3.1-8B-Instruct in the model_name field.

In the dataset_id you can put the dataset_id from the upload part before. The finetuning type allows the user to choose between a full Supervised Finetuning (SFT) or using Low-Rank Adaptation (LoRA) as a Performance Efficient Finetuning (PEFT) alternative. In this case, we will perform a full finetuning by choosing full instead of lora.

You can choose to change the existing hyperparameters n_epochs, learning_rate_multiplier and batch_size or you can leave them as is.

Once submitted, you will receive a submission_id in the response under job_id. This serves as the unique identifier for your job.

Job Status

You can use this submission_id to checkout the status of your job via the /api/v1/finetuning/jobs/{job_id} route.

Since this is a POST route, click "Try it out".
Enter the job_id (it's the same as submission_id) you want details for and click "Execute".
The response will be a single JSON object containing job details.

Now you can see the status of your training, you can also see the loss and the evaluation metrics such as perplexity via the same route.

Once the training has finished, you have a finetuned model, congratulations! 🎉

Let's now move to the deployment part.

Deploying

Moving the weights to the worker

First step is to make the model available to the worker, so it could pick it up and deploy it.

To do this you need to add the following configuration to the values.yaml file for the installation of pharia-ai-models helm chart:

models:
  - name: models-<you-model-name> # must be a lowercase
    pvcSize: 100Gi
    weights:
      - s3:
          # TODO: align this with finetuning API Job object. this is the first part of the checkpoints field in the finetuning API Job object
          endpoint: <your-storage-endpoint> # your storage endpoint
          # this is the second part of the checkpoints field in the finetuning API Job object
          folder: <path for you model weights inside you storage> # has to end with checkpoint.ckpt
          targetDirectory: <you-model-name>

s3Credentials: # use the same credentials you use in the pharia-ai helm chart values in pharia-finetuning
  accessKeyId: ""
  secretAccessKey: ""
  profile: "" # can be left empty
  region: ""

Further information on downloading model weights from object storage can be found here.

To trigger the download your finetuned model you need to re-deploy the models helm chart. Further information on how to deploy the changes can be found here.

This makes the model available to be served by inference workers which are configured in the next step.

Now that we have moved the weights and the worker can see them, we need to configure the worker.

Configuring the worker

To configure the worker, we add the following configuration to the values.yaml file that we use to install the pharia-ai helm chart.

inference-worker:
  ...
  checkpoints:
    ...
    - generator:
        type: vllm
        pipeline_parallel_size: 1
        tensor_parallel_size: 1
        model_path: /models/<you-model-name>
      queue: <you-model-name>
      replicas: 1
      modelVolumeClaim: models-<you-model-name>

basically this configuration determines whether you want to shard your model or not or how many replicas of your model should be deployed among other things.

Further information on worker deployment can be found here.

The final step now is to tell the scheduler that we have a newly configured worker ready to start.

Make the scheduler aware of the new worker

We can achieve this by adding the following configuration to the values.yaml file, the same one from the previous step.

inference-api:
  ...
  modelsOverride:
    ...
      <you-model-name>:
        checkpoint: <you-model-name>
        experimental: false
        multimodal_enabled: false
        completion_type: full
        embedding_type: null
        maximum_completion_tokens: 8192
        adapter_name: null
        bias_name: null
        softprompt_name: null
        description: Your description here
        aligned: false
        chat_template:
          template: |-
            {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

            '+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>

            ' }}
          bos_token: <|begin_of_text|>
          eos_token: <|endoftext|>
        worker_type: vllm # this needs to be the same worker type as defined in step 2
        prompt_template: |-
          <|begin_of_text|>{% for message in messages %}<|start_header_id|>{{message.role}}<|end_header_id|>

          {% promptrange instruction %}{{message.content}}{% endpromptrange %}<|eot_id|>{% endfor %}<|start_header_id|>assistant<|end_header_id|>

          {% if response_prefix %}{{response_prefix}}{% endif %}

Now you are good to go, you have successfully downloaded, formatted and uploaded new data, finetuned a model and deployed it.

You are now able to see the finetuned model in studio, and we can verify that the model has been finetuned by asking this question from the dataset:

Write a sentence with spelling mistakes.

authorize

Dataset​

Putting the data in the right format​

Uploading the data​

Finetuning​

Job Submission​

Job Status​

Deploying​

Moving the weights to the worker​

Configuring the worker​

Make the scheduler aware of the new worker​