Storing an evaluation dataset in PhariaStudio

Prerequisites

Follow the instructions in Creating examples for an evaluation dataset to create a list of examples before creating and storing an evaluation dataset.

Add required dependencies

from pharia_studio_sdk.connectors import StudioClient
from pharia_studio_sdk.evaluation import (
    Example,
    StudioDatasetRepository
)

Submit the dataset using code

We initialise the PhariaStudio client linking to an existing project:

studio_client = StudioClient("Test Evaluation")

When the client is initialised and pointing to a project, you can submit the dataset:

studio_dataset_repo = StudioDatasetRepository(studio_client=studio_client)

studio_dataset = studio_dataset_repo.create_dataset(
    examples=examples,
    dataset_name="Jokes",
    metadata={"description": "This is an extensive list of jokes"},
)

View the dataset in PhariaStudio

After you submit the dataset, you can view it in the Dataset section of PhariaStudio:

PhariaStudio - dataset list

The dataset ID is needed to create a benchmark object. To copy the ID, click on the kebab menu icon and select Copy ID:

PhariaStudio - copy dataset ID

Click on a line in the Datasets table to display the content of that dataset:

PhariaStudio - dataset details

Upload the dataset using the PhariaStudio portal

The dataset file must be in the JSON lines format and each example must contain a unique ID.

You can upload a new dataset in the PhariaStudio portal, as follows:

  1. In the Evaluate menu in the sidebar, select Datasets.
    If the current project does not have any datasets, PhariaStudio displays a code snippet to enable the creation of a dataset using code.

  2. Click Upload Dataset.

  3. Upload a dataset file, or drag and drop one into the popup.