Uploading data for PhariaFinetuning
To upload a dataset for PhariaFinetuning, open PhariaStudio and click Finetune in the sidebar. Click Click to upload and select the dataset to upload.
The following sections describe the dataset format and limitations that apply to PhariaFinetuning.
Supported data format
Your dataset needs to be in a JSON Lines (JSONL) format with the following structure:
{ "messages":[ { "role":"user", "content":"user_content" }, { "role":"assistant", "content":"assistant_content" } ] }
{ "messages":[ { "role":"user", "content":"user_content2" }, { "role":"assistant", "content":"assistant_content2" } ] }
...
The dataset can also contain system messages, which typically take the format system → user → assistant:
{ "messages":[ { "role":"system", "content":"system_content" }, { "role":"user", "content":"user_content" }, { "role":"assistant", "content":"assistant_content" } ] }
{ "messages":[ { "role":"user", "content":"user_content2" }, { "role":"assistant", "content":"assistant_content2" } ] }
...
In the second case, the dataset is composed of the first message containing system as role, while the remaining messages are alternating user to assistant roles.
Note that although the general structure is validated, the specific order of roles is not.