Example: Configuring a model
Configuring models consists of two parts: model weight download and worker deployment. This article provides an example showing how to configure the two parts for a given model.
Ensure you specify tolerations according to your node configuration, if required.
Pharia-1 with 256-dimensional embedding head
First, we download both the base model and adapter to the same volume:
models:
- name: models-pharia-1-embedding-256-control
pvcSize: 20Gi
weights:
- repository:
fileName: Pharia-1-Embedding-256-control.tar
targetDirectory: pharia-1-embedding-256-control
- repository:
fileName: Pharia-1-Embedding-256-control-adapter.tar
targetDirectory: pharia-1-embedding-256-control-adapter
The worker checkpoint exposes the embedding adapter for 256-dimensional embeddings:
checkpoints:
- generator:
type: luminous
tokenizer_path: pharia-1-embedding-256-control/vocab.json
pipeline_parallel_size: 1
tensor_parallel_size: 1
weight_set_directories:
- pharia-1-embedding-256-control
- pharia-1-embedding-256-control-adapter
cuda_graph_caching: true
memory_safety_margin: 0.1
task_returning: true
queue: pharia-1-embedding-256-control
tags: []
replicas: 1
version: 0
modelVolumeClaim: models-pharia-1-embedding-256-control
models:
pharia-1-embedding-256-control:
experimental: false
multimodal_enabled: false
completion_type: none
embedding_type: instructable
maximum_completion_tokens: 0
adapter_name: embed-256
bias_name: null
softprompt_name: null
description: Pharia-1-Embedding-256-control. Fine-tuned for instructable embeddings. Has an extra down projection layer to provide 256-dimensional embeddings.
aligned: false
chat_template: null
worker_type: luminous
prompt_template: |-
{% promptrange instruction %}{{instruction}}{% endpromptrange %}
{% if input %}
{% promptrange input %}{{input}}{% endpromptrange %}
{% endif %}
embedding_head: pooling_only