Example: Configuring a model

Configuring models consists of two parts: model weight download and worker deployment. This article provides an example showing how to configure the two parts for a given model.

Ensure you specify tolerations according to your node configuration, if required.

Pharia-1 with 256-dimensional embedding head

First, we download both the base model and adapter to the same volume:

models:
  - name: models-pharia-1-embedding-256-control
    pvcSize: 20Gi
    weights:
      - repository:
          fileName: Pharia-1-Embedding-256-control.tar
          targetDirectory: pharia-1-embedding-256-control
      - repository:
          fileName: Pharia-1-Embedding-256-control-adapter.tar
          targetDirectory: pharia-1-embedding-256-control-adapter

The worker checkpoint exposes the embedding adapter for 256-dimensional embeddings:

checkpoints:
- generator:
    type: luminous
    tokenizer_path: pharia-1-embedding-256-control/vocab.json
    pipeline_parallel_size: 1
    tensor_parallel_size: 1
    weight_set_directories:
    - pharia-1-embedding-256-control
    - pharia-1-embedding-256-control-adapter
    cuda_graph_caching: true
    memory_safety_margin: 0.1
    task_returning: true
  queue: pharia-1-embedding-256-control
  tags: []
  replicas: 1
  version: 0
  modelVolumeClaim: models-pharia-1-embedding-256-control
  models:
    pharia-1-embedding-256-control:
      experimental: false
      multimodal_enabled: false
      completion_type: none
      embedding_type: instructable
      maximum_completion_tokens: 0
      adapter_name: embed-256
      bias_name: null
      softprompt_name: null
      description: Pharia-1-Embedding-256-control. Fine-tuned for instructable embeddings. Has an extra down projection layer to provide 256-dimensional embeddings.
      aligned: false
      chat_template: null
      worker_type: luminous
      prompt_template: |-
        {% promptrange instruction %}{{instruction}}{% endpromptrange %}
        {% if input %}
        {% promptrange input %}{{input}}{% endpromptrange %}
        {% endif %}
      embedding_head: pooling_only