Skip to main content

Configuring model weights downloaders

To download models you use the models Helm chart.

In this article, we first explain how to specify the credentials. Then we show how to define a PVC, and finally we dive deeper into how to configure the different supported model sources.

Credentials

If you use a source, you need credentials for the respective source.

This can be done either by using an existing cluster secret or specifying the credentials directly. The expected keys of these secrets can be renamed by using the {name}Key. For example, for modelCredentials use passwordKey to specify the key for the password in the secret, or tokenKey for Hugging Face.

Repository

  • Existing cluster secret:
    modelCredentials:
    existingSecret: ""
  • Specifying the credentials directly (only recommended locally):
    modelCredentials:
    username: ""
    password: ""
    Expected is by default a secret with keys username and password.

Hugging Face

  • Existing cluster secret:
    huggingFaceCredentials:
    existingSecret: ""
    Expected is by default a secret with a key token.
  • Specifying the credentials directly (only recommended locally):
    huggingFaceCredentials:
    token: ""

Object store

  • Existing cluster secret:
    s3Credentials:
    existingSecret: ""
    Expected is by default a secret with the key accessKeyId, secretAccessKey, profile and region. The key profile can probably be left empty.
  • Specifying the credentials directly (only recommended locally):
    s3Credentials:
    accessKeyId: ""
    secretAccessKey: ""
    profile: ""
    region: ""

PVC configuration

A persistent volume claim (PVC) is created by specifying a name and multiple weights to download on this PVC:

models:
- name: <name>
pvcSize: <size, e.g. 40Gi>
weights:
- <source 1>
- <source 2>

# Optional settings:
persistence:
storageClass: <custom storage class>
resources:
requests:
memory: <custom memory request, e.g. 16Gi>
limits:
memory: <custom memory limit, e.g. 16Gi>

All sources must have targetDirectory on the PVC; you need to make sure that these do not conflict. We describe below how to define the targetDirectory. Every targetDirectory must be included in the worker's weight_set_directories or the worker cannot load them.

You may need to set a storage class of the persistent volume to create. This can be done via persistence.storageClass. This can be done globally or for each model. For k3s you need to set it to "local-path", for example.

After downloading, you can find the layout of the final volume in the log output of the downloader job.

Below, we consider only the different source configurations. The examples do contain the full PVC definition, only for one weight of the respective type.

Note: Restricting the model download to persistent volumes in a dedicated availability zone can be achieved by defining respective K8s node tolerations / node selectors (cf. values.yaml in models Helm chart).

Sources

Repository

The registry base URL can be specified by using modelCredentials.registry. The default is alephalpha.jfrog.io/artifactory/model-weights-origin, so you can usually leave it out of the configuration.

The tar ball to download is given by {modelCredentials.registry}/{fileName}.

- repository:
fileName: luminous-base.tar.gz
targetDirectory: luminous-base

HuggingFace

- huggingFace:
model: meta-llama/Meta-Llama-3.1-8B-Instruct
targetDirectory: meta-llama-3.1-8b-instruct

Object storage

The files to download are given by {folder}/*:

- s3:
endpoint: https://object.storage.eu01.onstackit.cloud
folder: <folder path in the bucket>
targetDirectory: pharia-1-llm-7b-control

Complete example

models:
- name: luminous-base
pvcSize: 100Gi
weights:
- repository:
download: luminous-base.tar.gz
targetDirectory: luminous-base-2022-04
- name: models-llama-3.3-70b-instruct
pvcSize: 450Gi
weights:
- huggingFace:
model: meta-llama/Llama-3.3-70B-Instruct
targetDirectory: llama-3.3-70b-instruct-hf
postProcess: "convert-to-luminous llama-3.3-70b-instruct-hf llama-3.3-70b-instruct"
resources:
limits:
cpu: 800m
memory: 16Gi
requests:
cpu: 400m
memory: 16Gi

How to deploy it

First set the credentials to our registry to get access to the models Helm chart.

export AA_REGISTRY_USERNAME=<username> # the account provided to you
export AA_REGISTRY_PASSWORD=<password> # your generated token for the helm.

Once your credentials are set, authenticate your local Helm client with the repository. This step ensures Helm has the necessary access to fetch the PhariaAI chart.

helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"

The following step assumes that the target namespace already exists within your Kubernetes cluster. You can create it like this:

kubectl create namespace <pharia-ai-install-namespace>

Then you can install the chart:

helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
--set modelCredentials.username=$AA_REGISTRY_USERNAME \
--set modelCredentials.password=$AA_REGISTRY_PASSWORD \
-n <pharia-ai-install-namespace>

Troubleshooting

PVCs are not created due to wrong storage class

If the PVCs are not created, it might be due to the wrong storage class. You can change the storage class by overwriting the storageClassName in the values.yaml file. By default it is set to "" as shown here:

persistence:
storageClass: ""
...