Skip to main content

How can the model weights downloader be configured?

To download models you have to use the models helm chart. First, we'll go over how to specify the credentials, then explain how to define a pvc and then dive deeper into how to configure the different supported model sources.

Credentials

If you use a source, you need credentials for the respective source.

This can be done either by using an existing cluster secret or specifying the credentials directly. The expected keys of these secrets can be renamed by using the {name}Key, e.g. for modelCredentials use passwordKey to specify the key for the password in the secret, or tokenKey for hugging face.

Repository

  • Existing cluster secret:
    modelCredentials:
    existingSecret: ""
  • Specifying the credentials directly (only recommended locally)
    modelCredentials:
    username: ""
    password: ""
    Expected is by default a secret with keys username and password.

HuggingFace

  • Existing cluster secret:
    huggingFaceCredentials:
    existingSecret: ""
    Expected is by default a secret with a key token.
  • Specifying the credentials directly (only recommended locally)
    huggingFaceCredentials:
    token: ""

Object store

  • Existing cluster secret:
    s3Credentials:
    existingSecret: ""
    Expected is by default a secret with the key accessKeyId, secretAccessKey, profile and region. The key profile can probably be left empty.
  • Specifying the credentials directly (only recommended locally)
    s3Credentials:
    accessKeyId: ""
    secretAccessKey: ""
    profile: ""
    region: ""

Pvc configuration

A pvc (persistent volume claim) is created by specifying a name and multiple weights to download on this pvc:

models:
- name: <name>
pvcSize: <size, e.g. 40Gi>
weights:
- <source 1>
- <source 2>

All sources will have targetDirectory on the pvc, you need to make sure that these do not clash. We'll describe below how to define the targetDirectory. Every targetDirectory has to be included in the worker's weight_set_directories or the worker will not load them.

After downloading, you can find the layout of the final volume in the log output of the downloader job.

Below, we will only go over the different source configurations, the examples will not contain the full pvc definition, only for one weight of the respective type.

Sources

Repository

The registry base url can be specified by using modelCredentials.registry. The default is alephalpha.jfrog.io/artifactory/model-weights-origin, so you can mostly leave it out of the configuration.

The tar ball to download is given by {modelCredentials.registry}/{fileName}.

- repository:
fileName: luminous-base.tar.gz
targetDirectory: luminous-base

HuggingFace

- huggingFace:
model: meta-llama/Meta-Llama-3.1-8B-Instruct
targetDirectory: meta-llama-3.1-8b-instruct

Object storage

The files to download are given by {folder}/*

- s3:
endpoint: https://object.storage.eu01.onstackit.cloud
folder: <folder path in the bucket>
targetDirectory: pharia-1-llm-7b-control

How to deploy it

First set the credentials to our registry to get access to the models Helm Chart.

export AA_REGISTRY_USERNAME=<username> # the account provided to you
export AA_REGISTRY_PASSWORD=<password> # your generated token for the helm.

Once your credentials are set, authenticate your local Helm client with the repository. This step ensures Helm has the necessary access to fetch the PhariaAI chart.

helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"

The following step assumes that the target namespace already exists within your Kubernetes cluster. You can create it like this:

kubectl create namespace <pharia-ai-install-namespace>

Then you can install the chart:

helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
--set modelCredentials.username=$AA_REGISTRY_USERNAME \
--set modelCredentials.password=$AA_REGISTRY_PASSWORD \
-n <pharia-ai-install-namespace>

Troubleshooting

PVCs are not created due to wrong storage class

If the PVCs are not created, it might be due to the wrong storage class. You can change the storage class by overwriting the storageClassName in the values.yaml file. By default it is set to "" as shown here:

persistence:
storageClass: ""
...