Configuring model weights downloaders
To download models you use the models Helm chart.
In this article, we first explain how to specify the credentials. Then we show how to define a PVC, and finally we dive deeper into how to configure the different supported model sources.
Credentials
If you use a source, you need credentials for the respective source.
This can be done either by using an existing cluster secret or specifying the credentials directly.
The expected keys of these secrets can be renamed by using the {name}Key. For example, for modelCredentials use passwordKey to specify the key for the password in the secret, or tokenKey for Hugging Face.
Repository
- Existing cluster secret:
modelCredentials:
existingSecret: "" - Specifying the credentials directly (only recommended locally):
Expected is by default a secret with keys
modelCredentials:
username: ""
password: ""usernameandpassword.
Hugging Face
- Existing cluster secret:
Expected is by default a secret with a key
huggingFaceCredentials:
existingSecret: ""token. - Specifying the credentials directly (only recommended locally):
huggingFaceCredentials:
token: ""
Object store
- Existing cluster secret:
Expected is by default a secret with the key
s3Credentials:
existingSecret: ""accessKeyId,secretAccessKey,profileandregion. The keyprofilecan probably be left empty. - Specifying the credentials directly (only recommended locally):
s3Credentials:
accessKeyId: ""
secretAccessKey: ""
profile: ""
region: ""
PVC configuration
A persistent volume claim (PVC) is created by specifying a name and multiple weights to download on this PVC:
models:
- name: <name>
pvcSize: <size, e.g. 40Gi>
weights:
- <source 1>
- <source 2>
# Optional settings:
persistence:
storageClass: <custom storage class>
resources:
requests:
memory: <custom memory request, e.g. 16Gi>
limits:
memory: <custom memory limit, e.g. 16Gi>
All sources must have targetDirectory on the PVC; you need to make sure that these do not conflict.
We describe below how to define the targetDirectory.
Every targetDirectory must be included in the worker's weight_set_directories or the worker cannot load them.
You may need to set a storage class of the persistent volume to create. This can be done via persistence.storageClass. This can be done globally or for each model.
For k3s you need to set it to "local-path", for example.
After downloading, you can find the layout of the final volume in the log output of the downloader job.
Below, we consider only the different source configurations. The examples do contain the full PVC definition, only for one weight of the respective type.
Note: Restricting the model download to persistent volumes in a dedicated availability zone can be achieved by defining respective K8s node tolerations / node selectors (cf.
values.yamlinmodelsHelm chart).
Sources
Repository
The registry base URL can be specified by using modelCredentials.registry.
The default is alephalpha.jfrog.io/artifactory/model-weights-origin, so you can usually leave it out of the configuration.
The tar ball to download is given by {modelCredentials.registry}/{fileName}.
- repository:
fileName: luminous-base.tar.gz
targetDirectory: luminous-base
HuggingFace
- huggingFace:
model: meta-llama/Meta-Llama-3.1-8B-Instruct
targetDirectory: meta-llama-3.1-8b-instruct
Object storage
The files to download are given by {folder}/*:
- s3:
endpoint: https://object.storage.eu01.onstackit.cloud
folder: <folder path in the bucket>
targetDirectory: pharia-1-llm-7b-control
Complete example
models:
- name: luminous-base
pvcSize: 100Gi
weights:
- repository:
download: luminous-base.tar.gz
targetDirectory: luminous-base-2022-04
- name: models-llama-3.3-70b-instruct
pvcSize: 450Gi
weights:
- huggingFace:
model: meta-llama/Llama-3.3-70B-Instruct
targetDirectory: llama-3.3-70b-instruct-hf
postProcess: "convert-to-luminous llama-3.3-70b-instruct-hf llama-3.3-70b-instruct"
resources:
limits:
cpu: 800m
memory: 16Gi
requests:
cpu: 400m
memory: 16Gi
How to deploy it
First set the credentials to our registry to get access to the models Helm chart.
export AA_REGISTRY_USERNAME=<username> # the account provided to you
export AA_REGISTRY_PASSWORD=<password> # your generated token for the helm.
Once your credentials are set, authenticate your local Helm client with the repository. This step ensures Helm has the necessary access to fetch the PhariaAI chart.
helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"
The following step assumes that the target namespace already exists within your Kubernetes cluster. You can create it like this:
kubectl create namespace <pharia-ai-install-namespace>
Then you can install the chart:
helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
--set modelCredentials.username=$AA_REGISTRY_USERNAME \
--set modelCredentials.password=$AA_REGISTRY_PASSWORD \
-n <pharia-ai-install-namespace>
Troubleshooting
PVCs are not created due to wrong storage class
If the PVCs are not created, it might be due to the wrong storage class. You can change the storage class by overwriting the storageClassName in the values.yaml file.
By default it is set to "" as shown here:
persistence:
storageClass: ""
...