How can the model weights downloader be configured?
To download models you have to use the models
helm chart.
First, we'll go over how to specify the credentials, then explain how to define a pvc and then dive deeper into how to configure the different supported model sources.
Credentials
If you use a source, you need credentials for the respective source.
This can be done either by using an existing cluster secret or specifying the credentials directly.
The expected keys of these secrets can be renamed by using the {name}Key
,
e.g. for modelCredentials
use passwordKey
to specify the key for the password in the secret, or tokenKey
for hugging face.
Repository
- Existing cluster secret:
modelCredentials:
existingSecret: "" - Specifying the credentials directly (only recommended locally)
Expected is by default a secret with keys
modelCredentials:
username: ""
password: ""username
andpassword
.
HuggingFace
- Existing cluster secret:
Expected is by default a secret with a key
huggingFaceCredentials:
existingSecret: ""token
. - Specifying the credentials directly (only recommended locally)
huggingFaceCredentials:
token: ""
Object store
- Existing cluster secret:
Expected is by default a secret with the key
s3Credentials:
existingSecret: ""accessKeyId
,secretAccessKey
,profile
andregion
. The keyprofile
can probably be left empty. - Specifying the credentials directly (only recommended locally)
s3Credentials:
accessKeyId: ""
secretAccessKey: ""
profile: ""
region: ""
Pvc configuration
A pvc (persistent volume claim) is created by specifying a name and multiple weights to download on this pvc:
models:
- name: <name>
pvcSize: <size, e.g. 40Gi>
weights:
- <source 1>
- <source 2>
All sources will have targetDirectory
on the pvc, you need to make sure that these do not clash.
We'll describe below how to define the targetDirectory
.
Every targetDirectory
has to be included in the worker's weight_set_directories
or the worker will not load them.
After downloading, you can find the layout of the final volume in the log output of the downloader job.
Below, we will only go over the different source configurations, the examples will not contain the full pvc definition, only for one weight of the respective type.
Sources
Repository
The registry base url can be specified by using modelCredentials.registry
.
The default is alephalpha.jfrog.io/artifactory/model-weights-origin
, so you can mostly leave it out of the configuration.
The tar ball to download is given by {modelCredentials.registry}/{fileName}
.
- repository:
fileName: luminous-base.tar.gz
targetDirectory: luminous-base
HuggingFace
- huggingFace:
model: meta-llama/Meta-Llama-3.1-8B-Instruct
targetDirectory: meta-llama-3.1-8b-instruct
Object storage
The files to download are given by {folder}/*
- s3:
endpoint: https://object.storage.eu01.onstackit.cloud
folder: <folder path in the bucket>
targetDirectory: pharia-1-llm-7b-control
How to deploy it
First set the credentials to our registry to get access to the models
Helm Chart.
export AA_REGISTRY_USERNAME=<username> # the account provided to you
export AA_REGISTRY_PASSWORD=<password> # your generated token for the helm.
Once your credentials are set, authenticate your local Helm client with the repository. This step ensures Helm has the necessary access to fetch the PhariaAI chart.
helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"
The following step assumes that the target namespace already exists within your Kubernetes cluster. You can create it like this:
kubectl create namespace <pharia-ai-install-namespace>
Then you can install the chart:
helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
--set modelCredentials.username=$AA_REGISTRY_USERNAME \
--set modelCredentials.password=$AA_REGISTRY_PASSWORD \
-n <pharia-ai-install-namespace>
Troubleshooting
PVCs are not created due to wrong storage class
If the PVCs are not created, it might be due to the wrong storage class. You can change the storage class by overwriting the storageClassName
in the values.yaml
file.
By default it is set to ""
as shown here:
persistence:
storageClass: ""
...