Introduction
This document covers:
- Prerequisites - The requirements of PhariaAI
- The installation process that includes
- Model Weights - How to download and configure model weights
- Installation - The process to install PhariaAI for the first time
- Upgrade - The process to upgrade an existing PhariaAI installation to the latest release
Prerequisites
Access to our software
The PhariaAI stack is bundled as a single Helm chart, containing all relevant dependencies as Helm sub-charts for the entire stack. Upon signing your contract, we will provide you with accounts to our Software Self-Service Artifactory.
You will need an username and access token to be used by Helm to download the bundled chart from our registry, as well as to be provided as a docker-registry secret in your cluster. Furthermore, the credentials can be used for downloading the respective model data.
You have to log in at our artifactory to create the respective access token (see Registry Credential Setup).
Hardware Requirements
- 3 NVIDIA GPUs of Ampere, Lovelace or Hopper generation, with 40GB VRAM each.
- 24 CPU cores, 128 GB RAM
Software Requirements
- Kubernetes version >= 1.29
- The cluster should have an ingress controller for allowing external access to the PhariaAI service and a certificate-manager setup for supporting access via TLS.
- For installing the inference stack, the Kubernetes requires GPU nodes to run the respective application Pods
- We recommend to use the NVIDIA GPU Operator to prepare your GPU nodes with a recent NVIDIA driver version and the necessary libraries. We have tested our stack on clusters using version >= 24 of the GPU Operator and default settings.
- We require persistent volumes that are available to all GPU nodes in the cluster to store model weights. I.e., download jobs will store the model weights on persistent volumes and the Worker deployment will later read them from these. This could fail if persistent volumes are bound to availability zones and the download job and Worker deployment run in different availability zones.
- Helm version >= 3.0
- While we provide databases as part of our Helm dependencies, we recommend that you provide your own databases for productive setups.
- We require PostgreSQL version >= 14
How to operate PhariaAI
PhariaAI can be operated on any suitable Kubernetes cluster using the Helm chart provided in this repository: https://alephalpha.jfrog.io/artifactory/helm/pharia-ai/. The Helm chart will install the necessary components to run the PhariaAI models on your cluster.
Registry Credential Setup
If you have not already done so, you can create a token with your account on Software Self-Service Artifactory under this artifact path:
Click the "Set me Up" button to generate a token.
For the purpose of this instruction, the credentials are exported to environment variables.
export AA_REGISTRY_USERNAME=<username> # the account provided to you
export AA_REGISTRY_PASSWORD=<password> # your generated token for the helm.
Once your credentials are set, authenticate your local Helm client with the repository. This step ensures Helm has the necessary access to fetch the PhariaAI chart.
helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"
How to download the model weights for the large language models
The model weights are available in our JFrog Artifactory instance and need to be downloaded beforehand to be used by the inference stack of the PhariaAI installation.
We have prepared a separate Helm chart for downloading the model weights to persistent volumes in your cluster. The Helm chars deploys persistent volume claims and Kubernetes jobs for triggering the download.
By default, the chart deployment downloads the model weights for luminous-base
and pharia1-llm-7b-control
. If you want to download only those default models, run the following commands:
helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
--set modelCredentials.username=$AA_REGISTRY_USERNAME \
--set modelCredentials.password=$AA_REGISTRY_PASSWORD \
-n <pharia-ai-install-namespace>
If you want to download additional models, you can configure the models to download in a separate values.yaml
file like this:
models:
- name: luminous-base
check_directory: luminous-base-2022-04
download: luminous-base.tar.gz
pvcSize: 100Gi
- name: Pharia-1-LLM-7B-control
check_directory: Pharia-1-LLM-7B-control
download: Pharia-1-LLM-7B-control.tar
pvcSize: 30Gi
Note that the pvcSize
currently has to be at least twice the (unzipped) size of the model.
To run the model download with the additional models, run the following command:
helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
--set modelCredentials.username=$AA_REGISTRY_USERNAME \
--set modelCredentials.password=$AA_REGISTRY_PASSWORD \
--values values.yaml \
-n <pharia-ai-install-namespace>
Note: Restricting the model download to persistent volumes in a dedicated availability zone can be achieved via defining respective K8s node tolerations / node selectors (cf.
values.yaml
of models Helm chart).
Whether you download the default models or additional models, you can check the status of the download job by running:
kubectl get jobs -n <pharia-ai-install-namespace>
Note: An incorrect Helm configuration might result in Pod errors of the download K8s Job. Adapting the config and upgrading the Helm deployment might require the prior deletion of the involved K8s Jobs.
The names of the created persistent volume claims are required for the Helm config of the PhariaAI chart and can be obtained via:
kubectl get pvc -n <pharia-ai-install-namespace>
Once the download job is completed, you can proceed with the installation of the PhariaAI Helm chart.
Note: To utilize any features of PhariaAI that depend on embedding models, such as Assistant Q&A or document indexing, it is essential to have the luminous-base model. Note that the Pharia 1 LLM 7B models do not currently support embedding functionalities.
Installation of PhariaAI
Before you can install the PhariaAI Helm chart from Software Self-Service, you need to provide your access credentials to Helm. If you have not already done so, see Registry Credential Setup.
Download the chart
The following command allows you to download the pharia-ai helm chart
- Login
helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"
- Pull the specific chart version you want
helm pull oci://alephalpha.jfrog.io/pharia-ai-helm/pharia-ai --version <chart_version>
- Unzip the chart contents
tar -xvzf pharia-ai-<chart_version>.tgz
- Change into the chart directory
The previous tar command will have created a pharia-ai directory containing all the dependencies and default
values.yaml
file. Change into the directorycd pharia-ai
Helm Chart Configuration
The Helm chart configuration is provided via a respective Helm values.yaml
file. The initial values in the bundled values.yaml
are suitable for a default installation, and they may be modified to meet your specific configuration needs.
You will find additional comments and documentation on suitable config overrides directly added to the respective sections of the bundled values.yaml
file.
Instead of modifying the default values.yaml, you can make a copy called values-override.yaml
where you will make changes to the default configuration.
Ingress Configuration
External access to respective PhariaAI services with API or UI endpoint is provided via Kubernetes Ingress resources.
Major ingress configuration is provided globally for all sub-charts simultaneously:
global:
# Global config for all ingress resources
ingress:
# -- The ingressClassName globally defined for all ingress resources.
# See also: https://kubernetes.io/docs/concepts/services-networking/ingress/#the-ingress-resource
ingressClassName: "nginx"
# -- Domain for external access / ingress to Pharia AI services via {service}.{domain}
# e.g. {service}.pharia-ai.example.com
ingressDomain: "pharia-ai.local"
# -- Additional annotations globally defined for all ingress-resources. This can be used to add ingress controller specific annotations.
additionalAnnotations: {}
Specifically the following entries might require custom overrides in your values-override.yaml
:
global.ingress.additionalAnnotations
: annotations added globally to dependency specific ingress annotations. Might be needed for allowing automated certificate generation for TLS support (cf. https://cert-manager.io/docs/usage/ingress/).global.ingress.ingressClassName
: relates to the installed Kubernetes ingress controller in the deployment target cluster (cf. https://kubernetes.io/docs/concepts/services-networking/ingress/#the-ingress-resource).
For each dependency, specific ingress configuration is provided individually in the respective section of the values-override.yaml
file:
<sub-chart>:
ingress:
enabled: true
# -- Hostname for the ingress (without domain). The domain is read from global.ingress.ingressDomain.
# This needs to be changed, if multiple instances are deployed to the same cluster using the same domain.
hostname: "<sub-chart>"
# -- Annotations for the ingress-resource. This can be used to add ingress controller specific annotations.
annotations: {}
tls:
# -- Enable TLS configuration for this Ingress
enabled: false
# -- The name of the secret containing the TLS certificate.
# See also: https://kubernetes.io/docs/concepts/services-networking/ingress/#tls
secretName: "<sub-chart>-tls"
Specifically the following entries might require custom overrides:
<sub-chart>.ingress.tls.enabled
: enable TLS for specific ingress host.<sub-chart>.ingress.tls.secretName
: name of the secret containing the TLS certificates or used for certificate generation via an installed cert-manager.
Configuring which models to run
Our default set of models is luminous-base
and pharia1-llm-7b-control
. To be able to use these models, you have to configure the PhariaAI Helm chart by adding the following to the values-override.yaml
, into the inference-worker.checkpoints
section:
inference-worker:
checkpoints:
- generator:
type: "luminous"
pipeline_parallel_size: 1
tensor_parallel_size: 1
tokenizer_path: "luminous-base-2022-04/alpha-001-128k.json"
weight_set_directories: ["luminous-base-2022-04"]
queue: "luminous-base"
replicas: 1
modelVolumeClaim: "pharia-ai-models-luminous-base"
- generator:
type: "luminous"
pipeline_parallel_size: 1
tensor_parallel_size: 1
tokenizer_path: "Pharia-1-LLM-7B-control/vocab.json"
weight_set_directories: ["Pharia-1-LLM-7B-control"]
queue: "pharia-1-llm-7b-control"
replicas: 1
modelVolumeClaim: "pharia-ai-models-pharia-1-llm-7b-control"
Note: Each checkpoint requires the correct reference to the persistent volume claim (PVC) which relates to the volume (PV), the model weights are stored (cf. model download).
The model to be used in Pharia Assistant must be set in your values-override.yaml
file based on the queue
name used above, e.g.
pharia-assistant-api:
env:
...
QA_MODEL_NAME: pharia-1-llm-7b-control
SAFETY_MODEL_NAME: pharia-1-llm-7b-control
SUMMARY_MODEL_NAME: luminous-base
Scheduling on GPU Nodes
For installing the inference stack, the Kubernetes cluster requires GPU nodes (node pool) to run the respective application Pods (relevant for PhariaAI sub-charts inference-worker
and pharia-translate
).
The scheduling of the worker and translate deployment to the GPU nodes can be achieved via node taints and tolerations (cf. https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/). The tolerations config can be applied via overrides and Helm config as part of the values-override.yaml
file.
inference-worker:
...
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu # key used for node taints
operator: Exists
...
pharia-translate:
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu # key used for node taints
operator: Exists
Tolerations can also be specified for individual worker checkpoints in order to assign worker Pods to different node pools in context of the respective model used (e.g. large models to nodes with multi-GPU support).
inference-worker:
checkpoints:
- ...
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu # key used for node taints
operator: Exists
The total number of needed GPUs for each worker deployment is calculated via the specified checkpoints config entries pipeline_parallel_size
and tensor_parallel_size
and automatically added to the worker K8s Deployment resources
section:
resources:
limits:
nvidia.com/gpu: <number-of-gpus>
This config also controls the scheduling to GPU nodes with the respective number of available GPUs.
IAM Configuration
You may provide custom credentials for the initial user account in the section pharia-iam.config
.
This user account is used for user management in the PhariaAI stack via PhariaOS.
If you want to enable self-sign-up for your users, you need to create an extra Zitadel admin account in the section pharia-iam.zitadel.humanUser
.
Zitadel is the internal identity provider used in the PhariaAI stack.
For further information about the self-sign-up, see section 5.
Usecase Deployment
In the PhariaAI stack, you can deploy custom use cases. However, the PhariaAI installation does not include a built-in container registry for hosting these use case images. To deploy a custom use case, you must provide an OCI registry and the credentials for this registry must be provided via the namespace configuration.
Namespace Config
PhariaAI manages use cases in namespaces. You must configure which use cases to deploy to which namespace in a config file. The path to this file and the name of the environment variable to access it must be provided for each namespace. The matching environment variable must be set. You must specify a registry and repository for each namespace to pull the images from.
In this example one namespace assistant
is configured.
The configuration file is hosted on GitLab and is accessed with the token specified as NAMESPACE_CONFIG_ACCESS_TOKEN
environment variable.
The use cases are pulled from registry.acme.com
, which is accessed through BasicAuth with the configurable environmental variables ASSISTANT_REGISTRY_USER
and ASSISTANT_REGISTRY_PASSWORD
.
namespaces:
assistant:
config_url: "https://gitlab.acme.com/api/v4/projects/42/repository/files/assistant.toml/raw?ref=main"
config_access_token_env_var: "NAMESPACE_CONFIG_ACCESS_TOKEN"
registry: "registry.acme.com"
repository: "engineering/pharia-ai-skills/assistant"
user_env_var: "ASSISTANT_REGISTRY_USER"
password_env_var: "ASSISTANT_REGISTRY_PASSWORD"
env:
- name: NAMESPACE_CONFIG_ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: pharia-kernel-secrets
key: skillRegistryPassword
- name: ASSISTANT_REGISTRY_USER
valueFrom:
secretKeyRef:
name: pharia-kernel-secrets
key: skillRegistryUser
- name: ASSISTANT_REGISTRY_PASSWORD
valueFrom:
secretKeyRef:
name: pharia-kernel-secrets
key: skillRegistryPassword
Install the Helm chart
The Helm chart is installed using helm upgrade --install
. For the Helm install, a respective target Kubernetes namespace <pharia-ai-install-namespace>
should be chosen.
The access credentials for the image registry are required to be provided. There are two recommended options.
Option 1. Set the credentials directly by passing them to Helm
helm upgrade --install pharia-ai . \
--set imagePullCredentials.username=$AA_REGISTRY_USERNAME \
--set imagePullCredentials.password=$AA_REGISTRY_PASSWORD \
--values values.yaml --values values-override.yaml \
-n <pharia-ai-install-namespace>
This command assumes that the default value for the registry is used imagePullCredentials.registry: "alephalpha.jfrog.io"
is used. You can override the registry via --set imagePullCredentials.registry=<private-registry>
.
During the installation, the Kubernetes (image-pull) secrets with name defined at global.imagePullSecretName
and global.imagePullOpaqueSecretName
are generated in the install namespace.
Option 2. If you already have a Docker secret in your Kubernetes cluster, you can pass the secret name to Helm
helm upgrade --install pharia-ai . \
--set global.imagePullSecretName=<secretName> \
--set global.imagePullOpaqueSecretName=<opaqueSecretName> \
--values values.yaml --values values-override.yaml \
-n <pharia-ai-install-namespace>
Post Installation Steps
After installation, navigate to https://login.<YOUR_CONFIGURED_DOMAIN>/
and log in with the initial user account (configured in the helm chart values pharia-iam.config
) to complete the setup of the initial user credentials.
If you did not provide a custom initial user account password, you can display the autogenerated password with the following command:
kubectl get secret pharia-iam-admin-password -o jsonpath="{.data.password}" | base64 -d
Upgrade of PhariaAI
Ensure that your registry credentials are up to date and that Helm has access (see Registry Credential Setup).
Thanks to Helm's idempotent operations, the upgrade instruction is the same as for installation. Only the new pharia-ai-version
has to be referenced.
Next Steps
After installing PhariaAI, refer to the How to create users guide to set up your initial users which are required for using Pharia Assistant.
For setting up namespaces, creating collections, and uploading documents for indexing to facilitate Q&A in Pharia Assistant, consult the How to create knowledge bases guide.
Known Issues
Change of ingress domain needs IAM reset
A change of the ingress domain global.ingress.ingressDomain
is at the moment not supported out of the box and
needs manual intervention for the IAM component.
Please start from a clean database used for the pharia-iam
sub chart before changing the ingressDomain
value.
This also implies that all user data will be lost.
In future helm chart versions a change of the ingressDomain
will be supported.