Installation

This document covers:

Prerequisites - The requirements of PhariaAI
The installation process that includes
- Model Weights - How to download and configure model weights
- Installation - The process to install PhariaAI for the first time
Upgrade - The process to upgrade an existing PhariaAI installation to the latest release

Prerequisites

Get access to our software

The PhariaAI stack is bundled as a single Helm chart, containing all relevant dependencies as Helm sub-charts for the entire stack. Upon signing your contract, we will provide you with accounts to our Software Self-Service.

You will need an username and access token to be used by Helm to download the bundled chart from our registry, as well as to be provided as a docker-registry secret in your cluster. Furthermore, the credentials can be used for downloading the respective model data.

You have to log in at our Software Self-Service to create the respective access token (see Registry Credential Setup).

Hardware Requirements

3 NVIDIA GPUs of Ampere, Lovelace or Hopper generation, with 40GB VRAM each.
24 CPU cores, 128 GB RAM

Software Requirements

Kubernetes version >= 1.29
- The cluster should have an ingress controller for allowing external access to the PhariaAI service and a certificate-manager setup for supporting access via TLS.
- For installing the inference stack, the Kubernetes requires GPU nodes to run the respective application Pods
- We recommend to use the NVIDIA GPU Operator to prepare your GPU nodes with a recent NVIDIA driver version and the necessary libraries. We have tested our stack on clusters using version >= 24 of the GPU Operator and default settings.
- We require persistent volumes that are available to all GPU nodes in the cluster to store model weights. I.e., download jobs will store the model weights on persistent volumes and the Worker deployment will later read them from these. This could fail if persistent volumes are bound to availability zones and the download job and Worker deployment run in different availability zones.
- Helm version >= 3.0
- While we provide databases as part of our Helm dependencies, we recommend that you provide your own databases for productive setups.
- We require PostgreSQL version >= 14

How to operate PhariaAI

PhariaAI can be operated on any suitable Kubernetes cluster using the Helm chart provided in this repository: https://alephalpha.jfrog.io/artifactory/helm/pharia-ai/. The Helm chart will install the necessary components to run the PhariaAI models on your cluster.

Registry Credential Setup

If you have not already done so, you can create a token with your account on Software Self-Service under this artifact path:

https://alephalpha.jfrog.io/ui/repos/tree/General/helm

Click the "Set me Up" button to generate a token.

For the purpose of this instruction, the credentials are exported to environment variables.

export AA_REGISTRY_USERNAME=<username> # the account provided to you
export AA_REGISTRY_PASSWORD=<password> # your generated token for the helm.

Once your credentials are set, authenticate your local Helm client with the repository. This step ensures Helm has the necessary access to fetch the PhariaAI chart.

helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"

Download the model weights for the LLMs

The model weights are available in our Software Self-Service instance and need to be downloaded beforehand to be used by the inference stack of the PhariaAI installation.

We have prepared a separate Helm chart for downloading the model weights to persistent volumes in your cluster. The Helm chars deploys persistent volume claims and Kubernetes jobs for triggering the download.

By default, the chart deployment downloads the model weights for luminous-base and pharia1-llm-7b-control. If you want to download only those default models, run the following commands:

helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
  --set modelCredentials.username=$AA_REGISTRY_USERNAME \
  --set modelCredentials.password=$AA_REGISTRY_PASSWORD \
  -n <pharia-ai-install-namespace>

If you want to download additional models, you can configure the models to download in a separate values.yaml file like this:

models:
  - name: luminous-base
    check_directory: luminous-base-2022-04
    download: luminous-base.tar.gz
    pvcSize: 100Gi
  - name: Pharia-1-LLM-7B-control
    check_directory: Pharia-1-LLM-7B-control
    download: Pharia-1-LLM-7B-control.tar
    pvcSize: 30Gi

Note that the pvcSize currently has to be at least twice the (unzipped) size of the model.

To run the model download with the additional models, run the following command:

helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
  --set modelCredentials.username=$AA_REGISTRY_USERNAME \
  --set modelCredentials.password=$AA_REGISTRY_PASSWORD \
  --values values.yaml \
  -n <pharia-ai-install-namespace>

Note: Restricting the model download to persistent volumes in a dedicated availability zone can be achieved via defining respective K8s node tolerations / node selectors (cf. values.yaml of models Helm chart).

Whether you download the default models or additional models, you can check the status of the download job by running:

kubectl get jobs -n <pharia-ai-install-namespace>

Note: An incorrect Helm configuration might result in Pod errors of the download K8s Job. Adapting the config and upgrading the Helm deployment might require the prior deletion of the involved K8s Jobs.

The names of the created persistent volume claims are required for the Helm config of the PhariaAI chart and can be obtained via:

kubectl get pvc -n <pharia-ai-install-namespace>

Once the download job is completed, you can proceed with the installation of the PhariaAI Helm chart.

Note: To utilize any features of PhariaAI that depend on embedding models, such as Assistant Q&A or document indexing, it is essential to have the luminous-base model. Note that the Pharia 1 LLM 7B models do not currently support embedding functionalities.

How to install PhariaAI

Before you can install the PhariaAI Helm chart from Software Self-Service, you need to provide your access credentials to Helm. If you have not already done so, see Registry Credential Setup.

Download the Helm chart

The following command allows you to download the pharia-ai helm chart

helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"

Step 2: Pull the specific chart version you want

helm pull oci://alephalpha.jfrog.io/pharia-ai-helm/pharia-ai --version <chart_version>

Step 3: Unzip the chart contents

tar -xvzf pharia-ai-<chart_version>.tgz

Step 4: Change into the chart directory

The previous tar command will have created a pharia-ai directory containing all the dependencies and default values.yaml file. Change into the directory cd pharia-ai

Configure the Helm chart

The Helm chart configuration is provided via a respective Helm values.yaml file. The initial values in the bundled values.yaml are suitable for a default installation, and they may be modified to meet your specific configuration needs.

You will find additional comments and documentation on suitable config overrides directly added to the respective sections of the bundled values.yaml file.

Instead of modifying the default values.yaml, you can make a copy called values-override.yaml where you will make changes to the default configuration.

Configure Ingress

External access to respective PhariaAI services with API or UI endpoint is provided via Kubernetes Ingress resources.

Major ingress configuration is provided globally for all sub-charts simultaneously:

global:
  # Global config for all ingress resources
  ingress:
    # -- The ingressClassName globally defined for all ingress resources.
    #    See also: https://kubernetes.io/docs/concepts/services-networking/ingress/#the-ingress-resource
    ingressClassName: "nginx"
    # -- Domain for external access / ingress to Pharia AI services via {service}.{domain}
    #    e.g. {service}.pharia-ai.example.com
    ingressDomain: "pharia-ai.local"
    # -- Additional annotations globally defined for all ingress-resources. This can be used to add ingress controller specific annotations.
    additionalAnnotations: {}

Specifically the following entries might require custom overrides in your values-override.yaml:

global.ingress.additionalAnnotations: annotations added globally to dependency specific ingress annotations. Might be needed for allowing automated certificate generation for TLS support (cf. https://cert-manager.io/docs/usage/ingress/).
global.ingress.ingressClassName: relates to the installed Kubernetes ingress controller in the deployment target cluster (cf. https://kubernetes.io/docs/concepts/services-networking/ingress/#the-ingress-resource.

For each dependency, specific ingress configuration is provided individually in the respective section of the values-override.yaml file:

<sub-chart>:
  ingress:
    enabled: true
    # -- Hostname for the ingress (without domain). The domain is read from global.ingress.ingressDomain.
    #    This needs to be changed, if multiple instances are deployed to the same cluster using the same domain.
    hostname: "<sub-chart>"
    # -- Annotations for the ingress-resource. This can be used to add ingress controller specific annotations.
    annotations: {}
    tls:
      # -- Enable TLS configuration for this Ingress
      enabled: false
      # -- The name of the secret containing the TLS certificate.
      #    See also: https://kubernetes.io/docs/concepts/services-networking/ingress/#tls
      secretName: "<sub-chart>-tls"

Specifically the following entries might require custom overrides:

<sub-chart>.ingress.tls.enabled: enable TLS for specific ingress host.
<sub-chart>.ingress.tls.secretName: name of the secret containing the TLS certificates or used for certificate generation via an installed cert-manager.

Configuring which models to run

Our default set of models is luminous-base and pharia1-llm-7b-control. To be able to use these models, you have to configure the PhariaAI Helm chart by adding the following to the values-override.yaml, into the inference-worker.checkpoints section:

inference-worker:
  checkpoints:
    - generator:
        type: "luminous"
        pipeline_parallel_size: 1
        tensor_parallel_size: 1
        tokenizer_path: "luminous-base-2022-04/alpha-001-128k.json"
        weight_set_directories: ["luminous-base-2022-04"]
      queue: "luminous-base"
      replicas: 1
      modelVolumeClaim: "pharia-ai-models-luminous-base"
    - generator:
        type: "luminous"
        pipeline_parallel_size: 1
        tensor_parallel_size: 1
        tokenizer_path: "Pharia-1-LLM-7B-control/vocab.json"
        weight_set_directories: ["Pharia-1-LLM-7B-control"]
      queue: "pharia-1-llm-7b-control"
      replicas: 1
      modelVolumeClaim: "pharia-ai-models-pharia-1-llm-7b-control"

Note: Each checkpoint requires the correct reference to the persistent volume claim (PVC) which relates to the volume (PV), the model weights are stored (cf. model download).

The model to be used in Pharia Assistant must be set in your values-override.yaml file based on the queue name used above, e.g.

pharia-assistant-api:
  env:
    ...
    QA_MODEL_NAME: pharia-1-llm-7b-control
    SAFETY_MODEL_NAME: pharia-1-llm-7b-control
    SUMMARY_MODEL_NAME: luminous-base

Scheduling on GPU Nodes

For installing the inference stack, the Kubernetes cluster requires GPU nodes (node pool) to run the respective application Pods (relevant for PhariaAI sub-charts inference-worker and pharia-translate).

The scheduling of the worker and translate deployment to the GPU nodes can be achieved via node taints and tolerations (cf. https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/). The tolerations config can be applied via overrides and Helm config as part of the values-override.yaml file.

inference-worker:
---
tolerations:
  - effect: NoSchedule
    key: nvidia.com/gpu # key used for node taints
    operator: Exists
---
pharia-translate:
  tolerations:
    - effect: NoSchedule
      key: nvidia.com/gpu # key used for node taints
      operator: Exists

Tolerations can also be specified for individual worker checkpoints in order to assign worker Pods to different node pools in context of the respective model used (e.g. large models to nodes with multi-GPU support).

inference-worker:
  checkpoints:
    - ...
      tolerations:
        - effect: NoSchedule
          key: nvidia.com/gpu # key used for node taints
          operator: Exists

The total number of needed GPUs for each worker deployment is calculated via the specified checkpoints config entries pipeline_parallel_size and tensor_parallel_sizeand automatically added to the worker K8s Deployment resources section:

resources:
  limits:
    nvidia.com/gpu: <number-of-gpus>

This config also controls the scheduling to GPU nodes with the respective number of available GPUs.

IAM Configuration

You may provide custom credentials for the initial user account in the section pharia-iam.config. This user account is used for user management in the PhariaAI stack via PhariaOS.

If you want to enable self-sign-up for your users, you need to enable the rights to configure Zitadel, which is the internal identity provider used in the PhariaAI stack. Please enable the flag pharia-iam.config.adminEnableZitadelManagement which grants the rights to enable self-sign-up for your initial user account. For further information about the self-sign-up, see section 5. If you run an old version of Pharia AI (using PhariaAI version 0.3.145 and lower), you need to create an extra Zitadel admin account instead. Please configure this account in the section pharia-iam.zitadel.humanUser.

Usecase Deployment

In the PhariaAI stack, you can deploy custom use cases. However, the PhariaAI installation does not include a built-in container registry for hosting these use case images. To deploy a custom use case, you must provide an OCI registry and the credentials for this registry must be provided via the namespace configuration.

Namespace Config

PhariaAI manages use cases in namespaces. You must configure which use cases to deploy to which namespace in a config file. The path to this file and the name of the environment variable to access it must be provided for each namespace. The matching environment variable must be set. You must specify a registry and repository for each namespace to pull the images from.

In this example one namespace assistant is configured. The configuration file is hosted on GitLab and is accessed with the token specified as NAMESPACE_CONFIG_ACCESS_TOKEN environment variable. The use cases are pulled from registry.acme.com, which is accessed through BasicAuth with the configurable environmental variables ASSISTANT_REGISTRY_USER and ASSISTANT_REGISTRY_PASSWORD.

namespaces:
  assistant:
    config_url: "https://gitlab.acme.com/api/v4/projects/42/repository/files/assistant.toml/raw?ref=main"
    config_access_token_env_var: "NAMESPACE_CONFIG_ACCESS_TOKEN"
    registry: "registry.acme.com"
    repository: "engineering/pharia-ai-skills/assistant"
    user_env_var: "ASSISTANT_REGISTRY_USER"
    password_env_var: "ASSISTANT_REGISTRY_PASSWORD"
env:
  - name: NAMESPACE_CONFIG_ACCESS_TOKEN
    valueFrom:
      secretKeyRef:
        name: pharia-kernel-secrets
        key: skillRegistryPassword
  - name: ASSISTANT_REGISTRY_USER
    valueFrom:
      secretKeyRef:
        name: pharia-kernel-secrets
        key: skillRegistryUser
  - name: ASSISTANT_REGISTRY_PASSWORD
    valueFrom:
      secretKeyRef:
        name: pharia-kernel-secrets
        key: skillRegistryPassword

Install the Helm chart

The Helm chart is installed using helm upgrade --install. For the Helm install, a respective target Kubernetes namespace <pharia-ai-install-namespace> should be chosen.

The access credentials for the image registry are required to be provided. There are two recommended options.

Option 1. Set the credentials directly by passing them to Helm

helm upgrade --install pharia-ai . \
  --set imagePullCredentials.username=$AA_REGISTRY_USERNAME \
  --set imagePullCredentials.password=$AA_REGISTRY_PASSWORD \
  --values values.yaml --values values-override.yaml \
  -n <pharia-ai-install-namespace>

This command assumes that the default value for the registry is used imagePullCredentials.registry: "alephalpha.jfrog.io" is used. You can override the registry via --set imagePullCredentials.registry=<private-registry>.

During the installation, the Kubernetes (image-pull) secrets with name defined at global.imagePullSecretName and global.imagePullOpaqueSecretName are generated in the install namespace.

Option 2. If you already have a Docker secret in your Kubernetes cluster, you can pass the secret name to Helm

helm upgrade --install pharia-ai . \
  --set global.imagePullSecretName=<secretName> \
  --set global.imagePullOpaqueSecretName=<opaqueSecretName> \
  --values values.yaml --values values-override.yaml \
  -n <pharia-ai-install-namespace>

Post Installation Steps

After installation, navigate to https://login.<YOUR_CONFIGURED_DOMAIN>/ and log in with the initial user account (configured in the helm chart values pharia-iam.config) to complete the setup of the initial user credentials. If you did not provide a custom initial user account password, you can display the autogenerated password with the following command:

kubectl get secret pharia-iam-admin-password  -o jsonpath="{.data.password}" | base64 -d

How to upgrade PhariaAI

Ensure that your registry credentials are up to date and that Helm has access (see Registry Credential Setup).

Thanks to Helm's idempotent operations, the upgrade instruction is the same as for installation. Only the new pharia-ai-version has to be referenced.

Next Steps

After installing PhariaAI, refer to the How to create users guide to set up your initial users which are required for using Pharia Assistant.
For setting up namespaces, creating collections, and uploading documents for indexing to facilitate Q&A in Pharia Assistant, consult the How to create knowledge bases guide.

Prerequisites​

Get access to our software​

Hardware Requirements​

Software Requirements​

How to operate PhariaAI​

Registry Credential Setup​

Download the model weights for the LLMs​

How to install PhariaAI​

Download the Helm chart​

Step 1: Login​

Step 2: Pull the specific chart version you want​

Step 3: Unzip the chart contents​

Step 4: Change into the chart directory​

Configure the Helm chart​

Configure Ingress​

Configuring which models to run​

Scheduling on GPU Nodes​

IAM Configuration​

Usecase Deployment​

Namespace Config​

Install the Helm chart​

Option 1. Set the credentials directly by passing them to Helm​

Option 2. If you already have a Docker secret in your Kubernetes cluster, you can pass the secret name to Helm​

Post Installation Steps​

How to upgrade PhariaAI​

Next Steps​