Installation process
The installation process requires familiarity with Kubernetes and Helm.
Our documentation is written assuming you will be using Linux or MacOS for your installation, but this is not required.
Prerequisites
Make sure you have completed the prerequisites before starting.
How to operate PhariaAI
PhariaAI can be operated on any suitable Kubernetes cluster using the Helm chart provided in this repository: https://alephalpha.jfrog.io/artifactory/helm/pharia-ai/. The Helm chart will install the necessary components to run the PhariaAI models on your cluster.
Set up the registry credentials
If you have not already done so, you can create a token with your account on Software Self-Service under this artifact path:
Click the "Generate an Identity Token" button in your profile page to generate a token.
The provided credentials must be authorized to read from the registry via API.
For the purpose of this instruction, the credentials are exported to environment variables.
export AA_REGISTRY_USERNAME=<username> # the account provided to you
export AA_REGISTRY_PASSWORD=<password> # your generated token for the helm.
Once your credentials are set, authenticate your local Helm client with the repository. This step ensures Helm has the necessary access to fetch the PhariaAI chart.
helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"
Create the target namespace
All setup steps following this section assume the target namespace already exists within your Kubernetes cluster. You can create it like this:
kubectl create namespace <pharia-ai-install-namespace>
Download the model weights for the LLMs
Depending on the model, the weights are available in our Software Self-Service instance or on HuggingFace and need to be downloaded beforehand to be used by the inference stack of the PhariaAI installation.
We have prepared a separate Helm chart for downloading the model weights to persistent volumes in your cluster. The Helm chart deploys persistent volume claims and Kubernetes jobs for triggering the download.
By default, the chart deployment downloads the model weights for luminous-base
, llama-3.1-8b-instruct
, llama-3.3-70b-instruct
and llama-guard-3-8b
. If you want to download only those default models, run the following commands:
helm install pharia-ai-models oci://alephalpha.jfrog.io/inference-helm/models \
--set modelCredentials.username=$AA_REGISTRY_USERNAME \
--set modelCredentials.password=$AA_REGISTRY_PASSWORD \
--set huggingFaceCredentials.token=$HUGGINGFACE_TOKEN \
-n <pharia-ai-install-namespace>
If you want to download additional models, see Configuring model weights downloaders.
Whether you download the default models or additional models, you can check the status of the download job by running:
kubectl get jobs -n <pharia-ai-install-namespace>
Note: An incorrect Helm configuration might result in Pod errors of the download K8s Job. Adapting the config and upgrading the Helm deployment might require the prior deletion of the involved K8s Jobs.
The names of the created persistent volume claims are required for the Helm config of the PhariaAI chart and can be obtained using:
kubectl get pvc -n <pharia-ai-install-namespace>
Once the download job is completed, you can proceed with the installation of the PhariaAI Helm chart.
Note: To use any features of PhariaAI that depend on embedding models, such as PhariaAssistant Chat or document indexing, it is essential to have the luminous-base model. Note that the Pharia-1-LLM-7B models do not currently support embedding functionalities.
How to install PhariaAI
Before you can install the PhariaAI Helm chart from Software Self-Service, you need to provide your access credentials to Helm. If you have not already done so, see Registry Credential Setup.
Download the Helm chart
The following command allows you to download the PhariaAI Helm chart:
Step 1: Login
helm registry login https://alephalpha.jfrog.io -u "$AA_REGISTRY_USERNAME" -p "$AA_REGISTRY_PASSWORD"
Step 2: Pull and unpack the latest chart version
helm pull oci://alephalpha.jfrog.io/pharia-ai-helm/pharia-ai --untar
Step 3: Change into the chart directory
The previous tar command will have created a pharia-ai directory containing all the dependencies and default values.yaml
file. Change into the directory with cd pharia-ai
.
Configure the Helm chart
The Helm chart configuration is provided by a Helm values.yaml
file. The initial values in the bundled values.yaml
are suitable for a default installation, but they can be modified to meet your specific configuration needs.
You will find additional comments and documentation on suitable config overrides directly added to the respective sections of the bundled values.yaml
file.
Instead of modifying the default values.yaml, you can make a copy called values-override.yaml
where you will make changes to the default configuration.
Configure Ingress
External access to PhariaAI services with API or UI endpoints is provided using Kubernetes Ingress resources.
Major Ingress configuration is provided globally for all subcharts simultaneously:
global:
# Global config for all ingress resources
ingress:
# -- The ingressClassName globally defined for all ingress resources.
# See also: https://kubernetes.io/docs/concepts/services-networking/ingress/#the-ingress-resource
ingressClassName: "nginx"
# -- Domain for external access / ingress to Pharia AI services via {service}.{domain}
# e.g. {service}.pharia-ai.example.com
ingressDomain: "pharia-ai.local"
# -- Additional annotations globally defined for all ingress-resources. This can be used to add ingress controller specific annotations.
additionalAnnotations: {}
Specifically, the following entries may require custom overrides in your values-override.yaml
:
global.ingress.additionalAnnotations
: annotations added globally to dependency-specific Ingress annotations. They may be needed for allowing automated certificate generation for TLS support (cf. https://cert-manager.io/docs/usage/ingress/).global.ingress.ingressClassName
: relates to the installed Kubernetes Ingress controller in the deployment target cluster (cf. https://kubernetes.io/docs/concepts/services-networking/ingress/#the-ingress-resource.
For each dependency, specific Ingress configuration is provided individually in the respective section of the values-override.yaml
file:
<sub-chart>:
ingress:
enabled: true
# -- Hostname for the ingress (without domain). The domain is read from global.ingress.ingressDomain.
# This needs to be changed, if multiple instances are deployed to the same cluster using the same domain.
hostname: "<sub-chart>"
# -- Annotations for the ingress-resource. This can be used to add ingress controller specific annotations.
annotations: {}
tls:
# -- Enable TLS configuration for this Ingress
enabled: false
# -- The name of the secret containing the TLS certificate.
# See also: https://kubernetes.io/docs/concepts/services-networking/ingress/#tls
secretName: "<sub-chart>-tls"
Specifically the following entries may require custom overrides:
<sub-chart>.ingress.tls.enabled
: enable TLS for specific Ingress host.<sub-chart>.ingress.tls.secretName
: name of the secret containing the TLS certificates or used for certificate generation via an installed cert-manager.
Configure database connections
Several PhariaAI applications require PostgreSQL databases as a persistence layer. For a productive PhariaAI installation we highly recommend the use of external (managed) database instances.
By default, Kubernetes PostgreSQL instances are enabled. For each database configuration, you can either provide the necessary values directly during the Helm installation (in values-override.yaml
) or reference an existing Kubernetes secret that stores the required values.
The necessary database deployments automatically connect to client applications. While PostgreSQL deployment is enabled by default for each dependency, you must define a password in values-override.yaml
:
<sub-chart>:
postgresql:
# -- This is used to indicate whether the internal PostgreSQL should be used or not.
enabled: true
auth:
# -- If internal PostgreSQL is used a dedicated password has to be provided for setup of application authentication
password: ""
Make sure to set an initial password using Helm values to enable authentication between the application and the database instance.
External managed databases
We recommend using external database instances for production environments. The connection configuration and credential setup for each PhariaAI dependency can be managed with Helm chart values:
<sub-chart>:
postgresql:
# -- Disable the built-in Postgresql chart
enabled: false
databaseConfig:
# -- Default secret name is used to create a secret if `external.existingSecret` is not provided.
defaultSecret: default-secret-name
secretKeys:
# -- The key in the secret that contains the host of the database
hostKey: "host"
# -- The key in the secret that contains the port of the database
portKey: "port"
# -- The key in the secret that contains the user of the database
userKey: "user"
# -- The key in the secret that contains the password of the database
passwordKey: "password"
# -- The key in the secret that contains the database name
databaseNameKey: "databaseName"
# -- Provide an existing database if you want to use an external database
external:
# -- Set this value if a k8s Secret with PostgreSQL values already exists. Make sure that the all the keys exists in the secret with a valid value.
existingSecret: ""
# -- The host of the database
host: ""
# -- The port of the database
port: ""
# -- The user of the database
user: ""
# -- The password of the database
password: ""
# -- The name of the database
databaseName: ""
Configuring the PhariaAssistant API
The PhariaAssistant API requires a Redis service. We recommend using an external Redis instance (see the next section). However, by default, an internal Redis instance is provided with the built-in Helm chart and enabled automatically; you must define a password in values-override.yaml
:
redis:
# -- Indicate whether the internal Redis should be used.
enabled: true
auth:
# -- Redis Password
password: ""
External PhariaAssistant API Redis
As noted above, we recommend that you use an external Redis instance. To do this, you must disable the built-in Redis service in values-override.yaml
and configure the external connection settings.
The following is an example configuration for using an external Redis instance:
redis:
# -- Indicate whether the internal Redis should be used.
enabled: false
redisConfig:
external:
existingSecret: ""
host: "my-redis"
port: "6379"
password: "redispassword"
Configuring which models are used by PhariaAssistant
Model configuration is done through environment variables in your Helm values. Example configuration for the summarization, QA and generation models:
pharia-assistant-api:
env:
QA_MODEL_NAME: "llama-3.1-8b-instruct"
SUMMARY_MODEL_NAME: "llama-3.1-8b-instruct"
GENERATE_MODEL_NAME: "llama-3.1-8b-instruct"
For guidance on selecting appropriate models for different tasks and hardware configurations, see Models recommendations for PhariaAssistant.
Configuring which collections are visible to PhariaAssistant
To configure which collections are visible to PhariaAssistant, you need to set the RETRIEVER_QA_INDEX_NAME
environment variable, Only collections indexed using this index name will be visible to PhariaAssistant:
pharia-assistant-api:
env:
RETRIEVER_QA_INDEX_NAME: "assistant-index-name"
For optimal performance see Recommended index configuration for our recommendations on the index configuration to use.
Configuring the PhariaData API
The PhariaData API requires a RabbitMQ service. We recommend using an external RabbitMQ instance (see the next section). However, by default, an internal RabbitMQ instance is provided with the built-in Helm chart and enabled automatically; you must define a password in values-override.yaml
:
pharia-data-api:
rabbitmq:
# Enable or disable the internal RabbitMQ service.
enabled: true
auth:
# Set the RabbitMQ application username.
username: user
# Set the RabbitMQ application password.
password: ""
External RabbitMQ instance
For production environments, we recommend that you use an external RabbitMQ instance. To do this, you must disable the built-in RabbitMQ service in values-override.yaml
and configure the external connection settings.
The following is an example configuration for using an external RabbitMQ instance:
pharia-data-api:
rabbitmq:
enabled: false
rabbitmqConfig:
# Default secret name used to create a secret if `external.existingSecret` is not provided.
defaultSecret: pharia-data-api-rabbitmq-secret
# The load definitions secret must hold the RabbitMQ topology configuration.
defaultLoadDefinitionsSecret: pharia-data-api-rabbitmq-load-definitions-secret
secretKeys:
# The key in the secret that contains the host of RabbitMQ.
hostKey: "rabbitmq-host"
# The key in the secret that contains the port of RabbitMQ.
portKey: "rabbitmq-port"
# The key in the secret that contains the user of RabbitMQ.
userKey: "rabbitmq-username"
# The key in the secret that contains the password of RabbitMQ.
userPasswordKey: "rabbitmq-password"
external:
# Set this value if a Kubernetes Secret with RabbitMQ values already exists. Ensure all keys exist in the secret with valid values.
existingSecret: ""
# The user of RabbitMQ.
rabbitmqUser: ""
# The password of the RabbitMQ user.
rabbitmqUserPassword: ""
# The load definitions secret name.
loadDefinitionsSecret: ""
Configuring which models to run
Our default set of models is luminous-base
and llama-3.1-8b-instruct
. To change this, you have to overwrite inference-workers.checkpoints
(see Worker Deployment for more info).
Note: Each checkpoint requires the correct reference to the persistent volume claim (PVC) which relates to the volume (PV) in which the model weights are stored (cf. model download).
The model to be used in PhariaAssistant must be set in your values-override.yaml
file based on the queue
name used above. For example:
pharia-assistant-api:
env:
...
QA_MODEL_NAME: llama-3.1-8b-instruct
SAFETY_MODEL_NAME: llama-3.1-8b-instruct
SUMMARY_MODEL_NAME: llama-3.1-8b-instruct
Scheduling on GPU Nodes
For installing the inference stack, the Kubernetes cluster requires GPU nodes (node pool) to run the respective application Pods (relevant for PhariaAI sub-charts inference-worker
and pharia-translate
).
The scheduling of the worker and translate deployment to the GPU nodes can be achieved via node taints and tolerations (cf. https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/). The tolerations config can be applied using overrides and Helm config as part of the values-override.yaml
file.
inference-worker:
---
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu # key used for node taints
operator: Exists
---
pharia-translate:
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu # key used for node taints
operator: Exists
Tolerations can also be specified for individual worker checkpoints in order to assign worker Pods to different node pools in the context of the respective model used (for example, large models to nodes with multi-GPU support).
inference-worker:
checkpoints:
- ...
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu # key used for node taints
operator: Exists
The total number of required GPUs for each worker deployment is calculated using the specified checkpoints config entries pipeline_parallel_size
and tensor_parallel_size
and automatically added to the worker K8s Deployment resources
section:
resources:
limits:
nvidia.com/gpu: <number-of-gpus>
This config also controls the scheduling to GPU nodes with the respective number of available GPUs.
Configuring the PhariaInference API
The Helm config of the inference-api dependency requires the initial setup of certain credentials. Secrets can be directly passed as Helm values or using existing Kubernetes secrets, already available in the cluster.
Add the required configurations to your installed values-override.yaml
; for example:
inference-api:
inferenceApiServices:
# -- Name of an existing inferenceApiServices secret.
# If you want to provide your own secret, set this to the name of your secret.
# Keep in mind to set global.inferenceApiServicesSecretRef to the same name if an existing secret is used.
# The secret is expected to have a key-value-pair with key `secret`.
existingSecret: ""
# -- Manually added services secret
# If no existing external secret is provided via inferenceApiServices.existingSecret, a secret value has to be applied during installation
secret: ""
jwt:
# -- Name of an existing jwt secret to use
# The secret is expected to have a key-value-pair with key `secret`.
existingSecret: ""
# -- Manually added jwt secret
# If no existing external secret is provided via jwt.existingSecret, a secret value has to be applied during installation
secret: ""
admin:
# -- Email of the admin user to create on startup
email: "tools@aleph-alpha.com"
# -- Initial password of the admin user. If no existing external secret is provided via admin.existingSecret, a password value has to be applied during installation
password: ""
# -- Existing secret to use instead of email/password.
existingSecret: ""
# -- The email key in the secret
emailKey: "email"
# -- The password key in the secret
passwordKey: "password"
Take care to set global.inferenceApiServicesSecretRef
to the same name if an existing secret is used for inference-api.inferenceApiServices.existingSecret
.
Configuring the finetuning service
In case you want to use the finetuning service to finetune models on custom data, additional GPU and CPU resources are required (see Finetuning service resource requirements).
We recommend to setup a separate GPU nodepool to use for finetuning jobs and attaching a custom taint to it, such that there is no interference with the GPU workloads needed for the PhariaInference API. Since GPUs for finetuning are not occupied constantly, but only when finetuning jobs are running, we recommend using autoscaling for this nodepool to free GPUs when they are not needed and reduce costs.
Additionally, we recommend that you connect the finetuning service to an external S3 storage bucket. While we ship PhariaAI with a built-in storage solution, we cannot guarantee persistence of your finetuning artifacts this way.
To configure the finetuning service to use an external storage bucket, first create the bucket and generate credentials for access. Then, create a Kubernetes secret in the namespace where you will install PhariaAI as follows:
apiVersion: v1
kind: Secret
data:
bucketName: <base64 encoded name of the created storage bucket>
bucketPassword: <base64 encoded password to your bucket>
bucketUser: <base64 encoded username>
endpointUrl: <base64 encoded endpoint URL of your S3 storage, e.g. https://object.storage.eu01.onstackit.cloud>
region: <base64 encoded region of your bucket, e.g. EU01>
metadata:
name: <your secret name>
type: Opaque
With this, you can now configure the following values in your values-override.yaml
file to allow you to finetune models:
pharia-finetuning:
rayCluster:
workerGroups:
gpu-group:
# -- Tolerations matching the taints of the GPU nodes you want to use for finetuning
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu # key used for node taints
operator: Exists
- effect: NoSchedule
key: pharia-finetuning # key used for node taints
operator: Exists
minio:
# -- For production installations, we highly recommend to disable the built-in Minio service and to configure an external storage backend via the `storageConfig` section
enabled: false
# -- See reference of helm chart values for detailed information on the `storageConfig` section.
storageConfig:
fromSecret:
secretName: <your secret name>
Disabling the finetuning service
If you are not planning to use PhariaAI to finetune models on custom data, you can disable the finetuning service by adding the following to your values-override.yaml
file:
pharia-finetuning:
enabled: false
IAM Configuration
You can provide custom credentials for the initial user account in the section pharia-iam.config
. This user account is used for user management in the PhariaAI stack via PhariaOS.
pharia-iam:
config:
# -- Init password of initial user. To be valid it requires to have 10-70 characters, including at least one uppercase letter, one lowercase letter, and one digit. User will need to change this password on the first login.
adminPassword:
If you want to enable extra sign-up options such as self-sign-up or sign-up with SSO, you need to enable the rights to configure Zitadel, which is the internal identity provider used in the PhariaAI stack.
Enable the flag pharia-iam.config.adminEnableZitadelManagement
; this grants the rights to configure sign-up options to your initial user account.
For further information about the sign-up options, see How to Configure Sign-Up Options.
Install the Helm chart
The Helm chart is installed using helm upgrade --install
. For the Helm installation, you must choose a target Kubernetes namespace <pharia-ai-install-namespace>
.
The access credentials for the image registry must be provided.
There are two recommended options:
Option 1. Set the credentials directly by passing them to Helm
helm upgrade --install pharia-ai . \
--set imagePullCredentials.username=$AA_REGISTRY_USERNAME \
--set imagePullCredentials.password=$AA_REGISTRY_PASSWORD \
--values values.yaml --values values-override.yaml \
-n <pharia-ai-install-namespace>
This command assumes that the default value for the registry imagePullCredentials.registry: "alephalpha.jfrog.io"
is used. You can override the registry with --set imagePullCredentials.registry=<private-registry>
.
During the installation, the Kubernetes (image-pull) secrets with names defined at global.imagePullSecretName
and global.imagePullOpaqueSecretName
are generated in the install namespace.
Option 2. If you already have a Docker secret in your Kubernetes cluster, you can pass the secret name to Helm
helm upgrade --install pharia-ai . \
--set global.imagePullSecretName=<secretName> \
--set global.imagePullOpaqueSecretName=<opaqueSecretName> \
--values values.yaml --values values-override.yaml \
-n <pharia-ai-install-namespace>
The credentials are expected to be set with the following keys:
registryUser
registryPassword
Post-installation steps
After installation, navigate to https://login.<YOUR_CONFIGURED_DOMAIN>/
and log in with the initial user account (configured in the Helm chart values pharia-iam.config
) to complete the setup of the initial user credentials.
If you did not provide a custom initial user account password, you can display the autogenerated password with the following command:
kubectl get secret pharia-iam-admin-password -o jsonpath="{.data.password}" | base64 -d
Next Steps
After installing PhariaAI, see Configuring user and login options to set up your initial users that are required for using PhariaAssistant.
For setting up namespaces, creating collections, and uploading documents for indexing to facilitate Chat in PhariaAssistant, see the Putting files into collections guide.