Prerequisites
important
The installation process requires familiarity with Kubernetes and Helm.
Credentials
A user account with access to Aleph Alpha Artifactory. We will provide this to you.
On your local machine
note
The documentation assumes you are using Linux or macOS for your installation, but this is not a requirement.
| Aspect | Requirements |
|---|---|
| Container Orchestration Platform | Kubernetes client v 1.29 and above • You can check this using kubectl version• Check your connectivity using kubectl get nodes |
| Package Manager | Helm v 3.0 and above • You can check this using helm version |
On your Kubernetes Cluster
| Aspect | Criteria | Minimum Requirements |
|---|---|---|
| Hardware | GPU | Minimal Setup: 2 GPUs (with MIG), 3 GPUs (without MIG) Recommended Setup: 6 GPUs (with MIG), 7 GPUs (without MIG) Note: Actual # of GPUs depends on models selected for deployment. Type: NVIDIA Ampere, Lovelace or Hopper generation. Currently, only NVIDIA GPUs are supported. Support for other vendors may be added in the future. GPU Nodes: Your Kubernetes cluster must include GPU nodes to run the inference stack application pods. During the finetuning of models, additional GPUs will be required. See Finetuning Service Resource Requirements |
| CPU & Memory | 24 CPU cores, 128 GB RAM The exact requirements will depend on the number of users as well as which components of the stack you intend to use. - Resource requirements DataPlatform & Document Index - Resource requirements PhariaAssistant | |
| Object Storage | Quantity: 3x Type: minio or any other S3 backend type for PhariaData and PhariaFinetuning Input & Output Operations (IOPS) maximum: 1000 or above Throughput maximum: 100 Mb/s or above | |
| Persistent Volumes | Persistent volumes accessible by all GPU nodes in the cluster are essential for storing model weights. Ensure your persistent volumes are configured to be accessible across availability zones if applicable in your environment. | |
| Software | Networking | Installed in a single namespace with open communication between all services in the namespace |
| NVIDIA GPU Operator | We strongly recommend using the NVIDIA GPU Operator v 24 and above on default settings to manage NVIDIA drivers and libraries on your GPU nodes. More details on the GPU Operator setup can be found at GPU Operator Setup. | |
| Ingress controller & domain | The cluster must include an ingress controller to enable external access to the PhariaAI service. A certificate manager must also be configured to support secure access via TLS (Transport Layer Security). A dedicated domain must be assigned to the Kubernetes cluster, enabling each service to host its application under a subdomain of this domain (e.g., https://<service-name>.<ingress-domain>). | |
| Relational Database Management | Postgres v 14.0 and above Quantity: 1x Large Storage: 800 GB CPU: 8x Memory: 16GB | |
| Network Access & Whitelisting | Not required if networking requirements are met. If you require multiple name spaces please discuss this with our Product Support team. | |
| Artifact Management | Ability to pull the Helm chart containing the pharia-ai-helm and container images from an external artifact repository manager, such as Jfrog. Credentials for this will be provided to you. | |
| Monitoring & Observability | No fixed requirements but we can recommend the use of Prometheus & Grafana. | |
| Cert manager | Cert Manager is required to provision webhook certs for Dynamic Model Management feature. | |
| ClusterRole | PhariaOS requires a ClusterRole for hardware discovery and model management. By default, the chart will create the necessary ClusterRole and ClusterRoleBinding. For detailed configuration, refer to PhariaOS Manager Settings and How to use existing cluster role. |