Resource requirements for PhariaAssistant
This article describes how to plan the resources required to use PhariaAssistant and its API.
Preliminary resource considerations
PhariaAssistant uses several other components of the PhariaAI stack. Depending on which PhariaAssistant features are used, these can include:
-
PhariaInference for language model completions
-
PhariaEngine for hosting AI skills
-
PhariaData for document processing
-
PhariaOS for hosting PhariaAI applications
Kake sure these components are enabled and configured with sufficient resources to ensure smooth operation.
To ensure a quick response time in scenarios with multiple concurrent users, PhariaInference resources need to be scaled up accordingly.
For PhariaAssistant to serve 100 concurrent requests with a good user experience, we recommend two PhariaInference workers per model. (Depending on your user base, 100 concurrent requests translates to between 1,000 and 10,000 users.)
We recommend reviewing the Models recommendations for PhariaAssistant article for guidance on selecting appropriate models.
Quick reference
Component |
Minimum CPU |
Minimum memory |
Recommended storage |
PhariaAssistant (UI) |
100m cores |
256Mi |
- |
PhariaAssistant API |
500m cores |
4Gi |
- |
Database (PostgreSQL) |
1 CPU |
1Gi |
60 GiB |
Detailed requirements
PhariaAssistant (UI)
-
CPU: 100m cores
-
Memory: 256Mi
-
Scaling: Horizontal scaling recommended with increasing user base. In our experience, a single pod with these specs can serve 100 concurrent requests.