Assistant Resource Requirements Guide
This guide helps you plan the resources needed for using Assistant and its API.
Assistant utilizes many other components of the PhariaAI stack. Depending on which features of Assistant are used this includes
- Inference (for LM completions)
- Kernel (hosting AI skills)
- Data Platform & Document Index (for any document processing)
- OS (hosting PhariaAI applications)
Please make sure these components are enabled and configured with sufficient resources to ensure smooth operation.
In most scenarios with multiple concurrent users where a quick response time is desired, the inference resources will need to be scaled up accordingly.
For Assistant to serve 100 concurrent requests with a good user experience we recommend two inference workers per model. Depending on the user base 100 concurrent requests might translate to between 1,000 and 10,000 users.
We strongly recommend reviewing our Models recommendations for PhariaAssistant for guidance on selecting appropriate models.
Quick Reference
Component | Minimum CPU | Minimum Memory | Recommended Storage |
---|---|---|---|
PhariaAssistant (UI) | 100m | 256Mi | - |
PhariaAssistant API | 500m | 4Gi | - |
Database (PostgreSQL) | 1 CPU | 1Gi | 60 GiB |
Detailed Requirements
PhariaAssistant (UI)
- CPU: 100m cores
- Memory: 256Mi
- Scaling: Horizontal scaling recommended with increasing user base. In our experience a single pod with the mentioned specs can serve 100 concurrent requests.
PhariaAssistant API
- CPU: 500m cores
- Memory: 4Gi
- Scaling: Horizontal scaling recommended with increasing user base. In our experience a single pod with the mentioned specs can serve 100 concurrent requests.
Database (PostgreSQL)
- CPU: 1-2 cores
- Memory: 1-2Gi
- Storage: We recommend to allocate at least 1MiB of storage per user per day for storing traces, feedback and other application data. (This will be less when disabling trace collection.)