Resource requirements for PhariaAssistant

This article describes how to plan the resources required to use PhariaAssistant and its API.

In this article:

Preliminary resource considerations
Quick reference
Detailed requirements

Preliminary resource considerations

PhariaAssistant uses several other components of the PhariaAI stack. Depending on which PhariaAssistant features are used, these can include:

PhariaInference for language model completions
PhariaEngine for hosting AI skills
PhariaData for document processing
PhariaOS for hosting PhariaAI applications

Kake sure these components are enabled and configured with sufficient resources to ensure smooth operation.

To ensure a quick response time in scenarios with multiple concurrent users, PhariaInference resources need to be scaled up accordingly.

For PhariaAssistant to serve 100 concurrent requests with a good user experience, we recommend two PhariaInference workers per model. (Depending on your user base, 100 concurrent requests translates to between 1,000 and 10,000 users.)

We recommend reviewing the Models recommendations for PhariaAssistant article for guidance on selecting appropriate models.

Quick reference

Component

Minimum CPU

Minimum memory

Recommended storage

PhariaAssistant (UI)

100m cores

256Mi

PhariaAssistant API

500m cores

4Gi

Database (PostgreSQL)

1 CPU

1Gi

60 GiB

Detailed requirements

PhariaAssistant (UI)

CPU: 100m cores
Memory: 256Mi
Scaling: Horizontal scaling recommended with increasing user base. In our experience, a single pod with these specs can serve 100 concurrent requests.

PhariaAssistant API

CPU: 500m cores
Memory: 4Gi
Scaling: Horizontal scaling recommended with increasing user base. In our experience, a single pod with these specs can serve 100 concurrent requests.

Database (PostgreSQL)

CPU: 1-2 cores
Memory: 1-2Gi
Storage: We recommend to allocate at least 1MiB of storage per user per day for storing traces, feedback and other application data. (This can be less if you disable trace collection.)