Resource requirements for PhariaAssistant

This article describes how to plan the resources required to use PhariaAssistant and its API.


Preliminary resource considerations

PhariaAssistant uses several other components of the PhariaAI stack. Depending on which PhariaAssistant features are used, these can include:

  • PhariaInference for language model completions

  • PhariaEngine for hosting AI skills

  • PhariaData for document processing

  • PhariaOS for hosting PhariaAI applications

Kake sure these components are enabled and configured with sufficient resources to ensure smooth operation.

To ensure a quick response time in scenarios with multiple concurrent users, PhariaInference resources need to be scaled up accordingly.

For PhariaAssistant to serve 100 concurrent requests with a good user experience, we recommend two PhariaInference workers per model. (Depending on your user base, 100 concurrent requests translates to between 1,000 and 10,000 users.)

We recommend reviewing the Models recommendations for PhariaAssistant article for guidance on selecting appropriate models.

Quick reference

Component

Minimum CPU

Minimum memory

Recommended storage

PhariaAssistant (UI)

100m cores

256Mi

-

PhariaAssistant API

500m cores

4Gi

-

Database (PostgreSQL)

1 CPU

1Gi

60 GiB

Detailed requirements

PhariaAssistant (UI)

  • CPU: 100m cores

  • Memory: 256Mi

  • Scaling: Horizontal scaling recommended with increasing user base. In our experience, a single pod with these specs can serve 100 concurrent requests.

PhariaAssistant API

  • CPU: 500m cores

  • Memory: 4Gi

  • Scaling: Horizontal scaling recommended with increasing user base. In our experience, a single pod with these specs can serve 100 concurrent requests.

Database (PostgreSQL)

  • CPU: 1-2 cores

  • Memory: 1-2Gi

  • Storage: We recommend to allocate at least 1MiB of storage per user per day for storing traces, feedback and other application data. (This can be less if you disable trace collection.)