Resource Requirements Guide
This guide helps you plan the resources needed for your document processing and search infrastructure.
Quick Reference
Component | Minimum CPU | Minimum Memory | Recommended Storage |
---|---|---|---|
Pharia Search | 2 CPU | 1Gi | - |
PostgreSQL | 2 CPU | 1Gi | 128Gi |
Qdrant | 2 CPU | 64Gi | 96Gi |
Pharia Data API | 800m | 1Gi | - |
ETL Service | 500m | 1.5Gi | - |
Detailed Requirements
Pharia Search
- CPU: 2-4 cores
- Memory: 1-2Gi
- Scaling Factor:
documentBatchSize * averageDocumentSize * 2 + 256Mi
💡 Tip: For large documents or high concurrency, consider increasing CPU to 4 cores.
Database (PostgreSQL)
- CPU: 2-4 cores
- Memory: 1-2Gi
- Storage: Calculate as
rawDocumentData * 1.1
Example: A 2.8 million document corpus (like German Wikipedia) needs about 10GB storage.
Vector Database (Qdrant)
Resource requirements per replica (default is 3 replicas):
- CPU: 2-4 cores
- Memory: 64Gi recommended
- Storage: 96Gi minimum
Memory usage depends on:
- Number of documents
- Chunk size and overlap
- Search features enabled
Memory Calculator
For each index, estimate:
- Base:
4Gi * (512/chunkSize) * (1 + overlapPercentage)
- With BM25: Add 100% more for 512 token chunks
- With FilterIndex: Add
numberOfChunks * chunkSize * (1 + overlapPercentage) * metadataSize
Example: For 2.8M documents with:
- 256 token chunks
- 50% overlap
- BM25 enabled
- String metadata (20 chars average)
Total: 4Gi * 2 * 2 + 8Gi + (6M * 2 * 2 * 20B) = ~25Gi
Pharia Data API
- CPU: 800m-1 core
- Memory: 1-2Gi
- Scaling: Horizontal scaling recommended for high concurrency
ETL Service
- CPU: 500m-1 core per replica
- Memory: 1.5-3Gi per replica
- Replicas: 3 recommended
Best Practices
- Start Conservative: Begin with recommended minimums
- Monitor Usage: Watch resource utilization during initial operations
- Scale Gradually: Increase resources based on actual usage patterns
- Consider Growth: Plan for 30% headroom above current needs