Skip to main content

Resource Requirements Guide

This guide helps you plan the resources needed for your document processing and search infrastructure.

Quick Reference

ComponentMinimum CPUMinimum MemoryRecommended Storage
Pharia Search2 CPU1Gi-
PostgreSQL2 CPU1Gi128Gi
Qdrant2 CPU64Gi96Gi
Pharia Data API800m1Gi-
ETL Service500m1.5Gi-

Detailed Requirements

  • CPU: 2-4 cores
  • Memory: 1-2Gi
  • Scaling Factor: documentBatchSize * averageDocumentSize * 2 + 256Mi

💡 Tip: For large documents or high concurrency, consider increasing CPU to 4 cores.

Database (PostgreSQL)

  • CPU: 2-4 cores
  • Memory: 1-2Gi
  • Storage: Calculate as rawDocumentData * 1.1

Example: A 2.8 million document corpus (like German Wikipedia) needs about 10GB storage.

Vector Database (Qdrant)

Resource requirements per replica (default is 3 replicas):

  • CPU: 2-4 cores
  • Memory: 64Gi recommended
  • Storage: 96Gi minimum

Memory usage depends on:

  1. Number of documents
  2. Chunk size and overlap
  3. Search features enabled

Memory Calculator

For each index, estimate:

  • Base: 4Gi * (512/chunkSize) * (1 + overlapPercentage)
  • With BM25: Add 100% more for 512 token chunks
  • With FilterIndex: Add numberOfChunks * chunkSize * (1 + overlapPercentage) * metadataSize

Example: For 2.8M documents with:

  • 256 token chunks
  • 50% overlap
  • BM25 enabled
  • String metadata (20 chars average)

Total: 4Gi * 2 * 2 + 8Gi + (6M * 2 * 2 * 20B) = ~25Gi

Pharia Data API

  • CPU: 800m-1 core
  • Memory: 1-2Gi
  • Scaling: Horizontal scaling recommended for high concurrency

ETL Service

  • CPU: 500m-1 core per replica
  • Memory: 1.5-3Gi per replica
  • Replicas: 3 recommended

Best Practices

  1. Start Conservative: Begin with recommended minimums
  2. Monitor Usage: Watch resource utilization during initial operations
  3. Scale Gradually: Increase resources based on actual usage patterns
  4. Consider Growth: Plan for 30% headroom above current needs