Model recommendations for PhariaAssistant

At Aleph Alpha, we continuously strive to provide you with access to the best and most suitable AI models available. We regularly evaluate and test the latest language models, ensuring you benefit from highly effective, cost-efficient solutions tailored to your specific needs.

Our selection currently includes several variants of the Llama family models, chosen to accommodate different hardware capacities and usage requirements. In our tests, we found the Llama 3.1-8B-Instruct model particularly effective for summarisation tasks, especially when dealing with large documents exceeding the model’s context window. Conversely, the larger Llama 3.3-70B-Instruct and Llama 3.1-70B-Instruct models excel in question-answering tasks due to their precise instruction-following capabilities and alignment with source documents.

Llama 3.3-70B-Instruct and Llama 3.1-70B-Instruct share the same architecture and thus have identical hardware requirements. Llama 3.3-70B-Instruct is optimised specifically for dialogue applications and consistently outperforms Llama 3.1-70B-Instruct across multiple benchmark tasks. Given these advantages, we recommend Llama 3.3-70B-Instruct for most scenarios requiring high-quality responses and nuanced understanding.

The following is a comparative table highlighting strengths, best uses, and limitations of each model:

Model Strengths Best for Limitations Hardware Requirements

Llama 3.1-8B-Instruct

  • Efficient resource utilisation

  • Effective summarisation

  • Good performance when recursive summarisation needed

  • Document summarisation

  • Processing large documents

  • Deployments with resource constraints

  • Less sophisticated instruction-following

  • Less precise for complex QA tasks

  • May generate less nuanced responses

40 Gb of VRAM required.
For example, you can do this with:

  • 1 x L40 (48 Gb)

  • 1 x A100 / H100 (80 Gb) with a 50 % split with MiG mode

However, it is faster when using a full A100 / H100 or similar.

Llama 3.1-70B-Instruct

  • Strong instruction-following

  • Precise alignment with source documents

  • High-quality responses across languages

  • Complex question-answering tasks

  • Applications needing detailed responses

  • Cases where accuracy is critical

  • High computational requirements

  • Expensive to deploy and run

  • Slower inference speeds

320 Gb of VRAM recommended.

For example, you can do this with 4 x A100 / H100 or equivalent

Llama 3.3-70B-Instruct

  • Advanced instruction-following

  • Optimised for dialogue and contextual understanding

  • Improved coherence and nuanced responses

  • Most recent knowledge cutoff (Dec 2024)

  • Mission-critical QA applications

  • Applications benefiting from nuanced dialogue and recent knowledge

  • High computational requirements

  • Expensive to deploy

  • May be excessive for simpler tasks

320 Gb of VRAM recommended.

For example, you can do this with 4 x A100 / H100 or equivalent

This table is intended to assist you in selecting the most appropriate model based on your specific use case, balancing performance needs with computational resources and cost considerations.

To install models in your PhariaAI environment, see Deploying workers and Configuring model weights downloaders.

The Prerequisites article in the installation guide can help you get a better overview of the hardware options.

To configure these models for use in PhariaAssistant, see Configuring models for PhariaAssistant.