Model recommendations for PhariaAssistant

At PhariaAssistant, we continuously strive to provide users with access to the best and most suitable AI models available. Our selection currently includes several variants of the Llama family models, chosen to accommodate different hardware capacities and usage requirements.

We regularly evaluate and test the latest language models, ensuring our customers benefit from highly effective, cost-efficient solutions tailored to their specific needs. As customer demands evolve, we will continue expanding our support for additional models.

In our tests, we found the Llama 3.1-8B-Instruct model particularly effective for summarization tasks, especially when dealing with large documents exceeding the model's context window. Conversely, the larger Llama 3.3-70B-Instruct and Llama 3.1-70B-Instruct models excel in question-answering tasks due to their precise instruction-following capabilities and alignment with source documents.

Llama 3.3-70B-Instruct and Llama 3.1-70B-Instruct share the same architecture and thus have identical hardware requirements. Llama 3.3-70B-Instruct is optimized specifically for dialogue applications and consistently outperforms Llama 3.1-70B-Instruct across multiple benchmark tasks. Given these advantages, we recommend Llama 3.3-70B-Instruct for most scenarios requiring high-quality responses and nuanced understanding.

Here's a comparative table highlighting strengths, best uses, and limitations of each model:

Model	Strengths	Best for	Limitations
Llama 3.1-8B-Instruct	• Efficient resource utilization • Effective summarization • Good performance when recursive summarization needed	• Document summarization • Processing large documents • Deployments with resource constraints	• Less sophisticated instruction-following • Less precise for complex QA tasks • May generate less nuanced responses
Llama 3.1-70B-Instruct	• Strong instruction-following • Precise alignment with source documents • High-quality responses across languages	• Complex question-answering tasks • Applications needing detailed responses • Cases where accuracy is critical	• High computational requirements • Expensive to deploy and run • Slower inference speeds
Llama 3.3-70B-Instruct	• Advanced instruction-following • Optimized for dialogue and contextual understanding • Improved coherence and nuanced responses • Most recent knowledge cutoff (Dec 2024)	• Mission-critical QA applications • Applications benefiting from nuanced dialogue and recent knowledge	• High computational requirements • Expensive to deploy • May be excessive for simpler tasks

This table should assist you in selecting the most appropriate model based on your specific use case, balancing performance needs with computational resources and cost considerations.

To install models in your Pharia AI environment, see Worker Deployment and Model Weights Downloaders. In the Prerequisites section is possible to get a better view on the needed hardware.

To configure these models for use in PhariaAssistant, see Configuring which models are used by PhariaAssistant.