2 posts tagged with "feature"

Announcing support for Llama 3.1 models in our inference stack

August 22, 2024 · One min read

Engineering Manager

Meta has recently released their version 3.1 of the Llama family of language models. With worker version api-worker-luminous:2024-08-15-0cdc0 of our inference stack worker, we now support these models in our inference stack as well. However, we do not provide the model weights, as usual, in our JFrog Artifactory but instead ask you to download them from huggingface where Meta provides them directly.

To make use of the new models, these are the steps you need to follow:

Download the model weights from huggingface, for example using this command:

huggingface-cli download --local-dir /path/to/Meta-Llama-3.1-8B-Instruct meta-llama/Meta-Llama-3.1-8B-Instruct

Configure your worker with our new configuration format:

edition = 1

[queue]
url = "<your API URL>"
token = "<your API token>"
checkpoint_name = "llama-3.1-8B-instruct"

[monitoring]
metrics_port = 4000
tcp_probes = []

[generator]
type = "luminous"
pipeline_parallel_size = 1
tensor_parallel_size = 1
huggingface_model_directory = "/path/to/Meta-Llama-3.1-8B-Instruct"
tokenizer_path = "/path/to/Meta-Llama-3.1-8B-Instruct/tokenizer.json"
weight_set_directories = []

Notice that the huggingface_model_directory is the path where you downloaded the model weights to. This field is only supported in the new configuration format, which has been introduced in this previous blogpost.

Introducing chat endpoint in Aleph Alpha inference stack

July 26, 2024 · One min read

Andreas Hartel

Engineering Manager

With version api-scheduler:2024-07-25-0b303 of our inference stack API-scheduler, we now support a /chat/completions endpoint. This endpoint can be used to prompt a chat-capable LLM with a conversation history and a prompt to generate a continuation of the conversation. The endpoint is available for all models that support the chat capability. The endpoint is compatible with OpenAI's /chat/completions endpoint.

Documentation for the endpoint can be found at https://docs.aleph-alpha.com/api/chat-completions/.

Currently, the endpoint supports the following models:

llama-3-8b-instruct
llama-3-70b-instruct
llama-2-7b-chat
llama-2-13b-chat
llama-2-70b-chat