Aleph Alpha Docs

📄️ Improved Summarization Endpoint

Back in April, we released an initial version of our Summarization endpoint, which allowed for summarizing text using our language models.

📄️ New Token Management

Token Management List

📄️ Low Credit Balance Notifications

We are extremely grateful that so many of you trust us with your AI needs. We constantly strive to improve the speed and reliability of our API, and the last thing we want is for your requests to start getting rejected because you ran out of credits and didn't notice.

📄️ Python Client v2.5 - Async Support

We're excited to announce that we have added async support for our Python client! You can now upgrade to v2.5.0, and import AsyncClient to get started making requests to our API in async contexts.

📄️ March 2023 API updates

In the last few weeks we introduced a number of features to improve your experience with our models. We hope they will make it easier for you to test, develop, and productionize the solutions built on top of Luminous. In this changelog we want to inform you about the following changes:

📄️ Control Model Updates

We are happy to announce that we have improved our luminous-control models.

📄️ Verify your on-premise installation and measure its performance

To check that your installation works, we provide a script that uses the Aleph Alpha Python client to check if your system has been configured correctly. This script will report which models are currently available and provide some basic performance measurements for those models.

📄️ Introducing API-scheduler-worker interface deprecation time frame

We have now introduced a 2 week deprecation time frame for compatibility between API-scheduler and worker.

📄️ Intelligence Layer Release 1.0.0

We're happy to announce the public release of our Intelligence Layer-SDK.

📄️ Introducing paged attention and dynamic batching to our LLM workers

Batching is a natural way to improve throughput of transformer-based large language models.

📄️ Intelligence Layer Release 3.0.0

What's new with version 3.0.0

📄️ Introducing CUDA graph caching

With version api-worker-luminous:2024-06-06-04729 of our luminous inference workers, we support CUDA graph caching.

📄️ Intelligence Layer Release 4.0.1

What's new with version 4.0.1

📄️ Intelligence Layer Release 4.1.0

What's new with version 4.1.0

📄️ Intelligence Layer Release 5.0.0

What's new with version 5.0.0

📄️ Intelligence Layer Release 5.0.1

What's new with version 5.0.1

📄️ Introducing tensor parallel inference and CUDA graph caching for adapter-based models

With version worker version api-worker-luminous:2024-07-08-0d839 of our luminous inference workers, we now support Tensor parallelism for all of our supported models and CUDA graph caching for adapter-based models.

📄️ Introducing chat endpoint in Aleph Alpha inference stack

With version api-scheduler:2024-07-25-0b303 of our inference stack API-scheduler, we now support a /chat/completions endpoint. This endpoint can be used to prompt a chat-capable LLM with a conversation history and a prompt to generate a continuation of the conversation. The endpoint is available for all models that support the chat capability. The endpoint is compatible with OpenAI's /chat/completions endpoint.

📄️ Announcing new unified worker configuration file format

With worker version api-worker-luminous:2024-08-15-0cdc0 of our inference stack worker, we introduce a new unified and versioned configuration format for our workers. Instead of 2 configuration files the worker can now be configured with a single configuration file.

📄️ Announcing support for Llama 3.1 models in our inference stack

Meta has recently released their version 3.1 of the Llama family of language models.

📄️ Announcing release of Pharia embedding model

We are happy to bring to you our new Pharia Embedding model (Pharia-1-Embedding-4608-control) that builds on our latest Pharia LLM. The model is trained with adapters on top of (frozen) Pharia LLM weights and thus can be served on the same worker for both completion requests and embedding requests (see figure below). You can read more about the training details and evaluations of the embedding model in our model card.

What's new