📄️ Improved Summarization Endpoint
Back in April, we released an initial version of our Summarization endpoint, which allowed for summarizing text using our language models.
📄️ New Token Management
Token Management List
📄️ Low Credit Balance Notifications
We are extremely grateful that so many of you trust us with your AI needs. We constantly strive to improve the speed and reliability of our API, and the last thing we want is for your requests to start getting rejected because you ran out of credits and didn't notice.
📄️ Python Client v2.5 - Async Support
We're excited to announce that we have added async support for our Python client! You can now upgrade to v2.5.0, and import AsyncClient to get started making requests to our API in async contexts.
📄️ March 2023 API updates
In the last few weeks we introduced a number of features to improve your experience with our models. We hope they will make it easier for you to test, develop, and productionize the solutions built on top of Luminous. In this changelog we want to inform you about the following changes:
📄️ Control Model Updates
We are happy to announce that we have improved our luminous-control models.
📄️ Verify your on-premise installation and measure its performance
To check that your installation works, we provide a script that uses the Aleph Alpha Python client to check if your system has been configured correctly. This script will report which models are currently available and provide some basic performance measurements for those models.
📄️ Introducing API-scheduler-worker interface deprecation time frame
We have now introduced a 2 week deprecation time frame for compatibility between API-scheduler and worker.
📄️ Intelligence Layer Release 1.0.0
We're happy to announce the public release of our Intelligence Layer-SDK.
📄️ Introducing paged attention and dynamic batching to our LLM workers
Batching is a natural way to improve throughput of transformer-based large language models.
📄️ Intelligence Layer Release 3.0.0
What's new with version 3.0.0
📄️ Introducing CUDA graph caching
With version api-worker-luminous:2024-06-06-04729 of our luminous inference workers, we support CUDA graph caching.
📄️ Intelligence Layer Release 4.0.1
What's new with version 4.0.1
📄️ Intelligence Layer Release 4.1.0
What's new with version 4.1.0
📄️ Intelligence Layer Release 5.0.0
What's new with version 5.0.0
📄️ Intelligence Layer Release 5.0.1
What's new with version 5.0.1
📄️ Introducing tensor parallel inference and CUDA graph caching for adapter-based models
With version worker version api-worker-luminous:2024-07-08-0d839 of our luminous inference workers, we now support Tensor parallelism for all of our supported models and CUDA graph caching for adapter-based models.
📄️ Introducing chat endpoint in Aleph Alpha inference stack
With version api-scheduler:2024-07-25-0b303 of our inference stack API-scheduler, we now support a /chat/completions endpoint. This endpoint can be used to prompt a chat-capable LLM with a conversation history and a prompt to generate a continuation of the conversation. The endpoint is available for all models that support the chat capability. The endpoint is compatible with OpenAI's /chat/completions endpoint.
📄️ Announcing new unified worker configuration file format
With worker version api-worker-luminous:2024-08-15-0cdc0 of our inference stack worker, we introduce a new unified and versioned configuration format for our workers. Instead of 2 configuration files the worker can now be configured with a single configuration file.
📄️ Announcing support for Llama 3.1 models in our inference stack
Meta has recently released their version 3.1 of the Llama family of language models.
📄️ Announcing release of Pharia embedding model
We are happy to bring to you our new Pharia Embedding model (Pharia-1-Embedding-4608-control) that builds on our latest Pharia LLM. The model is trained with adapters on top of (frozen) Pharia LLM weights and thus can be served on the same worker for both completion requests and embedding requests (see figure below). You can read more about the training details and evaluations of the embedding model in our model card.
📄️ Announcing support for numerous additional open-source models through vLLM-based worker
Today we are happy to announce the support of more open-source models in the Aleph-Alpha stack
📄️ Announcing constrained decoding to ensure JSON format
Overview
📄️ Improvements in AtMan speed
With version api-worker-luminous:2024-10-30-094b5 of our luminous inference workers, we've improved the speed of inference when running with our Attention Manipulation mechanism.
📄️ Announcing token stream support for complete endpoint and Python Client
In version api-scheduler:2024-10-01-00535 of our inference stack API-scheduler, we added a new stream property to the /complete endpoint to enable streamed token generation.