Model Card Luminous
The Luminous series is a family of large language models. Large language models are powerful technological tools that can process and produce text. These capabilities emerge during model “training” where the model is exposed to significant amounts of human text data. Similar to a person who deliberately absorbs information while reading a whole library and half of the internet, large language models acquire structural understanding (and not necessarily also knowledge) of language and accumulated information about the world.
The Luminous family currently consists of three vanilla models, which differ in complexity and ability. They are, from the smallest to the largest, luminous-base, luminous-extended and luminous-supreme. All Luminous models are trained in the five most commonly spoken European languages: English, German, French, Italian and Spanish.
This model card amalgamates information for all foundation models of the Luminous family, for ease of reading and comparison. Much of the information provided is the same for every model. Where information for the models differ, this model card allows easy comparison across the different model sizes. Unless explicitly stated otherwise, the information presented here applies universally to all Luminous models.
Please note that this model card describes the foundational large language models of the Luminous family: fine-tuned versions of the models, including extensions (e.g., luminous-control and luminous-explore fine-tunings), are not detailed in this model card.
- Developed by: Aleph Alpha GmbH
- Model type: Autoregressive (causal, decoder only) transformer language model
- Language(s) (NLP): English, German, French, Spanish, Italian
- Multi-modality: luminous-base and luminous-extended provide multi-modal input capabilities (a prompt may consist of any combination of images and text). Multi-modal capabilities extend the number of parameters
- Model versions: The API will serve the model versions described in this model card until further notice. No restrictions, customizations or specializations are applied by any means. The same models are made available to users regardless of their country, geographic location, and the input language but subject to sanction regimes, technology export regulations, and other restrictions that may apply. In effect the same offering is provided to all countries within and external to the European Union if no legal restrictions apply.
|luminous-base||~13B (~15B with multi-modality)||luminous-base is the smallest model of the Luminous family. That makes it the fastest and cheapest to run. Therefore, this model is suited for applications where speed is important and costs should be low. It is well suited for tasks like classification and labelling but may struggle with more complex tasks requiring deeper understanding.|
|luminous-extended||~30B (~42B with multi-modality)||luminous-extended is our second-largest model. It is well suited for tasks like information extraction and language simplification. It performs better on a wide range of tasks compared to luminous-base and is faster and cheaper than luminous-supreme.|
|luminous-supreme||~70B||luminous-supreme is the largest model in the Luminous model family and the most capable. It can solve all natural language tasks that the smaller models can solve and is especially well suited to creative text writing applications or to applications where deeper text understanding is required.|
- On premise installation or AI-As-A-Service: contact us for options to deploy the Luminous models in your environment.
The model versions available via the API will be those described in this model card, unless further notice is provided. Please refer to the changelog for updates to the models served and the API interface.
No prompt data is stored when using the API or playground. No logging or other use (e.g. for further training) takes place on user provided data.
While Luminous models may be used for text generation or to explore the characteristics of a foundation model, they are, intended to be deployed as AI modules in an ecosystem of components at least incorporating adequate prompting to accomplish a downstream task (see downstream use). A plain model is unlikely to respond in a manner that is anticipated for a given use case.
Luminous models are intended as foundation models to be called with well designed prompts and included in an AI application. Use cases include but are not limited to:
- Text generation
- Question Answering
Out-of-Scope Use and Limitations
Bias, Risks, and Limitations with related recommendations
- Harmful language: language models may produce output that is harmful to a use case (undesired model generations with regards to insults, inappropriate tonality and style, systematic bias, instructions or recommendations for illegal behaviour, output of wrong or obsolete information, generation of (age) inappropriate content). Such output may be avoided by:
- Adequate prompt design
- Using a finetuned (control) model to follow instructions to rely on explicitly provided information
- Using a finetuned (control) model aimed at a decent tonality and style (incl. avoidance of insults)
- Checks using explainability to provide an audit trail on the application layer
- Performing other validations on the application layer
- Systematic biases: language models obtain world-knowledge from their pre-training data and as such may exhibit the same systematic biases that are present in the data. Differing deployment scenarios (including differing cultural contexts) can problematise systematic biases in differing ways. We acknowledge the cultural diversity of communities and users inside and outside the EU. For larger deployments, we encourage users to track systematic biases relevant to their use-case, and we are happy to consult on bespoke fine-tunings to alleviate such biases.
- Outdated world knowledge: pre-training is performed on a fixed dataset, created at a fixed date in the past. Accordingly, the world knowledge of foundation models is limited to the information contained in its training data. More recent information may not be known to the model or misunderstood when presented as input during live usage. This risk may be mitigated by:
- Prompt design and injection of context, where relevant
- Personally identifiable information: Models are not trained to provide, but may seem to provide personally identifiable information. This does not necessarily imply the presence of such information in training data due to a plausible hallucination. A user is required to avoid this unintended behaviour by:
- Performing validations on the application layer
- Prompt design and injection of context, where relevant
- Avoidance of use cases targeted at retrieval of personally identifiable information
- Generation of unintended, irrelevant or repetitive outputs may occur. This includes producing incorrect information and may be mitigated by:
- Performing validations on the application layer
- Using the repetition penalty or other parameters available in the API (see documentation )
- Prompt design
- Political bias: the Luminous family has not been optimized to represent a political opinion or take a specific point of view. It may generate outputs that contradict a user's opinion or expectation (e.g. produce hateful, violent or inappropriate, biased or discriminatory content). Such behaviour may be addressed by:
- Performing validations on the application layer (for example, via Red-Teaming or by semantic comparison of model outputs to undesired topics and tonalities, with our semantic search model, luminous-explore. This has the advantage of making the list of disallowed topics and tonalities configurable to differing cultural norms at the time of deployment.)
- Prompt design
- Mistaken for a human: users may attribute human traits to AI models. It is required to:
- Inform end users that they are interacting with or reading output of an AI
- Use luminous-*-control models which are more likely to include statements like "as an AI model" in the output
- Design an AI system in a way that mitigates the impact of unintended interpretation of output
- Other errors: Any AI module can produce errors, even after implementing all the recommended measures. When integrating foundation language models into an application one should:
- Be aware of the risk of (harmful) failure cases
- Implement the use case in a way that mitigates such risks
- Avoid the unsupervised use in high-stake environments
- Validate output with adequate measures dependent on the use case
- Deployment in high-stake settings: Language models are not agents and not optimized for prescriptive actions. The use of language models in high-stake environments, for critical decisions or to support a user's wellbeing is discouraged.
- Reproducibility: Some inference parameters (e.g. temperature) lead to the random sampling of outputs, which precludes reproducibility of outputs. Even when such parameters are not in use, outputs may diverge slightly on a numeric level for technical reasons. One may implement the following measures if needed.
- Users can log and recall past model outputs if exact recall is required on the application layer. Note that Aleph Alpha is not storing any data and/or using any data provided in prompts for the training of its LLMs..
- This list of risks, biases and limitations may not be complete. The research community is continuously improving behaviour and understanding of language models. This model card will be updated.
The Luminous family has been trained on a dataset compiled of sources in English, German, French, Spanish and Italian. While other languages may be represented in data, Luminous models are not evaluated on other languages and performance is likely to be worse. The explicit goal of the dataset is to provide an adequately balanced share of text of the different languages in different styles and formats. Pre-training of the Luminous models is fully self-supervised: no supervised datasets were included during pre-training. Please refer to the luminous-*-control and luminous-explore fine-tunings for more task specific fine-tunings. Note that we deliberately omit computer code from the training data.
The following tables provides a summarization of included training data.
|Dataset||Description||Percentage||Total Size (Tokenized)|
|Web Crawls||Large web scrape corpora (e.g. Common Crawl) containing various styles and sources||71%||2,77TB|
|Books||Fiction and non-fiction literature providing well-structured and coherent text on various topics||20%||0,79TB|
|Political and Legal Sources||Data provided by the EU parliament, legislation and speeches||5%||0,18TB|
|Wikipedia||Wikipedia provides well-structured and mostly factual information||2%||0,07TB|
|News||News articles from various journals||2%||0,06TB|
|Other||Collection of smaller, more specialized datasets (e.g. philosophy texts)||1%||0,02TB|
All models of the Luminous family were trained on the same data seed (order and mix), although for differing token counts.
All data was subjected to a data cleaning pipeline employing a ruleset for filtering data quality and otherwise unintended content. Given the amount of data and the variability in use cases there is a possibility of training data containing undesirable content (which cannot be excluded entirely). Undesirable content may refer to anything hateful, harmful, factually wrong or otherwise detrimental to the use case at hand. Please refer to the sections "Bias, Risks, and Limitations with related recommendations" for implications.
As Aleph Alpha we acknowledge and abide by copyright and related legislation. Text and data mining is performed in full compliance with such legislation, including Article 4 of Directive (EU) 2019/790 (adopted by Germany in Sec. 44b German Copyright Act (Urheberrechtsgesetz)) and its provisions on reservations by rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.
The Luminous tokenizer is a learned subword tokenizer and has the following properties.
- We use the byte-level Byte Pair Encoding (BPE) algorithm
- We apply NFC normalization
- A prefix whitespace is added
- Vocabulary size of 128000
- Tokens were learnt from a language-balanced sample of the training data
Text can be tokenized and detokenized both via the API, and on local compute. Please refer to the related documentation for examples.
The following table shows the number of train iterations with related token counts for the Luminous models. Efficiency is measured in TFlops in accordance with the Bloom implementation to provide comparability to the related paper. Train jobs on different topologies (different GPU counts) have been accounted for by a weighted average. Language pre-training used an order of magnitude more compute than subsequent multi-modal pre-training. Accordingly, the luminous-base and -extended architectures were optimised to make efficient use of A100 GPUs during language pre-training. This leads to a slightly lower efficiency for the multi-modal extensions, where the architecture is extended (see the MAGMA paper.)
|Model name||Parameter count||Iterations||Training tokens||TFlops|
|luminous-base multi-modal extension||~13B||60000||~31B||133|
|luminous-extended multi-modal extension||~30B||100000||~58B||121|
Aleph Alpha’s evaluations comprise accuracy based metrics. Next to accuracy based metrics HELM also evaluates with regards to calibration, robustness, fairness, general information, bias, toxicity and summarization. Results can be viewed in direct comparison to comparable models.
The Aleph Alpha API implements the explainable deep-learning algorithm AtMan for the explanation of outputs from the Luminous models. AtMan is applicable to any combination of image and text in the input, and functions by systematically manipulating the attention mechanisms of transformers to produce relevant sensitivity / heat maps across the input. This allows every-day users to understand which aspects of the input had most effect on a given aspect of the output, and in more sophisticated use cases can be used, for example, in detecting hallucinations. Please refer to the documentation on explainability and related code examples for more detail.
The Aleph Alpha data centre runs on 100% renewable energy such that no CO2 emissions are incurred for any inference job executed through the API. Training was run on a mixture of the data centre (zero emissions) and previously on cloud providers. The relevant train time results in the following CO2 emissions for each model. For calculation of the CO2 emissions, we make the following assumptions:
- The extension of a baseline model with multi-modal capability is included
- Datacentre efficiency and approximations as to CO2 emissions are used as reported by the cloud provider where applicable.
We report both the Carbon emitted by the GPUs during runtime (“carbon emitted”) and, the fractional contribution to emissions of the whole data-centre, according to Power Utilisation Efficiency (“Carbon emitted accounting for PUE”).
|Model||Hardware type||GPU hours used||Carbon emitted||Carbon emitted accounting for PUE||Note|
|luminous-base||A100 40GB||~95000h||~3.17 tons||~5.65 tons||includes extension to multi-modality|
|luminous-extended||A100 40GB||~360000h||11.95 tons||16.85 tons||includes extension to multi-modality|
|luminous-supreme||A100 40GB / A100 80GB||~839000h||6.45 tons||8.65 tons||Carbon emissions are lower compared to the number of GPU hours used due to extensive use of our own data centre, which runs on renewable energy|
Numbers may be put into context e.g. by reference to the paper ESTIMATING THE CARBON FOOTPRINT OF BLOOM, A 176B PARAMETER LANGUAGE MODEL.
Please direct questions, inquiries, suggestions or other forms of investigation to: