Skip to main content

Improved Summarization Endpoint

· 6 min read
Ben Brandt

Back in April, we released an initial version of our Summarization endpoint, which allowed for summarizing text using our language models.

We quickly noticed however, as did some of you, that it had a few problems when we tried integrating it into projects:

  • Document length was limited to the context size of the model (only ~2000 tokens available for the document).
  • As the documents grew larger, the endpoint became much more likely to just return the first sentence of the document.

In the meantime, we released some improvements to our Q&A endpoint that resolved similar problems for that use case. Particularly for Q&A, we can:

  • efficiently process really large documents, returning the best answers across the entire document.
  • process Docx and Text documents, not just Prompts.

With these improvements to our document handling, we were able to come back to Summarization with new approaches and better results.

What's New

New Document Formats

Summarization now supports all of our Document formats:

  • Docx: A base64 encoded Docx file
  • Text: A string of text
  • Prompt: A multimodal prompt, as used in our other tasks like Completion

Docx and Text documents are usually preferred, as we apply optimizations to them for the respective task they are being used for. Importantly for Q&A and Summarization, this includes some task-specific section chunking logic that allows us to process larger documents.

Prompt documents are the most flexible, but because of that we don't apply as many optimizations to afford for more advanced use cases where a user wants full control of their document. Currently, this is also the only way to experiment with summarizing documents that contain images. We are looking to expand this support to Docx as well (currently we only extract the text of the file), so any experimentation you want to do with multimodal documents should be done with Prompt documents.

Examples of how you can summarize the different document types using our Python client:

import os
from aleph_alpha_client import (

model = AlephAlphaModel(
AlephAlphaClient(host="", token=os.getenv("AA_TOKEN")),
model_name = "luminous-extended"

# Docx document
document = Document.from_docx_file("./sample.docx")
request = SummarizationRequest(document)

result = model.summarize(request)

# Text document
document = Document.from_text("In imperative programming, a computer program is a sequence of instructions in a programming language that a computer can execute or interpret.")
request = SummarizationRequest(document)

result = model.summarize(request)

# Multimodal Prompt from
image = ImagePrompt.from_url("")
text = "Blockbuster's beginnings can be traced back to another company, Cook Data Services, founded by David Cook in 1978.[3][20] The company's primary goal was to supply software services to the oil and gas industries throughout Texas, but it was very unsuccessful.[20] Sandy Cook, David's wife, wanted to get into the video business, and her husband would soon study the industry and future prospects.[21] Using profit he made from the sale of David P. Cook & Associates, the subsidiary of his company, he decided to buy into a video store franchise in Dallas known as Video Works. When Video Works would not allow him to decorate the interior of his store with a blue-and-yellow design, he departed the franchise and opened the first Blockbuster Video in 1985 under his own company Blockbuster Video Inc.[22][23] When he realized the potential in video rentals, Cook abandoned the oil industry and began franchising the Blockbuster store."

document = Document.from_prompt([image, text])
request = SummarizationRequest(document)

result = model.summarize(request)

You can also try this out in our Summarization Playground, and submit your own Docx or Text documents.

Larger Documents

In moving to the shared document formats, we are now able to process much larger documents for the Summarization task.

There are a few limitations currently to document length:

  • Maximum request size of 10MB (keep in mind that base64 encoding a Docx file will increase the file size)
  • We will also reject documents that we aren't confident we can process in our request time limit (3 minutes) with our current capacity.

Even with these limitations, however, this has already vastly increased the usability of the endpoint for our own use cases.

Improved Formatting of the Summary Result

Previously, since we could only process small documents, we just returned the summary block of text, usually a single sentence.

Now that we can process larger documents, the output needed to adapt for the summary to still be readable. For documents that have more to summarize, we now return the summary as a string of bullet points, each bulleted item being a summary of a section of the document, in the same order as the sections appeared in the document.

The output is still a string, so no API changes, you will just notice the improvement when rendering the result.

What's Next?

Summarization is by no means a solved problem, and there are still more improvements to be made. However, we felt the current improvements were already valuable and worth releasing so you could try it out yourself.

We greatly value your feedback, so please let us know how well Summarization is working for your use case, what you like, and what could be improved.

Other changes you might have missed

Luminous-Supreme is here!

We are happy to announce that Luminous-Supreme is now available!

After Luminous-Base and Luminous-Extended, Luminous-Supreme is the newest and most powerful generation of our multilingual language models.

You can it out yourself in our

  • Playground: select "luminous-supreme" as a model on the top left
  • Python Client: specify "luminous-supreme" as the model_name in your AlephAlphaModel
  • HTTP API: specify "model": "luminous-supreme" in your request body

Hosting: specify where a request is processed

We've also added a new hosting parameter to our main request endpoints. It is an optional parameter which determines which data centers the request may be processed in. You can either set the parameter to "aleph-alpha" or omit it (defaulting to null/None).

Not setting this value, or setting it to null/None, gives us maximal flexibility in processing your request in our own data centers and on our servers hosted with other cloud providers. Choose this option for maximal availability.

Setting it to "aleph-alpha" allows us to only process the request in our own data centers. Choose this option for maximal data privacy.