Skip to main content

Announcing token stream support for complete endpoint and Python Client

In version api-scheduler:2024-10-01-00535 of our inference stack API-scheduler, we added a new stream property to the /complete endpoint to enable streamed token generation.

When using streamed token generation, tokens are transmitted as soon as they have been computed. This means, the user will receive instant feedback and can decide to cancel the transmission of tokens if the result turns out to be unsatisfying.

Completion stream in Python Client demo

Documentation for the updated endpoint can be found at https://docs.aleph-alpha.com/api/complete/.