Announcing token stream support for complete endpoint and Python Client
In version api-scheduler:2024-10-01-00535
of our inference stack API-scheduler, we added a new stream
property to the /complete
endpoint to enable streamed token generation.
When using streamed token generation, tokens are transmitted as soon as they have been computed. This means, the user will receive instant feedback and can decide to cancel the transmission of tokens if the result turns out to be unsatisfying.
Documentation for the updated endpoint can be found at https://docs.aleph-alpha.com/api/complete/.