Skip to main content

Python Client v2.5 - Async Support

· 5 min read
Ben Brandt

We're excited to announce that we have added async support for our Python client! You can now upgrade to v2.5.0, and import AsyncClient to get started making requests to our API in async contexts.

When using simple scripts or a Jupyter notebook to experiment with our API, it is easy enough to use the default, synchronous client. But for many production use cases, async support unlocks a way to have concurrent requests and use with frameworks that take advantage of async (i.e. FastAPI's async def syntax for path operation functions).

We built AsyncClient on top of aiohttp, so you should be able to use it within any Python async runtime without any blocking I/0.

How to use AsyncClient

Context-Managed Client

When you are within an async context and just want to quickly create a client and make a request, the async context manager approach is ideal as it takes care of the setup and teardown for you.

Within an async function, you can enter the context manager and get a client that can make a request for a completion, embedding, evaluation, or any of our other API endpoints.

import os
from aleph_alpha_client import AsyncClient, CompletionRequest, Prompt

# Can enter context manager within an async function
async with AsyncClient(token=os.environ["AA_TOKEN"]) as client:
request = CompletionRequest(
prompt=Prompt.from_text("Request"),
maximum_tokens=64,
)
response = await client.complete(request, model="luminous-base")

Explicit Client Lifecycle

While the context manager approach is simple, there are cases where the overhead of creating a client each time you want to do a request isn't ideal. You could do a lot of work within a context manager, but if you are using the client in your own API, you may want to be able to take advantage of the connection pool over the lifetime of your application.

When you want to reuse the client, you can create it outside of a context manager. You just have to manage closing the client yourself at the end of your script or application. You make requests with the client the same way.

import os
from aleph_alpha_client import AsyncClient, CompletionRequest, Prompt

# Creation of client should always happen within an async function
client = AsyncClient(token=os.environ["AA_TOKEN"])

request = CompletionRequest(
prompt=Prompt.from_text("Request"),
maximum_tokens=64,
)
response = await client.complete(request, model="luminous-base")

# Make sure to close at the end of your script/app to close the
# underlying connection pool
await client.close()
caution

If you are creating a client outside of a context manager, make sure to always create a client within an event loop.

From the aiohttp docs, "Why is creating a ClientSession outside of an event loop dangerous?":

Short answer is: life-cycle of all asyncio objects should be shorter than life-cycle of event loop.

Full explanation is longer. All asyncio object should be correctly finished/disconnected/closed before event loop shutdown. Otherwise user can get unexpected behavior. In the best case it is a warning about unclosed resource, in the worst case the program just hangs, awaiting for coroutine is never resumed etc.

Advanced Use Cases

Concurrent Requests

We know many of you are building workflows on top of several of our lower-level requests. And this sometimes requires firing multiple requests before doing the next step.

Within an async runtime, this now no longer requires creating a thread pool, you can just use normal asyncio code.

import asyncio
import os
from aleph_alpha_client import AsyncClient, CompletionRequest, Prompt

# Within an async function
async with AsyncClient(token=os.environ["AA_TOKEN"]) as client:
# You have several prompts you need to generate completions for
requests = (
CompletionRequest(prompt=Prompt.from_text(prompt), maximum_tokens=64)
for prompt in ("Fewshot prompt 1", "Fewshot prompt 2")
)
# await the requests together
responses = await asyncio.gather(
*(client.complete(req, model="luminous-base") for req in requests)
)

Handling Large Volumes of Concurrent Requests

We also know several of you who like to send us large amounts of requests 😀 We do have some limits on the amount of concurrent requests you can make to the API, but you can still handle large workloads using asyncio primitives.

import asyncio
import os
from aleph_alpha_client import AsyncClient, CompletionRequest, Prompt

# Within an async function
async with AsyncClient(token=os.environ["AA_TOKEN"]) as client:
# Lots of requests to execute
requests = (
CompletionRequest(
prompt=Prompt.from_text(f"Prompt {i}"),
maximum_tokens=64,
)
for i in range(1000)
)
# Correct number dependent on use case (i.e. model, size of task...)
conc_req = 40
responses = await gather_with_concurrency(
conc_req,
*(client.complete(req, model="luminous-base") for req in requests),
)

# Helper for limiting number of requests at once
# Based on: https://blog.jonlu.ca/posts/async-python-http
async def gather_with_concurrency(n, *tasks):
semaphore = asyncio.Semaphore(n)

async def sem_task(task):
async with semaphore:
return await task

return await asyncio.gather(*(sem_task(task) for task in tasks))
note

Because we are trying to be fair to all of our customers, too many parallel requests will start getting rejected.

Finding the ideal number of concurrent requests will depend on the model you are using, and how large the prompt and completion are. Larger models typically take longer than smaller ones.

By default, the client will retry certain error codes up to three times before raising the exception. But you will still want to handle these exception cases in your production environments.