Async Jobs
By default, POST /v1/responses is synchronous; the connection stays open until the model finishes. For long-running requests (large reasoning models, many tool calls, complex multi-step tasks), async jobs let you fire and forget.
How It Works
Add "background": true to your request:
-
The API returns
202 Acceptedimmediately with a request ID (req_…). -
Processing continues on the server.
-
You poll
GET /v1/responses/{req_id}to check status. -
Once complete, the response contains the permanent
resp_…ID and full output.
Submitting a Background Job
-
curl
-
Python (OpenAI SDK)
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{
"model": "qwen3-32b-tool",
"input": "Write a short poem about distributed systems.",
"background": true
}'
Response (202 Accepted):
{
"id": "req_87e91719...",
"object": "response",
"status": "in_progress",
"output": [],
"background": true
}
response = client.responses.create(
model="qwen3-32b-tool",
input="Write a short poem about distributed systems.",
background=True,
)
print(response.id) # "req_87e91719..."
print(response.status) # "in_progress"
Polling for Completion
Poll GET /v1/responses/{req_id} until the status changes from in_progress:
-
curl
-
Python
# Poll until status changes
curl $BASE_URL/v1/responses/req_87e91719... \
-H "Authorization: Bearer $AA_TOKEN"
import time
req_id = response.id
deadline = time.time() + 120 # timeout after 2 minutes
while time.time() < deadline:
result = client.responses.retrieve(req_id)
print(f"Status: {result.status}, ID: {result.id}")
if result.status != "in_progress":
break
time.sleep(2)
else:
raise TimeoutError(f"Job {req_id} did not complete within 120s")
print(result.output_text)
Terminal Statuses
| Status | Meaning |
|---|---|
|
Job finished successfully: |
|
Job encountered an error: inspect |
When a job fails, the response includes an error object:
{
"id": "req_87e91719...",
"object": "response",
"created_at": 1711000000,
"model": "qwen3-32b-tool",
"status": "failed",
"error": {
"type": "server_error",
"message": "Request failed"
},
"background": true,
"output": [],
"usage": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
}
The error.type is "guardrail_violation" when the input was rejected by the safety guardrail, or "server_error" for all other failures.
Continuing from a Background Response
Background responses participate in multi-turn conversations just like synchronous ones. Pass the completed resp_… ID as previous_response_id:
-
curl
-
Python (OpenAI SDK)
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{
"model": "qwen3-32b-tool",
"input": "Now translate the poem into German.",
"previous_response_id": "resp_82559b5f...",
"background": true
}'
# Submit follow-up as another background job
follow_up = client.responses.create(
model="qwen3-32b-tool",
input="Now translate the poem into German.",
previous_response_id=result.id,
background=True,
)
# Poll for completion...
Important: You must wait until the previous background job has reached completed before chaining from it. Referencing an in_progress job as previous_response_id returns an error.