Async Jobs

By default, POST /v1/responses is synchronous; the connection stays open until the model finishes. For long-running requests (large reasoning models, many tool calls, complex multi-step tasks), async jobs let you fire and forget.

How It Works

Add "background": true to your request:

  1. The API returns 202 Accepted immediately with a request ID (req_…​).

  2. Processing continues on the server.

  3. You poll GET /v1/responses/{req_id} to check status.

  4. Once complete, the response contains the permanent resp_…​ ID and full output.


Submitting a Background Job

  • curl

  • Python (OpenAI SDK)

curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{
    "model": "qwen3-32b-tool",
    "input": "Write a short poem about distributed systems.",
    "background": true
  }'

Response (202 Accepted):

{
  "id": "req_87e91719...",
  "object": "response",
  "status": "in_progress",
  "output": [],
  "background": true
}
response = client.responses.create(
    model="qwen3-32b-tool",
    input="Write a short poem about distributed systems.",
    background=True,
)

print(response.id)      # "req_87e91719..."
print(response.status)  # "in_progress"

ID Transition

The initial req_…​ ID is a job handle for polling. Once the job completes, GET /v1/responses/{req_id} returns the finished object with its permanent resp_…​ ID. Use the resp_…​ ID for all follow-up operations (delete, continue, etc.).


Polling for Completion

Poll GET /v1/responses/{req_id} until the status changes from in_progress:

  • curl

  • Python

# Poll until status changes
curl $BASE_URL/v1/responses/req_87e91719... \
  -H "Authorization: Bearer $AA_TOKEN"
import time

req_id = response.id
deadline = time.time() + 120  # timeout after 2 minutes

while time.time() < deadline:
    result = client.responses.retrieve(req_id)
    print(f"Status: {result.status}, ID: {result.id}")

    if result.status != "in_progress":
        break
    time.sleep(2)
else:
    raise TimeoutError(f"Job {req_id} did not complete within 120s")

print(result.output_text)

Terminal Statuses

Status Meaning

completed

Job finished successfully: output is populated

failed

Job encountered an error: inspect error field

When a job fails, the response includes an error object:

{
  "id": "req_87e91719...",
  "object": "response",
  "created_at": 1711000000,
  "model": "qwen3-32b-tool",
  "status": "failed",
  "error": {
    "type": "server_error",
    "message": "Request failed"
  },
  "background": true,
  "output": [],
  "usage": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
}

The error.type is "guardrail_violation" when the input was rejected by the safety guardrail, or "server_error" for all other failures.


Continuing from a Background Response

Background responses participate in multi-turn conversations just like synchronous ones. Pass the completed resp_…​ ID as previous_response_id:

  • curl

  • Python (OpenAI SDK)

curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{
    "model": "qwen3-32b-tool",
    "input": "Now translate the poem into German.",
    "previous_response_id": "resp_82559b5f...",
    "background": true
  }'
# Submit follow-up as another background job
follow_up = client.responses.create(
    model="qwen3-32b-tool",
    input="Now translate the poem into German.",
    previous_response_id=result.id,
    background=True,
)

# Poll for completion...

Important: You must wait until the previous background job has reached completed before chaining from it. Referencing an in_progress job as previous_response_id returns an error.


When to Use Background vs. Synchronous

Scenario Recommended Mode

Short prompts, interactive UIs

Synchronous or Streaming

Long reasoning tasks, many tool calls

Background

Batch processing many requests

Background (submit all, poll in parallel)

Fire-and-forget jobs

Background (check result later)