Multi-Turn Conversations

The Responses API manages conversation history for you. Instead of resending the full message history on every request, you pass previous_response_id and the server reconstructs the context automatically.

How It Works

Each response has a unique id. To continue a conversation, pass that id as previous_response_id in the next request. The server:

  1. Walks backward through the response chain

  2. Reconstructs the full conversation history in chronological order

  3. Appends your new input

  4. Forwards the complete context to the LLM

Turn 1:  input="My name is Alice"                                    → resp_001
Turn 2:  input="What is my name?" + previous_response_id=resp_001   → resp_002
Turn 3:  input="Thanks!"          + previous_response_id=resp_002   → resp_003

You only store the latest response.id; the server handles the rest.

Two-Turn Conversation

  • curl

  • Python (OpenAI SDK)

  • Python (PydanticAI)

  • Python (LangGraph)

# Turn 1: Ask a question
curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{
    "model": "qwen3-32b-tool",
    "input": "What are the three laws of robotics?",
    "instructions": "You are a concise assistant. Keep answers brief."
  }'
# Response: {"id": "resp_abc123", ...}

# Turn 2: Follow up, no need to resend conversation history
curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{
    "model": "qwen3-32b-tool",
    "input": "Who created them?",
    "previous_response_id": "resp_abc123"
  }'
# Turn 1
response1 = client.responses.create(
    model="qwen3-32b-tool",
    input="What are the three laws of robotics?",
    instructions="You are a concise assistant. Keep answers brief.",
)
print(response1.output_text)

# Turn 2: chain from Turn 1
response2 = client.responses.create(
    model="qwen3-32b-tool",
    input="Who created them?",
    previous_response_id=response1.id,
)
print(response2.output_text)  # Knows we're talking about the laws of robotics

PydanticAI manages conversation state within a single agent.run() call. Each independent run() is a separate conversation; the framework handles tool-calling loops internally but does not chain across multiple run() calls by default.

agent = Agent(
    model=OpenAIResponsesModel("qwen3-32b-tool", provider=provider),
    system_prompt="You are a concise assistant. Keep answers brief.",
)

# Each run() is an independent request
result1 = await agent.run("What are the three laws of robotics?")
print(result1.output)

result2 = await agent.run("What is the capital of France?")
print(result2.output)

Assumes graph is built as in Getting Started. The graph stores previous_response_id in its state, so chaining is just feeding it back in.

from langchain_core.messages import HumanMessage, SystemMessage

# Turn 1
state = graph.invoke({
    "messages": [
        SystemMessage("You are a concise assistant. Keep answers brief."),
        HumanMessage("What are the three laws of robotics?"),
    ],
    "previous_response_id": None,
})
print(state["messages"][-1].text)

# Turn 2: chain via previous_response_id; no need to resend the system
# prompt or earlier messages.
state = graph.invoke({
    "messages": [HumanMessage("Who created them?")],
    "previous_response_id": state["previous_response_id"],
})
print(state["messages"][-1].text)

Three-Turn Conversation

Chains can be as long as you need. The server traverses the full chain on every request.

  • curl

  • Python (OpenAI SDK)

  • Python (LangGraph)

# Turn 1
curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{"model": "qwen3-32b-tool", "input": "I have 5 apples."}'
# → resp_001

# Turn 2
curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{"model": "qwen3-32b-tool", "input": "I give away 2.", "previous_response_id": "resp_001"}'
# → resp_002

# Turn 3: requires full history to answer correctly
curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{"model": "qwen3-32b-tool", "input": "How many do I have now?", "previous_response_id": "resp_002"}'
# → "3"
r1 = client.responses.create(
    model="qwen3-32b-tool",
    input="I have 5 apples.",
)

r2 = client.responses.create(
    model="qwen3-32b-tool",
    input="I give away 2.",
    previous_response_id=r1.id,
)

r3 = client.responses.create(
    model="qwen3-32b-tool",
    input="How many do I have now?",
    previous_response_id=r2.id,
)

print(r3.output_text)  # "3"
from langchain_core.messages import HumanMessage

state = graph.invoke({
    "messages": [HumanMessage("I have 5 apples.")],
    "previous_response_id": None,
})
state = graph.invoke({
    "messages": [HumanMessage("I give away 2.")],
    "previous_response_id": state["previous_response_id"],
})
state = graph.invoke({
    "messages": [HumanMessage("How many do I have now?")],
    "previous_response_id": state["previous_response_id"],
})

print(state["messages"][-1].text)  # "3"

Instructions Inheritance

When you set instructions in a request, they are stored with the response. Subsequent turns that use previous_response_id automatically inherit those instructions, so you don’t need to resend them.

  • curl

  • Python (OpenAI SDK)

  • Python (LangGraph)

# Turn 1: Set instructions
curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{
    "model": "qwen3-32b-tool",
    "input": "Hello",
    "instructions": "You are a helpful math tutor. Always relate things to numbers."
  }'
# → resp_001

# Turn 2: Instructions inherited, no need to resend
curl -X POST $BASE_URL/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AA_TOKEN" \
  -d '{
    "model": "qwen3-32b-tool",
    "input": "Tell me about yourself",
    "previous_response_id": "resp_001"
  }'
# → The model responds as a math tutor
# Turn 1: Set instructions
r1 = client.responses.create(
    model="qwen3-32b-tool",
    input="Hello",
    instructions="You are a helpful math tutor. Always relate things to numbers.",
)

# Turn 2: Instructions inherited automatically
r2 = client.responses.create(
    model="qwen3-32b-tool",
    input="Tell me about yourself",
    previous_response_id=r1.id,
)

print(r2.output_text)  # Responds as a math tutor
from langchain_core.messages import HumanMessage, SystemMessage

# Turn 1: Set instructions via SystemMessage
state = graph.invoke({
    "messages": [
        SystemMessage(
            "You are a helpful math tutor. Always relate things to numbers."
        ),
        HumanMessage("Hello"),
    ],
    "previous_response_id": None,
})

# Turn 2: Instructions inherited from the chain, no SystemMessage needed
state = graph.invoke({
    "messages": [HumanMessage("Tell me about yourself")],
    "previous_response_id": state["previous_response_id"],
})

print(state["messages"][-1].text)  # Responds as a math tutor

Instructions Priority

When resolving instructions, the server uses this priority order:

  1. prompt.id: Renders a stored prompt template with variables (highest priority)

  2. Explicit instructions in the current request

  3. Inherited from the previous_response_id chain

  4. Default system prompt from server configuration

This means you can override inherited instructions on any turn by providing new instructions or a prompt.id explicitly.