Multi-Turn Conversations
The Responses API manages conversation history for you. Instead of resending the full message history on every request, you pass previous_response_id and the server reconstructs the context automatically.
How It Works
Each response has a unique id. To continue a conversation, pass that id as previous_response_id in the next request. The server:
-
Walks backward through the response chain
-
Reconstructs the full conversation history in chronological order
-
Appends your new input
-
Forwards the complete context to the LLM
Turn 1: input="My name is Alice" → resp_001 Turn 2: input="What is my name?" + previous_response_id=resp_001 → resp_002 Turn 3: input="Thanks!" + previous_response_id=resp_002 → resp_003
You only store the latest response.id; the server handles the rest.
Two-Turn Conversation
-
curl
-
Python (OpenAI SDK)
-
Python (PydanticAI)
-
Python (LangGraph)
# Turn 1: Ask a question
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{
"model": "qwen3-32b-tool",
"input": "What are the three laws of robotics?",
"instructions": "You are a concise assistant. Keep answers brief."
}'
# Response: {"id": "resp_abc123", ...}
# Turn 2: Follow up, no need to resend conversation history
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{
"model": "qwen3-32b-tool",
"input": "Who created them?",
"previous_response_id": "resp_abc123"
}'
# Turn 1
response1 = client.responses.create(
model="qwen3-32b-tool",
input="What are the three laws of robotics?",
instructions="You are a concise assistant. Keep answers brief.",
)
print(response1.output_text)
# Turn 2: chain from Turn 1
response2 = client.responses.create(
model="qwen3-32b-tool",
input="Who created them?",
previous_response_id=response1.id,
)
print(response2.output_text) # Knows we're talking about the laws of robotics
PydanticAI manages conversation state within a single agent.run() call. Each independent run() is a separate conversation; the framework handles tool-calling loops internally but does not chain across multiple run() calls by default.
agent = Agent(
model=OpenAIResponsesModel("qwen3-32b-tool", provider=provider),
system_prompt="You are a concise assistant. Keep answers brief.",
)
# Each run() is an independent request
result1 = await agent.run("What are the three laws of robotics?")
print(result1.output)
result2 = await agent.run("What is the capital of France?")
print(result2.output)
|
Assumes |
from langchain_core.messages import HumanMessage, SystemMessage
# Turn 1
state = graph.invoke({
"messages": [
SystemMessage("You are a concise assistant. Keep answers brief."),
HumanMessage("What are the three laws of robotics?"),
],
"previous_response_id": None,
})
print(state["messages"][-1].text)
# Turn 2: chain via previous_response_id; no need to resend the system
# prompt or earlier messages.
state = graph.invoke({
"messages": [HumanMessage("Who created them?")],
"previous_response_id": state["previous_response_id"],
})
print(state["messages"][-1].text)
Three-Turn Conversation
Chains can be as long as you need. The server traverses the full chain on every request.
-
curl
-
Python (OpenAI SDK)
-
Python (LangGraph)
# Turn 1
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{"model": "qwen3-32b-tool", "input": "I have 5 apples."}'
# → resp_001
# Turn 2
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{"model": "qwen3-32b-tool", "input": "I give away 2.", "previous_response_id": "resp_001"}'
# → resp_002
# Turn 3: requires full history to answer correctly
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{"model": "qwen3-32b-tool", "input": "How many do I have now?", "previous_response_id": "resp_002"}'
# → "3"
r1 = client.responses.create(
model="qwen3-32b-tool",
input="I have 5 apples.",
)
r2 = client.responses.create(
model="qwen3-32b-tool",
input="I give away 2.",
previous_response_id=r1.id,
)
r3 = client.responses.create(
model="qwen3-32b-tool",
input="How many do I have now?",
previous_response_id=r2.id,
)
print(r3.output_text) # "3"
from langchain_core.messages import HumanMessage
state = graph.invoke({
"messages": [HumanMessage("I have 5 apples.")],
"previous_response_id": None,
})
state = graph.invoke({
"messages": [HumanMessage("I give away 2.")],
"previous_response_id": state["previous_response_id"],
})
state = graph.invoke({
"messages": [HumanMessage("How many do I have now?")],
"previous_response_id": state["previous_response_id"],
})
print(state["messages"][-1].text) # "3"
Instructions Inheritance
When you set instructions in a request, they are stored with the response. Subsequent turns that use previous_response_id automatically inherit those instructions, so you don’t need to resend them.
-
curl
-
Python (OpenAI SDK)
-
Python (LangGraph)
# Turn 1: Set instructions
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{
"model": "qwen3-32b-tool",
"input": "Hello",
"instructions": "You are a helpful math tutor. Always relate things to numbers."
}'
# → resp_001
# Turn 2: Instructions inherited, no need to resend
curl -X POST $BASE_URL/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AA_TOKEN" \
-d '{
"model": "qwen3-32b-tool",
"input": "Tell me about yourself",
"previous_response_id": "resp_001"
}'
# → The model responds as a math tutor
# Turn 1: Set instructions
r1 = client.responses.create(
model="qwen3-32b-tool",
input="Hello",
instructions="You are a helpful math tutor. Always relate things to numbers.",
)
# Turn 2: Instructions inherited automatically
r2 = client.responses.create(
model="qwen3-32b-tool",
input="Tell me about yourself",
previous_response_id=r1.id,
)
print(r2.output_text) # Responds as a math tutor
from langchain_core.messages import HumanMessage, SystemMessage
# Turn 1: Set instructions via SystemMessage
state = graph.invoke({
"messages": [
SystemMessage(
"You are a helpful math tutor. Always relate things to numbers."
),
HumanMessage("Hello"),
],
"previous_response_id": None,
})
# Turn 2: Instructions inherited from the chain, no SystemMessage needed
state = graph.invoke({
"messages": [HumanMessage("Tell me about yourself")],
"previous_response_id": state["previous_response_id"],
})
print(state["messages"][-1].text) # Responds as a math tutor
Instructions Priority
When resolving instructions, the server uses this priority order:
-
prompt.id: Renders a stored prompt template with variables (highest priority) -
Explicit
instructionsin the current request -
Inherited from the
previous_response_idchain -
Default system prompt from server configuration
This means you can override inherited instructions on any turn by providing new instructions or a prompt.id explicitly.