Skip to main content

Structured Output with Chat Completions

Introduction

For a lot of applications, e.g., use cases that invoke external APIs, instruct machines etc., a specific predefined JSON schema is desired. Whereas most models will correctly respond with the desired JSON when explicitly prompted for it and provided with a schema description, but not always. Few mismatches can already suffice to break applications or trigger undesired responses.

Structured output allows you to constrain the model's responses to follow a specific JSON schema format. Instead of receiving free-form text, the JSON responses are guaranteed to match the defined structure. This is particularly useful for building applications that need predictable, parseable responses for further processing.

Note: If the maximum amount of tokens is reached before the json schema is completed, then the json schema may be invalid nonetheless.

For best results, include the desired JSON format in the prompt alongside the structured output configuration. This approach encourages natural responses, preventing the model from being forced into unnatural answers by structured output.

Overview

  1. Deployment - configure the worker
  2. Usage - different ways to invoke structured output with, e.g., the Aleph Alpha client and pydantic modules or the Aleph Alpha client

Deployment

Currently, structured output with json schema is only enabled for worker type vllm.

Structured output with json schema needs to be added as a capability in the worker config. In addition, as structured output is invoked via the /chat/completions endpoint, hence, enabling chat capabilities is required, too.

Therefore add the following to the config.toml of the worker for structured output:

[generator.structured_output]
supported_types = ["json_schema"]

and for the chat task

[queue.models."your-model".chat_task]
supported = true

The whole config could look like this:

edition = 1

[generator]
type = "vllm"
model_path = "/path/to/your-model/"
max_model_len = 8192
max_num_seqs = 64

[generator.structured_output]
supported_types = ["json_schema"]

[queue]
url = "https://inference-api.pharia.example.com"
token = "worker-token"
checkpoint_name = "your-model"
version = 2
tags = []
http_request_retries = 7
service_name = "worker"
service_role = "Worker"

[queue.models."your-model"]
worker_type = "vllm"
checkpoint = "your-model"
description = "Very structured model"
maximum_completion_tokens = 8192
multimodal_enabled = false

[queue.models."your-model".chat_task]
supported = true

[monitoring]
metrics_port = 4000
tcp_probes = []

Usage

There are four possible actions to interact with the Inference API to use structured output via the /chat/completions endpoint. They all amount to the same output.

  1. Aleph Alpha Client with Pydantic - Use Python classes with type hints for easy validation
  2. Aleph Alpha Client with JSON Schema - Define schemas manually with full control over validation rules
  3. OpenAI Client Compatibility - Use the OpenAI client interface with parse() method
  4. Direct API Calls - Make raw HTTP requests with JSON schema in the payload

Let's say you would like to get a description of a fish with the fields name, species, color and size_cm. An example response would look like this:

{
"name": "Nemo",
"species": "Clownfish",
"color": "Orange with white stripes",
"size_cm": 7.5
}

Aleph Alpha Client with Pydantic

The most Pythonic approach uses Pydantic models for type safety and automatic validation:

import os
from aleph_alpha_client import Client
from aleph_alpha_client.chat import ChatRequest, Message, Role
from pydantic import BaseModel

client = Client(
host="https://inference-api.pharia.example.com",
token=os.environ["PHARIA_TOKEN"]
)

model = "<your-model>"

class FamousFish(BaseModel):
name: str
species: str
color: str
size_cm: float

request = ChatRequest(
messages=[
Message(role=Role.System, content="You are a helpful assistant that responds in JSON format."),
Message(
role=Role.User,
content="Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish.",
),
],
model=model,
response_format=FamousFish
)

response = client.chat(request, model=model)
print(response)

Aleph Alpha Client with JSON Schema

For more control over validation rules and constraints:

import os
from aleph_alpha_client import Client
from aleph_alpha_client.chat import ChatRequest, Message, Role
from aleph_alpha_client.structured_output import JSONSchema

client = Client(
host="https://inference-api.pharia.example.com",
token=os.environ["PHARIA_TOKEN"]
)

model = "<your-model>"

famous_fish_schema = {
'type': 'object',
'title': 'Famous Fish',
'properties': {
'name': {
'type': 'string',
'title': 'Fish name',
'description': 'Name of the fish'
},
'species': {
'type': 'string',
'title': 'Species',
'description': 'The species of the fish (e.g., Clownfish, Goldfish)'
},
'color': {
'type': 'string',
'title': 'Color',
'description': 'Primary color of the fish'
},
'size_cm': {
'type': 'number',
'title': 'Size in centimeters',
'description': 'Length of the fish in centimeters',
'minimum': 0.1,
'maximum': 100.0
}
},
'required': ['name', 'species', 'color', 'size_cm']
}

request = ChatRequest(
messages=[
Message(role=Role.System, content="You are a helpful assistant that responds in JSON format."),
Message(
role=Role.User,
content="Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish.",
),
],
model=model,
response_format=JSONSchema(
schema=famous_fish_schema,
name="aquarium",
description="Describe a famous fish",
strict=True,
),
)

response = client.chat(request=request, model=model)
print(response)

OpenAI Client Compatibility

Using the familiar OpenAI client interface with the parse() method:

import os
import openai
from pydantic import BaseModel

class FamousFish(BaseModel):
name: str
species: str
color: str
size_cm: float

model = "<your-model>"

openai_client = openai.OpenAI(
base_url="https://inference-api.pharia.example.com",
api_key=os.environ["PHARIA_TOKEN"]
)

completion = openai_client.chat.completions.parse(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant that responds in JSON format."},
{"role": "user", "content": "Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish."},
],
max_tokens=1000,
response_format=FamousFish,
)

print(completion.choices[0].message.parsed)

Direct API Calls with cURL

Making raw HTTP requests with JSON schema specification:

curl -L -X POST "https://inference-api.pharia.example.com/chat/completions" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H "Authorization: Bearer $PHARIA_TOKEN" \
-d '{
"model": "<model>",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that responds in JSON format."
},
{
"role": "user",
"content": "Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish."
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "aquarium",
"strict": false,
"schema": {
"type": "object",
"title": "Famous Fish",
"properties": {
"name": {
"type": "string",
"title": "Fish name",
"description": "Name of the fish"
},
"species": {
"type": "string",
"title": "Species",
"description": "The species of the fish"
},
"color": {
"type": "string",
"title": "Color",
"description": "Primary color of the fish"
},
"size_cm": {
"type": "number",
"title": "Size in centimeters",
"description": "Length of the fish in centimeters"
}
},
"required": ["name", "species", "color", "size_cm"]
},
"description": "Describe a famous fish"
}
}
}'