Structured output with chat completions

In many applications, it is desirable to apply a JSON schema to constrain the model’s responses, especially when building applications that need parseable responses for further processing. This section describes how to produce structured output in chat completions.

In this article:

Why use structured output?
- Example: Description of fish structured response
Configuring the worker
Producing structured output in chat completions

Why use structured output?

For many applications, such as those that invoke external APIs or instruct machines, you may want to apply a predefined JSON schema for the output. Most models correctly respond with the desired JSON when explicitly prompted for it and provided with a schema description, but not always. A few mismatches can already suffice to break applications or trigger undesired responses.

Structured output allows you to constrain the model’s responses to follow a specific JSON schema format. Instead of receiving free-form text, the JSON responses are guaranteed to match the defined structure. This is particularly useful for building applications that need predictable, parseable responses for further processing.

(Note, however, that if the maximum number of tokens is reached before the JSON schema is completed, then the JSON schema may be invalid nonetheless.)

For best results, you include the desired JSON format in the prompt along with the structured output configuration. This approach encourages natural responses, preventing the model from being forced into unnatural answers by structured output.

To produce structured output, you first need to configure the worker. Then your application can interact with the PhariaInference API using Pydantic models, JSON schema, the OpenAI client, or with raw HTTP calls.

Example: Description of fish structured response

Consider a case where you want a description of a fish with the fields name, species, color, and size_cm. An example response might look like the following:

{
  "name": "Nemo",
  "species": "Clownfish",
  "color": "Orange with white stripes",
  "size_cm": 7.5
}

Configuring the worker

Currently, structured output with JSON schema is only supported for worker type vllm.

Structured output with JSON schema needs to be added as a capability in the worker configuration. In addition, as structured output is invoked with the /chat/completions endpoint, you also need to enable chat capabilities.

Add the following to the config.toml file of the worker for structured output:

[generator.structured_output]
supported_types = ["json_schema"]

Add the following for the chat task:

[queue.models."your-model".chat_task]
supported = true

The whole worker configuration would typically look something like the following:

edition = 1

[generator]
type = "vllm"
model_path = "/path/to/your-model/"
max_model_len = 8192
max_num_seqs = 64

[generator.structured_output]
supported_types = ["json_schema"]

[queue]
url = "https://inference-api.pharia.example.com"
token = "worker-token"
checkpoint_name = "your-model"
version = 2
tags = []
http_request_retries = 7
service_name = "worker"
service_role = "Worker"

[queue.models."your-model"]
worker_type = "vllm"
checkpoint = "your-model"
description = "Very structured model"
maximum_completion_tokens = 8192
multimodal_enabled = false

[queue.models."your-model".chat_task]
supported = true

[monitoring]
metrics_port = 4000
tcp_probes = []

Producing structured output in chat completions

There are four ways to interact with the PhariaInference API to produce structured output using the /chat/completions endpoint; they all produce the same output:

Aleph Alpha client with Pydantic: Uses Python classes with type hints for easy validation. See the code…
Aleph Alpha client with JSON schema: Defines schemas manually with full control over validation rules. See the code…
OpenAI client compatibility: Uses the OpenAI client interface with the parse() method. See the code…
Direct API calls: Makes raw HTTP requests with JSON schema in the payload. See the code…

Aleph Alpha client with Pydantic

This method is the most Pythonic approach. It uses Pydantic models for type safety and automatic validation:

import os
from aleph_alpha_client import Client
from aleph_alpha_client.chat import ChatRequest, Message, Role
from pydantic import BaseModel

client = Client(
    host="https://inference-api.pharia.example.com",
    token=os.environ["PHARIA_TOKEN"]
)

model = "<your-model>"

class FamousFish(BaseModel):
    name: str
    species: str
    color: str
    size_cm: float

request = ChatRequest(
    messages=[
        Message(role=Role.System, content="You are a helpful assistant that responds in JSON format."),
        Message(
            role=Role.User,
            content="Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish.",
        ),
    ],
    model=model,
    response_format=FamousFish
)

response = client.chat(request, model=model)
print(response)

Aleph Alpha client with JSON schema

This method offers more control over validation rules and constraints:

import os
from aleph_alpha_client import Client
from aleph_alpha_client.chat import ChatRequest, Message, Role
from aleph_alpha_client.structured_output import JSONSchema

client = Client(
    host="https://inference-api.pharia.example.com",
    token=os.environ["PHARIA_TOKEN"]
)

model = "<your-model>"

famous_fish_schema = {
    'type': 'object',
    'title': 'Famous Fish',
    'properties': {
        'name': {
            'type': 'string',
            'title': 'Fish name',
            'description': 'Name of the fish'
        },
        'species': {
            'type': 'string',
            'title': 'Species',
            'description': 'The species of the fish (e.g., Clownfish, Goldfish)'
        },
        'color': {
            'type': 'string',
            'title': 'Color',
            'description': 'Primary color of the fish'
        },
        'size_cm': {
            'type': 'number',
            'title': 'Size in centimeters',
            'description': 'Length of the fish in centimeters',
            'minimum': 0.1,
            'maximum': 100.0
        }
    },
    'required': ['name', 'species', 'color', 'size_cm']
}

request = ChatRequest(
    messages=[
        Message(role=Role.System, content="You are a helpful assistant that responds in JSON format."),
        Message(
            role=Role.User,
            content="Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish.",
        ),
    ],
    model=model,
    response_format=JSONSchema(
        schema=famous_fish_schema,
        name="aquarium",
        description="Describe a famous fish",
        strict=True,
    ),
)

response = client.chat(request=request, model=model)
print(response)

OpenAI client compatibility

This method uses the familiar OpenAI client interface with the parse() method:

import os
import openai
from pydantic import BaseModel

class FamousFish(BaseModel):
    name: str
    species: str
    color: str
    size_cm: float

model = "<your-model>"

openai_client = openai.OpenAI(
    base_url="https://inference-api.pharia.example.com",
    api_key=os.environ["PHARIA_TOKEN"]
)

completion = openai_client.chat.completions.parse(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant that responds in JSON format."},
        {"role": "user", "content": "Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish."},
    ],
    max_tokens=1000,
    response_format=FamousFish,
)

print(completion.choices[0].message.parsed)

Direct API calls with cURL

This method sends raw HTTP requests with a JSON schema specification:

curl -L -X POST "https://inference-api.pharia.example.com/chat/completions" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H "Authorization: Bearer $PHARIA_TOKEN" \
-d '{
  "model": "<model>",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that responds in JSON format."
    },
    {
      "role": "user",
      "content": "Please provide information about a famous fish in JSON format with fields: name, species, color, and size_cm (size in centimeters). Tell me about this famous aquarium fish."
    }
  ],
  "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "aquarium",
            "strict": false,
            "schema": {
                "type": "object",
                "title": "Famous Fish",
                "properties": {
                    "name": {
                        "type": "string",
                        "title": "Fish name",
                        "description": "Name of the fish"
                    },
                    "species": {
                        "type": "string",
                        "title": "Species",
                        "description": "The species of the fish"
                    },
                    "color": {
                        "type": "string",
                        "title": "Color",
                        "description": "Primary color of the fish"
                    },
                    "size_cm": {
                        "type": "number",
                        "title": "Size in centimeters",
                        "description": "Length of the fish in centimeters"
                    }
                },
                "required": ["name", "species", "color", "size_cm"]
            },
            "description": "Describe a famous fish"
        }
    }
}'