The PhariaInference API

The PhariaInference API allows you to access and interact with Aleph Alpha models and PhariaAI functionality.

The API operates across the entire PhariaAI stack by providing access to the backends of various Aleph Alpha products. Its endpoints enable you to generate reliable results programmatically.


In this article:

Authentication

To use the API, you need an authentication token. You create these in PhariaOS. See Managing credentials and tokens.

Endpoints

The API endpoints for PhariaInference are the following:

Endpoint Description

/model-settings

Use this endpoint to list the models that are available to the client.

/complete

Use this endpoint to complete a prompt using a specific model.

/complete/json

Use this endpoint to generate a completion in valid JSON format, even if this was not requested explicitly in the prompt. JSON completion is currently only available for Luminous workers.

/chat/completions

Use this endpoint to retrieve one or more chat completions for a given prompt. This endpoint generates completions in a conversational style. You can generate multi-turn conversations, that is, conversations that contain follow-up questions to intermediate responses.

/embed

Use this endpoint to embed a text using a specific model. This results in vectors that can be used for downstream tasks (such as semantic similarity) or models (such as classifiers). See also Embedding.

/semantic_embed

Use this endpoint to embed a prompt using a specific model and semantic embedding method. See also Embedding.

/batch_semantic_embed

Use this endpoint to embed multiple prompts using a specific model and semantic embedding method.

/instructable_embed

Use this endpoint to embed the input using an instruction and a specific model.

/evaluate

Use this endpoint to evaluate the probability that the model will produce an expected completion given a prompt. This is useful if you already know the output you expect, or you want to test the probability of a given output. Note that the evaluate endpoint is significantly faster than the complete endpoint.

/explain

Use this endpoint to better understand the source of a completion. The endpoint returns how much the log-probabilities of the generated completion would change if we suppress individual parts (based on a configurable granularity) of a prompt. This reveals how much each section of a prompt impacts each token of the completion. See also Explainability.

/tokenize

Use this endpoint to tokenize a prompt for a specific model.

/detokenize

Use this endpoint to detokenize a list of tokens into a string.

/translate

Use this endpoint to translate input text from one language to a specified target language. For a list of supported languages, see the documentation for your selected model.

/transcribe

Use this endpoint to transcribe an audio file using a specified transcription model.