Skip to main content

Getting Started with Collections

Pharia Assistant features a Q&A capability that enables users to ask questions based on their knowledge base and receive answers. Documents you intend to ask questions on can be uploaded on the fly by users in Pharia Assistant or pre-uploaded to your instance by an admin via the Document Index API.

This document serves as a guide on creating namespaces, collections, and adding documents to your knowledge base, ensuring they are available for Q&A in Pharia Assistant.

Prerequisites

  • Ensure that the luminous-base model is deployed to your cluster for embedding your documents. Refer to How to setup for detailed instructions.
  • Document Index is deployed in your Pharia AI instance, you can access the API at https://document-index.{ingressDomain}
  • You can access the Assistant app in your browser at https://assistant.{ingressDomain}

How to obtain a user token for accessing the Document Index API.

To obtain a user token for accessing the Document Index API, follow these steps:

  1. Log in to the Assistant app with your admin credentials at https://assistant.{ingressDomain}.
  2. Open the Network tab in your browser's developer tools.
  3. Find any authenticated requests, e.g., /api/resources.
  4. Copy the value of the Authorization header. It should be a string starting with Bearer .

How to add documents to your knowledge base.

The Document Index service efficiently manages the chunking and embedding of your documents, it organizes your documents into structured namespaced collections, enabling seamless retrieval and contextual understanding. This process transforms your raw data into a rich knowledge base, optimized for intelligent querying within the Assistant.

Adding documents to your knowledge base requires creating a namespace and collections, configuring indexes, and then uploading your documents.

This can be accomplished in two ways:

  1. Using the Aleph Alpha Intelligence Layer SDK
  2. Using the Document Index API

Note: The AssistantUser role only has access to a namespace called Assistant. This means that you must create a namespace called Assistant and add all collections and documents to this namespace for them to be accessible to the assistant.

Using the Aleph Alpha Intelligence Layer SDK

Prerequisites

To utilize the Aleph Alpha Intelligence Layer SDK, you can access the Jupyter notebook provided here: https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/src/documentation/document_index.ipynb and follow the instructions in the "Upload documents to the Document Index" section. Provide the Document Index Client token, base URL, and other required parameters to successfully upload your documents.

Using the Document Index API

The Document Index API, accessible at https://document-index.{ingressDomain}, provides a RESTful interface for creating namespaces, collections, and uploading documents. It also offers an OpenAPI specification for its API, available at https://document-index.{ingressDomain}/openapi.yaml.

To use the Document Index API, you must have a user token for accessing the Document Index API. Refer to How to obtain a user token for accessing the Document Index API.

To index documents, follow these steps:

Details on the API endpoints can be found in the Document Index API OpenAPI specification at https://document-index.{ingressDomain}/openapi.yaml.

  1. Create a namespace using the PUT /namespaces/{namespace} endpoint.
# Example namespace creation
curl -X 'PUT' \
'https://document-index.{ingressDomain}/namespaces/{namespace}' \
-H 'Authorization: Bearer {your-token}'
  1. Create a collection using the PUT /collections/{namespace}/{collection} endpoint.
# Example collection creation
curl -X 'PUT' \
'https://document-index.{ingressDomain}/collections/{namespace}/{collection}' \
-H 'Authorization: Bearer {your-token}'
  1. Define one or more index configurations in your namespace using the PUT /indexes/{namespace}/{index} endpoint.
# Example index configuration
curl -X 'PUT' \
'https://document-index.{ingressDomain}/indexes/{namespace}/{index}' \
-H 'Authorization: Bearer {your-token}' \
-H 'Content-Type: application/json' \
-d '{
"chunk_size": 384,
"chunk_overlap": 0,
"embedding_type": "asymmetric"
}'
  1. Assign an index configuration to your collection using the PUT /collections/{namespace}/{collection}/indexes/{index} endpoint.
# Example index assignment
curl -X 'PUT' \
'https://document-index.{ingressDomain}/collections/{namespace}/{collection}/indexes/{index}' \
-H 'Authorization: Bearer {your-token}'
  1. Configure this index in the Pharia Assistant API via an environment variable.
# values.yml
pharia-assistant-api:
env:
RETRIEVER_QA_INDEX_NAME: {index}
  1. Insert documents into the collection using the PUT /collections/{namespace}/{collection}/docs/{name} endpoint.
# Example document insertion
curl -X 'PUT' \
'https://document-index.{ingressDomain}/collections/{namespace}/{collection}/docs/{name}' \
-H 'Authorization: Bearer {your-token}' \
-H 'Content-Type: application/json' \
-d '{
"schema_version": "V1",
"contents": [
{
"modality": "text",
"text": "{document-content}"
}
],
"metadata": [
{
"url": "https://example.com/external-uri"
}
]
}'