Getting Started with Collections
Pharia Assistant features a Q&A capability that enables users to ask questions based on their knowledge base and receive answers. Documents you intend to ask questions on can be uploaded on the fly by users in Pharia Assistant or pre-uploaded to your instance by an admin via the Document Index API.
This document serves as a guide on creating namespaces, collections, and adding documents to your knowledge base, ensuring they are available for Q&A in Pharia Assistant.
Prerequisites
- Ensure that the luminous-base model is deployed to your cluster for embedding your documents. Refer to How to setup for detailed instructions.
- Document Index is deployed in your Pharia AI instance, you can access the API at
https://document-index.{ingressDomain}
- You can access the Assistant app in your browser at
https://assistant.{ingressDomain}
How to obtain a user token for accessing the Document Index API.
To obtain a user token for accessing the Document Index API, follow these steps:
- Log in to the Assistant app with your admin credentials at
https://assistant.{ingressDomain}
. - Open the Network tab in your browser's developer tools.
- Find any authenticated requests, e.g.,
/api/resources
. - Copy the value of the
Authorization
header. It should be a string starting withBearer
.
How to add documents to your knowledge base.
The Document Index service efficiently manages the chunking and embedding of your documents, it organizes your documents into structured namespaced collections, enabling seamless retrieval and contextual understanding. This process transforms your raw data into a rich knowledge base, optimized for intelligent querying within the Assistant.
Adding documents to your knowledge base requires creating a namespace and collections, configuring indexes, and then uploading your documents.
This can be accomplished in two ways:
Note: The
AssistantUser
role only has access to a namespace calledAssistant
. This means that you must create a namespace calledAssistant
and add all collections and documents to this namespace for them to be accessible to the assistant.
Using the Aleph Alpha Intelligence Layer SDK
Prerequisites
Document Index Client token: Refer to How to obtain a user token for accessing the Document Index API.
Document Index Client base URL: This is the URL of the Document Index API in your Pharia AI cluster, typically formatted as:
https://document-index.{ingressDomain}
To utilize the Aleph Alpha Intelligence Layer SDK, you can access the Jupyter notebook provided here: https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/src/documentation/document_index.ipynb and follow the instructions in the "Upload documents to the Document Index" section. Provide the Document Index Client token, base URL, and other required parameters to successfully upload your documents.
Using the Document Index API
The Document Index API, accessible at https://document-index.{ingressDomain}
, provides a RESTful interface for creating namespaces, collections, and uploading documents. It also offers an OpenAPI specification for its API, available at https://document-index.{ingressDomain}/openapi.yaml
.
To use the Document Index API, you must have a user token for accessing the Document Index API. Refer to How to obtain a user token for accessing the Document Index API.
To index documents, follow these steps:
Details on the API endpoints can be found in the Document Index API OpenAPI specification at
https://document-index.{ingressDomain}/openapi.yaml
.
- Create a namespace using the
PUT /namespaces/{namespace}
endpoint.
# Example namespace creation
curl -X 'PUT' \
'https://document-index.{ingressDomain}/namespaces/{namespace}' \
-H 'Authorization: Bearer {your-token}'
- Create a collection using the
PUT /collections/{namespace}/{collection}
endpoint.
# Example collection creation
curl -X 'PUT' \
'https://document-index.{ingressDomain}/collections/{namespace}/{collection}' \
-H 'Authorization: Bearer {your-token}'
- Define one or more index configurations in your namespace using the
PUT /indexes/{namespace}/{index}
endpoint.
# Example index configuration
curl -X 'PUT' \
'https://document-index.{ingressDomain}/indexes/{namespace}/{index}' \
-H 'Authorization: Bearer {your-token}' \
-H 'Content-Type: application/json' \
-d '{
"chunk_size": 384,
"chunk_overlap": 0,
"embedding_type": "asymmetric"
}'
- Assign an index configuration to your collection using the
PUT /collections/{namespace}/{collection}/indexes/{index}
endpoint.
# Example index assignment
curl -X 'PUT' \
'https://document-index.{ingressDomain}/collections/{namespace}/{collection}/indexes/{index}' \
-H 'Authorization: Bearer {your-token}'
- Configure this index in the Pharia Assistant API via an environment variable.
# values.yml
pharia-assistant-api:
env:
RETRIEVER_QA_INDEX_NAME: {index}
- Insert documents into the collection using the
PUT /collections/{namespace}/{collection}/docs/{name}
endpoint.
# Example document insertion
curl -X 'PUT' \
'https://document-index.{ingressDomain}/collections/{namespace}/{collection}/docs/{name}' \
-H 'Authorization: Bearer {your-token}' \
-H 'Content-Type: application/json' \
-d '{
"schema_version": "V1",
"contents": [
{
"modality": "text",
"text": "{document-content}"
}
],
"metadata": [
{
"url": "https://example.com/external-uri"
}
]
}'