Skip to main content

One post tagged with "workers"

View All Tags

· 4 min read
Andreas Hartel

With worker version api-worker-luminous:2024-08-15-0cdc0 of our inference stack worker, we introduce a new unified and versioned configuration format for our workers. Instead of 2 configuration files the worker can now be configured with a single configuration file.

How the worker used to be configured

Previously, our worker needed to be configured with two separate configuration files, usually called env.toml and cap.toml. The idea behind this split was to have one file describing the environment the worker is running in, and another file describing the capabilities of the worker. That way, only the cap.toml file needed to be updated or duplicated when new models were added in the same environment.

You would start a worker by calling:

docker run ... api-worker-luminous -e env.toml -c cap.toml

How the worker is configured now

The latest worker versions can still be configured in the way that was described above and always will support that configuration method. But we recommend using the new configuration format, which is described below.

To make migration easier, once you start a worker with the above-mentioned version (or newer) in the usual way, the worker will output the configuration in the new format to stdout. You can take the output, save it in a file called worker_config.toml and start the worker with the new configuration format:

docker run ... api-worker-luminous --config worker_config.toml

What has changed

Below is an example of how config should be migrated. The basic idea is that you merge all existing sections into a single file. There are a few caveats however:

  • The section checkpoint is now called generator
  • The diagnostics flag is no longer supported and gets replaced by an environment variable LOG_LEVEL that can be used to set the log level.
  • The checkpoint_name field has moved to the queue section.
  • The gpu_model_name field has been removed. The fingerprint is now generated from the generator section.
  • In the generator section, we no longer support the fields tokenizer_filename and directory. Instead, we expect the tokenizer_path and weight_set_directories.

Previous configuration files

env.toml:

# A default worker configuration intended for documenting the options. The intention is that this
# file contains configuration about the environment of the worker, rather than configuration about
# the model model it serves. As such a single file can be shared for multiple workers.

# Emit more log diagnostics, including potentially sensitive information like prompts and
# completions.
diagnostics = true

[queue]
# http://localhost:8080 is the default if you execute the schedule locally. Suitable production
# settings are either `https://api.aleph-alpha.com` or `https://test.api.aleph-alpha.com`
url = "http://localhost:8080"

# API token used to authenticate fetch batch requests. Replace this with your api token for local
# development. And of course with a worker token in production.
token = "dummy-queue-token"

# Configure an optional list of supported hostings. Default is just an empty list, which means only
# cloud hosting is supported. Cloud hosting is always supported and must not be listed explicitly.
# hostings = ["aleph-alpha"]

cap.toml:

# Name of the model served by the worker. The model must be registered with the queue, as it used
# for distributing tasks to workes. All workers with the same model name should serve the same
# checkpoint, have the same capabilities.
checkpoint_name = "luminous-base"

# GPU model name that is used to generate a fingerprint that
# will be sent to the scheduler upon registration. It determines
# the task count distribution that will be selected for this worker
gpu_model_name = "A100-40GB"

# Configuration for a deepspeed checkpoint
[checkpoint]
type = "luminous"
# Filename of the tokenizer-file (must be stored in the checkpoint directory (config: directory))
# The tokenizer name (as reported to api) is derived from that by chopping the suffix
tokenizer_filename = "tokenizer.json"
# Location of the checkpoint in the file system
directory = "/path/to/checkpoint"
# Number of GPUs used for pipeline parallel inference
pipeline_parallel_size = 1
# Number of GPUs used for model parallel inference
tensor_parallel_size = 1

New configuration file

Here is an example of how the new config should look like:

edition = 1

[generator]
type = "luminous"
pipeline_parallel_size = 1
tensor_parallel_size = 1
tokenizer_path = "/path/to/checkpoint/tokenizer.json"
weight_set_directories = [ "/path/to/checkpoint",]
auto_memory_config = true
memory_safety_margin = 0.05

[queue]
url = "http://localhost:8080"
token = "XXXXXXXX"
checkpoint_name = "luminous-base"
tags = []
http_request_retries = 7

[monitoring]
metrics_port = 4000
tcp_probes = []

[generator.unstable]
skip_checkpoint_load = false