Troubleshooting
Describing infrastructure related issues (e.g. network issues, k8s scheduling issues, ...) is beyond the scope of this part of the documentation.
The worker logs show an error when trying to register with the inference-api
MODEL_PACKAGE_CONFLICT
The main cause of this is the error code MODEL_PACKAGE_CONFLICT:
Model packages, that is a combination of some basic model info and the models to be exposed by the inference-api, are immutable.
If you try to change them anyways you will receive this error.
We distinguish the following cases:
- The change was by mistake: The error message will contain some hint about what conflicts.
- The change was intended:
To change a model package, you have to increment the
versionin the worker deployment of the worker that runs the model. Afterwards, the worker will be allowed to connect to the inference-api and the model will receive a new queue, meaning all workers still connected to the old queue of this model will finish the tasks in their queue and become stale after. They will have to be restarted or won't receive any new tasks.
INVALID_REQUEST
Another possible error code is INVALID_REQUEST. Here, you have to look at the error message.
If the error message you see is Version x of given model package is less than the already registered version y you will have to modify (at least) version of the worker deployment:
- If you want to overwrite the existing model package, use a version higher than
y - Probably, you rather want to match the version
y