Improvements in AtMan speed

November 6, 2024 · One min read

Senior AI Inference Engineer

With version api-worker-luminous:2024-10-30-094b5 of our luminous inference workers, we've improved the speed of inference when running with our Attention Manipulation mechanism.

This will improve tokens-per-second throughput for all models running with both the contextual as well as non-contextual AtMan settings.

Measured improvements range from a 2.5x speedup for smaller batch sizes and up to a 6x speedup for bigger batch sizes.