With version api-worker-luminous:2024-10-30-094b5
of our luminous inference workers, we've improved the speed of inference when running with our Attention Manipulation mechanism.
This will improve tokens-per-second throughput for all models running with both the contextual as well as non-contextual AtMan settings.
Measured improvements range from a 2.5x speedup for smaller batch sizes and up to a 6x speedup for bigger batch sizes.