The OctoML platform has added two powerful model compilation technologies to help you get the best possible optimizations for your models!

  • We've added Metaschedule, TVM's most advanced model tuning technology (see the TVM PR here). As compared to the previous generation of automatic tuners (known as Autoscheduler/Ansor), Metaschedule can achieve faster inference latency for many common model:hardware combos (32% speedups on ARM64 hardware, like Graviton3, for example). In addition, Metaschedule can tune more quickly in most cases, meaning less waiting for your model tuning to complete.

  • We've upgraded our ONNX Runtime implementation with OpenVINO, a library made specifically for Intel hardware targets that can (depending on your model) deliver speedups as fast at 200% when compared to ONNX Runtime's standard defaults.

These upgrades ensure that, for every packaging request, the OctoML platform will try every possible cutting-edge optimization technique.

Did this answer your question?