The OctoML platform has added two powerful model compilation technologies to help you get the best possible optimizations for your models!
We've added Metaschedule, TVM's most advanced model tuning technology (see the TVM PR here). As compared to the previous generation of automatic tuners (known as Autoscheduler/Ansor), Metaschedule can achieve faster inference latency for many common model:hardware combos (32% speedups on ARM64 hardware, like Graviton3, for example). In addition, Metaschedule can tune more quickly in most cases, meaning less waiting for your model tuning to complete.
We've upgraded our ONNX Runtime implementation with OpenVINO, a library made specifically for Intel hardware targets that can (depending on your model) deliver speedups as fast at 200% when compared to ONNX Runtime's standard defaults.
These upgrades ensure that, for every packaging request, the OctoML platform will try every possible cutting-edge optimization technique.