The TVM Execution Provider for ONNX Runtime enables ONNX Runtime users to leverage Apache TVM model optimizations. TVM optimizes machine learning models through an automated tuning process that produces model variants specific to targeted hardware architectures. This process generates a shared library with tuned model, ro-file with model topology description and weights that the TVM EP relies on to maximize model performance. Users can use the OctoML Platform to generate the tuning files for a wide variety of hardware targets without managing any infrastructure or learning the OSS TVM stack by taking the following steps:
Log in to the OctoML Platform at https://app.octoml.ai. If you do not have an account, please request one here.
Create the project and upload models to it. Any model can be accelerated (use ‘Package’ pic) on your desired hardware target and compared to each other:
There are some options for further acceleration: package name, input tensor shapes (defined automatically for model with fixed topology), hardware targets. Several hardware targets can be chosen for model tuning. ‘Extended acceleration’ gives better performance but requires more time for that. When options are adjusted, push ‘Package’ button:
After tuning finished, download the ‘Linux SO’ package with the best TVM performance result. It is a folder with model shared library, ro-file, weights and other auxiliary files.
Once you have the folder for your specific optimized model, you can run the TVM EP using the instructions here. In practice, the path to the folder should be inserted to the ‘so_folder’ provider option.