The Standalone TVM (STVM) Execution Provider for ONNX Runtime enables ONNX Runtime users to leverage Apache TVM model optimizations. TVM optimizes machine learning models through an automated tuning process that produces model variants specific to targeted hardware architectures. This process also generates 'tuning logs' that the STVM EP relies on to maximize model performance. Users can use the OctoML Platform to generate tuning logs for a wide variety of hardware targets without managing any infrastructure or learning the OSS TVM stack by taking the following steps:

  • Log in to the OctoML Platform at https://app.octoml.ai. If you do not have an account, please request one here.

  • Upload a model and ‘accelerate’ it to run automated tuning on your desired hardware target.

  • Get a token from the "Manage your Account/Settings" page

  • Export your token as an environment variable in your command line to authenticate your session with the OctoML platform.

export OCTOMIZER_API_TOKEN="YOURTOKEN”

  • To access the tuning logs for your accelerated model, request the octoml_log_fetcher.py file from us using the messenger in the lower right-hand corner of this screen. This file will leverage the token you set above and will require the Workflow UUID (not/not the Variant ID) of the acceleration workflow you ran in the OctoML Platform for your chosen hardware. For example, the below workflow UUID is 9da8b893-d46e-4df7-ace0-dddb4ebd431a for this Skylake acceleration:

  • Using this UUID you can download the logs for this optimization into your local environment by executing python3 octoml_log_fetcher.py -u "9da8b893-d46e-4df7-ace0-dddb4ebd431a" -f "centerface_skylake.json" where 9da8b893-d46e-4df7-ace0-dddb4ebd431a is the workflow UUID and centerface_skylake.json is the name you would like to give the log file.

  • Once you have the logs for your specific optimized model, you can run the TVM EP using the instructions here. Remember to use tuning logs from a model optimization completed on the same target hardware on which you are running the EP.

Did this answer your question?