The Standalone TVM (STVM) Execution Provider for ONNX Runtime enables ONNX Runtime users to leverage Apache TVM model optimizations. TVM optimizes machine learning models through an automated tuning process that produces model variants specific to targeted hardware architectures. This process also generates 'tuning logs' that the STVM EP relies on to maximize model performance. Users can use the OctoML Platform to generate tuning logs for a wide variety of hardware targets without managing any infrastructure or learning the OSS TVM stack by taking the following steps:
Upload a model and ‘accelerate’ it to run automated tuning on your desired hardware target.
Get a token from the "Manage your Account/Settings" page
Export your token as an environment variable in your command line to authenticate your session with the OctoML platform.
To access the tuning logs for your accelerated model, request the
octoml_log_fetcher.pyfile from us using the messenger in the lower right-hand corner of this screen. This file will leverage the token you set above and will require the Workflow UUID (not/not the Variant ID) of the acceleration workflow you ran in the OctoML Platform for your chosen hardware. For example, the below workflow UUID is
9da8b893-d46e-4df7-ace0-dddb4ebd431afor this Skylake acceleration:
Using this UUID you can download the logs for this optimization into your local environment by executing
python3 octoml_log_fetcher.py -u "9da8b893-d46e-4df7-ace0-dddb4ebd431a" -f "centerface_skylake.json"where
9da8b893-d46e-4df7-ace0-dddb4ebd431ais the workflow UUID and
centerface_skylake.jsonis the name you would like to give the log file.
Once you have the logs for your specific optimized model, you can run the TVM EP using the instructions here. Remember to use tuning logs from a model optimization completed on the same target hardware on which you are running the EP.