Auto-scheduler - Next-Generation TVM
We have incorporated a new TVM acceleration technique, known in the Apache TVM community as Ansor or Auto-scheduler, into the OctoML Platform. This is now the default TVM technique used in the platform, replacing TVM's first-generation capability known as AutoTVM.
Auto-scheduler improves model performance significantly, by as much as 25-50 percent, compared to AutoTVM across a wide variety of model types and hardware targets.
Users may still select AutoTVM as the acceleration technique in "TVM Options", however, because there are some use cases, for example on certain model/GPU target combinations, where AutoTVM outperforms Auto-scheduler.
Read more about Auto-scheduler here.
Accessing Autoscheduler via the SDK
In the SDK, users can now select either AutoTVM or Auto-scheduler as their TVM acceleration engine.
If the user does not specify either the tuning option or the advanced parameters, the system will default to Auto-scheduler. If the user does not specify the tuning option, but DOES pass explicit advanced parameters (i.e. kernel_trials or early_stopping_threshold), the system will invoke AutoTVM.
To select Auto-scheduler use the below instructions. Note, example settings are for testing only; for best performance set trials_per_kernel=1000, early_stopping_threshold=250
octomize_workflow = model_variant.octomize( platform, tuning_options=AutoschedulerOptions( trials_per_kernel=3, early_stopping_threshold=1 ), )
To select AutoTVM use the below instructions. Note example settings are for testing only; for best performance, set kernel_trials=2000, early_stopping_threshold=500
octomize_workflow = model_variant.octomize( platform, tuning_options=AutoTVMOptions( kernel_trials=3, early_stopping_threshold=1 ), )