The OctoML Platform includes a TVM acceleration technique, known in the Apache TVM community as Ansor or Auto-scheduler. This is now the default TVM technique for most model-hardware combinations in the platform, replacing TVM's first-generation capability known as AutoTVM.
Auto-scheduler uses machine-generated schedules to conduct kernel trial searches in TVM. It also more intelligently applies kernel trials to tuning tasks so that more complex tasks receive more kernel trials while less complex tasks receive fewer.
Auto-scheduler improves model performance significantly, by as much as 25-50 percent, compared to AutoTVM across a wide variety of model types and hardware targets.
AutoTVM is still used in some use cases, however, such as on certain model/GPU target combinations, where it outperforms Auto-scheduler.
Recommended Settings for Auto-scheduler
Kernel trials for Auto-scheduler are allocated dynamically by task, rather than applied uniformly across all tasks as AutoTVM does. Because of Auto-scheduler's more efficient search, users should see higher performance for a much lower number of kernel trials. Therefore, the platform includes a default setting of 1,000 kernel trials for Auto-scheduler and a default setting of 2,000 kernel trials for AutoTVM.
Users should keep this 1:2 (Auto-scheduler kernel trials to AutoTVM kernel trials) conversion in mind when adjusting settings away from the defaults. Higher kernel trial settings using Auto-scheduler could significantly lengthen optimization job times, but also may lead to higher performance, particularly on GPUs.
Auto-scheduler in the SDK
In the SDK, users can now select either AutoTVM or Auto-scheduler as their TVM acceleration engine.
If the user does not specify either the tuning option or the advanced parameters, the system will default to Auto-scheduler. If the user does not specify the tuning option, but DOES pass explicit advanced parameters (ie kernel_trials or early_stopping_threshold), the system will invoke TVM.
The Auto-scheduler SDK setting 'trials_per_kernel' is functionally equivalent to the AutoTVM setting 'kernel_trials' in that it sets the overall number of experiments the system will run to optimize your model. However, when using Auto-scheduler, the 'trials_per_kernel' is multiplied by the total number of kernels to create an overall number of experiments, which are then distributed dynamically, not necessarily evenly, across the tasks as noted above.
To select Auto-scheduler:
Note, example settings are for testing only; for best performance set trials_per_kernel=1000, early_stopping_threshold=250
octomize_workflow = model_variant.octomize( platform, tuning_options=AutoschedulerOptions( trials_per_kernel=3, early_stopping_threshold=1 ), )
To select AutoTVM:
Note example settings are for testing only; for best performance, generally set kernel_trials=2000, early_stopping_threshold=500
octomize_workflow = model_variant.octomize( platform, tuning_options=AutoTVMOptions( kernel_trials=3, early_stopping_threshold=1 ), )