We've just released new improvements!
TensorFlow Model Upload is now available on our Platform! Benchmarking is also now available for CPUs! (Benchmarks on GPUs coming soon)!
Users are now able to upload models in Keras SavedModel or TensorFlow GraphDef formats.
These models can be benchmarked using TVM Runtime or ONNX-RT on any cloud target we offer and our 64bit ARM Cortex-A CPU edge devices as well as in TensorFlow's Runtime on CPUs. The same support will be extended to GPUs and the rest of our edge devices in coming weeks.
TensorFlow is a vast ecosystem of model formats, frontends and backends, and additional code libraries or other extensible configurations. We currently support Keras SavedModels and TensorFlow GraphDefs, for models trained using TF 2.0-2.2. We support TensorFlow operators which are convertible to ONNX Op Version 11 or 13, listed here.
We plan on adding support for models trained in TF versions up to 2.6 in coming weeks. If you have a question or request about a specific capability, please reach out via chat or directly to our Customer Success team.
To view more details and tutorials on how to use our TensorFlow feature, see here.
Performance improvement of 10-25% of the vast majority of models due to NHWC layout transformations used with TVM Autoscheduler
OctoML regularly introduces the most cutting-edge TVM capabilities into our Deployment Platform to help users attain better performance on their models. This week, we're releasing NHWC layout transformations for Autoscheduler, which yield a performance improvement of 10-25% on the vast majority of publicly available models.
To access the performance improvements, simply run a new TVM workflow on your models and compare the benchmarked runtime against previous results in the Workflows tab.
On Broadwell CPUs, Cascade Lake CPUs, and T4 and K80 GPUs, we saw performance improvements using Auto-scheduler on models such as:
gpt2, mobilenetv1, mobilenetv2, resnet50v1, resnet50v2, ssd, vgg19, and yolov3
.
Automatically package in both Python wheel and .so formats
We will now automatically package your TVM-accelerated model in both Python wheel and .so formats, whenever you run the Accelerate UX in the web UI or the
octomize()
function in the SDK. If you wish to override this default behavior in the SDK, specifycreate_package=False
when calling theoctomize()
function. This feature does not affect how users are billed.
Benchmarking on Arm A-72 64bit RPis in ONNX-RT and TensorFlow's Runtime
OctoML is continuing to extend support for edge devices. Apart from TVM autotuning, benchmarking, and packaging capabilities, users will now be able to run benchmark workflows in TensorFlow's Runtime on the 64bit RPis, for models uploaded in Keras SavedModel, or TensorFlow GraphDef formats.
ONNX-RT benchmarking is also available on the 64bit RPis for models uploaded in ONNX format.
The same support is already available on all cloud targets, and support will be added to other edge devices in coming weeks.