We've just released new improvements!
TFLite and ONNX-RT benchmarking are now available
TFLite and ONNX-RT benchmarking are available on x86 cloud devices and Arm Cortex-A 64bit CPUs for models uploaded in TFLite format.
For TFLite benchmarking, the platform sets the number of threads to vCPU_count / 2 on hyperthreading targets and vCPU_count for non-hyperthreading targets. We have found that this setting yields better utilization of threads on a machine than TFLite’s default setting and thus generally better performance results. Additionally, our setting of threads for TFLite benchmarking matches the default number of threads used for TVM benchmarking within our product (
tvm_num_threads
).The platform does not have XNNPack enabled. TFLite’s roadmap includes plans to enable XNNPack by default in the future, at which point we will support the same behavior in our product.
We support TF 2.0-2.6. TFLite operators listed here and convertible to ONNX Op Version 11 or 13 are supported.
Models uploaded in non-TFLite formats (e.g. ONNX, TensorFlow) will not be available for TFLite benchmarking.
If you wish to access TFLite benchmarking via the SDK, please upgrade your SDK version to the latest by running:
python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade
We just launched a new feature called "Projects" in the Platform that helps organize your models into folders and improves the ease of comparing performance results
For example, you can create a project for your object detection task and add multiple models such as MobileNet and ResNet into the project. On the landing page of the web UI, you will now see a tab for projects you’ve created, projects created by your teammates, as well as “Detached Models,” which are models not associated with any project.
Models in the “Detached Models” tab can be moved to existing projects as well as a new project.
The projects feature is also available in the SDK. For examples of how to use this feature programmatically, please see https://app.octoml.ai/docs/sdk.html
We also updated our web UI to improve the ease of comparing performance results
Performance results can be compared across training frameworks and hardware targets!
After running acceleration workflows on any model, go to the “Compare” tab for the model. There, you’ll see a graph showing latency on the y-axis and inferencing engines across the x-axis. Different hardware targets are represented by different colors.
In the squeezenet example below, the bar with the lowest latency in this example corresponds to a TVM workflow on AWS c5.24xlarge. On this hardware, TVM acceleration improved latency by 4.5x, compared to the TensorFlow baseline.