OctoML Accelerated Model Hub
Prediction times for the same TensorFlow or PyTorch model can vary from less than a millisecond to around a second depending on the hardware they're run on. With hundreds of possible hardware targets to choose across multiple clouds and edge devices, the choice can seem bewildering. But an incorrectly chosen hardware target can be costly and delay your project. So, how do you pick the right one?
Demystify your ML hardware selection
WHY SIGN UP?
What makes the Accelerated Model Hub different?
FEATURE COMPARISON
See how your model performs on over 70 CPUs, GPUs and accelerators across multiple clouds and edge devices including hardware from Intel, NVIDIA, Arm and AMD.
MODEL HUB INTRODUCTION
How do you use the Accelerated Model Hub?
Almost all state-of-the-art models are now adapted from a select few foundation models such as GPT-2, BERT and T5 for language tasks or ResNet and YOLO for vision applications. The rise of foundation models that are trained on massive data at scale and then adapted or fine-tuned to a wide range of downstream tasks has created a paradigm shift in the field of artificial intelligence.
OctoML’s Accelerated Model Hub was created for the era of foundation models.
Benchmarking that gives you a realistic understanding of how fast a model runs on specific hardware targets at the start of your project
Fast local inference. No network round-trip lags for remote inference APIs.
Automatically accelerated foundation ML models ready for download
Use of multiple acceleration engines to ensure you’re getting the best speed
100+ inference benchmarks
Featuring models with sub-millisecond inference
Download and integrate any of our hub models into your DevOps and MLops pipelines
OctoML uses Apache TVM, ONNX Runtime, TensorRT, and TensorFlow Lite (w/ OpenVINO & CUDA support on the way)
OctoML customers and partners
12 foundation models for your language and vision use cases
At launch we are featuring 7 foundation language models and 5 foundation vision models accelerated with both Apache TVM and ONNX Runtime across 11 hardware platforms (Lower is better).
Unlock inference results for all 12 foundation models
Get a free PDF with inference results for 12 foundation models along with an 5-page introduction to the Accelerated Model Hub and foundation models.
Pick the ideal ML model and hardware target combination using the Accelerated Model Hub
Fine-tune the model using your custom data sets, then accelerate it using OctoML
Deploy the accelerated model to your hardware
OctoML Accelerated Model Hub
Other Model Hubs & Zoos
Here’s how it works:
By clicking "Submit", you agree to our privacy policy.
What makes the Accelerated Model Hub different?
FEATURE COMPARISON
Benchmarking that gives you a realistic understanding of how fast a model runs on specific hardware targets at the start of your project
Fast local inference. No network round-trip lags for remote inference APIs.
Automatically accelerated foundation ML models ready for download
Use of multiple acceleration engines to ensure you’re getting the best speed
OctoML
Others
Features
Explore the Accelerated Model Hub to unlock 100+ realistic inference benchmarks and download sub-millisecond models accelerated with Apache TVM, ONNX Runtime, TensorRT, and TensorFlow Lite (with OpenVINO & CUDA support coming later this year).
You can quickly integrate any of our Hub models or your custom models into your DevOps and MLops pipelines.
12 foundation models for your language and vision use cases
At launch we are featuring 7 foundation language models and 5 foundation vision models accelerated with both Apache TVM and ONNX Runtime across 11 hardware platforms (Lower is better).
GPT-2
BERT-base-uncased
RoBERTa-base
MobileNetv2
YOLOv5
ResNet-50v1
Rotate Phone to Zoom
Choose the best ML model and hardware for the job
The OctoML Accelerated Model Hub shows you the inference performance for state-of-the-art deep learning models across CPUs, GPUs and accelerators.
Picking the best ML model and hardware target for your AI application has never been easier.
Features
Get PDF to see the inference times for all 12 foundation models
See a demo of how your deep learning model performs on more than 70 CPUs, GPUs and accelerators across multiple clouds and edge devices including hardware from Intel, NVIDIA, Arm and AMD.