OctoML Accelerated Model Hub

Prediction times for the same TensorFlow or PyTorch model can vary from less than a millisecond to around a second depending on the hardware they're run on. With hundreds of possible hardware targets to choose across multiple clouds and edge devices, the choice can seem bewildering. But an incorrectly chosen hardware target can be costly and delay your project. So, how do you pick the right one?

Demystify your ML hardware selection

WHY SIGN UP?

What makes the Accelerated Model Hub different?

FEATURE COMPARISON

See how your model performs on over 70 CPUs, GPUs and accelerators across multiple clouds and edge devices including hardware from Intel, NVIDIA, Arm and AMD.

MODEL HUB INTRODUCTION

How do you use the Accelerated Model Hub?

Almost all state-of-the-art models are now adapted from a select few foundation models such as GPT-2, BERT and T5 for language tasks or ResNet and YOLO for vision applications. The rise of foundation models that are trained on massive data at scale and then adapted or fine-tuned to a wide range of downstream tasks has created a paradigm shift in the field of artificial intelligence.

OctoML’s Accelerated Model Hub was created for the era of foundation models.

Benchmarking that gives you a realistic understanding of how fast a model runs on specific hardware targets at the start of your project

Fast local inference. No network round-trip lags for remote inference APIs.

Automatically accelerated foundation ML models ready for download

Use of multiple acceleration engines to ensure you’re getting the best speed

100+ inference benchmarks

Featuring models with sub-millisecond inference

Download and integrate any of our hub models into your DevOps and MLops pipelines

OctoML uses Apache TVM, ONNX Runtime, TensorRT, and TensorFlow Lite (w/ OpenVINO & CUDA support on the way)

OctoML customers and partners

12 foundation models for your language and vision use cases

At launch we are featuring 7 foundation language models and 5 foundation vision models accelerated with both Apache TVM and ONNX Runtime across 11 hardware platforms  (Lower is better).

  • GPT-2
  • BERT
  • RoBERTa
  • ResNet
  • YOLO
  • MobileNet
chart
chart
chart
chart
chart
chart

Unlock inference results for all 12 foundation models

Get a free PDF with inference results for 12 foundation models along with an 5-page introduction to the Accelerated Model Hub and foundation models.

Pick the ideal ML model and hardware target combination using the Accelerated Model Hub

Fine-tune the model using your custom data sets, then accelerate it using OctoML

Deploy the accelerated model to your hardware

Request a Demo

OctoML Accelerated Model Hub

Other Model Hubs & Zoos

Here’s how it works:

By clicking "Submit", you agree to our privacy policy.

What makes the Accelerated Model Hub different?

FEATURE COMPARISON

Benchmarking that gives you a realistic understanding of how fast a model runs on specific hardware targets at the start of your project

Fast local inference. No network round-trip lags for remote inference APIs.

Automatically accelerated foundation ML models ready for download

Use of multiple acceleration engines to ensure you’re getting the best speed

OctoML

Others

Features

Explore the Accelerated Model Hub to unlock 100+ realistic inference benchmarks and download sub-millisecond models accelerated with Apache TVM, ONNX Runtime, TensorRT, and TensorFlow Lite (with OpenVINO & CUDA support coming later this year). 

You can quickly integrate any of our Hub models or your custom models into your DevOps and MLops pipelines.

12 foundation models for your language and vision use cases

At launch we are featuring 7 foundation language models and 5 foundation vision models accelerated with both Apache TVM and ONNX Runtime across 11 hardware platforms (Lower is better).

GPT-2

BERT-base-uncased

RoBERTa-base

MobileNetv2

YOLOv5

ResNet-50v1

Rotate Phone to Zoom

Choose the best ML model and hardware for the job

The OctoML Accelerated Model Hub shows you the inference performance for state-of-the-art deep learning models across CPUs, GPUs and accelerators.

Picking the best ML model and hardware target for your AI application has never been easier.

Get PDFSee Results
See Results

Features

Get PDF to see the inference times for all 12 foundation models

See a demo of how your deep learning model performs on more than 70 CPUs, GPUs and accelerators across multiple clouds and edge devices including hardware from Intel, NVIDIA, Arm and AMD.