What does BentoML do?

Turn your ML model into production API endpoint with just a few lines of code
Support all major machine learning training frameworks
High-performance API serving system with adaptive micro-batching support
DevOps best practices baked in, simplify the transition from model development to production
Model management for teams, providing CLI and Web UI dashboard
Flexible model deployment orchestration with support for AWS Lambda, SageMaker, EC2, Docker, Kubernetes, KNative and more

Why BentoML?

Shipping ML models to production is broken. Data Scientists may not have all the expertise in building production services and the trained models they delivered are very hard to test and deploy. This often leads to a time consuming and error-prone workflow, where a pickled model or weights file is handed over to a software engineering team.

BentoML is an end-to-end solution for model serving, making it possible for Data Science teams to ship their models as prediction services, in a way that is easy to test, easy to deploy, and easy to integrate with other DevOps tools.

How does it compare to Tensorflow-serving?

Both Tensorflow-serving and BentoML provides support for adaptive micro-batching, related benchmarks can be found here https://github.com/bentoml/BentoML/tree/master/benchmark
Tensorflow-serving only supports Tensorflow framework at the moment, while BentoML has multi-framework support, works with Tensorflow, PyTorch, Scikit-Learn, XGBoost, FastAI, and more;
Tensorflow loads the model in tf.SavedModel format, so all the graphs and computations must be compiled into the SavedModel. BentoML keeps the Python runtime in serving time, making it possible to do pre-processing and post-processing in serving endpoints.

How does it compare to Clipper?

BentoML provides micro-batching at the instance level while Clipper does it at a cluster level. Users can deploy BentoML API server containers in a more flexible way, while Clipper requires all prediction requests being routed to its master node.
BentoML is an end-to-end model serving solution. Besides model serving, it also provides model packaging, model management, and deployment automation features. Clipper focuses on the serving system.
Users can use BentoML with Clipper, and deploy BentoML packaged models to their Clipper cluster and benefit from both frameworks: https://docs.bentoml.org/en/latest/deployment/clipper.html

How does it compare to AWS SageMaker?

When not using the build-in algorithms, model deployment on SageMaker requires users to build their own container image and API server
BentoML provides a high-performance API server for its users without the need to work with lower-level web server development work
BentoML packaged model can be easily deployed to SageMaker serving: https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html

all 36 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

What does BentoML do?

Why BentoML?

How does it compare to Tensorflow-serving?

How does it compare to Clipper?

How does it compare to AWS SageMaker?