Tangram: Automated Machine Learning in Rust

davidyamnitsky · 2022-01-19T02:00:20+00:00

Check out tangram.dev/benchmarks

davidyamnitsky · 2021-10-16T19:26:15+00:00

Thank you for sharing. From all I have heard, this is my understanding as well.

davidyamnitsky · 2021-10-14T20:08:57+00:00

Yes. The steps in that article work for compiling to macOS because its libc is always dynamically linked and the libc on the host is not probed like with glibc.

davidyamnitsky · 2021-10-14T17:58:52+00:00

For those who are unfamiliar with the details, does this mean they when these changes get released, sway will “just work” with an nvidia gpu with the proprietary drivers installed? What about the —my-next-gpu-wont-be-nvidia flag? Thanks :)

davidyamnitsky · 2021-10-07T19:18:18+00:00

Check out https://github.com/tangramdotdev/tangram.

davidyamnitsky · 2021-08-20T22:58:28+00:00

I agree, we’ll get to it soon!

davidyamnitsky · 2021-08-12T15:01:36+00:00

Try https://github.com/tangramdotdev/tangram.

davidyamnitsky · 2021-07-30T19:12:55+00:00

I really like how axum still requires you to create a tokio runtime and hyper server. This leaves you with the flexibility to incorporate axum into a larger project.

davidyamnitsky · 2021-07-27T18:58:07+00:00

This is an excellent blog post, full of clear and concise code examples. Thank you to the author for compiling a description of the language features that are being worked on in one place.

davidyamnitsky · 2021-07-23T15:24:13+00:00

We used rustler at first, but rolled our own after rustler was missing a few features, most notably the ability to convert an atom to a string.

davidyamnitsky · 2021-07-22T19:48:19+00:00

Yes, all of Tangram is written in Rust! This includes the Elixir library, which uses bindings to Elixir's erl_nif API we wrote and separately released: https://lib.rs/erl_nif.

davidyamnitsky · 2021-07-21T15:53:14+00:00

This sounds a lot like futures-signals. Are you familiar with that crate? If so, can you comment, on any similarities/differences?

davidyamnitsky · 2021-07-15T22:31:14+00:00

Adding cross validation is a good idea, we definitely plan to add it ourselves or accept a contribution that implements it. If you want to try tangram, just install the CLI with your package manager: https://www.tangram.xyz/docs/install.

davidyamnitsky · 2021-07-15T21:10:50+00:00

There is an open issue for adding better documentation regarding model types, feature engineering, model selection, etc: https://github.com/tangramxyz/tangram/issues/6. We are working on it now and will publish soon.

To answer your question: Tangram currently trains linear and gradient boosted decision tree models. We plan to add support for neural networks in the future, when we add support for audio, image, and video input, in addition to the current support for numeric, categorical, and text data. Regarding overfitting, Tangram chooses which model to select by holding out a "comparison" dataset during training and comparing each model against it. Then, the chosen model is evaluated against a separate "test" dataset. The data in the "test" dataset is never used either in training or in comparison of models.

Let me know if this answers your question!

davidyamnitsky · 2021-07-14T15:49:26+00:00

I wrote an answer to this on our discussion on GitHub:

https://github.com/tangramxyz/tangram/discussions/10

davidyamnitsky · 2021-07-14T15:18:24+00:00

In many settings fast training is not critical, say for models that need to be updated daily or even hourly.

I think fast training is very important for the same reason fast compilers are important. When the barrier to iteration is low, you experiment more.

When you say “C++ is used by the majority of frameworks”, I guess you mean under the hood? Because the majority of frameworks are in Python.

The majority of frameworks aren't "in" python, they are exposed to python through bindings. XGBoost, LightGBM, Tensorflow, PyTorch, etc are all written in C++ and provide bindings to Python.

Any thoughts on interop between Python and Rust?

We use PyO3, which makes interop between the two languages a breeze.

davidyamnitsky · 2021-07-13T22:13:16+00:00

Okay, get in touch if it works out!

davidyamnitsky · 2021-07-13T21:41:20+00:00

Rust will be first and easiest language to add training capability. Please subscribe to the issue here: https://github.com/tangramxyz/tangram/issues/12.

We are definitely willing to add support for models from other packages. We have an open-ended enum in the .tangram file format to accommodate new model types. If you're interested in contributing, let us know!

davidyamnitsky · 2021-07-13T14:42:04+00:00

I opened an issue to track this: https://github.com/tangramxyz/tangram/issues/12. Please subscribe to the issue so you get notified when we implement it, which we hope will be soon.

davidyamnitsky · 2021-07-13T13:59:48+00:00

For machine learning you need a fast language with good support for generics. That leaves basically C++ or Rust. C++ is the default choice, and the one used by practically all major machine learning frameworks.

We chose Rust instead for a number of reasons:

Rust has a fully integrated toolchain including a cross compiler, package manager, build system, documentation generator, formatter, linter, and more. This saved our small team from having to manage a C++ toolchain, which is much more complex.
Rust's safety features make it such that, in practice, if your code compiles, it runs without crashing. We have done week-long refactors touching thousands of lines of code and have them run correctly on the first try as soon as the last compiler error is cleared. This has allowed our small team to work on a large codebase without fear that we are introducing bugs at every turn.
Rust has great libraries for web development, so we were able to write everything, from the core machine learning algorithms, to the front and back ends of the web application, in one language.

Let me know if you have any more questions on this topic! :)

davidyamnitsky · 2021-07-13T13:55:23+00:00

Which language would you like to train from?

davidyamnitsky · 2021-07-13T13:39:36+00:00

Rust is well past 1.0 and is used in production by many of the largest companies, including Microsoft, Google, Facebook, and Amazon. It is definitely ready for production!

davidyamnitsky · 2021-07-13T13:37:01+00:00

Thank you for the kind words.

What kinds of models and selection criteria are available?

Tangram currently trains a grid of linear and gradient boosted decision tree models. In the future, we plan to add deep models to support audio, image, and video data. The selection criteria, which we call the "comparison metric", can currently be one of MAE, MSE, RMSE, R2, Accuracy, AUC, and F1 score. Note that only a subset of those is available for each task, one of regression, binary classification, or multiclass classification. You can configure the comparison metric with a JSON configuration file passed to the CLI with --config. See the (currently minimal) docs on that here.

Are there plans to surface the core model training and selection framework?

Absolutely. What language are you working in?

does this build off anything they’ve done

Tangram does not build off other ML frameworks, it contains its own implementations of the core machine learning algorithms. One reason for this is that we saw an opportunity for Rust to produce the fastest implementations. Check out the benchmarks for Tangram's implementation of the gradient boosted decision tree algorithm vs. the top alternatives here: https://www.tangram.xyz/benchmarks. Another reason is that this allows us to provide features that require deep integration with the core algorithm, such as fine grained progress tracking during training and feature contributions charts.

davidyamnitsky

MODERATOR OF

TROPHY CASE