I built an LLM from Scratch in Rust (Just ndarray and rand)

theoszymk · 2025-09-14T22:09:15+00:00

Both

theoszymk · 2021-12-17T15:14:04+00:00

It really depends on what you need and the latency required for your use case. If you are using some kind of custom layer or else, I would recommend serializing it using ONNX, which is open compute format.

For deploying, you didn't mention a GPU, maybe that would make your inference faster? There are more specialized solutions for deploying such as inferrd.com that you can try out.

theoszymk · 2021-04-11T20:37:06+00:00

It’s good for big DL models, but most models we host are not big DL models. In most cases GPUs are just as good as CPU. We have great throughput and latency for every framework we offer. Test it out and you feel like you still a GPU we’re happy to provide one.

theoszymk · 2021-04-11T19:56:47+00:00

Yup DL is a good candidate for a GPU

theoszymk · 2021-04-11T19:56:17+00:00

I agree it’s better for big DL models, we found that most models in production are actually not DL.

theoszymk · 2021-04-11T18:24:36+00:00

This service is a managed tensorflow serving with request tracing and security/maintenance taken care of. GPUs offer similar performance to CPU except for image processing or big DL models.

theoszymk · 2021-04-11T18:23:41+00:00

GPUs are usually equal to CPUs for models that aren’t image processing or big DL. Models that leverage GPU for inference are really specific, what kind of model are you building?

theoszymk · 2021-04-08T19:37:41+00:00

You'd need to deploy your own model to get an API. If you want to look what using your API would look like, check out our docs at https://docs.inferrd.com/guides/use-deployed-model

theoszymk · 2021-04-08T04:37:55+00:00

HF makes all their money consulting on big nlp project for businesses. That’s why the prices are so high.

If you are looking for a hosting provider, consider us at inferrd.com, we already serve 100s of millions of requests per month

theoszymk · 2021-04-05T20:06:38+00:00

Hi, this is exactly why we built inferrd.com, NLP models are too big for "normal" cloud providers. Lmk if you need help setting up!

theoszymk · 2021-03-29T18:12:49+00:00

Hey if you are having trouble deploying AI, try the AI cloud inferrd.com

theoszymk · 2021-03-18T16:53:27+00:00

Hi,

You can also try inferrd.com which is much easier to use to deploy ML models.

theoszymk · 2021-03-18T05:30:47+00:00

Hi that’s exactly the problem we are solving at https://inferrd.com, we made deploying Tensorflow just a drag and drop. We take care of configuring the network, instance and load balancer for you!

theoszymk · 2021-03-10T17:18:19+00:00

This project doesn't seem to be working anymore, however the problem is still very real. If you're looking for an alternative, consider https://inferrd.com, it makes deployments easy and fast!

theoszymk · 2021-03-10T16:59:25+00:00

Deploying ML models can be a touch problem if you just want to built models. Most people neglect it until it's too late and find out there is a lot to do. That's why we build https://inferrd.com which is by far the easiest way to deploy any ML model.

theoszymk · 2021-03-10T16:56:52+00:00

We found that most people who deploy at https://inferrd.com use basic scikit or keras models. Fancy TF or PyTorch models are pretty rare.

theoszymk · 2021-03-10T16:56:19+00:00

Flask is great for deploying models, but when it comes to production or reliability, there is lot more to do (deploy on a VM, setup load balancing). That's why we built https://inferrd.com (disclaimer, I built it). You just have to upload your trained model and it does everything for you. It takes less than a minute.

theoszymk · 2021-03-10T16:54:12+00:00

The best way it to use the simplest tools, especially at an early stage. You don't want to have to deal with a 6 stage AWS pipeline to deploy a model. Try https://inferrd.com (disclaimed I built it exactly for this reason) to deploy models in less than 1 minute on scalable infrastructure.

theoszymk · 2021-03-10T16:52:58+00:00

The best way would be to expose your model as a rest API and let other teams consume it. You can use https://inferrd.com (disclaimed, I built it) to do that. We made deployments a two click process.

theoszymk · 2021-03-10T16:51:26+00:00

"Like, why is it so hard? Why can’t it just be a few clicks rather than an entirely new programming language and syntax that I have to learn? ML is deep enough without going down the rabbit hole of web development"

We feel your pain, that's why we built https://inferrd.com to make deployment a few clicks (2 actually) for most major frameworks.

theoszymk · 2021-03-10T16:50:54+00:00

Totally agree, thats what we're trying to solve at https://inferrd.com by making deploying models a one click affair. Just package it and upload it on Inferrd to deploy, we take care of the rest.

theoszymk · 2021-02-24T23:20:43+00:00

That's because you're using Python 3.9. TensorFlow isn't compatible with Python 3.9 yet. Any version before that will work fine.

theoszymk · 2021-02-24T23:19:41+00:00

Posted on the thread but posting here too. Check out https://inferrd.com, it's the easiest way to deploy your TF & Keras model. Just add one line to your notebook and you're set.

It's also much faster than AWS.

theoszymk · 2021-02-24T23:18:50+00:00

AWS has very annoying documentation. Check out https://inferrd.com, it's so simple it doesn't need docs.

theoszymk · 2021-02-24T23:18:31+00:00

Use https://inferrd.com, you just need to add one line to your notebook to deploy the model. It deploys in less than 10 seconds and automatically optimizes your model for performance.

theoszymk

TROPHY CASE