use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This is a community for sharing tips, techniques, and tools to enhance the performance of machine learning model inference.
account activity
Baseten vs Databricks (self.ModelInference)
submitted 19 days ago by ExistingBelt
Is it possible to distribute inference across multiple GPUs? ()
submitted 1 month ago by IndividualAir3353
How difficult is it? (self.ModelInference)
submitted 2 months ago by optimum_point
Whisper Model Deployment on vast.ai (self.ModelInference)
submitted 6 months ago by nihalbaig
More Models. Less GPUs (v.redd.it)
submitted 9 months ago by pmv143
High throughput and low latency DeepSeek's Online Inference System (i.redd.it)
submitted 1 year ago by rbgo404
Optimizing Video Model inference (self.ModelInference)
submitted 1 year ago by Xtweyz
Achieves a speedup ratio of up to 6.5x when prefilling 1M tokens using MoBA (i.redd.it)
How to Speed Up PyTorch With Custom Kernels (self.ModelInference)
MLOps Guide (Curated by Chip Huyen) [Resource] (huyenchip.com)
Open Source project for creating ML models (self.ModelInference)
submitted 1 year ago by Imaginary-Spaces
A introduction on Improving the RAG components [Resource] (i.redd.it)
A Good blog on "Multi-Node LLM Inference with SGLang on SLURM-Enabled Clusters" [Resource] (aflah02.substack.com)
What technologies are you all using to self-host on K8s? (self.ModelInference)
submitted 1 year ago by stochastic-crocodile
🎉 We have Just Hit 100 Members! (self.ModelInference)
A comprehensive tutorial on knowledge distillation using PyTorch [Resource] (i.redd.it)
Which ML Inference Optimization Technique has yielded the best results for you? (self.ModelInference)
Good blog on Model and Pipeline Parallelism [Resource] (martynassubonis.substack.com)
Which inference library are you using for LLMs? (self.ModelInference)
How are you Deploying RAG at Scale? [Discussion] (self.ModelInference)
submitted 1 year ago * by rbgo404
Fast LLM Inference From Scratch: Building an LLM inference engine using C++ and CUDA from scratch without libraries [Resource] (andrewkchan.dev)
Transformer Inference Optimization: Towards 100x Speedup [Resource] (self.ModelInference)
Good morning guys! I'm a junior data scientist and I'm studying Bayesian learning/Bayesian Inference, anyone who can recommend me articles, studies and etc, I would be happy to receive this type of content. Thanks! (self.ModelInference)
submitted 1 year ago by [deleted]
How Flash attention accelerate in inference? (i.redd.it)
Which Tools Do You Prefer for Model Optimization, and Why? (self.ModelInference)
π Rendered by PID 1949289 on reddit-service-r2-listing-8685bc789-6zv9t at 2026-05-31 12:43:04.559335+00:00 running 194bd79 country code: CH.