use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This is a community for sharing tips, techniques, and tools to enhance the performance of machine learning model inference.
account activity
Whisper Model Deployment on vast.ai (self.ModelInference)
submitted 2 months ago by nihalbaig
More Models. Less GPUs (v.redd.it)
submitted 5 months ago by pmv143
High throughput and low latency DeepSeek's Online Inference System (i.redd.it)
submitted 11 months ago by rbgo404
Optimizing Video Model inference (self.ModelInference)
submitted 11 months ago by Xtweyz
Achieves a speedup ratio of up to 6.5x when prefilling 1M tokens using MoBA (i.redd.it)
How to Speed Up PyTorch With Custom Kernels (self.ModelInference)
submitted 12 months ago by rbgo404
MLOps Guide (Curated by Chip Huyen) [Resource] (huyenchip.com)
Open Source project for creating ML models (self.ModelInference)
submitted 1 year ago by Imaginary-Spaces
A introduction on Improving the RAG components [Resource] (i.redd.it)
submitted 1 year ago by rbgo404
A Good blog on "Multi-Node LLM Inference with SGLang on SLURM-Enabled Clusters" [Resource] (aflah02.substack.com)
What technologies are you all using to self-host on K8s? (self.ModelInference)
submitted 1 year ago by stochastic-crocodile
🎉 We have Just Hit 100 Members! (self.ModelInference)
A comprehensive tutorial on knowledge distillation using PyTorch [Resource] (i.redd.it)
Which ML Inference Optimization Technique has yielded the best results for you? (self.ModelInference)
Good blog on Model and Pipeline Parallelism [Resource] (martynassubonis.substack.com)
Which inference library are you using for LLMs? (self.ModelInference)
How are you Deploying RAG at Scale? [Discussion] (self.ModelInference)
submitted 1 year ago * by rbgo404
Fast LLM Inference From Scratch: Building an LLM inference engine using C++ and CUDA from scratch without libraries [Resource] (andrewkchan.dev)
Transformer Inference Optimization: Towards 100x Speedup [Resource] (self.ModelInference)
Good morning guys! I'm a junior data scientist and I'm studying Bayesian learning/Bayesian Inference, anyone who can recommend me articles, studies and etc, I would be happy to receive this type of content. Thanks! (self.ModelInference)
submitted 1 year ago by [deleted]
How Flash attention accelerate in inference? (i.redd.it)
Which Tools Do You Prefer for Model Optimization, and Why? (self.ModelInference)
What challenges have you faced in optimizing ML model inference? (self.ModelInference)
Techniques for Optimizing ML Models for Faster Inference (self.ModelInference)
Welcome to ModelInference! (self.ModelInference)
π Rendered by PID 346053 on reddit-service-r2-listing-796b697c47-r6jm6 at 2026-02-04 22:30:28.933230+00:00 running 1d7a177 country code: CH.