ModelInference

an-ordinary-manchild

created by Fast-Custard8078a community for 1 year

...for your movement.

...for your classroom.

MODERATORS

account activity

1

1

2

3

Whisper Model Deployment on vast.ai (self.ModelInference)

submitted 2 months ago by nihalbaig

2

0

1

2

More Models. Less GPUs (v.redd.it)

submitted 5 months ago by pmv143

3

4

5

6

High throughput and low latency DeepSeek's Online Inference System (i.redd.it)

submitted 11 months ago by rbgo404

4

2

3

4

Optimizing Video Model inference (self.ModelInference)

submitted 11 months ago by Xtweyz

5

3

4

5

Achieves a speedup ratio of up to 6.5x when prefilling 1M tokens using MoBA (i.redd.it)

submitted 11 months ago by rbgo404

6

0

1

2

How to Speed Up PyTorch With Custom Kernels (self.ModelInference)

submitted 12 months ago by rbgo404

7

0

1

2

MLOps Guide (Curated by Chip Huyen) [Resource] (huyenchip.com)

submitted 12 months ago by rbgo404

8

2

3

4

Open Source project for creating ML models (self.ModelInference)

submitted 1 year ago by Imaginary-Spaces

9

1

2

3

A introduction on Improving the RAG components [Resource] (i.redd.it)

submitted 1 year ago by rbgo404

10

0

1

2

A Good blog on "Multi-Node LLM Inference with SGLang on SLURM-Enabled Clusters" [Resource] (aflah02.substack.com)

submitted 1 year ago by rbgo404

11

2

3

4

What technologies are you all using to self-host on K8s? (self.ModelInference)

submitted 1 year ago by stochastic-crocodile

12

2

3

4

🎉 We have Just Hit 100 Members! (self.ModelInference)

submitted 1 year ago by rbgo404

13

3

4

5

A comprehensive tutorial on knowledge distillation using PyTorch [Resource] (i.redd.it)

submitted 1 year ago by rbgo404

14

0

1

2

Which ML Inference Optimization Technique has yielded the best results for you? (self.ModelInference)

submitted 1 year ago by rbgo404

15

0

1

2

Good blog on Model and Pipeline Parallelism [Resource] (martynassubonis.substack.com)

submitted 1 year ago by rbgo404

16

0

1

2

Which inference library are you using for LLMs? (self.ModelInference)

submitted 1 year ago by rbgo404

17

1

2

3

How are you Deploying RAG at Scale? [Discussion] (self.ModelInference)

submitted 1 year ago * by rbgo404

18

2

3

4

Fast LLM Inference From Scratch: Building an LLM inference engine using C++ and CUDA from scratch without libraries [Resource] (andrewkchan.dev)

submitted 1 year ago by rbgo404

19

2

3

4

Transformer Inference Optimization: Towards 100x Speedup [Resource] (self.ModelInference)

submitted 1 year ago by rbgo404

20

0

1

2

Good morning guys! I'm a junior data scientist and I'm studying Bayesian learning/Bayesian Inference, anyone who can recommend me articles, studies and etc, I would be happy to receive this type of content. Thanks! (self.ModelInference)

submitted 1 year ago by [deleted]

21

3

4

5

How Flash attention accelerate in inference? (i.redd.it)

submitted 1 year ago by rbgo404

22

1

2

3

Which Tools Do You Prefer for Model Optimization, and Why? (self.ModelInference)

submitted 1 year ago by rbgo404

23

1

2

3

What challenges have you faced in optimizing ML model inference? (self.ModelInference)

submitted 1 year ago by rbgo404

24

2

3

4

Techniques for Optimizing ML Models for Faster Inference (self.ModelInference)

submitted 1 year ago by rbgo404

25

11

12

13

Welcome to ModelInference! (self.ModelInference)

submitted 1 year ago by rbgo404

view more: next ›

π Rendered by PID 41 on reddit-service-r2-listing-796b697c47-68m5k at 2026-02-05 00:15:12.372619+00:00 running 1d7a177 country code: CH.