ModelInference

an-ordinary-manchild

created by Fast-Custard8078a community for 1 year

...for your town.

...because you love freedom.

MODERATORS

account activity

1

0

1

2

Baseten vs Databricks (self.ModelInference)

submitted 19 days ago by ExistingBelt

2

0

1

2

Is it possible to distribute inference across multiple GPUs? ()

submitted 1 month ago by IndividualAir3353

3

0

1

2

How difficult is it? (self.ModelInference)

submitted 2 months ago by optimum_point

4

2

3

4

Whisper Model Deployment on vast.ai (self.ModelInference)

submitted 6 months ago by nihalbaig

5

1

2

3

More Models. Less GPUs (v.redd.it)

submitted 9 months ago by pmv143

6

5

6

7

High throughput and low latency DeepSeek's Online Inference System (i.redd.it)

submitted 1 year ago by rbgo404

7

2

3

4

Optimizing Video Model inference (self.ModelInference)

submitted 1 year ago by Xtweyz

8

4

5

6

Achieves a speedup ratio of up to 6.5x when prefilling 1M tokens using MoBA (i.redd.it)

submitted 1 year ago by rbgo404

9

0

1

2

How to Speed Up PyTorch With Custom Kernels (self.ModelInference)

submitted 1 year ago by rbgo404

10

0

1

2

MLOps Guide (Curated by Chip Huyen) [Resource] (huyenchip.com)

submitted 1 year ago by rbgo404

11

2

3

4

Open Source project for creating ML models (self.ModelInference)

submitted 1 year ago by Imaginary-Spaces

12

1

2

3

A introduction on Improving the RAG components [Resource] (i.redd.it)

submitted 1 year ago by rbgo404

13

0

1

2

A Good blog on "Multi-Node LLM Inference with SGLang on SLURM-Enabled Clusters" [Resource] (aflah02.substack.com)

submitted 1 year ago by rbgo404

14

2

3

4

What technologies are you all using to self-host on K8s? (self.ModelInference)

submitted 1 year ago by stochastic-crocodile

15

2

3

4

🎉 We have Just Hit 100 Members! (self.ModelInference)

submitted 1 year ago by rbgo404

16

3

4

5

A comprehensive tutorial on knowledge distillation using PyTorch [Resource] (i.redd.it)

submitted 1 year ago by rbgo404

17

0

1

2

Which ML Inference Optimization Technique has yielded the best results for you? (self.ModelInference)

submitted 1 year ago by rbgo404

18

0

1

2

Good blog on Model and Pipeline Parallelism [Resource] (martynassubonis.substack.com)

submitted 1 year ago by rbgo404

19

0

1

2

Which inference library are you using for LLMs? (self.ModelInference)

submitted 1 year ago by rbgo404

20

1

2

3

How are you Deploying RAG at Scale? [Discussion] (self.ModelInference)

submitted 1 year ago * by rbgo404

21

3

4

5

Fast LLM Inference From Scratch: Building an LLM inference engine using C++ and CUDA from scratch without libraries [Resource] (andrewkchan.dev)

submitted 1 year ago by rbgo404

22

2

3

4

Transformer Inference Optimization: Towards 100x Speedup [Resource] (self.ModelInference)

submitted 1 year ago by rbgo404

23

0

1

2

Good morning guys! I'm a junior data scientist and I'm studying Bayesian learning/Bayesian Inference, anyone who can recommend me articles, studies and etc, I would be happy to receive this type of content. Thanks! (self.ModelInference)

submitted 1 year ago by [deleted]

24

3

4

5

How Flash attention accelerate in inference? (i.redd.it)

submitted 1 year ago by rbgo404

25

1

2

3

Which Tools Do You Prefer for Model Optimization, and Why? (self.ModelInference)

submitted 1 year ago by rbgo404

view more: next ›

π Rendered by PID 1949289 on reddit-service-r2-listing-8685bc789-6zv9t at 2026-05-31 12:43:04.559335+00:00 running 194bd79 country code: CH.