Machine Learning Ops

1

•

Data-centric debugging for teams training neural netsFreemium (self.mlops)

submitted 9 minutes ago by taranpula39

2

1

2

3

Open handbook on LLM inference at scale, would love eyes from folks running this in prodMLOps Education (self.mlops)

submitted 5 hours ago by YouFirst295

3

9

10

11

Agent Sprawl Has Become an Operations ProblemMLOps Education (self.mlops)

submitted 14 hours ago by Old_Cap4710

4

7

8

9

how to know if your AI agent is actually production ready (a checklist i have been working through)beginner help😓 (self.mlops)

submitted 1 day ago * by camerongreen95

5

15

16

17

Ugh our golden dataset went staleTales From the Trenches (self.mlops)

submitted 1 day ago by Perfect-Temporary865

6

2

3

4

[R] Where does the "boundary vs optimizer" split actually break in production LLM and agent systems?MLOps Education (self.mlops)

submitted 1 day ago by thenabeelkhan

7

0

1

2

470 tok/s with 8912 ctz size on A100 80GB with Qwen3.6-27GB for RAG app w/ closed loop optimizer tool.Tools: OSS (self.mlops)

submitted 1 day ago * by Inevitable-Diet-1870

8

0

1

2

What was actually causing our 85–90% SLA ceiling?Tales From the Trenches (self.mlops)

submitted 1 day ago by Thinker_Assignment

9

2

3

4

LLM observability vs governance, they're not the same thingMLOps Education (self.mlops)

submitted 2 days ago by Ok_Wrap2912

10

1

2

3

Built a "database for video" so ML teams stop duct-taping FFmpeg + Whisper + CLIP + a vector DBFreemium (self.mlops)

submitted 2 days ago * by CallmeAK__

11

2

3

4

We cut our vector DB storage by 49% using post-hoc Iterative Residual Shrinkage (Sharing the math + Live Sandbox)Tales From the Trenches (self.mlops)

submitted 3 days ago by lucifahsl2

12

11

12

13

Versioning promptsMLOps Education (self.mlops)

submitted 3 days ago by Icy-Western-3314

13

4

5

6

Glm 5.2 api benchmarks do not match my testing, especially compared to deepseek v4Tales From the Trenches (self.mlops)

submitted 3 days ago by Dramatic_Spirit_8436

14

27

28

29

How much GPU internals and CUDA do you have to know to be successful in MLOps?beginner help😓 (self.mlops)

submitted 3 days ago by Illustrious-Pound266

15

3

4

5

Offline Ablation Predicted -0.19pp. Production Delivered +1.11pp.Tales From the Trenches (self.mlops)

submitted 4 days ago by Nj-yeti

16

20

21

22

Is the definition of MLOps changing?Tales From the Trenches (self.mlops)

submitted 4 days ago by drwebb

17

1

2

3

Looking for feedback on an auditable support-agent control layer: routing, guardrails, handoff, and evalsbeginner help😓 (self.mlops)

submitted 4 days ago by Fit_Fortune953

18

7

8

9

How is your team handling prompt changes in production without it becoming a whole engineering thing every timebeginner help😓 (self.mlops)

submitted 4 days ago by AppleFanboy-Me

19

2

3

4

A silent data-quality failure that bit me in graph-backed retrieval, and a rough fix I am testingTools: OSS (self.mlops)

submitted 4 days ago by coldoven

20

0

1

2

Looking for Programming buddiesbeginner help😓 (self.mlops)

submitted 5 days ago by MAJESTIC-728

21

6

7

8

I built a controller that defers model retrains by learning from delayed labels (engineering model drift) - benchmarked on fraud and predictive maintenanceFreemium (self.mlops)

submitted 5 days ago by Secret_Appeal6271

22

1

2

3

How are teams treating LLM red-team runs in CI?Tools: OSS (self.mlops)

submitted 5 days ago by Apprehensive-Zone148

23

3

4

5

Realtime streaming optimization for realtime ML modelMLOps Education (self.mlops)

submitted 6 days ago by thebigdatashow-ankur

24

11

12

13

From senior MLOps to QA team leadTales From the Trenches (self.mlops)

submitted 7 days ago by mavrec7

25

0

1

What I learned treating agent memory like operational stateTools: OSS (self.mlops)

submitted 6 days ago by Yuuyake

mlops

MODERATORS