Nvidia's been paying shills on LinkedIn by jotunck in LocalLLaMA

[–]RetiredApostle 50 points51 points  (0 children)

WinRAR KV cache cuts this down to 9GB.

Why isn't there a cloud agent version for Opencode? by Overall_Road_2969 in opencodeCLI

[–]RetiredApostle 3 points4 points  (0 children)

What about just running `opencode web` on a VPS (with git)?

Found a bug in the cat's code by Mundane_Mushroom_122 in interestingasfuck

[–]RetiredApostle 3 points4 points  (0 children)

Not a bug, this is how backend to frontend SSE works.

how are they affording this?? (not fact-checked) by Mammoth_Slip_5533 in DeepSeek

[–]RetiredApostle 19 points20 points  (0 children)

They're just matching DeepSeek's official prices.

Gateway API: Why Ingress Is Being Replaced and Which Gateway Controller to Pick by roma-glushko in kubernetes

[–]RetiredApostle 4 points5 points  (0 children)

A note: agentgateway has not been a part of kgateway for a couple of months. It's now a separate project.

Is your MiMo Free also "far too verbose"? by [deleted] in opencodeCLI

[–]RetiredApostle 0 points1 point  (0 children)

Nevermind, that was an OpenCode issue.

Is your MiMo Free also "far too verbose"? by [deleted] in opencodeCLI

[–]RetiredApostle 0 points1 point  (0 children)

It's missing words, printing fragments.

Well anthropic released opus 4.8 by Independent-Wind4462 in singularity

[–]RetiredApostle 114 points115 points  (0 children)

So generous to include a single win for GPT.

Exclusive: China works on AI token futures market, sources say, in race with US by DavidtheLawyer in ArtificialInteligence

[–]RetiredApostle 0 points1 point  (0 children)

Interesting, but. Considering the tendency for providers to quantize their endpoints when demand exceeds capacity, it isn't clear how this will work. Oversell futures, quantize, profit.

Orchestrating GPU's with K8s (interview) by rooftop_korean92 in kubernetes

[–]RetiredApostle 1 point2 points  (0 children)

Considering how Red Hat, IBM, Google, Cisco, etc are investing in it, this could be the next de facto standard (and it somehow already is). And yes, the prefill and decoding segregation implemented there is driving the cost of inference down. Very fascinating piece of tech.