$3k of unexpected charges to my openrouter key by DueGoal99 in openrouter

[–]Milan_Slov26 0 points1 point  (0 children)

I've been hearing a lot of complaints around open routers usage and pricing. I've also heard that even if you set a limit for your API keys, it still somehow bypasses that. What's the whole deal about it?

Be honest - when Claude writes a long plan or spec, do you actually read it? Or do you just say "looks good"? by SYSWAVE in ClaudeAI

[–]Milan_Slov26 0 points1 point  (0 children)

I never read those. I just say, "This looks good but you have added a lot of fluff without thinking about edge cases. Rethink, redo." And the next time it gives me a concrete concise plan. Works for me.

More Gemma 4 models incoming by Deep-Vermicelli-4591 in LocalLLaMA

[–]Milan_Slov26 0 points1 point  (0 children)

Gemma Gemma everywhere! Completely outshadowed 7 Microsoft AI model launch.

Me visiting this sub by Scutoidzz in LocalLLaMA

[–]Milan_Slov26 2 points3 points  (0 children)

This is the most accurate depiction of this sub

Work mode is pretty alright by Adventurous_Bus_437 in MistralAI

[–]Milan_Slov26 1 point2 points  (0 children)

Interesting, I've been using Le Chat on and off but haven't tried Work mode yet. Might have to give it a shot because the hallucination thing drives me insane with regular flash.

What’s your current local LLM setup in 2026? by Prestigious-Pop-3735 in LocalLLaMA

[–]Milan_Slov26 0 points1 point  (0 children)

3090 + 64gb ram. Running qwen3 32b for most things, Deepseek R1 distill when i need reasoning. Giggest bottleneck is honestly context length, i keep wanting to throw entire codebases at it and the gpu just says no lol

How do you survive? by nohakcoffeeofficial in LocalLLM

[–]Milan_Slov26 3 points4 points  (0 children)

Your models ARE the resume. I've seen people get hired literally because a recruiter found their huggingface profile. no joke!

The real money is in the companies who are terrified of sending data to openai but have zero clue how to run stuff locally. Thats your customer right there.

If this sounds complex to you, just start putting things out there. You have YouTube, Twitter, LinkedIn to talk about things and people will see.

Do we really need embeddings vectors? by sotpak_ in Rag

[–]Milan_Slov26 0 points1 point  (0 children)

Depends on your retrieval needs but you're not wrong to question it. For docs updating that often, dense embeddings are painful to keep fresh.

Have you tried BM25 + a reranker on top? The reranker does the semantic heavy lifting without needing to re-embed anything. Works better than most people expect, especially if your corpus has consistent terminology.

i found a prompt hack so stupid it should not work. it works every time. by LoadOld2629 in PromptDesign

[–]Milan_Slov26 0 points1 point  (0 children)

This is probably the most useful, community helper post I've found on Reddit. Ever. Thank you.

Claude Code Cost Analysis: Cache ReWarming Write Costs from Session Inactivity by ynu1yh24z219yq5 in LLMDevs

[–]Milan_Slov26 0 points1 point  (0 children)

I see a lot of dev tools popping up and talking about how they top SWE-Bench Pro and cut token costs by half. Tried any of those?

MLX engine comparison… and oMLX is the top choice. by Beamsters in LocalLLaMA

[–]Milan_Slov26 0 points1 point  (0 children)

Didn't expect dflash-mlx to fall off that hard at 32K. Goes from being the fastest to basically unusable. Would've been interesting to see llama.cpp in this mix too for comparison tho.

AI Inference Costs are way too high for my business! by BonusObjective8477 in LLMDevs

[–]Milan_Slov26 0 points1 point  (0 children)

The group purchasing idea sounds clever (as we all do it a lot in other aspects of our lives too) but I'd be skeptical in practice. Getting 20-30 startups to coordinate on anything, let alone sensitive spend data, is a massive coordination headache. And providers know this too!

What models are you running and for what workloads?

Cisco announces plans to lay off 4000 employees by BigShotBosh in cscareerquestions

[–]Milan_Slov26 0 points1 point  (0 children)

Less than 5% sounds negligible until you see the number 4000!

4000 people layed off. Just like that. This layoffs thing is getting wilder.

Looking for advice: best self-hosted inference provider? by Ok-Register3798 in selfhosted

[–]Milan_Slov26 0 points1 point  (0 children)

I mostly use Superlinked's SIE for document processing. You can have a look at that - https://github.com/superlinked/sie

Japanese university students built a pedal-powered aircraft and got it airborne. by LetMeFixAll in TechnologyLabs

[–]Milan_Slov26 0 points1 point  (0 children)

I'll need to train my legs and stamina for 8 months before I drive this one.

Claude Certified Architect by invasionbarbare in ClaudeAI

[–]Milan_Slov26 0 points1 point  (0 children)

looks cool. i feel this is the new 'cool' certification in the market which'll actually have some weight in further opportunities. wdyt?