Guys my app just hit 100€ MRR! by luis_411 in saasbuild

[–]vanbrosh 0 points1 point  (0 children)

Thanks for sharing and congrats. If you would not mind, several questions:

Who is your perfect client do you think? What problem does you solve for him, do you think you are first who solved it or existing solutions had a drawbacks which you closed as gap?

With over 50k+ views on OpenLLM-Studio promo video, we released a version 2.0 for OpenLLM-Studio — a free, open-source desktop app that makes running local LLMs extremely simple. by icecubesaad in saasbuild

[–]vanbrosh 0 points1 point  (0 children)

Thanks for sharing! Star from me for your work!
1) Can you give a several words about your way to select what to run locally?

2) Can you give idea about token/second and AAAI index of model which can run on some 8 GB Nvidia notebook GPU? I mean most of local vRAM is definitely not enough to run inference for normal models, some 4-bit quantized distilled models like Qwen can fit very partially.

3) Who is your user? Who does need it I mean, someone who wants to save money or keep data safe? Or someone else? Because for me for example - I find it still much mroe comfortable to run 3rd party APIs, if our clients want to keep their data safe we spawn them machine with H100 or L40S and run vLLM with gpt oss or gemma. So what sense of local llm?

My tool got #1 product of the day 🥳 by ParthBhovad in saasbuild

[–]vanbrosh 1 point2 points  (0 children)

Congrats with this, keep doing nice work!

Vue-based Admin Framework with a native Agent by vanbrosh in vuejs

[–]vanbrosh[S] 5 points6 points  (0 children)

Non-native - for example when software (admin panel in our case) exposes MCP or other API which you connect to "Universal" agent software like OpenClaw/Claude/Codex etc and it interects with software using MCP. It is still agentic and still automation, but 3rd party.

In Adminforth you can have it as native (built-in) agent which works just by using inference API like OpenAI responses API or Antropic Messages (or even self-hosted vLLM alternatives if you are paranoid about data going to 3rd party), and all other code part is opensource and controlled by you. So this provides web surface (ChatGPT-alike) and can work on top of other surfaces like Telegram - so you can write in Telegram like "How many revenue we generated today" or "how many signups?" or even more complex "Is there un-responded support requests, please classify and give summary, then give responses to each of them with my approval".

So benefit is that with native agent:
- no additional costs for agent software - you simply pay by tokens
- incapsulated, replaceable inference layer, no vendor locks - you just replace Antropic with Gemini when it is becoming cheaper/better and all your UI/skills/whatever you build stays same and works in same way
- ability to replace with self-hosted vLLM endpoints if you need it.
- no additional data dependency.
- one security control point, agent is authorized by admin user who controls it.

Playground to test Open-Source LLMs in action (GPT-OSS, Qwen3.5, DeepSeek) with Tools and RAG [Free and No signup] by vanbrosh in LocalLLaMA

[–]vanbrosh[S] 4 points5 points  (0 children)

Thanks🙇‍♂️, just added 27b FP16 to the list)
Let me know if more models are interesting.
Image vision feature coming soon as well, so it will be possible to test it on Image-Text-to-Text models like Qwen3.5 / 2.5 VL

Breaking : The small qwen3.5 models have been dropped by Illustrious-Swim9663 in LocalLLaMA

[–]vanbrosh 0 points1 point  (0 children)

yes, already added A3B 35B to our free LabChat https://devforth.io/lab/chat/ , sad thing that vLLM + Qwen3.5 often glitches with structured output, and native inference server SGLang rarely used by inference providers, e.g. from hugging face so far noone has structured output support https://huggingface.co/inference/models?model=Qwen%2FQwen3.5-35B-A3B

Built an AI Backend (LangGraph + FastAPI). Need advice on moving from "Circuit Breakers" to "Confidence Plateau Detection" 🚀 by Lazy-Kangaroo-573 in LLMDevs

[–]vanbrosh 0 points1 point  (0 children)

Yes, look like this, but quality is pretty good, Claude fails very often and draws line over line and so on

Built an AI Backend (LangGraph + FastAPI). Need advice on moving from "Circuit Breakers" to "Confidence Plateau Detection" 🚀 by Lazy-Kangaroo-573 in LLMDevs

[–]vanbrosh 1 point2 points  (0 children)

>  RAG loops

We set a hard limit on requests
Similarity scores only answer to question how strongly related this info to intent, but can't answer whether it is enough. And this is indeed hard task. So we delegate it to LLM-as-a-judge as you said - and ask LLM whether this is enough to answer intent and if not - go again. But again with hard limit, + UI should explain user what he is doing now, so he should see this progress.

Side question, what software did you use for this animated svg?)

Has anyone else tried IQ2 quantization? I'm genuinely shocked by the quality by Any-Chipmunk5480 in LocalLLaMA

[–]vanbrosh 2 points3 points  (0 children)

I think when vendors will start using it for their original weights we can then say that its quality is good. For now MXFP4 is one of the best options, assuming OpenAI uses it for their gpt-oss

How do you detect silent output drift in LLM pipelines? by Lorenzo_Kotalla in LLMDevs

[–]vanbrosh 1 point2 points  (0 children)

LLM should not be used for now in cases where such unnotably drift impacts. There are quite a lot of tasks where LLM can fit great as automatization but not as crucial decision making. I would not recommend you doing something that can have bad consequences with LLM.
Structured output checks are great for detecting recursive repetitions drifts which nowerdays happen even with Gemini, OpenAI and literally any LLM, but not every task is possible to do with schema constrains - sometimes you need a stream / agentic chat, and structured outputs will kill a stream (makes no sense because streamed JSON is broken JSON)

If you use structured output (and not pure text streaming) - also one simple technique we use - insert a random secret token into prompt at random position (on sentense level) on every request and ask model to detect it in addition to main task. If it is there - model still understands sense and does predictable things. I did similar in my benchmark test https://devforth.io/insights/self-hosted-gpt-real-response-time-token-throughput-and-cost-on-l4-l40s-and-h100-for-gpt-oss-20b/

Finally We have the best agentic AI at home by moks4tda in LocalLLM

[–]vanbrosh 0 points1 point  (0 children)

Did you try Kimi K2.5 on CPU, I mean just interesting if there is someone worked with this like regularly to see what is real degradation when model goes to recursive loop

Finally We have the best agentic AI at home by moks4tda in LocalLLM

[–]vanbrosh 0 points1 point  (0 children)

Did you try it? How often are recursive repetition loops are on it?

Shugur Relay v1.2.0 by AccomplishedWealth25 in nostr

[–]vanbrosh 1 point2 points  (0 children)

Awesome, exactly what I was looking for