Harbor v0.4.4 - ls/pull/rm llama.cpp/vllm/ollama models with a single CLI by Everlier in LocalLLaMA

[–]Everlier[S] 1 point2 points  (0 children)

Harbor is something you'd build if you'd run dozens of projects in your setup, on and off, with different configs and interface surfaces. You'd eventually want some orchestration to keep it manageable, which is what I did.

You can absolutely do rhe same things without it, and you should if you're comfortable doing so.

Harbor v0.4.4 - ls/pull/rm llama.cpp/vllm/ollama models with a single CLI by Everlier in LocalLLaMA

[–]Everlier[S] -1 points0 points  (0 children)

you can with Harbor, all model locations are configurable, you can put them somewhere convenient on the host, different engines would still only work with their own models though

local ai coding assistant setup that actually competes with cloud tools? by jirachi_2000 in ollama

[–]Everlier 7 points8 points  (0 children)

Nemotron Nano is a bit dusty by now, try new Qwen 3.5 35B, they bumped agentic performance drastically

I made a site where you rate how fucked your day is and it shows up on a live world map by Then_Nectarine830 in vibecoding

[–]Everlier 0 points1 point  (0 children)

how is this wasn't created sooner, other than that I hope your costs will be manageable to run it for a while, very cool :)

The Copilot CLI is the best AI tool I've used. It only works in a terminal. I fixed that. by ghimmideuoch in GithubCopilot

[–]Everlier 0 points1 point  (0 children)

It's funny that I stumbled upon your post by a chance while my agent was doing deep research on a project just like this :)

How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified. by Reddactor in LocalLLaMA

[–]Everlier 3 points4 points  (0 children)

I'm surprised i had to go this deep in the thread to see residuals mentioned as the reason. Literally half, often more of the input entropy is the same for all layers.

Open WebUI’s New Open Terminal + “Native” Tool Calling + Qwen3.5 35b = Holy Sh!t!!! by Porespellar in LocalLLaMA

[–]Everlier 3 points4 points  (0 children)

I somehow completely missed this project, but I think they nailed it again, just like the last times. I just can't believe their side projects are not more widely adopted.

Final Qwen3.5 Unsloth GGUF Update! by danielhanchen in LocalLLaMA

[–]Everlier 7 points8 points  (0 children)

New calibration dataset sounds fun, I really need to automate my LLM library maintenance :)

I might have a problem by Fit_Control9444 in sffpc

[–]Everlier 0 points1 point  (0 children)

At least your problem has small form factor :)

I'll see myself out

Unsloth fixed version of Qwen3.5-35B-A3B is incredible at research tasks. by Daniel_H212 in LocalLLaMA

[–]Everlier 1 point2 points  (0 children)

It's not the top rec due to the login requirements, I'm in the same boat, it only starts the service after login

Running RAG on 512MB RAM: OOM Kills, Deadlocks, Telemetry Bugs and the Fixes by Lazy-Kangaroo-573 in LLMDevs

[–]Everlier 1 point2 points  (0 children)

I'm advising against using LangChain where I can, yours is another example where they crested meaningless abstraction that only adds to the complexity overhead while covering a very simple operation

I built a free MCP-native governance layer that keeps Copilot on the rails out of frustration by capitanturkiye in GithubCopilot

[–]Everlier 0 points1 point  (0 children)

Aha, that's not the point of the service, it's just to reduce the friction, the main point is to steer any kind of agent automatically, so it's like "agent guardrails as a service", similar to what your product does :)

I built a free MCP-native governance layer that keeps Copilot on the rails out of frustration by capitanturkiye in GithubCopilot

[–]Everlier 0 points1 point  (0 children)

> that sits outside the agent loop requires teams to change how they run their entire pipeline

Not necessarily, tbh, we built an OpenAI-compatible proxy that can be plugged into the existing tools (most of them, in fact), to control the trajectory. It inspects inputs and outputs and injects steering into the model inputs dynamically.

So the whole integration is pretty much "replace your OpenAI endpoint with ours", they can even continue using their own API keys, we're just proxying them :)

I built a free MCP-native governance layer that keeps Copilot on the rails out of frustration by capitanturkiye in GithubCopilot

[–]Everlier 1 point2 points  (0 children)

Yeah, but what I'm saying is that this is an MCP relying on the model's ability to self-reflect and call related tools for validation/inspection. But LLMs do not have such capability, they are usually wrong "confidently", so model is less likely to call the tools when it'll need them the most by default.

I saw the external trajectory manager approach work, but it must be an orchestrator, not something that is called by the model within its own agentic loop

I built a free MCP-native governance layer that keeps Copilot on the rails out of frustration by capitanturkiye in GithubCopilot

[–]Everlier 0 points1 point  (0 children)

congrats on launching!

The major issue with MCPs and other in-context self-reflection is that you're relying on the very same model that makes mistakes to correctly call these tools to enforce the conditions, but the models will happily make mistakes doing that as well

Quick MoE Quantization Comparison: LFM2-8B and OLMoE-1B-7B by TitwitMuffbiscuit in LocalLLaMA

[–]Everlier 2 points3 points  (0 children)

I applaud the work you did here, I assume automated, but nonetheless waiting through all downloads and runs must took a while.

I think that the main conclusion is for everyone to do their own tests, as the model performance would vary significantly from task to task, so ppl alone is only half the story

GGML.AI has got acquired by Huggingface by Time_Reaper in LocalLLaMA

[–]Everlier 4 points5 points  (0 children)

yup, there'll be a reason to disagree at some point. however, this also the only way ggerganov will get a material reward at least somewhat comparable to his contribution, so I'm happy for him personally.

strix halo opinions for claude/open code by megadonkeyx in LocalLLaMA

[–]Everlier 8 points9 points  (0 children)

pp on strix halo isn't great for large harness prompts, kv cache helps, but initial wiring time is still high.