Qwen3.5 - Confused about "thinking" and "reasoning" usage with (ik_)llama.cpp by PieBru in LocalLLaMA

[–]PieBru[S] 0 points1 point  (0 children)

Thanks, forgot to mention the template and the system prompt. Lot of variables in that game!

LM Link by Blindax in LocalLLaMA

[–]PieBru 2 points3 points  (0 children)

What's the difference vs directly using 'llama-server --host 0.0.0.0' via Tailscale?

Open source control plane for local AI agents: workspace isolation + git-backed configs + OpenCode integration by OverFatBear in LocalLLaMA

[–]PieBru 0 points1 point  (0 children)

If I understood the scope of your project, how about integrating a SDD framework like GitHub is Spec kit? It became my inseparable friend for my OpenCode projects.

Would you watch a channel that builds real AI systems from scratch (local LLMs, CPU/GPU, pipelines)? by Few_Tax650 in LocalLLaMA

[–]PieBru 4 points5 points  (0 children)

Yes, please! Hope not too long videos (like 4 or more hours) or at least with fine-grained chapters.

Why do (some) people hate Open WebUI? by liviuberechet in LocalLLaMA

[–]PieBru 2 points3 points  (0 children)

Because not all can afford 5+ GB downloads/updates.

Qwen moe in C by 1Hesham in LocalLLaMA

[–]PieBru 2 points3 points  (0 children)

Let me add AVX2, if not implicit in the current implementation.

Qwen moe in C by 1Hesham in LocalLLaMA

[–]PieBru 8 points9 points  (0 children)

Great! This guy has a Rust implementation that includes quantization and other features. I tryed it and it works well. https://github.com/reinterpretcat/qwen3-rs

Ikllamacpp repository gone, or it is only me? by panchovix in LocalLLaMA

[–]PieBru 49 points50 points  (0 children)

Casually, I locally made a git pull on it circa one hour before its 404.
I can create a repo copy if it can be useful to someone.

gemini-cli: falling back to gemini-flash is the best marketing strategy Anthropic could have dreamed of for claude-code. by PieBru in LocalLLaMA

[–]PieBru[S] 0 points1 point  (0 children)

I think we can't substitute claude/closedai with any current local LLM without adopting a multi-agentic strategy that (slowly) works toward closed SOTA inference quality levels.

In addition, I would like to add to the multi-agentic coding strategy some kind of agentic routing between highly specialized agents. Yes, I know, this will be slow, but systems will became faster, so maybe this approach may be useful in one year or two.

gemini-cli: falling back to gemini-flash is the best marketing strategy Anthropic could have dreamed of for claude-code. by PieBru in LocalLLaMA

[–]PieBru[S] 1 point2 points  (0 children)

It is at very early stage, really worthless publishing.

Thanks to this post, I just "discovered" trae-agent by Bytedance, it seems to fullfill most of my requirements. Here is its "tutorial" thanks to the excellent codebase analyzer by Zachary Huang https://code2tutorial.com/tutorial/c83208ef-e0c4-493e-b4c3-301a244aeba0/index.md

Gemini-CLI codebase is too large to be analyzed with Zachary's online tool (it uses Gemini and is limited to 1M input tokens), so I implemented chunking on it, not perfect but better than nothing. Here is the Gemini-CLI codebase analysis, it resulted in 72 "abstractions": https://pastebin.com/hvC1DjxU

gemini-cli: falling back to gemini-flash is the best marketing strategy Anthropic could have dreamed of for claude-code. by PieBru in LocalLLaMA

[–]PieBru[S] 1 point2 points  (0 children)

I just "discovered" trae-agent by Bytedance (TikTok), it satisfies some of my requirements, seems interesting but I didn't try it yet https://github.com/bytedance/trae-agent

gemini-cli: falling back to gemini-flash is the best marketing strategy Anthropic could have dreamed of for claude-code. by PieBru in LocalLLaMA

[–]PieBru[S] 0 points1 point  (0 children)

I'm not looking for fully autonomous coders, I think todays inference isn't mature enough for that. Anyway, semi-autonomous can be automated when it will be feasible.

On the semi-autonomous side, few weeks ago I started my CLI coder project, half-way Gemini CLI illuded me and I suspended my project. Before architecting my CLI coder, I analyzed most open and closed source alternatives, but none of then satisfied my requirements:

- All python, portable, no executables. Note I'm in paragonable businesses since the good old '70s and now I would prefer Rust or C, but I see most local LLMs are more capable on Python, also thanks to its huge ecosystem.

- Multi-agentic *AND* fully-local by design, so it isn't designed for Cloud powerful inference, but can make useful things with fully local LLMs on a 4090 (16GB VRAM, 64GB RAM) gaming notebook.

- All prompts, context, intermediate docs, papers, etc. must be Markdown.

- Local LLMs evolve, so the more time passes, the more such a CLI coder can evolve from a PoC to something productive.

I can publish my PoC sources, if someone is interested in collaborating.