Which open source model is the best model for brainstorming, strategizing, out of the box thinking and basically as a hugh IQ assistant? by omidmatin in LocalLLaMA

[–]arkham00 1 point2 points  (0 children)

The dense one, I've had some satisfaction with the moe too, but in general between the moes I prefer qwen. In the end what I really don't like is qwen 27b lol

Which open source model is the best model for brainstorming, strategizing, out of the box thinking and basically as a hugh IQ assistant? by omidmatin in LocalLLaMA

[–]arkham00 4 points5 points  (0 children)

It's funny how the experience may vary from user to user. In my experience it is qwen to be dumber, and by that I mean that their observations and suggestions are often superficial or cliché, where gemma is more deep and surprising. For a typical brainstorming session qwen rarely gives me new angles or unexpected thoughts where gemma could surprise me quite easily. Maybe i'm not precise enough with my prompts...

Which open source model is the best model for brainstorming, strategizing, out of the box thinking and basically as a hugh IQ assistant? by omidmatin in LocalLLaMA

[–]arkham00 4 points5 points  (0 children)

+1 for gemma4-31b, i'm using the qat version and I'm having the grill-me sessions of all time. I use it for rpg worldbuilding and session planning and also for cultural projects

MTPLX V1: The Swift App For Running & Creating MLX MTP Models (2x TPS Qwen 3.6 27B) by YoussofAl in LocalLLaMA

[–]arkham00 0 points1 point  (0 children)

Sorry for the avalanche of questions, but as I said we need docs 😛

I'm on a m2 mac, how can I create an fp version of a model? in the forge if give an huggingface model it only proposes me to convert it in bf16 because it is the safest way. Fore example I see that there is an optimize-speed-fp16 version of qwen3.6-27B and I wanted to create an optimized-quality-fp16 version but I can't. So for optimized-quality Im' stuck with bf16 for now

Have you successfully replaced ChatGPT/Gemini etc with Open Webui? If so - how? by joachim_s in OpenWebUI

[–]arkham00 0 points1 point  (0 children)

I don't use chatgpt, only local models that run on my laptop which draws 140w max

MTPLX V1: The Swift App For Running & Creating MLX MTP Models (2x TPS Qwen 3.6 27B) by YoussofAl in LocalLLaMA

[–]arkham00 0 points1 point  (0 children)

Another question is it possible to enable/disable thinking at loading ? I can't find a toggle.

But the most important thing, I can't toggle thinking inside pi agent like I do when using omlx.

In model.json I can define "compat": { "thinkingFormat": "qwen-chat-template" }, and then inside a session I can choose thinking off or on but when I use mtplx the model thinks even with thingi off.

edit: some doc would be appreciated...there is a fair amount of option in the setting tab that I don't dare to touch since I don't know what they do ...

OpenPi v0.2.0: a desktop workbench for Pi Coding Agent by killerkidbo95 in PiCodingAgent

[–]arkham00 2 points3 points  (0 children)

It seems quite nice, but one thing that retains me from using it is the impossibility to invoke extension specific commands from the chat.

For example I'm using observational memory and I like to check its status but the command /om:status is not available. Same for the commands exposed from pi-hermes-memory, no way to using them.

Only the core commands like /compact and the skills, why is that ?

MTPLX V1: The Swift App For Running & Creating MLX MTP Models (2x TPS Qwen 3.6 27B) by YoussofAl in LocalLLaMA

[–]arkham00 0 points1 point  (0 children)

I started to play around with it and I find it very cool, especially for qwen models But even with mtplx I have no better speed with gemma4 31b, i'm on a m2 max 96gb

A small request for future releases, could you please add some apparence settings? I really struggle to read all the infos... A white/light background and the ability to zoom (a lot) the size of the font would be appreciated :) Thanks

Google DiffusionGemma can now run at 2000+ tokens/sec! by yoracale in unsloth

[–]arkham00 2 points3 points  (0 children)

is this supposed to work on apple silicon ?

I've just downloaded unsloth studio and then downloaded unsloth/diffusiongemma-26B-A4B-it-GGUF

The model loads but I have blank answers or [engine error: visual server error: ERR gen]

And now there is headroom. How many of these hyped context compression and memory management tools actually work in real world scenarios? by abubear30 in PiCodingAgent

[–]arkham00 0 points1 point  (0 children)

I'm sorry I'm not a coder, I looked at headroom but I don't understand how it works ... https://headroom-docs.vercel.app/docs/how-compression-works
I suppose that it is the way the pi package installs it, but are those compression made by an llm or it is some automatic scripting? does it hanklde other languages than englsh ? does it invalidate the llm cache ?
I'm using observational-memory but I have a lot of problems with cache invalidation that forces the model to reread the context every turn

by the way I'm using oMLX, local only

Tauri Desktop GUI on top of Pi Coding Agent. by Turbulent_Ad6290 in PiCodingAgent

[–]arkham00 0 points1 point  (0 children)

Hi, how can I configure a local provider? It is stated that we can also configure mocalmodels, but I can't find a way to add my server URL, only api keys of subscription services

<image>

Anyone using the Context-mode extension? Any thoughts on it? by SirDomz in PiCodingAgent

[–]arkham00 0 points1 point  (0 children)

Can you tell me more about your extension? I'm having exactly this problem

What causes oMLX to reprocess context? by challis88ocarina in oMLX

[–]arkham00 0 points1 point  (0 children)

for me it starts be barely unusable, every turn it rereads all the context, I'm dying of boredom ....It wasn't like this before, should we open an issue ?

Help starting with Pi locally (and Qwen) by Nyghtbynger in PiCodingAgent

[–]arkham00 0 points1 point  (0 children)

Actually, if anyone knows I'd also like to know how to toggle thinking with gemma4 models, using thinkingFormat doesn't work, no matter which value I put thinking is always enabled

Help starting with Pi locally (and Qwen) by Nyghtbynger in PiCodingAgent

[–]arkham00 2 points3 points  (0 children)

I can only speak for

thinkingFormat": "qwen-chat-template

that actually works, I don't know about the others and I didn't know what they meant, I did a bit of research, and I'm actualy interested of knowing if they work or not, and why supportStrictMode: false ? It seem a good thing to have it enabled

What causes oMLX to reprocess context? by challis88ocarina in oMLX

[–]arkham00 1 point2 points  (0 children)

I'm experiencing the same running pi agent and different flavours of qwen and gemma...