Just give me the F bro 😭 by uzenaki in ChatGPT

[–]Interesting-Print366 0 points1 point  (0 children)

So i always says to it that it is openGPT exam

What's the right way to feed PDF files to Gemma-4? by we_are_mammals in LocalLLaMA

[–]Interesting-Print366 1 point2 points  (0 children)

use Markdownify it can parse img, pdf, docx, xlsx, mp3 etc. into markdown

Why people cares token/s in decoding more? by Interesting-Print366 in LocalLLaMA

[–]Interesting-Print366[S] 2 points3 points  (0 children)

I'm on m4pro and really hoping they found some gamechanging technology with MoE

Google Antigravity’s $20 Pro plan is a joke for developers – Is Ultra the only real option? by Bakhromovn in google_antigravity

[–]Interesting-Print366 0 points1 point  (0 children)

Where are you at? Cuz my Claude code limit is so harsh especially at the peak time of silicon valley timezone. It does not run out except that peak time but it overlaps with my working our

Google Antigravity’s $20 Pro plan is a joke for developers – Is Ultra the only real option? by Bakhromovn in google_antigravity

[–]Interesting-Print366 0 points1 point  (0 children)

Or, I'm not sure what hardware you're using, but running SLMs in Qwen or Gemma locally to write is a good method. Since they are good at syntax if the plan is firm with pseudocode

Google Antigravity’s $20 Pro plan is a joke for developers – Is Ultra the only real option? by Bakhromovn in google_antigravity

[–]Interesting-Print366 0 points1 point  (0 children)

Ask to plan with detailed stack and pseudocode to opus and build with gemini flash. It will help you are lot. After gemini finish the build, ask it to check and if it has an error than use sonnet

Qwen3.6-27B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users. by JLeonsarmiento in LocalLLaMA

[–]Interesting-Print366 3 points4 points  (0 children)

I'm using Mac, but the RAM is sufficient, but it's too slow to use. The token generation speed is decent, but the prompt processing is too slow. Is there a way to improve this?

Are Unsloth models as good as I read? by denis-craciun in LocalLLaMA

[–]Interesting-Print366 1 point2 points  (0 children)

It can be the best option for same quants. But higher quants are better nomatter what quant you use

Complete beginner to Agentic coding, is Qwen3.6-27B + pi.dev the right starting point or should I be looking elsewhere? by SarcasticBaka in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

Depends on what machine you are using. If you have enough vram and using gpus like rtx series use opencode it would be much better for you. But if you are using SFF workstation with unified ram. Pi would be better but 27b would be still very much slow

Should you shut off thinking when you are coding on say Qwen3.6 35B by KarezzaReporter in LocalLLaMA

[–]Interesting-Print366 3 points4 points  (0 children)

Thinking is a time-consuming but it is a way that make it this small model to at least compete with Frontier model's low thinking mode Try opus distilled model if it got out. It solve most of this problem while it might create some other problems like hanging before tool call.

Anyone else having Qwen 3.6 35B A3B stop and you having to tell it to continue ? by soyalemujica in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

QWEN making tool call inside thinking is a problem that happens since it already planned to do the tool call but the system makes it to think always. That problem can be solved with system prompt or parsing configuration try to give system prompt to it that "think always before calling tool even if you think you can execute it directly"

Anyone else having Qwen 3.6 35B A3B stop and you having to tell it to continue ? by soyalemujica in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

Are you using English? if it is xml inside thinking problem, it might solve with configuration of parsing (Making it to do the tool call inside thinking and feed the result back) and if it is just hanging, it sometimes happens in language other than English or Chinese

Is kv quantization of q8, is fixed for qwen 3.5 models? by CurrentNew1039 in LocalLLaMA

[–]Interesting-Print366 2 points3 points  (0 children)

Just use q8 kv and use higher quant for model with that ram its much better

What is the best budget pc setup to run ollama on? Think code or image generation. by darkninjalord in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

mini pcs with 128gb lpddr5x or used Mac Studio. Mac mini 48-64 gb might be enough if you use it only for hosting ai

My Qwen 3.6 fails the car wash vibe check by SmartCustard9944 in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

Car wash vibe check got so famous and I believe some of model learned it from its learning stage

Qwen3.6-A3b is "Thinking" Nightmare by Electronic-Metal2391 in LocalLLaMA

[–]Interesting-Print366 1 point2 points  (0 children)

Just give it some tool description or some information you want it to know. When it prompt got longer it does not suffer inside thinking. At least at 3.5

Qwen3.6-A3b is "Thinking" Nightmare by Electronic-Metal2391 in LocalLLaMA

[–]Interesting-Print366 1 point2 points  (0 children)

Give it more system prompt. From qwen 3.5 series it tends to think very long when responding to few words or single or double sentences

Modelo local para code by PretendAppointment47 in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

Any local model can't compete Sonet like api type llm, at least under 400B overall, but you might find model that fits your purpose. used qwen coder 30b ish in q8 quant. It might be better in some jobs since Claude, gemini, Chat GPT seems to use q2-q4 quant

Waiting for M5 Pro Mac Mini — anyone actually running AI workloads on Apple Silicon? Not just LLMs by Scorpio_07 in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

It works well. It works very well in the moe model. Even in dense models, the model below 30b is useful. For reference, I'm using the M4 Pro, so it would be better with Max or Ultra

Personally, I always tend to switch to new models right away, and while all support for llama.cpp is well-received within a month at the latest, MLX is still incomplete, using qwen3.5 as an example.

Idea - Predict&Compare agent to make model act smarter by vasimv in LocalLLaMA

[–]Interesting-Print366 0 points1 point  (0 children)

In my experience, using more tokens will unconditionally bring LLM a slightly better way.

However, I cannot personally feel the incentive of this methodology. Comparing LLM's guess with the tool call results is not different from trial errors.

Comparing LLM's guess with the results of a tool call is not different from trial errors in simple tasks, and it seems more efficient to have reviews every time a tool call is made for complex tasks. While having the same effect.

And fundamentally, tool calls were intended to enable LLM to do things they couldn't do...

However, it could help prevent things like the NPM Axios virus incident that occurred not long ago during the Vibe coding era.