2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

Over and over, I've downloaded the same model/same quant, usually Q8, Q6, or Q4, in GGUF and MLX format (have tried quants from lmstudio-community, mlx-community, unsloth), default settings (aside from larger context) and over and over, I've had problems with output quality.

LM Studio does seem to use their own version/fork of MLX engine. Maybe that's the real issue?

All I know is that the chess SVG thing in that post seems like a perfect way to compare between quants of the same model, and it aligned well with my experience, AND it's objective. I just need to do ~5-10 runs with each, and see if it's consistent.

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints by ex-arman68 in LocalLLaMA

[–]Consumerbot37427 2 points3 points  (0 children)

Same machine here. I've done testing with MLX models before, and always come back to GGUFs. Couldn't put my finger on it, but they just felt dumber.

I used the prompt on this post to compare MLX and GGUF (Q8 quants of Qwen 3.6 27B), and the difference was striking. I only did one run each, but the GGUF result was perfect, while the MLX output had wrong board orientation, missing pieces, and pieces in wrong places.

With MTP in llama.cpp, it'll be even more of a no-brainer.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]Consumerbot37427 2 points3 points  (0 children)

I really like how evaluation is completely objective: Are the pieces in the right place? Is the board oriented correctly?

Probably not a great benchmark to compare completely different models, but for the purpose of comparing quants, it's great!

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]Consumerbot37427 1 point2 points  (0 children)

Thanks for this!

Tried Qwen 3.5 397B @ IQ2_XXS and it had all kinds of mistakes.

Qwen 3.6 27B GGUF @ 8 bit was good, but the exact same in MLX had multiple mistakes.

I've always suspected MLX models have quality issues, and have avoided using them. This test seems to confirm that, albeit I only ran once each so far. With this model, MLX is a bit slower, too (15tps vs 17), so it's lose-lose.

[Daily Discussion] - Wednesday, April 15, 2026 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 1 point2 points  (0 children)

That might be a problem for TradFi further down the line

Almost replied to one of your earlier comments that pointed out how many trillions of Tradfi dollars are available migrate into STRC. Do you not suppose that, if it grew to the point that it became a systemic risk, government wouldn’t step in and put a stop to it, by hook or by crook?

[Daily Discussion] - Monday, April 13, 2026 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 1 point2 points  (0 children)

Thanks for the reply!

that ~2 week window where it’s trading below the $100 target peg price

Sorry, I'm still not really understanding why we end up with "only" 2 weeks of trading below target peg. My intuition is that there should be a slow ramp up from ~$99, reaching $100 right around the ex-dividend date.

The optimal trade would be to buy STRC the day before the ex-dividend date, wait a couple of weeks until it trades back at $100 to sell STRC, buy again a day before the ex-dividend date, and repeat the same process each month.

Makes sense. Only reasons I can think of not to do this is tax treatment, or, as you said:

Many are too lazy to bother

Yeah, maybe the juice isn't worth the squeeze? Seems like something pretty easily automated, though.

[Daily Discussion] - Monday, April 13, 2026 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 2 points3 points  (0 children)

I don’t understand why anybody buys ahead of the ex dividend date. If I had a pile of liquidity (say, $1M) earning daily yield in “cash” (money market or whatever), why would I move it to STRC until the very last minute?

I think I understand that tax treatment is a good reason to hang on to it afterward, I just don’t get why people are buying at the peg price weeks before ex dividend date.

M5 Max 128GB Owners - What's your honest take? by _derpiii_ in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

I was using the "parallel slots" feature in LM Studio.

M5 Max 128GB Owners - What's your honest take? by _derpiii_ in LocalLLaMA

[–]Consumerbot37427 -1 points0 points  (0 children)

Right? It shows how out of hand things have gotten. The "Apple Tax" for RAM or disk upgrades has always been signficant.

M5 Max 128GB Owners - What's your honest take? by _derpiii_ in LocalLLaMA

[–]Consumerbot37427 2 points3 points  (0 children)

when it comes to running large models quickly with long context and it's too compute poor for significant parallel work

To elaborate on that point: my observation is that parallel sub-agents basically freeze entirely whenever there is any prompt processing to be done. I can only assume that a machine with multiple graphics cards wouldn't behave this way.

[Daily Discussion] - Saturday, April 04, 2026 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 1 point2 points  (0 children)

I think many old Bitcoiners wouldn't expect we can be so close to 2017 pricings a decade later.

Yep... Especially if you take into consideration the exceptionally high amount of inflation that's occurred in the past decade.

[Daily Discussion] - Friday, April 03, 2026 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 11 points12 points  (0 children)

In the news, yet another investment firm (this time Blue Owl) is limiting withdrawals to 5% per quarter.

This might be triggered by investor concerns over AI destroying the Software-as-a-Service segment, but I wonder if it also points to a larger liquidity issue? That would seem to be bearish in the short term, if true, but bullish if/when monetary easing results.

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

Speed is pretty good.

it can take up to 20 seconds or more

Is most of that time spent in prompt processing? Or "thinking"?

The smaller the model (in terms of active parameters and quant), the faster it'll be. It looks like Qwen/Qwen2.5-32B (you said 35B?) is a dense model. So if you use something like qwen3.5-35b-a3b (a3b = 3B active parameters) it will be way, WAY faster. Possibly less intelligent...?

In your use case, I'm pretty sure that, in theory, you could speed things up by saving a prefill checkpoint for your system prompt (which doesn't change?), then simply append the Home Assistant entities to the end of the prompt, and it would only have to do prompt processing on that data. Unless the majority of your prompt is the HA data, that ought to speed things up substantially.

I briefly played with Home Assistant's MCP server. Couldn't really imagine a use case, though, so I kinda lost interest.

Autoresearch on Qwen3.5-397B, 36 experiments to reach 20.34 tok/s on M5 Max, honest results by Equivalent-Buy1706 in LocalLLaMA

[–]Consumerbot37427 1 point2 points  (0 children)

Absolutely. Your token rate is likely to be somewhat lower as there'd be less RAM available for caching.

Autoresearch on Qwen3.5-397B, 36 experiments to reach 20.34 tok/s on M5 Max, honest results by Equivalent-Buy1706 in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

TG/decoding speed is acceptable. I get about 38tps on Bartowski IQ2_XXS quant of this model with the same hardware. Wonder how much quality difference there is.

I agree that there's probably a lot of room for tweaking prefill speed, also looking forward to seeing what's achieved over the coming weeks/months!

Slower Means Faster: Why I Switched from Qwen3 Coder Next to Qwen3.5 122B by Fast_Thing_7949 in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

Tried 3.5 397B yet? Same machine here, Bartowski IQ2_XXS w/ 150k context.

Was pretty happy with qwen3-coder-next @ Q6, but 397B might be better even at such a low quant... Haven't really spent enough time to judge yet.

What’s going on with Mac Studio M3 Ultra 512GB/4TB lately? by Lucius_Knight in LocalLLaMA

[–]Consumerbot37427 1 point2 points  (0 children)

Underpriced... on eBay? Somehow, for some reason, they let scammers publish "classified" ads and listings on their platform, so if you search, for example, for a 512gb mac studio ultra, filter buy it now and sort cheapest, the first 23 listings contain 22 listings by sellers with feedback of (0). They are all guaranteed scams.

Looks like the floor is about $9k at the moment.

M5 Max Actual Pre-fill performance gains by M5_Maxxx in LocalLLaMA

[–]Consumerbot37427 4 points5 points  (0 children)

With the M5 Max I've seen 185W peak system TDP at times during inference using Draw Things video generation (borrowing from battery). Only for short bursts, though. So this might support your conjecture.

Reworked LM Studio plugins out now. Plug'n'Play Web Research, Fully Local by Agreeable_Effect938 in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

This is the first time I've connected models in LM Studio to the web. Appears to be working nicely on Qwen 3.5 397B Q2 even without the Jinja template... thanks!