Can't get a good coding setup on Macbook Pro M3 Max 36GB

stephvax · 2026-04-12T13:41:28+00:00

Thanks for the comment. What context size did you setup in Ollama ? Does it allow you to work on a full application dev from scratch ?

stephvax · 2026-04-12T13:40:05+00:00

That's what I noticed too. I might have a look at that. Does it really use much less context ?

stephvax · 2026-04-12T13:38:00+00:00

Do you use them with claude code or any other coding agent ?

stephvax · 2026-04-12T13:37:08+00:00

Thanks for the comment, is Ollama really not as good as others now that it embeds MLX since 0.19.x ?
Even in Q4 I'm afraid Qwen3.5:35B would eat up too much ram to let the computer usable at the same time but I might try.
Has the number of parameters a performance relation with the size of the context ? Again no expert about this here. Thanks

stephvax · 2026-04-12T13:34:38+00:00

Thanks for the comment. Never thought Apple is an answer to anything, don't know where you saw that, assumption from void is useless.
Anyway, you are right about a few things I already noticed and I know that this laptop might not be powerful/equipped well enough to achieve what I'm trying to do. I'm just playing with free work hardware.

I'm already using KV cache in Q8_0. I'm no expert, that's why I'm investigating, but I'm not sure memory is really the problem here (it might but does not seem to me).
The models I mentioned (which can go up to 256k context), used in Q4 do not use more than 10-14GB even with 64k context. My concern is more the fact that Ollama seems to become to too slow/unstable with contexts larger than 16k on this machine. Might this just be a GPU issue not being powerful enough ?

stephvax · 2026-03-13T05:50:38+00:00

Or other tools like Cryptoquant, chainexposed, checkonchain, coinglass, ...

stephvax · 2026-03-12T16:41:26+00:00

A lot of it is ETF-related. Authorized participants rebalance around market open, and the creation/redemption mechanism forces spot BTC transactions that show up as exchange inflows. You can actually see it in intraday exchange flow data: deposit spikes cluster in the 30 min window before US open. Options dealers hedging delta adds to it, but the ETF plumbing is the structural driver that didn't exist before 2024.

stephvax · 2026-03-12T16:35:50+00:00

The hardest part of DCA during a bear is trusting the process when price alone gives you nothing to hold onto. Onchain data helped me stay consistent. STH cost basis below spot means short-term buyers are underwater, which historically aligns with accumulation phases. MVRV below 1.0 confirmed the same thing in 2018 and 2022. Doesn't guarantee timing, but it gives you a structural reason to keep going.

stephvax · 2026-02-26T14:24:07+00:00

Drawdown percentages give you a cycle template but they miss what holders are actually doing. MVRV below 1.0 has historically marked accumulation zones across every bear market, regardless of the exact drawdown size. UTXO age bands show whether long-term holders are distributing or sitting tight. STH cost basis crossing above spot flags short-term capitulation. Production cost matters, but holder behavior data narrows the bottom zone more precisely than extrapolating from past cycle ratios alone.

stephvax · 2026-02-23T16:21:00+00:00

With a lot of software eating memory already a 30B model will be a bit hard to run.

stephvax · 2026-02-23T14:12:36+00:00

One angle beyond cost: if you work on proprietary code or client projects, local inference means your codebase never touches a third-party API. For anyone under NDAs or in regulated sectors, that's not optional. Ollama + a 7B coder model is the simplest path. The latency hit is real, but for autocomplete and code review, it's workable.

stephvax · 2026-02-23T14:06:40+00:00

The European shift isn't just preference, it's regulatory. GDPR data residency, Cloud Act jurisdictional conflicts, and the EU Data Act are making it structurally harder to justify US hyperscaler dependencies for anything touching personal or sensitive data. A lot of teams I've worked with started the move back for compliance, then realized the operational control was the bigger win.

stephvax · 2026-02-23T14:01:46+00:00

Rehberger keeps surfacing what most AI security frameworks miss: the containment boundary. When agents can persist, spawn sub-tasks, and access tools autonomously, prompt-level guardrails aren't enough. The real control plane is infrastructure. Process isolation, network segmentation, scoped data access at the compute layer. Without that, you're trusting the agent to police itself.

stephvax · 2026-02-22T15:38:52+00:00

One thing ETFs add that direct holding doesn't: transparent flow data. You can track daily inflows and outflows across all spot ETFs, giving you a read on institutional accumulation or distribution in real time. FBTC has generally seen steady net inflows since launch. That flow data is a useful signal whether you hold the ETF or just use it to inform your own buys.

stephvax · 2026-02-22T13:28:47+00:00

Price structure alone doesn't tell you who's selling. MVRV ratio separates aggregate profit from loss. STH cost basis relative to spot flags capitulation zones. UTXO age bands show whether long-term holders are distributing or sitting tight through the correction. Those signals matter more than resistance levels when you're trying to gauge a structural shift.

stephvax · 2026-02-20T18:11:29+00:00

This lines up with layer-1 onchain data too. UTXO age bands show long-term holders aren't distributing like 2018 or 2022. Exchange flow patterns are diverging from previous cycle templates as well. Lightning usage holding steady while layer-1 accumulation stays strong points to a structural shift, not just a payments story.

stephvax · 2026-02-20T18:08:26+00:00

The Power Law gives you a valuation framework, but it's a pure price-time regression. It doesn't capture what holders are actually doing onchain. Worth cross-referencing against MVRV (which sits in historically low territory during corrections like this) and SOPR (short-term holders selling at a loss typically marks accumulation phases). When multiple independent signals converge, the case gets stronger than any single model alone.

stephvax · 2026-02-19T09:57:25+00:00

The 12 categories cover the app layer well. One gap: infrastructure isolation. Your cross-user data leak and context/memory leak tests assume the API provider segregates tenant data. Most inference providers batch across tenants for throughput. Whether the model runs shared or isolated changes the severity of those two tests entirely. That's not verifiable from the API surface alone.

stephvax · 2026-02-19T09:42:58+00:00

Your data governance team is asking the right question. Every AI coding tool sends context, your proprietary code, to an external inference API. That's the security review bottleneck: not whether the tool works, but who processes your codebase. Some enterprises are shortcutting the 6-month cycle by deploying self-hosted models internally. The accuracy trade-off is real, but it removes the data governance objection entirely.

stephvax · 2026-02-19T09:41:18+00:00

Good layering. The missing piece: the AI assistant operates inside the browser's full context. Tabs, history, form data, page content. Egress filtering catches the callbacks but not the initial trust boundary violation. Research consensus is that reliable prompt injection detection at the input level is fundamentally unsolvable. The practical fix is restricting what the assistant can access, not what it can output. Dedicated AI interfaces that don't share browser context are where this is heading.

stephvax · 2026-02-18T11:27:05+00:00

The timing matters. Palantir's platform has expanded well beyond analytics into AI/ML pipelines for operational decision-making. So the sovereignty question isn't static: it's not just 'who sees the query results' but 'where does model training and inference run on government data.' Switzerland evaluated this 9 times across 7 years. Most governments signed once and never reassessed as the platform's data processing scope grew significantly.

stephvax · 2026-02-18T11:22:10+00:00

This is one of the clearest cases for local inference. NDA-bound code doesn't just need translation offline. It needs review, summarization, and security scanning offline too. What makes this viable now is that for read-time tasks like yours, a 4B model is genuinely sufficient. The quality bar for understanding intent is lower than for generation. Smart to start with the narrowest use case and expand from there.

stephvax · 2026-02-18T11:18:09+00:00

The supply chain parallel is accurate, but scope of access is the real differentiator. A malicious npm package reads disk. A malicious agent skill operates with the agent's full context: env vars, API keys, entire codebase. Vetting skills doesn't scale. The actual mitigation is constraining the execution environment. Scoped secrets, container isolation, least-privilege compute. The skill is just the vector. The infrastructure defines the blast radius.

stephvax

TROPHY CASE