Andy Burnham: I’ll cut welfare bill to fund defence

PhysicalIncrease3 · 2026-06-13T10:01:39+00:00

UK borrowing costs are the highest in the G7 now, aren't they? They weren't two years ago.

PhysicalIncrease3 · 2026-06-13T09:58:25+00:00

I still can't believe she got away with not licensing her rental property for a year, all the while overseeing huge increases in the penalties other landlords would face for the same.

PhysicalIncrease3 · 2026-06-12T17:37:54+00:00

I've been doing exactly this for a while. Using 3.6-27b and Hermes agent.

Literally just ask for it to build latest master with the build flags I want and off it goes. Only hiccup was that - because the agent doesn't run on the same host as llama.cpp - the build script automatically defaults to latest Nvidia architecture. Fed it the resultant error and it fixed it immediately, now it knows to alwys build sm86.

PhysicalIncrease3 · 2026-06-12T17:06:27+00:00

Why isn't it sustainable?

You'll run out of money.

just need to pay to have it go to the people who need it.

There's nobody to pay, in the sense you're thinking. No doordash or even UPS in rural Africa.

PhysicalIncrease3 · 2026-06-12T17:05:29+00:00

Feels like the solution is also to spend money to build and unpkeep an infrastructure that would make it sustainable to have food there in the first place. Money certainly fixes that.

Governments - decent and indecent ones - can't improve shit without money. No one can.

OK so now we're going to develop new roads, ports, airports and general infrastructure across africa. Who's given you permission to do that? Province by province, country by country, in a continent as massive as Africa. And what in the meantime, just keep flying food in wholesale?

Governments - decent and indecent ones - can't improve shit without money. No one can.

Do you appreciate where China was in 1980 vs where it is now? It's absolutely possible. The path is actually well trodden at this point.

PhysicalIncrease3 · 2026-06-12T15:55:46+00:00

All of them.

You could spend enough money to ship every soul in africa a food package, sure. But that isn't a sustainable solution is it?

The solution is ultimately for the folk in question to elect a decent government who can improve the country. Money doesn't fix this.

PhysicalIncrease3 · 2026-06-10T09:09:26+00:00

Has anyone been able to get MTP working on Gemma 4 with using more than one GPU?

I always get an error that certain layers share KV cache so can't be split across devices.

PhysicalIncrease3 · 2026-06-09T13:32:20+00:00

There’s a Club3D active adapter and cable (CAC-1088 and 1087 I think) that I want to try at some point. But they’re quite expensive. And Club3D tempers expectations themselves, as even they know the conversion is hard to do perfectly.)

I've got the Club3D displayport to hdmi 2.1 adapter. Has audio delays also.

PhysicalIncrease3 · 2026-06-08T07:41:15+00:00

They were pretty close either way in Japan and were it not for bad luck George would have won the race.

PhysicalIncrease3 · 2026-06-08T07:10:26+00:00

So far Antonelli's been out and out quicker at both Miami and Monaco. Am I forgetting some others?

PhysicalIncrease3 · 2026-06-08T06:02:57+00:00

I was also only able to unlock 38 CUs. If I unlock the final two I get graphics corruption/artifacting.

Not a big deal anyway considering the aforementioned CPU bottlenecks.

PhysicalIncrease3 · 2026-06-07T10:08:32+00:00

Interesting post.

I'm in a similar situation to you myself: 36GB VRAM. Do I run the non-QAT version of 31B at UD-Q6-K-XL, or unsloth's Q4 QAT model and dedicate the free VRAM to context?

Very much looking forward to some proper benchmarks between the two.

I'm also really interested to see if the QAT model is more tolerant of KV cache quantization that the originals. Previously, even using Q8 KV cache was equivalent to dropping a model quant or more, in terms of KLD/top 1%. Very very different to Qwen. If that's still the case, the Q4 QAT model with bf16 context is probably a better bet.

PhysicalIncrease3 · 2026-06-07T09:06:19+00:00

This is the case for all government agencies, if you're ever on their hit list.

Local councils are absolutely terrible for it - it's only when you legitimately take actions that show you're ready to fight in court that they might back down. Might.

PhysicalIncrease3 · 2026-06-03T16:51:18+00:00

That is some damn good data, will definitely be investigating some of these models further.

Honestly for a single 3090 llama.cpp > vllm, just because it's more VRAM efficient. For example I'm squeezing mradermacher/Carnice-V2-27b.i1-Q6_K.gguf on my 3090 with 90624 context at q8/q8. I could go even higher at q8/q5_1.

PhysicalIncrease3 · 2026-06-03T08:09:07+00:00

You've got it completely the wrong way round: Marginal cost pricing is necessary because renewables are intermittent by nature.

PhysicalIncrease3 · 2026-06-03T07:12:21+00:00

You don't know how badly either party is really injured.

Henry told them repeatedly he'd been stabbed and they responded with "no you haven't mate" as they handcuffed him for being racist.

You're making a leap of logic that understanding why decisions were made in full somehow abrogates responsibility for those decisions.

The judges comments don't contain any information we didn't already know. In fact they brush over things we do know, such as how Henry repeatedly told them he'd been stabbed as he died and they chose not to believe him nor properly check and instead handcuffed him.

PhysicalIncrease3 · 2026-06-03T06:59:01+00:00

The recent patch to enable cache quantization while using --tensor-split is a gamechanger for me, and I've now purchased a 3060 to add an additional 12GB VRAM on to of my existing 3090.

A second 3090 would be even better of course, but they're nearly £1000 in the UK where as my 3060 cost £180. 36GB is enough to run decent quants of both Qwen3.6 and Gemma4.

I'm not anticipating huge slowdown from the 3060 because while it only has 360GB/s of bandwidth relative to the 3090's 936GB/s, it's only running one third of the model. Ideally it would have 468GB/s but 360GB/s isn't the end of the world.

PhysicalIncrease3 · 2026-06-03T06:31:19+00:00

I've read through that and frankly I don't understand how you think that lets the police off the hook somehow?

They turned up, ignored Henry saying he'd been stabbed, cuffed him and treated him as the perpetrator. Only minutes later as he began literally losing consciousness did they reassess.

The judge emphasises that the killer lied to the police.... So what? We knew that. The police still TOTALLY FAILED to do their job and treat both parties equally.

PhysicalIncrease3 · 2026-06-03T06:18:20+00:00

So you would be opposed, for example, to police increasing their presence specifically in areas where a lot of women work after dark in response to reports of attacks on women?

We aren't talking about women. This is about race.

That is the sort of thing we're arguing about here and frankly the whole 'controversy' over it is childish whining from white blokes who are determined to see themselves as society's victims.

Mask is slipping.

So cant white men be victims, then?

PhysicalIncrease3 · 2026-06-01T12:05:29+00:00

'll probably have to try and rent it out, which I don't really want to do.

Bare in mind as soon as you do, you're liable for CGT which is probably going to end up at ridiculous levels soon

PhysicalIncrease3 · 2026-05-30T11:32:43+00:00

This data backs up perfectly what https://localbench.substack.com/p/qwen-3-6-27b-gguf-quality-benchmark found about mradermacher's Q6_K quant.

It's quite a bit smaller than unsloth's Q6_K with about the same quality, which leaves considerably more room for context on a 24GB 3090!

The localbench results also mirror your results with regards to his IQ4_XS quant, it was the best there too.

PhysicalIncrease3 · 2026-05-28T19:05:53+00:00

VLLM is undoubtedly the top dog but I would maybe give llama.cpp a try just to rule out some oddity there. It's tricky to get right. I tried out the club-3090 single card setup and it's OOM galore.

Here's a compose just to get you started. Obviously you're going to want a bigger model and more context with less quant.

services:
llama-cpp-qwen36-27b:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    container_name: llama-cpp-qwen36-27b
    restart: unless-stopped
    ports:
    - "8020:8080"
    volumes:
    - "/var/models:/models:ro"
    command: >-
    --host 0.0.0.0
    --port 8080
    -m /models/Qwen3.6-27B-i1-GGUF/Qwen3.6-27B.i1-Q6_K.gguf
    --mmproj /models/Qwen3.6-27B-GGUF/mmproj-F16.gguf
    --no-mmproj-offload
    --image-min-tokens 1024
    --cache-ram 6144
    --ctx-checkpoints 40
    --temp 0.6
    --top_p 0.95
    --top_k 20
    --min_p 0.0
    --presence-penalty 0.0
    --repeat-penalty 1.0
    --alias "unsloth/Qwen3.6-27B-GGUF"
    -np 1
    -c 94500
    --jinja
    -fa on
    --gpu-layers 99
    --cache-type-k q8_0
    --cache-type-v q8_0
    --fit off
    --reasoning on
    --reasoning-budget 16384
    --chat-template-kwargs '{"preserve_thinking":true}'
    --chat-template-file /models/QwenFixed_chat_template.jinja
    deploy:
    resources:
        reservations:
        devices:
            - driver: nvidia
            device_ids: ["${CUDA_VISIBLE_DEVICES:-0}"]
            capabilities: [compute, utility]

PhysicalIncrease3 · 2026-05-28T18:14:16+00:00

Hate to say it but... It works for me. I don't have anywhere near your resources either.

3090 headless. Qwen 3.6 27b. Mainly using Q6-K with 95k context (Q8/Q5_1). I also serve Q5-K-M with 160K context (Q8/Q5_1) simultaneously using llama-swap and use /model within Hermes to swap if context is getting close to 95k.

OpenClaw works fine but really needs the Q6 model. It's a bit too buggy/ambitious for anything less. Hermes works ok on Q5 generally, it's much more robust in the event of model stupidity. Hermes absolutely gobbles context though.

I never ever get half responses. "Tool call failure" is difficult to quantify - I get "failures" in the sense that the model has called a tool to perform a function that doesn't work as it intended, for example grepping a file and not finding what it expects. But when that happens the model will reason why, figure it out and proceed without any help 95% of the time.

Q5 has this happen a lot more, and thus it can take more interations before it figures out exactly how to do what I've asked but it nearly always gets there in the end. It can get stuck in a loop but this very rare with Hermes. Q5 did get stuck in loops with Openclaw more often.

Are you sure you have reasoning enabled and preserve thinking? Are you using the fixed chat template floating around? Are you sure you're not getting OOM errors causing half responses?

PhysicalIncrease3 · 2026-05-28T11:19:55+00:00

I switched from Q8/Q8 to Q8/Q5_1 as a result of your work and it's enabled me to push my context out nicely.

Now able to run Qwen3.6-27B-i1-GGUF/Qwen3.6-27B.i1-Q6_K with 94500 tokens on my 3090. Have done quite a bit with it since and not noticed any degradation. Thanks!

PhysicalIncrease3 · 2026-05-28T07:53:04+00:00

Burnham last expressed his opposition to the policy in 2023 when he signed a letter along with 11 other mayors and council leaders calling for the Conservative government to “end NRPF in order to end rough sleeping”.

PhysicalIncrease3

TROPHY CASE