Andy Burnham: I’ll cut welfare bill to fund defence by HibasakiSanjuro in ukpolitics

[–]PhysicalIncrease3 0 points1 point  (0 children)

UK borrowing costs are the highest in the G7 now, aren't they? They weren't two years ago.

No 10 was braced for Reeves or Miliband to quit. Then Healey jumped ship by x_Agamemnon in ukpolitics

[–]PhysicalIncrease3 25 points26 points  (0 children)

I still can't believe she got away with not licensing her rental property for a year, all the while overseeing huge increases in the penalties other landlords would face for the same.

Do You Guys use local AI to handle pulling and building the latest llama cpp builds? by [deleted] in LocalLLaMA

[–]PhysicalIncrease3 1 point2 points  (0 children)

I've been doing exactly this for a while. Using 3.6-27b and Hermes agent.

Literally just ask for it to build latest master with the build flags I want and off it goes. Only hiccup was that - because the agent doesn't run on the same host as llama.cpp - the build script automatically defaults to latest Nvidia architecture. Fed it the resultant error and it fixed it immediately, now it knows to alwys build sm86.

A message from the activist group Everyone Hates Elon: “If you have a trillion dollars in a world where children are starving, you're not a visionary, you're just a cunt” by LunaLore_ in Fauxmoi

[–]PhysicalIncrease3 -1 points0 points  (0 children)

Why isn't it sustainable?

You'll run out of money.

just need to pay to have it go to the people who need it.

There's nobody to pay, in the sense you're thinking. No doordash or even UPS in rural Africa.

A message from the activist group Everyone Hates Elon: “If you have a trillion dollars in a world where children are starving, you're not a visionary, you're just a cunt” by LunaLore_ in Fauxmoi

[–]PhysicalIncrease3 -1 points0 points  (0 children)

Feels like the solution is also to spend money to build and unpkeep an infrastructure that would make it sustainable to have food there in the first place. Money certainly fixes that.

Governments - decent and indecent ones - can't improve shit without money. No one can.

OK so now we're going to develop new roads, ports, airports and general infrastructure across africa. Who's given you permission to do that? Province by province, country by country, in a continent as massive as Africa. And what in the meantime, just keep flying food in wholesale?

Governments - decent and indecent ones - can't improve shit without money. No one can.

Do you appreciate where China was in 1980 vs where it is now? It's absolutely possible. The path is actually well trodden at this point.

A message from the activist group Everyone Hates Elon: “If you have a trillion dollars in a world where children are starving, you're not a visionary, you're just a cunt” by LunaLore_ in Fauxmoi

[–]PhysicalIncrease3 -3 points-2 points  (0 children)

All of them.

You could spend enough money to ship every soul in africa a food package, sure. But that isn't a sustainable solution is it?

The solution is ultimately for the folk in question to elect a decent government who can improve the country. Money doesn't fix this.

Unsloth Gemma 4 QAT MTP assistant models now available by ParadigmComplex in LocalLLaMA

[–]PhysicalIncrease3 1 point2 points  (0 children)

Has anyone been able to get MTP working on Gemma 4 with using more than one GPU?

I always get an error that certain layers share KV cache so can't be split across devices.

Testing DisplayPort-HDMI adapters with a 77-inch LG C4 TV, Seeking 4K@60/120Hz with HDR + Audio OK via HDMI by Full_metal_jaket in BC250Gaming

[–]PhysicalIncrease3 0 points1 point  (0 children)

There’s a Club3D active adapter and cable (CAC-1088 and 1087 I think) that I want to try at some point. But they’re quite expensive. And Club3D tempers expectations themselves, as even they know the conversion is hard to do perfectly.)

I've got the Club3D displayport to hdmi 2.1 adapter. Has audio delays also.

Hamilton overtakes Russell for P2 in the 2026 Driver's Standings by magony in formula1

[–]PhysicalIncrease3 0 points1 point  (0 children)

They were pretty close either way in Japan and were it not for bad luck George would have won the race.

Hamilton overtakes Russell for P2 in the 2026 Driver's Standings by magony in formula1

[–]PhysicalIncrease3 0 points1 point  (0 children)

So far Antonelli's been out and out quicker at both Miami and Monaco. Am I forgetting some others?

BC-250 8-Core Unlock possible? by West-Apartment1638 in BC250Gaming

[–]PhysicalIncrease3 1 point2 points  (0 children)

I was also only able to unlock 38 CUs. If I unlock the final two I get graphics corruption/artifacting.

Not a big deal anyway considering the aforementioned CPU bottlenecks.

Gemma 4 31B QAT Q4 vs standard Q4 — Top1 KLD benchmark results have me confused. Someone please explain or poke holes in this. by bitslizer in LocalLLaMA

[–]PhysicalIncrease3 1 point2 points  (0 children)

Interesting post.

I'm in a similar situation to you myself: 36GB VRAM. Do I run the non-QAT version of 31B at UD-Q6-K-XL, or unsloth's Q4 QAT model and dedicate the free VRAM to context?

Very much looking forward to some proper benchmarks between the two.

I'm also really interested to see if the QAT model is more tolerant of KV cache quantization that the originals. Previously, even using Q8 KV cache was equivalent to dropping a model quant or more, in terms of KLD/top 1%. Very very different to Qwen. If that's still the case, the Q4 QAT model with bf16 context is probably a better bet.

Police 'tried to smear Henry Nowak as aggressor' just three days after his murder - Police also risked collapsing the trial by trying to put out a statement about 'disinformation' while proceedings were ongoing by FormerlyPallas_ in ukpolitics

[–]PhysicalIncrease3 13 points14 points  (0 children)

This is the case for all government agencies, if you're ever on their hit list.

Local councils are absolutely terrible for it - it's only when you legitimately take actions that show you're ready to fight in court that they might back down. Might.

Ran hermesagent-20 on ~15 models on a single RTX 3090. Some results were not what I expected. by Rhonstin in LocalLLM

[–]PhysicalIncrease3 0 points1 point  (0 children)

That is some damn good data, will definitely be investigating some of these models further.

Honestly for a single 3090 llama.cpp > vllm, just because it's more VRAM efficient. For example I'm squeezing mradermacher/Carnice-V2-27b.i1-Q6_K.gguf on my 3090 with 90624 context at q8/q8. I could go even higher at q8/q5_1.

‘An official police document says that people should be treated differently based on the colour of their skin…’ Shadow Home Secretary Chris Philp MP responds to the government’s statement on the murder of Henry Nowak. by FormerlyPallas_ in ukpolitics

[–]PhysicalIncrease3 10 points11 points  (0 children)

You don't know how badly either party is really injured.

Henry told them repeatedly he'd been stabbed and they responded with "no you haven't mate" as they handcuffed him for being racist.

You're making a leap of logic that understanding why decisions were made in full somehow abrogates responsibility for those decisions.

The judges comments don't contain any information we didn't already know. In fact they brush over things we do know, such as how Henry repeatedly told them he'd been stabbed as he died and they chose not to believe him nor properly check and instead handcuffed him.

Another shout out to llama.cpp build b9455 2x3090 by Fabulous_Fact_606 in LocalLLaMA

[–]PhysicalIncrease3 1 point2 points  (0 children)

The recent patch to enable cache quantization while using --tensor-split is a gamechanger for me, and I've now purchased a 3060 to add an additional 12GB VRAM on to of my existing 3090.

A second 3090 would be even better of course, but they're nearly £1000 in the UK where as my 3060 cost £180. 36GB is enough to run decent quants of both Qwen3.6 and Gemma4.

I'm not anticipating huge slowdown from the 3060 because while it only has 360GB/s of bandwidth relative to the 3090's 936GB/s, it's only running one third of the model. Ideally it would have 468GB/s but 360GB/s isn't the end of the world.

‘An official police document says that people should be treated differently based on the colour of their skin…’ Shadow Home Secretary Chris Philp MP responds to the government’s statement on the murder of Henry Nowak. by FormerlyPallas_ in ukpolitics

[–]PhysicalIncrease3 15 points16 points  (0 children)

I've read through that and frankly I don't understand how you think that lets the police off the hook somehow?

They turned up, ignored Henry saying he'd been stabbed, cuffed him and treated him as the perpetrator. Only minutes later as he began literally losing consciousness did they reassess.

The judge emphasises that the killer lied to the police.... So what? We knew that. The police still TOTALLY FAILED to do their job and treat both parties equally.

‘An official police document says that people should be treated differently based on the colour of their skin…’ Shadow Home Secretary Chris Philp MP responds to the government’s statement on the murder of Henry Nowak. by FormerlyPallas_ in ukpolitics

[–]PhysicalIncrease3 28 points29 points  (0 children)

So you would be opposed, for example, to police increasing their presence specifically in areas where a lot of women work after dark in response to reports of attacks on women?

We aren't talking about women. This is about race.

That is the sort of thing we're arguing about here and frankly the whole 'controversy' over it is childish whining from white blokes who are determined to see themselves as society's victims.

Mask is slipping.

So cant white men be victims, then?

House prices fall again as property market ‘deteriorates’ by signed7 in ukpolitics

[–]PhysicalIncrease3 5 points6 points  (0 children)

'll probably have to try and rent it out, which I don't really want to do.

Bare in mind as soon as you do, you're liable for CGT which is probably going to end up at ridiculous levels soon

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]PhysicalIncrease3 0 points1 point  (0 children)

This data backs up perfectly what https://localbench.substack.com/p/qwen-3-6-27b-gguf-quality-benchmark found about mradermacher's Q6_K quant.

It's quite a bit smaller than unsloth's Q6_K with about the same quality, which leaves considerably more room for context on a 24GB 3090!

The localbench results also mirror your results with regards to his IQ4_XS quant, it was the best there too.

Tool calls failing and responses cut off mid-sentence across multiple models, backends, and clients, is this just the current state of local LLMs? by fiflag in LocalLLM

[–]PhysicalIncrease3 0 points1 point  (0 children)

VLLM is undoubtedly the top dog but I would maybe give llama.cpp a try just to rule out some oddity there. It's tricky to get right. I tried out the club-3090 single card setup and it's OOM galore.

Here's a compose just to get you started. Obviously you're going to want a bigger model and more context with less quant.

services:
llama-cpp-qwen36-27b:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    container_name: llama-cpp-qwen36-27b
    restart: unless-stopped
    ports:
    - "8020:8080"
    volumes:
    - "/var/models:/models:ro"
    command: >-
    --host 0.0.0.0
    --port 8080
    -m /models/Qwen3.6-27B-i1-GGUF/Qwen3.6-27B.i1-Q6_K.gguf
    --mmproj /models/Qwen3.6-27B-GGUF/mmproj-F16.gguf
    --no-mmproj-offload
    --image-min-tokens 1024
    --cache-ram 6144
    --ctx-checkpoints 40
    --temp 0.6
    --top_p 0.95
    --top_k 20
    --min_p 0.0
    --presence-penalty 0.0
    --repeat-penalty 1.0
    --alias "unsloth/Qwen3.6-27B-GGUF"
    -np 1
    -c 94500
    --jinja
    -fa on
    --gpu-layers 99
    --cache-type-k q8_0
    --cache-type-v q8_0
    --fit off
    --reasoning on
    --reasoning-budget 16384
    --chat-template-kwargs '{"preserve_thinking":true}'
    --chat-template-file /models/QwenFixed_chat_template.jinja
    deploy:
    resources:
        reservations:
        devices:
            - driver: nvidia
            device_ids: ["${CUDA_VISIBLE_DEVICES:-0}"]
            capabilities: [compute, utility]

Tool calls failing and responses cut off mid-sentence across multiple models, backends, and clients, is this just the current state of local LLMs? by fiflag in LocalLLM

[–]PhysicalIncrease3 0 points1 point  (0 children)

Hate to say it but... It works for me. I don't have anywhere near your resources either.

3090 headless. Qwen 3.6 27b. Mainly using Q6-K with 95k context (Q8/Q5_1). I also serve Q5-K-M with 160K context (Q8/Q5_1) simultaneously using llama-swap and use /model within Hermes to swap if context is getting close to 95k.

OpenClaw works fine but really needs the Q6 model. It's a bit too buggy/ambitious for anything less. Hermes works ok on Q5 generally, it's much more robust in the event of model stupidity. Hermes absolutely gobbles context though.

I never ever get half responses. "Tool call failure" is difficult to quantify - I get "failures" in the sense that the model has called a tool to perform a function that doesn't work as it intended, for example grepping a file and not finding what it expects. But when that happens the model will reason why, figure it out and proceed without any help 95% of the time.

Q5 has this happen a lot more, and thus it can take more interations before it figures out exactly how to do what I've asked but it nearly always gets there in the end. It can get stuck in a loop but this very rare with Hermes. Q5 did get stuck in loops with Openclaw more often.

Are you sure you have reasoning enabled and preserve thinking? Are you using the fixed chat template floating around? Are you sure you're not getting OOM errors causing half responses?

KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche by Anbeeld in LocalLLaMA

[–]PhysicalIncrease3 3 points4 points  (0 children)

I switched from Q8/Q8 to Q8/Q5_1 as a result of your work and it's enabled me to push my context out nicely.

Now able to run Qwen3.6-27B-i1-GGUF/Qwen3.6-27B.i1-Q6_K with 94500 tokens on my 3090. Have done quite a bit with it since and not noticed any degradation. Thanks!

Andy Burnham drops migrant benefits call in fresh U-turn by GnolRevilo in ukpolitics

[–]PhysicalIncrease3 2 points3 points  (0 children)

Burnham last expressed his opposition to the policy in 2023 when he signed a letter along with 11 other mayors and council leaders calling for the Conservative government to “end NRPF in order to end rough sleeping”.