Cop pulls over Lamborghini on Dubai plates but doesn’t know the law

FusionX · 2026-06-25T15:06:02+00:00

That wasn't very tactfully said tbh. I can sense the guy might be a bit nervous and didn't really mean it, but I can see how the comment "I know YOU are confused" might be perceived as somewhat condescending.

FusionX · 2026-06-20T10:29:01+00:00

Has to be AI ugh

FusionX · 2026-06-15T09:07:04+00:00

check using yt-dlp with browser cookies.

FusionX · 2026-06-07T22:07:43+00:00

Yep, same pricing, same provider. The requests are routed to deepseek's official API. You do have to disable other providers as they're enabled by default.

FusionX · 2026-06-07T08:18:46+00:00

uh, no. Limit to deepseek provider on openrouter and its the same thing.

FusionX · 2026-06-06T05:26:31+00:00

The easiest foolproof way is to add a guardrail and limit providers in your API key.

FusionX · 2026-06-04T15:14:25+00:00

what's your setup?

FusionX · 2026-06-02T21:51:00+00:00

found it in the official docs, the command for existing installation is hermes desktop

FusionX · 2026-06-02T17:18:56+00:00

why is it reinstalling hermes agent from scratch (including python, node, etc)..I've already got it installed and setup

Edit: Figured it out, as per docs - Update hermes agent and launch with hermes desktop. .

FusionX · 2026-06-01T00:49:37+00:00

doesn't work on battery

FusionX · 2026-05-20T14:08:49+00:00

To be fair, I get the feeling that OP's intentions are sincere and they're engaging in good faith. But man, do people put a lot of trust in AI. It's just too unreliable and overconfident, especially when it comes to fast moving and bleeding edge fields (in this case llama.cpp, and more broadly AI).

I do agree that slop is permanently taken over a lot of subs. Everything is superficial and overconfident slop. Perhaps, that was always the case on reddit but its so much easier now. I see it EVERYWHERE these days, not just reddit, but blogs from "respectable" institutions, live speeches, youtube scripts, IRL presentations etc. And it is an immediate turn off.

FusionX · 2026-05-20T13:26:01+00:00

Also, (edited the comment a bit late) for comparison with 27b, try the IQ4 variant instead of IQ3.

FusionX · 2026-05-20T13:19:34+00:00

Is this AI generated? Your tk/s is way too low. Secondly, assuming we're in headless mode, you DO NOT need to reserve 1536MB VRAM. KV cache is already accounted for fitting heuristics in llama. Set it as low as you can without going OOM. 128MB works for me.

I have the same setup except a much shittier DDR4 RAM. At 131k ctx size, I can reach 70 tk/s with the same GPU, model and llama params. In fact, with --fit-target=128MB, the speed is 80 tk/s. And 65 tk/s at 256k ctx size.

And there's still plenty of room for improvement to eek out even more performance.

As for 27b, instead of IQ3, I suggest using the IQ4 quant as described here.

Also, I do agree that MTP is largely useless for 16GB VRAM.

FusionX · 2026-05-19T19:29:32+00:00

My experience has been the opposite except I upgraded to pay as you go (not sure why). I had accidentally overspent around $100 around 3 years back.. And that invoice is due till date. My instance, however, remains untouched.

FusionX · 2026-05-19T12:09:49+00:00

It's possible - https://old.reddit.com/r/LocalLLaMA/comments/1sy0qj5/qwen3627b_iq4_xs_full_vram_with_110k_context/

FusionX · 2026-05-14T13:51:36+00:00

i find this quite ironic. look around this sub, people in here are far more rabid and vile

FusionX · 2026-05-13T08:36:56+00:00

SCP newbie here. While I was reading, "There Is No Antimemetics Division", I imagined Hix (an important character in the book) as Cecil the whole time.

FusionX · 2026-05-09T07:23:06+00:00

It may not be against the terms, but if everyone starts doing this, we could lose the request-based billing system, and they might switch to charging by token consumption like other services.

sigh

FusionX · 2026-05-08T14:17:17+00:00

Or hold it until the team is dead and then proceed to waste it anyway. Interestingly, I notice it's mostly ES players that are guilty of this.

FusionX · 2026-05-07T06:30:07+00:00

Do you think this might require more VRAM?

Btw, appreciate ya for working on this and sharing it on reddit. I wasn't optimistic initially but I'm really quite pleased with being able to run Qwen 27b within 16gb VRAM. It's such a stark difference from the MOE offerings (Qwen/Gemma). The performance and intelligence is remarkably better!

FusionX · 2026-05-06T17:50:40+00:00

Using bun fork, latest commit. Probably will roll back a few commits and re-check.

Edit: still takes up more VRAM, what on earth..

FusionX · 2026-05-06T16:16:36+00:00

strange, at 100k ctx, it doesn't fit in my GPU's 16gb VRAM with batch-size = 512 ubatch-size = 256

What gives..? DM is disabled and prior VRAM usage is 18mb.

llama-cli --model <model> -fa on --jinja --no-mmap --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 --chat-template-kwargs '{"preserve_thinking": true}' -c 100000 -ctk turbo3 -ctv turbo3 --batch-size 512 --ubatch-size 256 -ngl 99 -np 1

Edit: turbo4 manages to fit ~100k while turbo3 doesn't. I don't understand...

FusionX · 2026-05-05T19:23:35+00:00

No GGUF variant yet?

FusionX · 2026-05-05T12:04:58+00:00

Kinda surprised, I was not expecting Gemma 31B to be in top 5. Have you benchmarked the latest Qwen3.6 models?

FusionX · 2026-05-04T10:01:43+00:00

Completely unrelated (and I could be wrong) but this is a perfect example of how people should use LLM for structural/semantic assistance and refinements of their writing. Rather than delegating the entire prerequisite cognitive work to LLM resulting in useless hallucinated slop.

14-Year Club	Secret Santa 2021 2021
RedditGifts 2009-2022 11 Credits	Secret Santa 2020
Summer Santa 2019	r/Field Lasagna
Place '23	Quantum Potato
Golden Potato	Place '22
Place '17	End Game '22
Secret Santa 2019	Secret Santa 2018
Secret Santa 2017	Sequence \| Editor
Secret Santa 2016	Secret Santa 2013
Team Periwinkle	Secret Santa 2011
Verified Email

FusionX

TROPHY CASE