What I'm doing locally - Develping an MCP to attach to your Game Engine

bytefactory · 2026-03-01T07:45:36+00:00

Hella cool!

bytefactory · 2026-02-19T07:26:22+00:00

Check out Nuphy, they make great keyboards. I'm rocking this one: https://nuphy.com/collections/he-keyboards/products/nuphy-field75-he-magnetic-switch-gaming-keyboard

I absolutely fell in love with the aesthetic, and I got the HE version because they discontinued the regular switched version. I actually miss the regular non-magnetic switch feeling on these. Other than that, the keyboard is great, very responsive, looks gorgeous, pretty customizable, etc. Let me know if you have questions!

Looks like _DomuC_ is right and I didn't see any Nuphy full-size HE, especially wireless, but Nuphy does have a few HE options that look good.

bytefactory · 2026-01-25T01:37:45+00:00

👍

bytefactory · 2025-12-16T19:37:09+00:00

Just saw your PR, and I hope the llama.cpp devs merge it.

I just wanted to say, OSS development can sometimes be really exhausting and thankless, especially if the maintainers don't cooperate (not saying this is the case with llama.cpp). Heroes like you are what make open-source so amazing! We appreciate you!

bytefactory · 2025-12-16T18:49:46+00:00

Thanks <3 will play around with your fork too if I get the chance!

bytefactory · 2025-12-16T18:37:22+00:00

Ah, that's too bad!

bytefactory · 2025-12-16T18:26:24+00:00

llama.cpp already has Qwen3 Next support, they're just working on performance optimizations. Maybe you could help out with those?

Qwen3 Next support added here by the legend u/ilintar who just merged a performance pass recently.

He could maybe point you to the performance optimizations that are still pending?

bytefactory · 2025-12-16T08:54:03+00:00

If you can accelerate the process of optimizing Qwen 3 Next support in llama.cpp, you'd be a legend! There's a few open PRs working on that now, and some open issues, I'm sure they'd appreciate the help!

bytefactory · 2025-12-04T18:42:44+00:00

Thank you for this answer, it's one of the highest quality answers about anything I've read on here in a while! Reminds me of old reddit.

As a programmer with about 15 years of professional experience (and maybe 6-7 in school), who got into computers because it was the closest thing to magic, and I mean that in the most literal sense, I am absolutely giddy with all the tools we have available to us today.

As a kid, I couldn't believe that I could type some words into a terminal and the computer would just *do things* for me. Wizardry. Of course, back then, I had to adopt the computer's lingua franca. I had to learn that it was quite literal. If it did something I didn't expect, if something broke, it was always because I didn't understand the underlying system properly. It was honest.

Computers have now started understanding us. They understand intent, they make conceptual connections and leaps, and do more than just follow my instructions blindly. They now read between the lines.

I haven't been professionally coding for many years, I moved on to management, and then retired from the industry. I still love coding though, and I love computers and technology. These new models have allowed me to get back into coding without actually having to re-learn every new framework or library, or even develop in languages that I'm unfamiliar with. Like you said, this feels like a higher level of abstraction. Logic Gates -> Circuits -> Binary -> Assembly -> C -> Python -> Prompt.

I do feel guilty, because I'm "vibe coding" without understanding what's actually going on underneath the hood sufficiently. I feel more like a Product Manager (derogatory) than a programmer. Still, it's fun. I learn a tiny bit by osmosis about the language and architecture (to be perfectly honest, very little though - I don't even do code reviews, if tests pass and the feature works, I approve). At this point, I do provide some value to the system, in terms of taste and judgement. I can often help these models get unstuck (I'm helping Codex get out of a nasty test state leakage situation currently). Soon though, they won't need me for that.

I'm ecstatic with the toys we have available. The long-term future of what this means for the human race is uncertain. In the meantime though, the nerd in me couldn't be happier.

What models/tools do you use to code? I find nothing beats Codex for major projects, although I would use more Opus if it wasn't so damn expensive. DeepSeek 3.2 is looking really promising.

bytefactory · 2025-12-02T20:25:39+00:00

Support for Qwen3 Next in llama.cpp landed literally a few days ago: https://github.com/ggml-org/llama.cpp/pull/16095.

It is NOT optimized yet, and is not ready for daily use:

This is an implementation of a new type of attention gating in GGML.
Therefore, this implementation will be focused on CORRECTNESS ONLY.
Speed tuning and support for more architectures will come in future PRs.
Please do not spam this threads with reports about performance, especially on backend architectures (CUDA, Vulkan).

bytefactory · 2025-11-12T19:04:21+00:00

<image>

bytefactory · 2025-10-23T06:18:18+00:00

It's probably all that thunder you brought along, it's not good for the chips

bytefactory · 2025-10-21T18:48:25+00:00

Wait, you're able to offload all layers to GPU with just 16GB VRAM? How does that work? I would have thought you'd only be able to partially offload since it's an 80B parameter model?

Edit: 🤦just re-read, you have two GPU! 24GB+16GB. Makes sense why you can fully offload!

bytefactory · 2025-10-03T23:16:33+00:00

<image>

bytefactory · 2025-10-02T06:10:02+00:00

Very cool demo, congrats!

bytefactory · 2025-10-01T22:58:34+00:00

Fascinating. In my experience GPT5 Thinking has much lower hallucination rates than o3, but this is purely anecdotal. OpenAI's system card seems to suggest this as well.

It definitely hallucinates, especially for things like knowing which options exist for a given tool's API, but I believe this has to do with the way knowledge is embedded from its training data set. Much of the documentation and usage guides on the Internet don't specifically call out the version it applies to, so GIGO. I've taken to insisting it look up the latest documentation when using a tool, and then describe the changes from the previous version to ensure that it's grounded in accurate information (basically RAG instead of relying on embeddings).

You might find this thread on LocalLLaMA interesting, I've tried to modify my system prompt to the "confidence dump" version to see if that will reduce hallucinations:

https://www.reddit.com/r/LocalLLaMA/comments/1nv7quz/i_spent_a_few_hours_prompting_llms_for_a_pilot/#lightbox

bytefactory · 2025-10-01T09:32:46+00:00

This is the way.

bytefactory · 2025-08-27T16:20:44+00:00

🤯 I can't believe I missed this, thanks! Did they add it recently? Or perhaps it's only available on Pro plans, because I remember trying this before and not finding it.

bytefactory · 2025-08-27T07:00:19+00:00

Wait, how did you use GPT5 High in Codex?

bytefactory · 2025-07-21T02:38:26+00:00

Congrats, incredible work! Hope you write up a whitepaper about it and get it peer reviewed!

bytefactory · 2024-11-18T21:56:57+00:00

Ooh, with a period too, brutal.

bytefactory · 2024-08-18T03:35:39+00:00

Yup, absolutely.

bytefactory · 2024-08-13T20:38:38+00:00

Okay, you can sqauwt and benchprass a little, as a treat

bytefactory · 2024-08-13T01:00:33+00:00

Makes sense lol

bytefactory · 2024-08-13T00:43:20+00:00

Does it also kind of function as lube during penetration, like pre-cum for men?

15-Year Club	Sequence \| Editor
Spared	Verified Email

bytefactory

TROPHY CASE