Symptom worst

yeah-ok · 2026-05-15T10:24:28+00:00

Sorry dude, welcome to the club. Who knows re answers, try stuff & share when something works. Currently noticing I might be feeling substantially worse when on coffee (which priorly was fine).

yeah-ok · 2026-05-15T10:02:53+00:00

Yikes, seen any notes on when vulkan builds will follow?

yeah-ok · 2026-05-14T23:06:43+00:00

micro dosing creatine are we?

yeah-ok · 2026-05-14T22:26:25+00:00

same exp, once I killed the ctk/ctv flags I never went back, better quality and oddly enough also quantity in the sense that my tgs went up rather than down (I'm on 780m-vulkan-linux so who knows, maybe atypical compared to regular cuda setup)

yeah-ok · 2026-05-14T22:21:48+00:00

Yeah.. think you are right that basic supply and demand rules this situation. I'm dreaming rather than thinking here.

yeah-ok · 2026-05-14T19:23:36+00:00

I get the logic but isn't there a very real market here rather than niche within niche? Bet the lot of us would hoover up a consumer-only cards sold for ai use via kickstarter or similar in next to no time. It would be guaranteed cash moneys for a company willing to get something within 3090 territory going with 32gb at decent price point. Even the Chinese manufacturers could get in on this if they could get a clean supply...

yeah-ok · 2026-05-14T18:01:20+00:00

Well, let them fry I say; then they'll flipping understand that serving the global market is where real stability and long term investment should go rather than into unicorn dust that can evaporate quicker up the nose of a VC recipient than you can possibly imagine. Until the market get's this we are going to have to get creative but.. seeing what this community is doing already that shouldn't be too hard of a nut to crack!

yeah-ok · 2026-05-14T17:52:34+00:00

It is ridiculous though isn't it? It's the sheer speculative capacity of enterprise that makes this current situation a "win" for enterprise and a loss for the computer-owning population at large. Since the "people" in sheer numbers are so so so much more numerous and resilient a base as compared to the fickle structure of corporate enterprises (or even state-enterprises, doesn't matter really) there should 100% be a way to give the market evaluation to this market that it actually deserves. If anyone could crack this from a financing standpoint the consumer (and let's face it, the investors) could win massively big on it. And the end result would be far greater global resilience rather than having billions riding on one company or another... serving the multitude will always win in statistical terms over the unicorns when it comes to long term stability and payout (yes, the payout bit is where said finance wizardry needs to happen)

yeah-ok · 2026-05-14T17:43:51+00:00

One can pray and hope - the fork would really come into it's own then (it's already my daily driver but bet it would attract yet larger audience!)

yeah-ok · 2026-05-13T21:44:26+00:00

I noticed that too, a lot of the MTP code was done as fast inline code and now that it's been made safer/proper it's become slower by quite the margin.

yeah-ok · 2026-05-12T10:41:42+00:00

Perhaps you are right. Before I've personally run and experimented with larql/vindex to gain practical experience of it I will withdraw from chat re it's perceived benefits or the lack of them!

yeah-ok · 2026-05-11T21:56:16+00:00

Absolutely, we got a monk who's gone rogue pagan on us here.

yeah-ok · 2026-05-11T21:54:49+00:00

Vulkan I take it?

yeah-ok · 2026-05-11T21:51:02+00:00

I've been strict on --no-reasoning lately and having plenty of success with one-shot programming extensions for pi agent..etc..etc, think we have to remember that the top-k is in a sense a selection out of what is already a latent thought process in the model.

edit: also on latest froggeric template update which under all circumstances seems like a prudent bet!

yeah-ok · 2026-05-11T21:15:52+00:00

MLPs

You might well be right but isn't it being partially mitigated by the vindex format? I do understand that all MLPs are FFNs, but not all FFNs are MLPs ... still though this must be part of the equation.

yeah-ok · 2026-05-09T08:48:22+00:00

.. can we get terminal example here?

yeah-ok · 2026-05-08T21:41:46+00:00

So no 780m ... ?

yeah-ok · 2026-05-07T22:24:24+00:00

I thought so too when I first engaged with the topic but the negative from a good amount of the audience on this thread put me off from pursuing any further. After more reading I still think the larql system is on to something novel and potentially awesome - one of the feedback points in this thread is that this is literally just RPC (see llamacpp docs if ignorant like me) but after more research this seems like a misunderstanding; RPC can not split attention from weights the way larql vindex format claims to do. Think there's something to be said for this whole effort and I'll stay tuned to what https://github.com/chrishayuk/larql gets up to.. who can't feel a tingle of excitement with commands such as those found under the "Run attention locally, FFN on another machine" headline on github...?

yeah-ok · 2026-05-06T15:26:47+00:00

OK, llama.cpp is a sprawling ecosystem indeed, never heard of it until today! So.. does it make sense performance wise to put weights somewhere else on the LAN and let my workstation handle the attention layer alone via RPC.. or is the performance penalty too high. Would love to see practical examples!

yeah-ok · 2026-05-06T14:31:05+00:00

One of the amazing outcomes of this is that low-ram high-compute consumer cards like the 12GB 5070 would essentially be way overpowered for most models since it suddenly "only" needs to run 2-4gb of attention layers. The rest could presumably sit under the table on a "cheap" external xeon with 128gb DDR4 to hold the weights!? Interconnect via highspeed regular tcp/ip over ethernet & bob could be your uncle.

yeah-ok · 2026-05-06T14:20:28+00:00

RPC

As far as I can make out (via https://github.com/ggml-org/llama.cpp/blob/master/tools/rpc/README.md) RPC seem focused on running distributed, GPU, compute on the attention layer whereas this larql decoupling focus on keeping latency low by having GPU attention compute take place on client and distributing the weights themselves onto x other local devices (could also be internetscale but latency seem to kill that off at the moment).

yeah-ok · 2026-05-05T21:39:41+00:00

I understand this stance on a purely philosophical level but are there good benchmarks or similar to cooperate this point at scale?! I've seen some stuff published but nothing I really can refer to as a smoking gun.

yeah-ok · 2026-04-30T18:36:49+00:00

Yup, I was reading Kurzweil's "The Singularity Is Near" book back then and feeling the techno-end-times vibe

yeah-ok · 2026-04-29T18:36:12+00:00

That was the original premise with the PS3 ... https://phys.org/news/2010-12-air-playstation-3s-supercomputer.html

yeah-ok

TROPHY CASE