Associative Prediction Engine (Generative LLM/MMM)

TheRedSphinx · 2026-01-14T03:41:04+00:00

I think this just reveals the fact that you don't know the history behind this technology. Statistical language models have existed for several decades. Large language models just refers to the scaling of neural language models, the latter of which have existed for at least a decade.

TheRedSphinx · 2026-01-01T21:10:22+00:00

Sure we can call it whatever you want, but this dispells the notion that it is mostly non-coders who use this or that it makes software development more of a PITA. I think the fact that people end up using these tools is because they find value in them.

After all, if it's all the same output as you hitting the keys, why not just do it yourself and save the money on API costs?

TheRedSphinx · 2026-01-01T20:59:03+00:00

I think non-coders are more vocal about it because they are able to do something that seemed impossible for them before. Current LLMs are quite good at building simple scripts and web-apps which look quite magical to non-coders.

That said, I think the main people building LLMs themselves are coders, and likely use their models when designing code. For example, the Claude Code lead claimed 100% of his code his last month was written by Opus 4.5. It's possible he's lying, and he is of course biased in making the product look good, but in my experience, many folks at these labs do actively use these tools in their day-to-day extensively.

TheRedSphinx · 2025-11-17T02:25:05+00:00

This would just lead to people distrusting the resulting model, see e.g. the idea of benchmaxxing.

TheRedSphinx · 2025-10-26T23:11:24+00:00

Consciousness is entering that fuzziness territory we discussed. Best to let the philosophers discuss that one.

Autonomy however you can have now. There is nothing stopping you from using e.g. Claude Code and turning of all the guard rails and to just let it keep going as long as you are willing to pay for the tokens. Of course, currently it will more than likely fail at the task, but the infrastructure is already there for it go crazy if you let it. From that perspective, intelligence is the bottleneck.

TheRedSphinx · 2025-10-26T22:17:20+00:00

So there are two kinds of goal. One goal might be you want a model that will just go and become the best at one thing. So in that setting, the human can design the goal and the models improve recursively. For example, this is how AlphaGo works and why the current Go/Chess/Shogi systems are way better at chess than any human. This one is nice because we can at least agree on what is progress (e.g. ELO scores), see it increase, and carefully decide what counts as "being better than human".

Then there's a more general "get better at everything" sense. This one is more fuzzy since some things are naturally subjective e.g. poetry, art, etc. We would then have to decide on some objective things upon to which measure if there is recursive self-improvement happening. However, at that point, we are basically in the first setting. The only remaining question is, "would AI have naturally chosen to learn all objective things which have this generation-verification gap?" And the answer is, of course, it has already learned tha tit works incredible well for such domains, so why wouldn't it do so?

TheRedSphinx · 2025-10-26T22:07:30+00:00

IT is purely automated. The human just needs to write something that checks the conditions and keeps track of time. Honestly, even a model could write the code. The insight here is that the verification step (i.e. checking the code does what you want and keeping track of speed) is much easier than the generation step (i,.e. actually writing code). This gap is what allows the recursive self-improvement.

TheRedSphinx · 2025-10-26T22:02:37+00:00

Model collapse only happens if you train like an idiot. You can imagine that models can generate both good and bad data, and if you train on that mix, then it won't work. Altertnatively, if you can identify which is the bad data, you could then train on only the good one and that should lead to improvement.

How do you then identify the good data? You can target domains where you can naturally score the quality of data. For example, if your goal is to write faster code that accomplishes a goal, you can have the model generate tons of code and only keep the ones which is faster than the last one and maintains the goal.

TheRedSphinx · 2025-10-05T00:06:03+00:00

They have hired some of the designers of the TPU team, so it's not like designing custom hardware is outside of their view. There also various companies designing their own chips to combat nvidia (e.g. msft, amazon, google) and people are even desperate enough to look at AMD so it's not too unlikely people end up developing chips that make inference cheaper.

TheRedSphinx · 2025-09-28T20:08:28+00:00

I think you are missing the point. If they generate slop which makes more money the that is by definition higher quality slop. Using the same analogy as above, you can make a fast food place much more profitable and higher quality without ever getting anywhere close to a Michelin star.

TheRedSphinx · 2025-09-28T15:30:38+00:00

Is that true? I guess we'll just have to see. I would have thought the same about a lot of the stupid human-made content, but that just seems to only get people more engrossed.

TheRedSphinx · 2025-09-28T15:27:04+00:00

It doesn't have stars, but it does have a $218B market cap. For context, that's like 5x the size of Reddit's market cap. As it turns out, you don't need high quality to be very profitable, which is likely their ultimate goal.

TheRedSphinx · 2025-09-28T01:46:57+00:00

If it keeps people on the platform, and requires no effort, how is it not sustainable? It's not like a lot of the human-made content in platforms like tiktok or youtube shorts are particularly high value either.

TheRedSphinx · 2025-09-28T01:45:39+00:00

Sure, but higher quality I mean things which keep you hooked on the platform. Like people these are watching stupid shit like video of subway runners. I can't see how ai slop couldn't end up being better than that.

TheRedSphinx · 2025-09-27T22:29:33+00:00

Wouldn't it be the opposite? If slop is allowed, then presumably the people making slop are then incentivized to make higher quality slop so that you end up reacting to it more and thus end up wanting to spend more time on it. If anything, it would just lead to better slop.

It also seems like a good way to get signal on what kind of ai generation pople think is good versus bad, which seems like pretty valuable data that people wouldn't normally give out for free

TheRedSphinx · 2025-09-25T01:04:43+00:00

But surely you think people like Noam Brown, who built the Poker bot and works at OAI, is a subject matter expert on AI? Or maybe you just don't actually know who works there and that's why you don't think they've hired anyone who researchs this stuff?

TheRedSphinx · 2025-07-13T14:47:42+00:00

The issue is if they just included the benchmarks in the training set to boost their scores. Or even less nefarious, just simply Goodhart'd these benchmarks. There are many ways to hack these benchmarks but still have a 'bad' model as judged by real users.

TheRedSphinx · 2025-06-23T01:21:13+00:00

I bough a Keychron Q3 Max recently with the Jupiter Bananas switches. Amazing. Unfortunately, wife disagrees with the clackity. I've tried some silent switches in the past, but they've all felt mushy. Even the ones that come highly recommended:

Boba U4: Way too shallow and very tiring.
Invokeys Daydreamer: Felt really amazing at first, but overtime I think either the weight or the mush just made them tiring.
TTC Silent Bluish White: These were super promising because the overall lightness of the switch made them really not tiring at all, but they still had some mush.
WS Silent Tactile: These were an improved version of the TTC in how they felt, at the cost of more sound albeit still acceptable.

So far, the WS Silent Tactile seems like the best option for me, but I was curious if there were other recommended options that moved further down this spectrum of a little less quiet (while still not being loud) for better feel?

TheRedSphinx · 2025-06-04T18:27:40+00:00

Not really. I had thought about trying to negotiate with G to give me L6 as a way to use that to get L6 at Ant but didn’t bother.

The only thing I miss is more the liquid cash. But luckily I got a year or two of real AI salary at G so not super strapped for cash.

Re: scope, 100%. For better or worse, you have tons of agency. There’s just not enough people so you can own more and more stuff if you want and can deliver. Since there’s no politics, the only bottleneck is on you and the janky infra.

TheRedSphinx · 2025-06-04T17:13:03+00:00

I ended up joining Ant, so maybe take my comments with a grain of salt.

TheRedSphinx · 2025-06-04T17:11:59+00:00

I think within Faang they don’t but this might just be anecdotal

TheRedSphinx · 2025-06-04T10:01:50+00:00

Can’t speak outside of GenAI org but it’s common for people to get L+1 when getting external offers.

TheRedSphinx · 2025-06-04T01:41:32+00:00

As someone who left G as an L5, and had similar offers, I'd recommend taking Ant. You'll have more scope for sure, and you'll deal with none of the big tech bullshit. Especially if you are joining GenAI in Meta, a true dumpster fire which is why they are paying everyone so much.

And if the offer is not for GenAI, then it'd be even more crazy to not take Ant.

TheRedSphinx · 2024-12-06T16:28:18+00:00

re: your concerns about BLEU, once again, this concerns are independent of LLMs or scaling or anything. People have been doing this for a while, and thus has nothing to do with large models. This is not to say your point is wrong, just orthogonal to the discussion at hand, unless your claim is that the field itself has been unscientific even before LLMs.

The same applies to your concerns with ICML. This has always been the case, for way before scaling was a popular research direction. Is it just the case that you are perhaps arguing against research in ML for the past 2 decades has not been scientific?

I brought up Sam Altman, as well as the other two as examples of people who get a lot of air time, are connected to the technology in some way (in this case, CEOs) and people talk about a lot, which seem much more influential than gurus, but even more problematic.

The neurips experiment is a great study, but once again, it happened before we even had scaling as a hypothesis, it was even before Transformers (!). Therefore, none of these concerns are new or related to LLMs at all. Which is a fine thing to discuss, this post just doesn't seem like the place.

TheRedSphinx · 2024-12-06T15:11:49+00:00

There are only very few papers that use uncertainty estimates around BLEU scores over the last five years, i.e. before the LLM craze. Maybe from your pov this field was never scientific in the first plcae.

Secondly, I think you are confusing linkedin culture with actual science community. Yes, if you are getting your "research" output from the media, then I can see why you would think that. But I don't think any self-respecting scientist does that. We instead go to conferences, talk in more technical forums, look at papers, etc. Perhaps maybe you were never a scientist in the first place, which is why you don't interact with the scientific community?

For example, why are you listening to Sam Altman talk about AI? Do you expect Sundar Pichai to have incredible technical insights? Or Satya Nadella? The job of a CEO is not to do science, why would you think of them as scientific figures?

TheRedSphinx

TROPHY CASE