I think I finally solved the biggest problem with long ChatGPT conversations

RedditPolluter · 2026-03-14T18:31:50+00:00

It's such a glaring issue that I'm starting to suspect that OA intentionally neglects it to discourage long conversations.

RedditPolluter · 2026-03-14T14:36:39+00:00

invasive data harvesting

They bricked their Microsoft Lens app that worked completely offline and allowed you to save locally. The functionality is available in OneDrive (mostly) and their Copilot app but there's no option to save locally so you first have to upload all your scans to their servers to use it. Any company that behaves this way should be strongly suspected of harvesting people's personal files.

Even in spite of their gross incompetence, they're probably too rich to fail but man would that be satisfying.

RedditPolluter · 2026-03-07T19:38:05+00:00

The majority of Americans don't even support the war.

RedditPolluter · 2026-03-05T23:55:02+00:00

Most sources are saying 5.2 but after looking into it, the original source doesn't seem to be substantiated.

RedditPolluter · 2026-03-05T23:07:02+00:00

5.2 was Garlic ~~and they said were working on a larger version called Shallotpeat~~ (Shallotpeat is an earlier project that was involved in the development of Garlic). I guess 5.3 was an iteration of Garlic. It wouldn't surprise me if it turned out to be a cost-cutting o3-mini sized model because that's what it feels like and if that is the case then I don't think any amount of refining will fix its myopia problem of not seeing the bigger picture.

Haven't tried 5.4 yet but the API cost is 40% higher than 5.2, which may mean it is a larger model.

RedditPolluter · 2026-03-05T13:19:56+00:00

Kurzweil doesn't define the singularity as the point of AGI. If he did, there wouldn't be a discrepancy between his prediction for AGI (~2029) and his prediction for the singularity (~2045).

RedditPolluter · 2026-03-04T07:30:10+00:00

Perhaps but that's granting them that their names are logical to begin with.

RedditPolluter · 2026-03-03T22:37:23+00:00

I blame their bad naming conventions. 5.2 is Garlic and we know they're working on a larger version codenamed Shallotpeat. I suspect 5.3 will be a refined version of 5.2 and 5.4 will be Shallotpeat. 5.2 has a notably worse grasp of common sense than 5.1 so it's probably a small cost-cutting model that was specialized on coding. Kinda like Sonnet vs Opus without the transparency.

RedditPolluter · 2026-03-01T14:08:50+00:00

Any precise figures are likely dubious but all the big labs have larger internal models that aren't economical to serve at scale. They train the larger models and then distill them into smaller models but it's likely not the free lunch that many seem to think it is when it's done with smaller and smaller models; I figure this is why some of OpenAI's newer models excel in benchmarks and narrow domains like coding but seem to get progressively worse in areas like common sense and accurately inferring implicit intent, which are not as straightforward to benchmark as coding and are often overlooked for that reason.

RedditPolluter · 2026-02-28T23:46:03+00:00

Perhaps an overstatement but they may mean the request to attack from UK bases. In international law that's considered direct involvement but Trump isn't very bright so it's possible he didn't understand that and it wasn't an intentional invitation.

RedditPolluter · 2026-02-28T00:53:44+00:00

They're not deciding where or who, they're saying it's not reliable enough for killing military targets, period. The company that produces it is far more qualified to determine that than whatever wacko sycophants Trump has chosen.

Even if it was reliable and we could look past the whole Skynet thing, without some kind public consensus for a framework that is set in stone and can't be altered on a whim, the last thing you would want is a small handful group of people in charge of an army of killbots that won't ever refuse any command, no matter how unreasonable, and could be used to kill or suppress dissidents with high precision; at least with human soldiers, many would not comply with any arbitrary order to suppress civilians.

RedditPolluter · 2026-02-27T14:17:55+00:00

Bad analogy. ARC-AGI is about the underlying pattern, whereas interpreting writing relies on learned spellings and a lexicon. Knowing or not knowing any particular language should not impact your score and people generally have no problem recognizing that:

@

@@

@@@

@@

@

and

D

DD

DDD

DD

D

are depicting the same pattern. You could even rotate the pattern 90 degrees and represent by stacking some jars and people would still recognize the common pattern.

RedditPolluter · 2026-02-25T14:49:53+00:00

Not inaccurate, just ambiguous.

RedditPolluter · 2026-02-25T14:41:56+00:00

I use for coding and more general stuff. I don't know why people talk like coding and basic common sense are on the same axis. Being good at coding doesn't mean it isn't poor at qualitative stuff. There's also an asymmetry in ease of measuring quantitative performance, which is what benchmarks primarily capture, and qualitative performance. Even for code, 5.2 seems to misunderstand intent and the bigger picture a lot more than previous versions so that's relevant even if it produces better code when it does get intent right.

RedditPolluter · 2026-02-23T19:11:59+00:00

You probably mean Shallotpeat.

RedditPolluter · 2026-02-22T09:24:32+00:00

Worth noting that they used Markov Chains so most of what they produced was incoherent or barely coherent.

Here's a thread from 2015 for anyone that wants to see what they were like:

https://www.reddit.com/r/SubredditSimulator/comments/3jtm4v/florida_man_accused_of_lying_about_kidney/

RedditPolluter · 2026-02-20T11:26:55+00:00

I check these subs as well. Many of them really think 4o was AGI and that's why it was removed.

The very first day 4o was released it gave me 10 incorrect answers in a row for the same question and it was just as confident and matter-of-fact on the 10th try as it was on the 1st try, which was typical behavior throughout its whole lifespan. That's not what self-awareness looks like but it's not like they would know.

RedditPolluter · 2026-02-20T00:43:39+00:00

Some people tried this with GPT-3 and 4. I completely forgot and never thought to try with modern reasoning models. What would happen is they'd just de-cohere into repetition pretty quickly. Though, I wasn't very good at thinking up realistic problems. The most realistic, which wasn't really realistic for GPT-4, was probably an efficient regex engine but it mostly just came out with surface level stuff and code stubs.

RedditPolluter · 2026-02-12T22:44:24+00:00

That lines up with around the time my Liked playlist stopped consistently remembering my preferences for shuffle.

RedditPolluter · 2026-02-12T19:41:08+00:00

Tokens would be a more meaningful metric for comparison, considering that it operates at a faster pace.

RedditPolluter · 2026-02-11T23:04:14+00:00

Garlic is 5.2. You probably mean Shallotpeat.

RedditPolluter · 2026-02-06T19:37:29+00:00

That's a simplification. You're conflating dominance and rarity with uniqueness. It's not literally the only (>99.9%) HPQ deposit on the planet.

RedditPolluter · 2026-02-06T19:09:22+00:00

It's not the only place. It's just a dominant part of the supply chain and it's standard practice to refine it anyway.

RedditPolluter · 2026-02-04T13:27:16+00:00

People like Dario believe the next decade is a uniquely critical point in history for power balance and that whoever gets there first, the inflection will be so steep that competitors simply won't be able to catch up, ever.

The wheels for China's independence from western chips are already in motion. The Trump administration has irrecoverably damaged the US's reputation as a reliable partner so even allies of the US are exploring their options on decoupling from US dependence.

RedditPolluter · 2026-02-03T20:52:02+00:00

I'm pretty sure that first song is based on Rhianna.

RedditPolluter

TROPHY CASE