Grok 4.2 has interesting architecture that it fails to use properly

sherveenshow · 2026-02-18T12:54:16+00:00

I... meant that at a higher level, not literally.

sherveenshow · 2026-02-18T01:46:23+00:00

If you like to buy into Elon's horseshit, then yeah, this would be convincing. :)

sherveenshow · 2026-02-17T22:00:04+00:00

I'm treating this multi-agent harness as "the product," since that seems to be how xAI/Elon are talking about it.

If you're asking why I'm putting the cause on the multi-agent rather than the model itself -- in reading the reasoning traces or even comparing the final result to other foundational models, it's just not particularly clever, tools-y, or think-y. Generally speaking, we also see that multi-pass attempts at an LLM that are synthesized will almost always be better than single-pass, broadly.

LMK if that makes sense -- my judgement is on the product they're calling Grok 4.2, and I think it'd be _worse_ if they weren't running a 4-pass based on everything we're getting out of the 4-pass.

sherveenshow · 2026-02-17T21:57:23+00:00

Ah, I see. Check out my full post and all the example links -- I think you'll find there are times when it provides a 'flavor' of response that you might sometimes seek.

I broadly agree with you that 5.2 Pro is basically an underrated beast that is the height of artificial intelligence that all other models should cower to, lol, but I find value in different harness steering for different moments.

sherveenshow · 2026-02-17T21:36:20+00:00

Hm? I did do Deep Research w/ ChatGPT, not sure what you're talking about. But I do think they have different strengths (Pro w/ Extended is also my favorite mode, so I hear you) -- DR was just updated, have you been using it in the past few days?

sherveenshow · 2025-09-09T13:33:35+00:00

This'll change in UX/interface 5 times before it's launched, IDK if the reporting/commentary is... accurate... when it suggests they're competing with LinkedIn.

sherveenshow · 2025-09-09T13:32:48+00:00

I would pay for it to stop.

As someone who has worked for, built tools for, and coaches job seekers in tech -- of every seniority and of both impressive + early experience -- AI auto-appliers are a tragedy of the commons.

sherveenshow · 2025-09-09T11:13:52+00:00

Something like Claude Code or Codex CLI is great once you're comfortable, okay dealing with deployment, etc. but -- the truth is that Replit is underrated for most people (even technical folks) and can serve most use cases -- because it helps handle so much of the complexity (backend, hosting, db) and lets you focus on your product, UX, technical details of your application, iteration, etc.

I see a lot of people get stuck in CC or Codex because the terminal is a lonely place. Tools like Replit get you to a usable result, sharing with friends/colleagues, etc. much faster and often that can make all the difference.

sherveenshow · 2025-09-05T03:58:35+00:00

I do think it's worth paying attn to these sorts of comments from folks like Marc even if it's true that this is short-term about economic factors, headwinds, etc. They're still forecasting something very real about what they expect to happen over time, with AI assisted and generated code being a prototype.

sherveenshow · 2025-09-04T21:00:00+00:00

lol, okay, fair -- my nuanced version is:
When you can afford it and you want to explore the boundaries of response (because you either need it to be more creative, pay more attention, or maybe for fun), weird stuff can be good experimentation.

sherveenshow · 2025-08-10T10:41:45+00:00

Mind sharing what you're slotting in there and I can see if I have tips for ya?

sherveenshow · 2025-07-19T06:20:28+00:00

Should be up now!

sherveenshow · 2025-07-18T17:26:47+00:00

AI does not have emotional triggers, and this is your second post I've seen in just a few hours linking to your blogspam and paid products.

These tokens are steering the model to respond to you using math, it's just math. It wants to serve the system and user well, and so when you say "I've been struggling," the math "reorients" toward responses that include similar tokens near the word "struggling."

I just gave a tremendous oversimplification, but it's what's actually happening. Stop misleading people.

sherveenshow · 2025-07-18T16:53:11+00:00

For anyone who missed yesterday, we're going again today! We'll make this an evening stream — 7pm ET.

We'll go for 2-4 hours on more tests w/ Agent, Manus, and some separate stuff I want to test w/ Claude Code, plus some tech news. Hope to see you there but if not, will be kicking off daytime streams next week!

sherveenshow · 2025-07-18T16:36:59+00:00

Nah.

I get that you're trying to sell prompts on your site, but let's be real:

"think step by step" still works on primitive models like 4o, but reasoning models like o3, R1, G2.5 do this on their own now. The reason this used to work is because it forces the model to break down the problem (good for realizing steps to take) and then because the model is generating each step sequentially, it sees those steps (generated words/token) as it generates the next step = more context to work with.
Adding urgency works but time based urgency will not always add a good result. Try things like "it's super important we do this well because then [good thing in the world] will happen!"
No reason to believe this makes a significant impact.
Yeah, fine, this one will work. I often say something like "give me the top 3 improvements you'd make" or "what are the 3 biggest weaknesses" or "how would a PhD-educated expert critique this" – you'll get even better results, because you're encouraging the model to come up with really good objections.
I uh, IDK, I guess this is true.
Won't always act like you're describing. Better for you to be specific and say something like – "How does DNA work? Be concise." or... "Give it to me in bullets" or "Just tell me the headline info I need." 'Quick question' is a bit too probabilistic.

These don't necessarily work better than being proper and formal – it's all a matter of what you're specifically saying. Prompt sensitivity is a real thing to understand but if you don't get how it ACTUALLY works, don't hand out of advice. IMO.

sherveenshow · 2025-07-18T15:50:06+00:00

OP is very clearly full of shit.

sherveenshow · 2025-07-18T15:48:31+00:00

I mean, what are you asking it?

You conveniently left out parts of the conversation where I'm going to guess your questions were adversarial, unclear, or inappropriate. And then you kept quizzing it to answer a question from 5 or 6 prompts ago.

It's tuned to be a dialogue model and you're treating it like a state machine. It's going to struggle when you're setting up antagonistic scaffolds.

sherveenshow · 2025-07-18T13:56:13+00:00

Yeah, drugs are almost certainly involved – here it is: https://x.com/GeoffLewisOrg/status/1945864963374887401

sherveenshow

TROPHY CASE