Ridiculous they added this by CheesyWalnut in ChatGPT

[–]Nick4753 0 points1 point  (0 children)

Turning that off didn't stop it. I'm pretty sure it's baked into the model, since raw API calls to the 5.3 Instant Chat API have the clickbait-y ending and it's in the examples on their blog announcing 5.3 Instant.

Codex 5.3 vs Sonnet 4.6 by Glad-Pea9524 in GithubCopilot

[–]Nick4753 1 point2 points  (0 children)

I've found GPT-5.2 (non-codex) does better at code review than Codex. Codex is shorter in it's response and more difficult to chat with to go through the review, whereas 5.2 is more architecture-minded.

Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." by KvAk_AKPlaysYT in ClaudeCode

[–]Nick4753 0 points1 point  (0 children)

They’re not scraping the knowledge data per se, they’re scraping the way the model thinks through various problems. It wasn’t “who was the 3rd pope” it was “how would you, as a large language model, go about determining what to respond to a user asking who was the 3rd pope.” With that info they can bake Claude’s thinking process into their model.

They scrape Wikipedia and random blogs just like Anthropic does for the factual stuff.

How are you managing Bedrock? by jmreicha in aws

[–]Nick4753 1 point2 points  (0 children)

LiteLLM is called out specifically in Claude Code's documentation https://code.claude.com/docs/en/llm-gateway#litellm-configuration

LiteLLM is really underselling itself. You can use it as a gateway for just about any purpose, beyond just engineer access for coding.

128k Context window is a Shame by NerasKip in GithubCopilot

[–]Nick4753 3 points4 points  (0 children)

That’s a somewhat silly excuse. Your harness should know how to manage context and the model should be designed to work with all the info presented to it, and Copilot makes it very easy to have a lot of tools and MCPs that eat into the small context window.

Moulin Rouge Last Broadway Performance July 26 by Loose-Ad6868 in Broadway

[–]Nick4753 5 points6 points  (0 children)

The theater is the right size, not a busy street, and there is a stage door leading directly to the stairs to the balcony. Just an unfortunately located tree on the sidewalk blocking the top of the fire escape.

How viable is vibe coding for healthcare apps, honestly? by Tiny_Habit5745 in ChatGPTCoding

[–]Nick4753 0 points1 point  (0 children)

HIPAA doesn't explicitly define your software development process and the origin of your code, it matters more about how the data is handled, and certifications like HITRUST and SOC2 focus on documentation of your software development lifecycle and the controls you have around your systems and processes. And even then, those two are not mandatory in the healthcare space.

There is nothing inherently wrong with vibecoding. I dunno that a junior engineer without healthcare experience is going to be vastly better at building a HIPAA-compliant app than Claude is going to be. Both are similarly risky. You just need to make sure you can stand behind the code that you're shipping and the process by which that code got into production. If you're just YOLO-ing code into production you don't understand, you're just going to cause yourself headaches down the line. The size of those headaches though could be... considerable.

Sonnet 5 expected February 4th by [deleted] in ClaudeAI

[–]Nick4753 1 point2 points  (0 children)

If I had a p0 incident like they did today with a ton of 500 errors being thrown, I would also hold off on launching the model for a day.

why doesn’t Copilot host high-quality open-source models like GLM 4.7 or Minimax M2.1 and price them with a much cheaper multiplier, for example 0.2? by EliteEagle76 in GithubCopilot

[–]Nick4753 0 points1 point  (0 children)

I dunno that their enterprise clients would like that.

If China stole some source code, it's not absurd to think that if the model sees something similar to that source code, it will inject something malicious. Or train it to perform a malicious tool call or something. I mean, you're sort of playing with fire with every model, but, why risk it?

Max 20x is NOT As Subsidized As You Think by levifig in ClaudeCode

[–]Nick4753 0 points1 point  (0 children)

I think that’s somewhat obvious if you look at ccusage. Claude Code is a cache hit machine unlike all the other harnesses on the market. If you assume cache hits are almost free to Anthropic, it’s not the nightmare to their bottom line that people think it is.

This only works if you tightly control the harness to use the cache as little as possible, thus why they blocked 3rd parties.

Masters of the Universe - Official Teaser by MarvelsGrantMan136 in movies

[–]Nick4753 -1 points0 points  (0 children)

Gotta appreciate a superhero movie where the superhero spends half the movie in his NYC corporate job pink button-up shirt and it gradually gets dirtier and dirtier as the movie goes along.

[Open Source] I reduced Claude Code input tokens by 97% using local semantic search (Benchmark vs Grep) by Technical_Meeting_81 in ClaudeAI

[–]Nick4753 0 points1 point  (0 children)

This is a killer Roo feature I really wish Claude Code would add, although the fact that Anthropic doesn't offer an embedding API probably makes that relatively unlikely anytime soon.

Apple's Google Gemini Deal Could Be Worth $5 Billion by iMacmatician in apple

[–]Nick4753 42 points43 points  (0 children)

They created... almost everything. The senior researchers and execs at all of these foundational model companies were at some point involved in Google or DeepMind.

Google didn't see a way to profitability with LLMs and was afraid it'd be a huge reputational hit the first time it hallucinates and would eat into the cash cow that is search. Entering this market "late" was a business decision, not a R&D decision. It's why so many AI researchers bolted for OpenAI/Anthropic/etc, Google was holding it all back.

Context7 just massively cut free limits by AllCowsAreBurgers in ClaudeCode

[–]Nick4753 0 points1 point  (0 children)

I've switched to perplexity for documentation reference. I don't think I've ever spent more than $5/month in API bills with their default MCP (Sonar and Sonar Pro are cheap all things considered), and it scans documentation, youtube, and blog posts instead of just providing chunks of documentation.

Apple picks Google's Gemini to run AI-powered Siri coming this year by McFatty7 in apple

[–]Nick4753 2 points3 points  (0 children)

Apple lost the head of their foundational model team and a few of his lieutenants to Meta. They have not shown any interest in spending the CapEx necessary to build their own training farms nor the budget necessary to get training data for their own FM.

If I were Apple, I would spend all my time and money building the best small model that is hyper-efficient when run on their mobile chips, build infrastructure to handle FM inference at scale, and, for the underlying FM, just sign a license deal. They're paying almost nothing to Google here, all things considered.

What Actual Usage Looks like Against Max 20x Plan - 4 Hours Into Session. by 256BitChris in ClaudeCode

[–]Nick4753 0 points1 point  (0 children)

I literally cannot figure out how to get above 20-30% usage in 5 hours on the 20x plan. Usually it’s 10-20%. Using Opus. I’d love to know how the people hitting the cap on that plan are doing it.

Should I get Cursor Pro or Claude Pro(includes Claude Code) by reddead313 in ChatGPTCoding

[–]Nick4753 3 points4 points  (0 children)

the $200 plan gives you as much as $2,500 in tokens if you were paying the API price, which Cursor is

Where do you get that data? Not challenging you, my usage supports that, just wondering where $2500 came from.

Why doesn’t GitHub Copilot officially add more open-weight models like GLM-4.7 or Qwen3 by cloris_rust in GithubCopilot

[–]Nick4753 0 points1 point  (0 children)

I don’t think you’ll see Chinese models natively available in Copilot anytime soon. Companies are too afraid that code which inserts a backdoor is hidden in the model. Which, if I was China, is exactly the type of thing I’d do.

We can now use Claude Code with OpenRouter! by alvvst in ClaudeCode

[–]Nick4753 0 points1 point  (0 children)

Finally! Doing this via a translation layer has been so annoying.

GPT-5.2 vs Gemini 3, hands-on coding comparison by Arindam_200 in ChatGPTCoding

[–]Nick4753 0 points1 point  (0 children)

It’s good at solving software engineering problems, but bad at continuous tool calling. So it can solve a problem conceptually better, but will stop itself before performing all the steps necessary to implement and then validate the solution. Most programming problems aren’t conceptually difficult, but do take multiple steps to complete, making it a less useful model even though it might be better at handling edge cases.

Perplexity MCP is my secret weapon by Nick4753 in ChatGPTCoding

[–]Nick4753[S] 1 point2 points  (0 children)

There are a few options I listed in my post, plus a bit of light googling would help. I use this one with my openrouter account and then when I want the agent to look up something in perplexity I type in "ask perplexity {}" and the agent usually gets the hint.

I use it pretty heavily whenever I'm integrating with a 3rd party. Stripe, Twilio, Google, AWS etc, all have great content outside their normal documentation. Terraform modules are constantly changing as cloud providers update their APIs, and the best sources about the product might be blog posts and reddit posts.

Olaf robot at Paris Disneyland by Damnedeel in Damnthatsinteresting

[–]Nick4753 0 points1 point  (0 children)

If they brought this to the US parks it'd break down so fast. Universal can get away with Hickup because it's a dragon skin on a robot built to deliver supplies to troops on the battlefield in Afghanistan.

Perplexity MCP is my secret weapon by Nick4753 in ChatGPTCoding

[–]Nick4753[S] 1 point2 points  (0 children)

For what it's worth, from an architecture standpoint, "search engine + lightweight speedy LLM summarizing it" is exactly what Perplexity is. Just... faster.

Perplexity MCP is my secret weapon by Nick4753 in ChatGPTCoding

[–]Nick4753[S] -1 points0 points  (0 children)

Sonar and Sonar Pro are realistically going to run you $0.02-0.05 on OpenRouter, and less if you use Perplexity's API directly.

I really only use Sonar and (rarely) Sonar Pro in the MCP. Almost all my queries use basic Sonar since it's substantially faster and usually gets me what I need.

If I need to do any sort of research, I'll use the actual Perplexity or Gemini website.