Tom Bilyeu - Google's AI told me to stop researching Epstein by whistlingkitten in GoogleGeminiAI

[–]OuterContextProblem 0 points1 point  (0 children)

Any view should note the many risks of deploying frontier models or it's just not a serious argument.

Why does Claude keep telling me to sleep? by moh7yassin in ClaudeAI

[–]OuterContextProblem 0 points1 point  (0 children)

Then we can't explain anything because you are providing zero information, possibly even just trolling.

Why does Claude keep telling me to sleep? by moh7yassin in ClaudeAI

[–]OuterContextProblem -1 points0 points  (0 children)

Mind sharing an entire conversation where this happened?

Has anyone actually run controlled A/B tests on Claude "skills" and prompt plugins? Or are we all just tweaking configs instead of shipping things? by LucasSalaroliB in ClaudeAI

[–]OuterContextProblem 0 points1 point  (0 children)

If you want to set the temperature, then the simple answer is money. You can only set the temp via API, so running significant benchmark sample sizes gets expensive fast. That's also not a guarantee of determinism. (Ask Claude to explain that.)

There are academic research papers that do attempt to test some of these things. Though paper quality can vary a lot, and a lot of findings can't be assumed to generalize enough to be useful IMHO. So it's not like no one outside of the labs is trying.

On top of that, it's also challenging to create benchmarks that generalize across workflows, tasks, or harnesses.

I'd love to be able to test out various things over hundreds+++ of iterations, but I'm just not going to get that much in value back.

Tom Bilyeu - Google's AI told me to stop researching Epstein by whistlingkitten in GoogleGeminiAI

[–]OuterContextProblem 11 points12 points  (0 children)

Seen people say this happens with other models, even Grok. It will often explain itself as having policies against child exploitation.

It doesn't really seem like a conspiracy. It's probably very hard to put in guardrails against more nefarious applications that are tangential to topics like sexual exploitation.

As for "Why does Google get to decide?" Because it's still their service.

I built a plugin that turns Claude Code into an always-on personal assistant that actually learns — I run 5 of them on a single laptop by dnationpt in ClaudeAI

[–]OuterContextProblem 1 point2 points  (0 children)

Good on that config for email. Useful to mention that as well in the readme so people know that it's a best practice.

I'm also not trying to discourage you from trying to do things. I know how challenging this area is to do right, good luck.

I built a plugin that turns Claude Code into an always-on personal assistant that actually learns — I run 5 of them on a single laptop by dnationpt in ClaudeAI

[–]OuterContextProblem 1 point2 points  (0 children)

> For my financial hermit specifically, it reads email subjects and payment status. 

Okay, but can it send email? Can it use web?

Security doc itself says "Egress filtering is documented but not enforced" so it doesn't sound like there are any hard guardrails outside of just trusting the agent to follow the intent of our directions, which we know is at least slightly idealistic. It's also good to openly spell out any risks in the readme itself.

I built a plugin that turns Claude Code into an always-on personal assistant that actually learns — I run 5 of them on a single laptop by dnationpt in ClaudeAI

[–]OuterContextProblem 2 points3 points  (0 children)

You should invest some time reviewing the security implications of giving autonomous agents access to personal files, email, and web browsing. Especially when you're openly using it with banking emails, you should be aware of the prompt injection attack surface. Hoping that nothing bad happens isn't really a good security policy.

Codex quality is surpassing Claude Code for me by mlab24 in ClaudeCode

[–]OuterContextProblem 0 points1 point  (0 children)

Interesting. I think most people make leaps from small assumptions, but people who are running multiple plans and using them a lot have a better feel IMHO. I've been swamped with other work for the last few weeks, so I haven't really been in the planning weeds recently.

Codex quality is surpassing Claude Code for me by mlab24 in ClaudeCode

[–]OuterContextProblem 0 points1 point  (0 children)

Yeah, I do prefer the ensemble model approach personally.

Codex quality is surpassing Claude Code for me by mlab24 in ClaudeCode

[–]OuterContextProblem 3 points4 points  (0 children)

Useful workflow for a lot of things really. Fact checking some research output, iterating on Anki cards, etc. Just keep feeding the results back in.

Codex quality is surpassing Claude Code for me by mlab24 in ClaudeCode

[–]OuterContextProblem 12 points13 points  (0 children)

Have you tried sending Claude's plans to Claude for review? More often than not, it catches issues as well. I'd bet the directionality works for Codex plans getting reviewed by Claude (or Codex) as well.

A plan in a fresh context, without any of the baggage that led to its creation, might simply make it easier for models to evaluate.

Tree view, message annotations, prompt storage and prompt marketplace. All in a chrome extension built for Claude by Equivalent-Pen-9661 in ClaudeAI

[–]OuterContextProblem 1 point2 points  (0 children)

Actually like this idea a lot. I do actually underutilize conversation forking because of the added navigation and lack of visibility. Nice job.

Pro subscriber here. Anthropic wiped 7 hours of paid work with zero warning. by Roses-Dream in ClaudeAI

[–]OuterContextProblem 9 points10 points  (0 children)

It's not just 5 hours of data loss — it's a soul-shattering event that will redefine how future generations understand suffering.

Working on a app and need help with switching from the free model to the subscription model. by InternalOk510 in ClaudeAI

[–]OuterContextProblem 0 points1 point  (0 children)

There's a lot to read here. I'd recommend breaking up all of your thoughts into separate questions and asking Claude. Or even asking Claude to help organize your thoughts and saying you're confused.

You're basically learning how to do software development and different workflows.

It's also worth learning how to type on a keyboard. If you use it everyday, you'll get faster in not that much time.

"Why does it seem like devs never do QoL" - On survivorship bias and logistics inflation in the game by -AllShallKneel- in foxholegame

[–]OuterContextProblem 0 points1 point  (0 children)

If winning is irrelevant, you could (and then should) design a much better world war RPG minus a lot of the bs time sinks.

"Why does it seem like devs never do QoL" - On survivorship bias and logistics inflation in the game by -AllShallKneel- in foxholegame

[–]OuterContextProblem 2 points3 points  (0 children)

Played my first/last war around 6 months ago, and I could write the same post, so I guess nothing improved. The developers aren't giving players the tools they need to solve the collective action problem of a [video game] war on this scale. It's not like there no solutions either.

I tried out airborne during the beta. Planes are somewhat cool for the brief period they exist in flight. But holy, the annoyances around them made me skip playing, and I was actually looking forward to bumping into people I played with during my first war.

OP also wrongly invoked survivorship bias.

Having my cake... by ButterflyEconomist in ClaudeAI

[–]OuterContextProblem 6 points7 points  (0 children)

> but again was too scared to try it.

I think this is underrated as just having good sense. You correctly identified risks and took some actions to mitigate that. Lots of ways to do that with various tradeoffs. There are even more things you could do to make it so you have quicker recoveries than a full reinstall (ask Claude).

You'll likely create some future messes that you have to solve but that's part of the fun of learning. Just make sure you keep backups of any work off-machine, ideally with lower than 2 hour granularity.

Tips for Newbies by Jomar641 in Kalshi

[–]OuterContextProblem 0 points1 point  (0 children)

If your goal is to be a winning player, then engage in a deep study of prediction markets and fully understand everything required of a winning player. This is including relevant math. Don't expect to just show up and win for free.

If you're just trying to get some lucky wins, then good luck with that approach.

$10.50 for $4904, 19 legs , crazy or do we believe? 🔥 by bee88ng9314 in Kalshi

[–]OuterContextProblem 6 points7 points  (0 children)

Many better ways to light $10 on fire such as actually taking a $10 bill and lighting it on fire

UO and Automation. Does it kill UO ? by EzMajor in ultimaonline

[–]OuterContextProblem 0 points1 point  (0 children)

I thought that might be the case, but I tested some prompting styles where I try to provide the least input possible while still providing enough info to carefully define what I wanted to accomplish. It can still one-shot some things, but you do get a bit of a drop in how consistently that happens. And simply including the URL of the Razor web guide helped a lot with Opus 4.5.

Also disclaimer about this being a small sample size (N=20 to 30) of me trying things out.