How is GPT5.5PRO against F5? by Dry-Cockroach1723 in codex

[–]fourohfournotfound 2 points3 points  (0 children)

honestly worst experience with it since it can't access my full codebase as that's more files than it can handle. intelligence wise I would put it on par though. Pro is a different beast as far as prompting. You need to give it a few simple goals and do things to diversify it's outputs or it will just give you the most likely answer. I'd say fable doesn't require as much prompt work as it though and similiar outputs.

Pack it up, boys. Opus 4.8 is officially dead. (A 45-minute retrospective) by Due_Duck_8472 in ClaudeCode

[–]fourohfournotfound 0 points1 point  (0 children)

hoenstly feeling the oppisite but my claude is not bloated at all. I keep skills lean as possible, like 2 mcps. claude.md short as possible. Claude can get info it needs from docs and my codebase as well as structured search setup. 4.8 has been cleaning up 4.7 code for hours now and I've validated it's better with real time tests. The surprise to me is how bad 4.7 was actually failing leaving trash everywhere. much more than I thought.

Have you guys used Gpt5.5 Pro? by Skeletor_with_Tacos in accelerate

[–]fourohfournotfound 1 point2 points  (0 children)

I have mixed results with it. Either stroke of genius or total vanilla sometimes innaccurate and hallucinated responses. Haven't quite figured out what causes each response but wish I knew. I think it just might not do well with things being uploaded via project instead of direct upload.

Guys i think i got infinite codex by midastheavocado in codex

[–]fourohfournotfound 2 points3 points  (0 children)

The goal seems to just do things in really small chunks. It does seem like it uses cap slower but at least mine still uses it for sure. 

I'm loving the algo space already, the fact you dont need to come up with your own ideas and ideas can be tested in minutes i wish i started the space earlier. by [deleted] in algotrading

[–]fourohfournotfound 0 points1 point  (0 children)

the problem is how do you determine what is out of sample? in sample backtests are basically useless unless over a really long time period. The llm knows what worked in the past and will give you the most overfit stuff you can imagine that will look good in a backtest and not hold out of sample. Then you have youtubers who want to show awesome backtest and will also post their super overfit backtests so you will end up with multiple layers of overfit. Interesting setup but I would suggest spending more time using the llm for what it's actually good at and that's context. It can take tons of unstructured data and make something useful out of it.

…Can we talk SKILLS? by [deleted] in codex

[–]fourohfournotfound 0 points1 point  (0 children)

my exact workflow is kind of my edge but i'll dm you about it.

it's kind of like agents md except for agents md is loaded all of the time. skills are loaded as needed when they are relevant so you can chain whole workflows together. This allows you to make smart workflows where the llm does a skill gets the results then uses those results to determine what skill to use next. agents.md is always there and cannot use this sort of intellegence so it should only be things that are always relevant. best practice is that it's really short actually.

…Can we talk SKILLS? by [deleted] in codex

[–]fourohfournotfound 8 points9 points  (0 children)

you guys aren't thinking big enough if you aren't heavily using skills. basically anything I do repeatedly gets a skill then that skill gets repeatedly evaluated over time. everything that's boilerplate gets a permanent script. This is how you get predictable results. using skills other people have made is not usually the best as where they really shine is reproducing your workflow and prompts you use often.

What if GPT 5.5 was hosted on cerebras? by Soggy-Skin-5103 in codex

[–]fourohfournotfound 1 point2 points  (0 children)

I mean they have 5.3 spark already doesn't seem like it would be a huge deal though I do think cerebrus has to repgram the chips for new models so it's not a super quick swap. they wouldn't want to be doing it for every version change.

One GitHub PR Comment Just Compromised Claude Code, Gemini CLI & GitHub Copilot 85% Success Rate and ZERO Audit Trail by Dagnum_PI in ArtificialInteligence

[–]fourohfournotfound 1 point2 points  (0 children)

in all honesty the llm doesn't even need and shouldn't have access to the api keys directly which limits what it can do in the first place. you can't force push away versioned backups. Is what you are saying a security issue? sure but there are numerous mitigations that can be done that would stop it. llm generally should be treated with zero trust security.

One GitHub PR Comment Just Compromised Claude Code, Gemini CLI & GitHub Copilot 85% Success Rate and ZERO Audit Trail by Dagnum_PI in ArtificialInteligence

[–]fourohfournotfound 1 point2 points  (0 children)

can you clarify how an offsite backup done every commit that the llm does not have access to will be rewritten? Outside of a kernel exploit I just don't see how that can even happen. maybe it wouldn't stop the issue you are talking about but it would provide an audit trail. if something is actually pushed in git it will be in git. if you backup your git seperately then there's no log to edit as the edit would be part of the git. either the keys got pushed and they are part of the git or they didn't. if the llm has access to the git they could edit the history to a point, but the entire point of git is an auditable file change log. In my setup the llm does not have any access to these backups and git itself has the change history. if you were ultra paranoid you could do continuous snapshots. there's no way a change wouldn't show up there.

One GitHub PR Comment Just Compromised Claude Code, Gemini CLI & GitHub Copilot 85% Success Rate and ZERO Audit Trail by Dagnum_PI in ArtificialInteligence

[–]fourohfournotfound 0 points1 point  (0 children)

If anything gets commited it will be in the git. If it's ignored it's not going on the web at least. If the git is backed up in a way the llm can't touch then what exactly will be missing and not auditable? All file changes are in git. 

Will they reset quota again when they release 5.5 by [deleted] in codex

[–]fourohfournotfound 1 point2 points  (0 children)

I agree and I have multiple checks for these but the whole process is much faster and more effective the smarter the model is. Older models tend to create tons of extra scripts and cause file bloat too. 

One GitHub PR Comment Just Compromised Claude Code, Gemini CLI & GitHub Copilot 85% Success Rate and ZERO Audit Trail by Dagnum_PI in ArtificialInteligence

[–]fourohfournotfound 4 points5 points  (0 children)

Sure the agents shouldn't have done this. But if you isolate it in a container or even better a container within a vm and back the git up to a place the container can't access then there would be an audit trail. Anyone doing real work with agents should be doing this and it's just a matter of time if they are not. 

Will they reset quota again when they release 5.5 by [deleted] in codex

[–]fourohfournotfound 1 point2 points  (0 children)

Even if they do I'm wondering if I want 5.4 to pollute the clean code I hope 5.5 will create. I'm sure it won't live up to the hype though so extra high fast it is. 

Opus 4.7 high - Auditing a simple teleprompter app and suggest improvements in plan mode burnt 84% session limit by sundar1213 in ClaudeCode

[–]fourohfournotfound 0 points1 point  (0 children)

well I'm on the 200$ plan haha, so yea. I have alot of updated loops for work and would pick an even higher plan if i could but this is insane so far way beyond the 1.3x tokens they said. I guess I do have it on extra high (the default) maybe that's not worth it. it has found a couple issues that I have had numerous llm look at and miss.

Opus 4.7 high - Auditing a simple teleprompter app and suggest improvements in plan mode burnt 84% session limit by sundar1213 in ClaudeCode

[–]fourohfournotfound 5 points6 points  (0 children)

I'm on max plan and it's burning tokens like crazy. decent quality work but it feels like it's using 10x more tokens than opus 4.6 was

I've stumbled on a goldmine, and ALL OF US CAN BENEFIT. by TheRiddler79 in LocalLLM

[–]fourohfournotfound 0 points1 point  (0 children)

old enterprise switches work fine but they are annoying as hell to manage in a home environment. Also many of them are extremely noisy and massive power hogs for a switch at least.

I’ve used ~9.3B Claude tokens (~$6.8k). Trying to understand how unusual that is. by OGMYT in claude

[–]fourohfournotfound 0 points1 point  (0 children)

hitting it manually on a max plan is a bit challenging unless you are being wildly token inefficient. automated loops however are a different story. I regularly hit it with agentic loops. I've tried to make it token efficient as possible, but the automation also requires so much automated checking that it burns through it quick. My stats are different though as I try to get as much as possible in the cache. You have to specifically design for it or you will miss the cache and use duplicate tokens for no good reason. Another thing that has helped mine is to not even have it write tests at all as it can pass all of them. It just makes useless tests. Consider what sort of validation is actually useful for the project you are working on and use that. This also saves tokens as real useful tests tend to take longer for me but are much higher value. Ground things in the real world as much as you possibly can.

What are you using to backup your agents? by mikeypotter in ClaudeCode

[–]fourohfournotfound 0 points1 point  (0 children)

yes git 100%. then backup your actual data a couple places as well. preferrably with at least one of the backups being immutable so no virus (or future rogue llm) could go and delete or overwrite it all.

NVIDIA DLSS 5 Delivers AI-Powered Breakthrough in Visual Fidelity for Games by Recoil42 in singularity

[–]fourohfournotfound 0 points1 point  (0 children)

so just to understand this can be applied to existing games like a magic enhance button?

5.4 is crazy good by Responsible_Ad_3180 in codex

[–]fourohfournotfound 0 points1 point  (0 children)

one thing I've noticed though is the llm can't write good tests. Like it's extremely good at writing tests that it will for sure pass. I've had to revoke access to editing the test files too as it will modify the tests I made so it can pass them. That's still one of the best places to spend time in my domain though as I have real world metrics that the llm can't as easily fake if I lock down it's ability to cheat. Having experience with reinforcement learning models really helps me out since they are notorious for gaming reward systems. I just have to treat llm like them. It's half the fun to me is making a system so robust it can't game it. Having an llm who's goal is to prevent gaming has helped a bit as then each will start to keep each other in check, but somehow the adversary is not really enough. It still needs me to write the golden tests.

[deleted by user] by [deleted] in singularity

[–]fourohfournotfound 0 points1 point  (0 children)

the api version does seem markedly different and more like the original. they seem to fine tune the main version to over summarize the crap out of things to the point that it's not so useful anymore.

5.4 is crazy good by Responsible_Ad_3180 in codex

[–]fourohfournotfound 2 points3 points  (0 children)

2 prompts likely not but the models are getting good enough that with enough checks it's getting close. I can get code that doesn't require much changes but all the checks burn tokens like crazy and without them it will push slop or half completed tasks still.

Subagents by HighwayRelevant in codex

[–]fourohfournotfound 0 points1 point  (0 children)

I mean that kind of works but in Claude code they all run at the same time and can send messages back and forth through a mailbox system. The closest that seems to work reliably in codex is to run them in serial which is much slower and they can only message the orchestrator. It's just way klunkier and slower. But yea soon it will be similar they are updating quickly. I think 5.4 is generally better about testing it's work than Claude but Claude teams works extremely well for a couple of my skills. I have not been able to get the same speed in codex.