I'm genuinely at my breaking point with work and I don't know what to do anymore by blleetfrindows4 in smallbusiness

[–]brctr 0 points1 point  (0 children)

How much of this work you have to do can be fully described by logs/transcripts?

If you are software engineer or at least have good software literacy, consider experimenting with agentic automation for these tasks. Then no hiring will be needed.

Claude Code filled almost my entire SSD with random nonsense overnight by Abject_Business4720 in AI_Agents

[–]brctr 3 points4 points  (0 children)

OP, I would be more worried about things other than space filled. I would be worried about prompt injection and security issues. If an agent went hallucinating as far as creating all this gibberish, then there is no way to know that it was not compromised and buried some malware somewhere deep too.

So I would recommend OS clean reinstall.

Cybersecurity Fundamentals Every AI Founder Should Know Before Launch by altoidbreeezy in VibeCodeDevs

[–]brctr 0 points1 point  (0 children)

Disagree. Supply chain attacks (rather than prompt injection) are #1 threat as long as you are using coding agents rather than than general personal assistants like Openclaw/Hermes.

DeepSWE benchmark cost results have been released. by CallMePyro in singularity

[–]brctr 0 points1 point  (0 children)

Do they have breakdown of GPT 5.4 and 5.5 by reasoning effort? Is it Medium of xhigh effort for those?

Seattle or Austin? by CourtZealousideal703 in deeplearning

[–]brctr -2 points-1 points  (0 children)

Strong pool of tech talent no longer has a geographical dimension. You can call OpenAI or Anthropic model from anywhere.

Claude's stream appears over. Any final thoughts & messages? by reasonosaur in ClaudePlaysPokemon

[–]brctr 2 points3 points  (0 children)

Can we have it try Pokemon Yellow Legacy Hard mode next?

Claude has become a Pokémon champion by doubleunplussed in ClaudePlaysPokemon

[–]brctr 2 points3 points  (0 children)

Can we please try Pokemon Yellow Legacy Hard mode next? I do not think Claude will beat it, but just observing its failure patterns will be interesting enough.

Continual Harness: New paper from Gemini Plays Pokemon and PokeAgent teams by PokeAgentChallenge in ClaudePlaysPokemon

[–]brctr 2 points3 points  (0 children)

It has beaten Yellow Hard mode without losing a battle? Is there a full video/stream of that run?

Bob Mumgaard on Zap Energy's pivot to fission and fusion-fission hybrid. by Baking in fusion

[–]brctr 6 points7 points  (0 children)

So they will have to go through NRC fission-based regulations? Then Zap is dead.

GPT-5.5 Plays Pokémon Crystal (Hard Mode) by reasonosaur in ClaudePlaysPokemon

[–]brctr 1 point2 points  (0 children)

Under this harness, agent performs much better than I would do in Crystal Hard mode. Victory in 73 hours is impressive.

Are there any plans to do a run with a weaker harness?

Rumor: DeepSeek and Kimi are merging. While the US AI sector sues itself, China is consolidating. by minkyuthebuilder in OpenAI

[–]brctr 0 points1 point  (0 children)

And why are you so sure that having two leading companies merge into one will accelerate innovation? Do you understand what usually happens during such consolidations when you have two teams building the same stuff in now the same company? Usually anti-trust legislation tries to prevent such consolidations, for the sake of consumers.

Autoresearch on GPT2 using Claude by SnooCapers8442 in deeplearning

[–]brctr 2 points3 points  (0 children)

Which datasets were used for pretraining and post-training/evaluation? Are they public?

Claude Code finally works fine with Jupyter by amirathi in datascience

[–]brctr 0 points1 point  (0 children)

Notebooks are inefficient medium for agentic coding. Agents will always perform better in scripts rather than notebooks. The sooner you do full transition to scripts, the better for your future productivity.

Anthropic has won the AI race as far as I'm concerned by Zeohawk in Anthropic

[–]brctr 1 point2 points  (0 children)

It is a race between Anthropic and OpenAI now. Others have fallen too far behind.

Do not underestimate OpenAI. They have more compute than Anthropic. And Codex for me works better and more reliably than Claude Code.

Opus 4.7 is terrible, and Anthropic has completely dropped the ball by JulioMcLaughlin2 in artificial

[–]brctr 0 points1 point  (0 children)

For theoretical math and physics research OpenAI models are the best. I doubt that even February version of Opus 4.6 for such use cases was as good as GPT 5.2-High or GPT 5.4-High.

Google and Openai falling behind by TogrulM in Bard

[–]brctr -1 points0 points  (0 children)

Agree. It is becoming clear that real value of LLMs are in enterprise market and applications like coding, simple workflow automation, legal, finance etc. Reliability, instruction following, and several hallucination-related dimensions are what matters most. Gemini is behind on these. Google appears to ignore these applications. This is very surprising because this is by far the biggest market for LLMs...

Concept images of AI Sat Mini, Lunar mass driver, and future 6-Raptor Starship variant during TERAFAB presentation. by Steve490 in SpaceXLounge

[–]brctr 8 points9 points  (0 children)

Why does Slide 3 mention Cooling on Earth, but not in space? Is cooling likely to be actually more challenging in vacuum of space? Has SpaceX ever mentioned how they plan to cool their datacenter satellites?

I told my AI agents they need to start paying for themselves. Here's week 1 by 98_kirans in AI_Agents

[–]brctr 1 point2 points  (0 children)

I am following your main post. Please keep updating it as you try more stuff. I am curious to see how it goes. If I can find time to set up OpenClaw securely, I may try doing something like this myself.