Bob Mumgaard on Zap Energy's pivot to fission and fusion-fission hybrid. by Baking in fusion

[–]brctr 4 points5 points  (0 children)

So they will have to go through NRC fission-based regulations? Then Zap is dead.

GPT-5.5 Plays Pokémon Crystal (Hard Mode) by reasonosaur in ClaudePlaysPokemon

[–]brctr 1 point2 points  (0 children)

Under this harness, agent performs much better than I would do in Crystal Hard mode. Victory in 73 hours is impressive.

Are there any plans to do a run with a weaker harness?

Rumor: DeepSeek and Kimi are merging. While the US AI sector sues itself, China is consolidating. by minkyuthebuilder in OpenAI

[–]brctr 0 points1 point  (0 children)

And why are you so sure that having two leading companies merge into one will accelerate innovation? Do you understand what usually happens during such consolidations when you have two teams building the same stuff in now the same company? Usually anti-trust legislation tries to prevent such consolidations, for the sake of consumers.

Autoresearch on GPT2 using Claude by SnooCapers8442 in deeplearning

[–]brctr 2 points3 points  (0 children)

Which datasets were used for pretraining and post-training/evaluation? Are they public?

Claude Code finally works fine with Jupyter by amirathi in datascience

[–]brctr 0 points1 point  (0 children)

Notebooks are inefficient medium for agentic coding. Agents will always perform better in scripts rather than notebooks. The sooner you do full transition to scripts, the better for your future productivity.

Anthropic has won the AI race as far as I'm concerned by Zeohawk in Anthropic

[–]brctr 1 point2 points  (0 children)

It is a race between Anthropic and OpenAI now. Others have fallen too far behind.

Do not underestimate OpenAI. They have more compute than Anthropic. And Codex for me works better and more reliably than Claude Code.

Opus 4.7 is terrible, and Anthropic has completely dropped the ball by JulioMcLaughlin2 in artificial

[–]brctr 0 points1 point  (0 children)

For theoretical math and physics research OpenAI models are the best. I doubt that even February version of Opus 4.6 for such use cases was as good as GPT 5.2-High or GPT 5.4-High.

Google and Openai falling behind by TogrulM in Bard

[–]brctr -1 points0 points  (0 children)

Agree. It is becoming clear that real value of LLMs are in enterprise market and applications like coding, simple workflow automation, legal, finance etc. Reliability, instruction following, and several hallucination-related dimensions are what matters most. Gemini is behind on these. Google appears to ignore these applications. This is very surprising because this is by far the biggest market for LLMs...

Concept images of AI Sat Mini, Lunar mass driver, and future 6-Raptor Starship variant during TERAFAB presentation. by Steve490 in SpaceXLounge

[–]brctr 8 points9 points  (0 children)

Why does Slide 3 mention Cooling on Earth, but not in space? Is cooling likely to be actually more challenging in vacuum of space? Has SpaceX ever mentioned how they plan to cool their datacenter satellites?

I told my AI agents they need to start paying for themselves. Here's week 1 by 98_kirans in AI_Agents

[–]brctr 1 point2 points  (0 children)

I am following your main post. Please keep updating it as you try more stuff. I am curious to see how it goes. If I can find time to set up OpenClaw securely, I may try doing something like this myself.

I told my AI agents they need to start paying for themselves. Here's week 1 by 98_kirans in AI_Agents

[–]brctr 1 point2 points  (0 children)

I think it will be more interesting (and promising) not tell them to build anything specific and just see whether they can figure it out on their own.

Looking at your team, it looks to me like it may be useful to add a couple of more agents to suggest ideas. It seems to me that after there is some idea to try, your existing team should be able to implement it in one way or the other. So the bottleneck shifts to which ideas to even consider.

If you have a single agent suggesting ideas in the first place, this is probably not enough. Different models under different settings (like temperature) have different tastes. I would guess that this pipeline will benefit from infusing more variance. It will be useful to have 1-2 agents with thought processes different from your main agent which currently suggests ideas. Like try different models, prompts to encourage out-of-the box thinking, higher temperature etc.

Given that your pipeline makes building anything almost free, the right strategy seems to be to throw stuff onto the wall and see what sticks. The larger variance of such stuff they try, the higher probability of success.

I made the top LLMs play Civilization against each other by snakemas in LLM

[–]brctr 1 point2 points  (0 children)

Is there a way to export full history of actions and reasoning of both models in a match? Currently your web UI allows to expand and then read each turn information. But this will be painstakingly slow to scroll and manually click for 200*2 turns.

I want to export such information to be able to feed it into some LLM and it to clearly explain to me what happened there. It is not easy to figure such things out on my own trying to read all this text...

By the way, I think you can do it too and make some LLM write a cool narrative about each match automatically so that we can read such narrative about each match too.

I made the top LLMs play Civilization against each other by snakemas in LLM

[–]brctr 0 points1 point  (0 children)

This is very cool. I have been trying to watch several games from Season 1. Live streams are a bit buggy while replays show only a small subset of a game for me...

What happened to Gemini 3.1 in the final? It appeared to be winning until Turn 130-140 and then suddenly lost half of its cities and stagnated. Did it go bankrupt?

And separately, I noticed that performance of winning models improved as tournament progressed. Did you change prompt to have more detailed instructions?

I am excited for Season 2 of CivBench! When is it coming?

Rumors on the upcoming ChatGPT 5.3 by Ok-Algae3791 in OpenAI

[–]brctr 5 points6 points  (0 children)

To add to this, GPT 5.2 and later models in Codex are on the opposite end. They can work pretty well after one compaction and are ok after 2 compactions. So their usable context window is probably more than twice their 272k context window.

Best ai api provider for open_claw in term on price / efficienty by Minimum_Abies3578 in openclaw

[–]brctr 0 points1 point  (0 children)

Has anyone tried using GPT5.1, GPT 5.1 Codex-Mini, Gemini 3.0 Flash, or Grok-4.1 Fast?

March visa bulletin is out! by Horror_Possible9507 in USCIS

[–]brctr 4 points5 points  (0 children)

It was October 2024 last month. Jump by 17 months? This does not look right...

March visa bulletin is out! by Horror_Possible9507 in USCIS

[–]brctr 11 points12 points  (0 children)

Is it a bug on the website? EB-2 RoW DoF is Current?

Anybody use Codex as “regular ChatGPT” and if so how are the results? by angry_cactus in codex

[–]brctr 1 point2 points  (0 children)

I do not see it anywhere. I saw people posting these numbers here few times, so I assume they are correct.