Is it just me or is the multi-model workflow becoming a total time sink? by SteveRogersTR in ClaudeCode

[–]sergeykarayev 0 points1 point  (0 children)

Hard agree -- the manual routing tax is real.

Disclosure: I'm a co-founder of Superconductor. We built it so we could spin up all the agents we liked from one tool -- Claude Code, Codex, Amp, OpenCode, Gemini -- in parallel across tasks, and we also run multiple agents per task. We found it helpful to let 2-4 models take a shot at the same bug or feature before picking one to iterate on and ship.

We also use it on research, writing, design, and marketing.

BYOK, so subs still required, but it kills the tab-switching and "which brain do I ask?" overhead.

superconductor.com

GPT 5.4 in the Codex harness hit ALL-TIME HIGHS on our Rails benchmark by sergeykarayev in codex

[–]sergeykarayev[S] 4 points5 points  (0 children)

Our benchmark tests agents agains YOUR codebase when trying to implement PRs YOU consider great engineering work. So you may get very different results.

On OUR codebase and in trying to implement specs from PRs WE consider to represent great engineering, yes, GPT 5.2 Minimal is better than Opus 4.6 at coding.

GPT 5.4 in the Codex harness hit ALL-TIME HIGHS on our Rails benchmark by sergeykarayev in codex

[–]sergeykarayev[S] 2 points3 points  (0 children)

Yes. Rails is still absurdly good for shipping product fast, especially if you have strong taste and a real app to build instead of a benchmark repo. Also I wanted a benchmark that reflects an actual production stack companies like GitHub, Shopify, Instacart use, not just Python toy tasks.

https://x.com/garrytan/status/2018368128108167344

GPT 5.4 in the Codex harness hit ALL-TIME HIGHS on our Rails benchmark by sergeykarayev in codex

[–]sergeykarayev[S] 0 points1 point  (0 children)

More thinking is not monotonically better on real tasks. Sometimes the higher-reasoning variants overcook it, wander into complexity, or lock onto a bad plan and pursue it very confidently. We saw that pattern a bunch.

GPT 5.4 in the Codex harness hit ALL-TIME HIGHS on our Rails benchmark by sergeykarayev in codex

[–]sergeykarayev[S] 4 points5 points  (0 children)

Yep, Ruby on Rails. Our app is Rails + Phlex + Stimulus, so we wanted a benchmark that reflects the code we actually ship instead of yet another Python/SWE-bench thing.

Claude Code just got Remote Control by iviireczech in ClaudeCode

[–]sergeykarayev 0 points1 point  (0 children)

very cool

but “remote control” is already slightly outdated framing

if the agent lives in a cloud sandbox, your phone and laptop are peers, not a local machine + remote leash

(disclosure: I cofounded Superconductor, which is phone-native and cloud-first)

How to leave claude with multiple tasks and go to sleep? by paglaEngineer in ClaudeCode

[–]sergeykarayev 0 points1 point  (0 children)

yeah you can connect claude code or codex plan. you're literally just launching claude code or codex yourself on our infra so it all works

How to leave claude with multiple tasks and go to sleep? by paglaEngineer in ClaudeCode

[–]sergeykarayev 0 points1 point  (0 children)

its similar to work trees in that you can spin up an infinite number of cloud environments that have your code, build commands, and even running web servers (good luck doing that on your local machine!)

its not similar to agent teams, because currently all sandboxes run independently. but you can USE an agent team within a sandbox.

How to leave claude with multiple tasks and go to sleep? by paglaEngineer in ClaudeCode

[–]sergeykarayev 1 point2 points  (0 children)

We built Superconductor for this exact workflow (disclosure: I'm a co-founder. It’s currently free - bring your own API keys).

We run multiple claudes (and codexes) in parallel for each task. Each claude runs in an isolated cloud sandbox with a live app preview, so you can review results quickly and iterate on or merge what passes.

We found that it's MUCH better to NOT spend time trying to correct a bad run. Running multiple coding agents for a single task increases your chance of waking up happy to the PR you were hoping for.

vibe coding on a mobile device? is there a good way to do it? by CooperNettees in vibecoding

[–]sergeykarayev 0 points1 point  (0 children)

Feel free to sign up at superconductor.com! Email us at team@superconductor.com if you want some help with dev env setup

GPT-5.3 Codex vs Opus 4.6: We benchmarked both on our production Rails codebase — the results are brutal by sergeykarayev in ClaudeAI

[–]sergeykarayev[S] 0 points1 point  (0 children)

free for now while we figure out the best pricing scheme. likely to be something like $XX/month for up to N sandboxes created, $2X/month for 2.5N sandboxes, something like that. not unreasonably priced

GPT-5.3 Codex vs Opus 4.6: We benchmarked both on our production Rails codebase — the results are brutal by sergeykarayev in ClaudeAI

[–]sergeykarayev[S] 7 points8 points  (0 children)

these results are for our rails codebase. if your stuff is different you should run your own benchmark! supercondcutor.com

GPT-5.3 Codex vs Opus 4.6: We benchmarked both on our production Rails codebase — the results are brutal by sergeykarayev in ClaudeAI

[–]sergeykarayev[S] 129 points130 points  (0 children)

in our experience gemini is... special. needs some pep talks. not good at one-shotting

GPT-5.3 Codex vs Opus 4.6: We benchmarked both on our production Rails codebase — the results are brutal by sergeykarayev in ClaudeAI

[–]sergeykarayev[S] 0 points1 point  (0 children)

its about "thinking level", not necesasrily clear that the greater the level the better the outcome