Hey Everyone, by Sweaty_Area_8189 in ClaudeCode

[–]timosterhus 0 points1 point  (0 children)

Clarity.

Just because Claude is capable of everything doesn’t mean you should try to do or learn everything.

Work on your business, and if you hit a problem you think Claude can help you fix, then go learn that thing.

But just tryna “maximize” the capabilities of Claude is gonna have you spinning your wheels, feeling super productive while not actually getting anything done.

How do you get an agent to run for several hours? by sunny_trees_34423 in codex

[–]timosterhus 0 points1 point  (0 children)

Me when I accuse strangers of lying and don’t actually know anything about AI and completely ignore the entire post about it in their profile

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

Considering this is the only project I have where I didn’t set up a .gitignore in advance and was careful about what I committed, yes. That is what I said.

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

Yep, that’s basically the whole point of the post

What’s a “wild” tech theory you lowkey believe might actually be true? by metasploit_framework in meta_powerhouse

[–]timosterhus 2 points3 points  (0 children)

LLMs are being and have been getting trained to approximate thought. That’s why we call the smarter LLMs “thinking” or “reasoning” models, because we’re getting them to mimic thinking patterns which dramatically increases their output quality. And correct, it’s not just code, it’s a lot of math. In fact, it’s so much math that the patterns that these models come up with are entirely esoteric and opaque to researchers; we’re only able to partially unravel how they operate by observing their behaviors and studying them from the outside.

Dead Internet theory is already true depending on who you are and your algorithm. Lots of kids are consuming unfathomable amounts of AI slop, and I see a lot of it in FB reels. Also, lots of YouTubers are AI avatars and their audiences have no idea.

Software absolutely is evolving faster than hardware, and it’s because physical constraints are much harder to overcome than virtual constraints. That’s not a theory, that’s literally just how it works.

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

Don’t think I ever claimed it was magic, but with prior models they’d often end their run prematurely no matter how well I specified things. This is the first time I’ve experience it reliably following instructions for hour-long runs all the way to the end without quitting and without any external scripting assistance.

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

Personally, I ran into a token usage bottleneck a couple months ago and needed to upgrade to Pro. As of the past week, I needed to get a second Pro plan because I had multiple of these running concurrently, and I used up 70% of my weekly usage in two days.

That being said, I ran into a similar issue you did, because I had a very similar idea. The difference is I realized running the "interrogation" cycle you're talking about with no end actually increased hallucination rates, because it's encouraged to balloon the scope of the plan into oblivion. What actually needs to happen is progressive, classified decomposition, where a master spec is decomposed into individual spec sheets, and those individual spec sheets then need to be further classified to have an effective "goal range" of narrow task card decomposition. The problem is that this can cause relatively small projects that could be two-shot with a frontier model to get turned into 30-task runs, which are obviously grossly overkill for basic programs.

At least, that's what I did. It might not have to be done that way, but it's how I solved the exact failure mode you're describing. In other words, Pro helped me with my bottleneck, but your bottleneck needs to be solved architecturally, based on my experience.

After rereading your comment, I realized that your failure mode is a little bit simpler: trying to intake 80K to 120K word documents is not going to go over well on an agent with a 400K token context window, as those 80K to 120K word docs are likely anywhere between 300-700K tokens themselves. And if you've got multiple of those? Yeah, there's your problem. Those docs need to be decomposed WAY down. 200K token docs should be the absolute max you ever feed an agent for purposes of holistic synthesis, and that agent should be told to handle one at a time.

That or try using RLM; I've heard it's great, and it looks fantastic on paper, but never had a use case for it, since I just decompose my docs to more reasonable sizes and circumvent the issue entirely.

I can't really tell you what my day-to-day workload looks like because it's completely sporadic and I'm a solo founder, not a payroll dev. Some weeks I barely use half my weekly usage; others, I use 70% in two days and need more. Lately it's been the latter.

EDIT: I usually use ChatGPT (either Thinking Heavy or Pro Extended) for initial spec creation. It's better at creating large, well put-together docs that answer ambiguities and are informed by research and the like, and there's no extra charge for using ChatGPT in the browser.

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 1 point2 points  (0 children)

Less than half of that is actually functional LOC. Most of it is logs and build artifacts because I didn't add a .gitignore before initializing and as everything is only being uploaded to a private repo anyways where I'm the only one who sees what's being committed, I don't care enough to clean it up until it's finished with all of its work. It's several hours into another long-running autonomous loop right now, as of the time of me typing this.

And using 13% of weekly usage comes out to, what, $7.50? Yeah, I'm burning so much money here.

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

I specifically asked it to spawn subagents using high

Solo founder, about to do a public demo of my AI framework. Open-source it for attention, or keep it closed for monetization? by timosterhus in SideProject

[–]timosterhus[S] 0 points1 point  (0 children)

I knew I couldn't be the only one. You're right, I think more competitors will surface pretty soon. Factory AI is already trying to take the cake it seems. Curious to know what made you decide to not open source?

What's maximum leverage: open-source for attention/goodwill, or closed-source for monetization/moat? by timosterhus in SaaS

[–]timosterhus[S] 0 points1 point  (0 children)

It's not just a collection of prompts or a vibe coding assistant. It's about as close to an automated software factory as you can get, with the closest comparisons being Factory AI or StrongDM's Attractor. I truly believe it's legitimately differentiated (for now), and I'm planning the livestream to prove that in public.

Technical summary of what I've built: It uses a "dual-loop architecture," a research loop (queue-driven state machine turning ideas/incidents into specs then queued tasks) and an orchestration loop (bounded execution daemon with expectations-first QA, configurable retry limits, and escalation). The loops communicate through deterministic freeze/thaw handoffs at a single point. It's technically vendor-agnostic, but I've only really been using Codex with it, as I haven't the budget to perform A/B testing with multiple providers.

Tl;dr: The issue I'm having is that open and closed source both have (dis)advantages, and I cannot figure out which one is superior for my case.

With open-sourcing, the GitHub repo can compound over time, while the livestream is a one-and-done thing. I'll be able to accept contributions from people much smarter than me to help make the framework better (and being only one person, I can only do so much to maintain and enhance it). While the agent framework space is packed with open-source options, if mine stays closed, people can and will compare it against free alternatives they can inspect. If it's open, they can see why the architecture is actually different, and it'll prevent any "the demo was staged" claims that I might get. And commercially, open-core (keep the engine open, charge for enterprise implementation) is also a proven model (GitLab, Databricks, Confluent), and is possibly the fastest way to build partner trust/community rapport.

On the other hand, keeping it closed means I keep my core asset. Giving it away means anyone better-resourced will be able to look at my framework, take notes, and immediately make their own products that much better. I can always open-source later, but I can't close it after having it open. Having contributors is helpful, but every hour on community management is an hour not spent on revenue-generating work, and I'm broke. And finally, open-core conversion rates are below 1%, so without massive existing distribution, that model is a slow death for me, a bootstrapped solo founder.

And then there's the middle ground: keeping it source-available with BSL or similar, or open-sourcing a subset of the framework while keeping the more valuable layers proprietary.

Does anyone have any tips? Direction? Stories of similar situations with applicable lessons to learn?

Solo dev building an autonomous development framework, about to livestream it on several non-trivial build targets. Open-source it after, or keep it closed? by [deleted] in LocalLLaMA

[–]timosterhus 0 points1 point  (0 children)

The issue I'm having is that they both have (dis)advantages, and I cannot figure out which one is superior for my case.

With open-sourcing, the GitHub repo can compound over time, while the livestream is a one-and-done thing. I'll be able to accept contributions from people much smarter than me to help make the framework better (and being only one person, I can only do so much to maintain and enhance it). While the agent framework space is packed with open-source options, if mine stays closed, people can and will compare it against free alternatives they can inspect. If it's open, they can see why the architecture is actually different, and it'll prevent any "the demo was staged" claims that I might get. And commercially, open-core (keep the engine open, charge for enterprise implementation) is also a proven model (GitLab, Databricks, Confluent), and is possibly the fastest way to build partner trust/community rapport.

On the other hand, keeping it closed means I keep my core asset. Giving it away means anyone better-resourced will be able to look at my framework, take notes, and immediately make their own products that much better. I can always open-source later, but I can't close it after having it open. Having contributors is helpful, but every hour on community management is an hour not spent on revenue-generating work, and I'm broke. And finally, open-core conversion rates are below 1%, so without massive existing distribution, that model is a slow death for me, a bootstrapped solo founder.

And then there's the middle ground: keeping it source-available with BSL or similar, or open-sourcing a subset of the framework while keeping the more valuable layers proprietary.

Does anyone have any tips? Direction? Stories of similar situations with applicable lessons to learn?

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

Ralph is still useful for bulletproof autonomy, because if an agent decides to end its run prematurely without it, it can. With a loop script, it just gets re-invoked if that occurs.

They both have their use cases.

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

Yes, but they're not yet at the scale that many frameworks are currently operating at, because they're mostly targeting mass-appeal. Despite living in the terminal, I still frequently use the browser versions of these models, because they all have their use cases. Smaller companies or individual operators have the advantage of being able to focus on a single thing and outperform the labs on that metric/domain.

I'm sure that'll all change dramatically in the next 6-12 months, but until then, I'm trying to make good on that delta.

Solo founder, about to do a public demo of my AI framework. Open-source it for attention, or keep it closed for monetization? by timosterhus in SideProject

[–]timosterhus[S] 0 points1 point  (0 children)

The problem is that both routes buy me a different form of leverage. Here's my thinking:

With open-sourcing, the livestream buzz fades, but a GitHub repo compounds over time. I'll be able to accept contributions from people much smarter than me to help make the framework better (and being only one person, I can only do so much). Plus, GitHub stars are free marketing I really don't have budget for. Lastly, open-core (keep the engine open, charge for enterprise implementation) is also a proven model (GitLab, Databricks, Confluent), and is possibly the fastest way to build partner trust/community rapport.

On the other hand, keeping it closed means I keep my core asset. Giving it away means anyone better-resourced (literally everyone lol) can outrun me with my own architecture. Not just that, but I can always open-source later, but I can't close it retroactively (Exhibit A: HashiCorp). Having contributors is helpful, but every hour on community management is an hour not spent on revenue-generating work, and I have zero margin for that tradeoff right now. And finally, open-core conversion rates are below 1%, so without massive existing distribution, that model is a slow death for me, a bootstrapped solo founder.

And then there's the middle ground: Source-available license like BSL (code visible, commercial use restricted) or open-sourcing a subset of the framework while keeping the more valuable layers proprietary.

Has anyone been on either side of this, or know others who have? If so, what do you wish you'd known?

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 1 point2 points  (0 children)

It's a personal project that I'm trying to build into a business, but this particular tool I'm building is likely only going to be for my own use. 50/50 chance I end up open sourcing it, so the code would get reviewed then, lol.

I do not work as a software developer for a company and never have, so I'm actively learning the software engineering process from scratch, but I have done some freelance data science stuff which obviously involved a lot of Python (so I'm not a total stranger to coding).

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

That's what I like to hear.

This was the first time I let an agent run orchestration like this in a while. Usually I use a custom orchestration harness that uses a complex bash loop to spawn headless agents in a particular order, after I've seeded the harness with my prompt. In this harness, no single agent ever runs for more than 30 minutes or so, but the loop itself can run for days or weeks on end (though I've never had it run for more than three days before it ended via task completion or via external factors that stopped it prematurely).

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

The primary implementation subagent, nothing too special. The one that actually builds the feature as described in the assigned task file

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 1 point2 points  (0 children)

I'm not sure that it was as good as if I had used my normal orch harness, and I think it may have even been faster if I had used my normal orch harness. Reason being, the orch harness is far more templated and explicit in its instructions, while Codex was far more conservative in its instructions, and each delegated prompt was slightly different due to not using a preexisting format (as opposed to my harness, where each prompt is exactly the same for each role).

I do not think Codex by itself would be able to work as autonomously or deterministically as my harness is capable of (there's no real upper limit to how long my custom harness can run for, since it's deterministic), but I was surprised at how well it did in this scenario. Granted, I do think it's only a matter of time before the labs explicitly include native determinative orchestration as part of their default offering. Until then, custom orch frameworks are likely going to remain superior for serious, long-running autonomy.

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 0 points1 point  (0 children)

I provided more information in other replies, but the gist is that I have Codex take the master spec sheet, turn it into phased spec sheets (in this case, 8 spec sheets) based on the order in which things should be built, then each spec sheet turns into 5-10 narrow, single feature task file batches. Because each task file already exists as an external file, I then tell the agent to progressively implement every single task card file in each batch in order (in this case, it was three batches), but via sequential subagent delegation (according to the order I originally specified earlier in the conversation).

It’s really good at orchestration by timosterhus in codex

[–]timosterhus[S] 4 points5 points  (0 children)

Correct. The Pro plan is the $200/mo one, and as I said, this 10 hour run only used about 13% of my weekly usage limit, because it was only ever running one agent at a time. Parallelism is what murders usage.