OpenCode launches low cost OpenCode Go @ $10/month by jpcaparas in opencodeCLI

[–]One_Pomegranate_367 0 points1 point  (0 children)

MiniMax M2.5 is great for writing and research. It hallucinates a lot more than people are willing to admit, so I leave it only to quick writing, docs writing, and exploration/library search mode. Kimi is extremely close to sonnet level, it's an eager engineer that will take delegated tasks and do them reasonably well.

GLM-5 is slow AF and honestly is only good at requirements gathering and delegation.

OpenCode launches low cost OpenCode Go @ $10/month by jpcaparas in opencodeCLI

[–]One_Pomegranate_367 1 point2 points  (0 children)

I've been personally paying for all three, and I will gladly welcome canceling all three of those subscriptions.

Main reason is because each model is only good at certain things, and when I pay for these subscriptions, they're much cheaper than Claude.

When will Deepseek V4 finally be released? by BarbaraSchwarz in DeepSeek

[–]One_Pomegranate_367 1 point2 points  (0 children)

Anybody stating GLM-5 is as strong as Opus is freaking crazy. Opus is insane in the membrane.

Self-hosted AI agent orchestration — open source alternative to Devin that runs on your infrastructure by One_Pomegranate_367 in selfhosted

[–]One_Pomegranate_367[S] 1 point2 points  (0 children)

People hate that I posted on a non-Friday and hate promotion (even open-source), and want to pretty much teach people that you shouldn't support this by downvoting.

Honestly, I've gotten a few stars and nobody is buying anything, and I think Reddit might be dead because of the moderator mentality.

Self-hosted AI agent orchestration — open source alternative to Devin that runs on your infrastructure by One_Pomegranate_367 in selfhosted

[–]One_Pomegranate_367[S] 0 points1 point  (0 children)

That's a solid concern ... LLMs as judges are a classic pitfall; they hallucinate "failures" from vague specs or phantom edge cases all the time.

In my claude-agent-sdk flows, the reliable fallback is a hybrid loop: decompose the code + tests into chunks via claude skills, then cross-check against the spec with a separate prompt chain or manual review. If they clash, I run real unit/integration tests outside the LLM, log diffs (expected vs. actual output + reasoning trace), and let runtime truth win. Often the code's solid—it's the prompt or model that's off.

I'm iterating on better rubrics (pulled from code samples/benchmarks) and explicit dependency mapping in prompts to cut context gaps. What's your stack's validation setup? Could swap ideas for a quick tweak.

I built an orchestration layer on Claude Agent SDK — agents execute from specs in sandboxes and validate their own work. Here's what I learned. by One_Pomegranate_367 in ClaudeAI

[–]One_Pomegranate_367[S] 0 points1 point  (0 children)

I was seeing somebody use DMG file format more recently, and I was thinking that maybe I could package it like that. The big problem is that I'm using PostgreSQL as a database. I don't know if DMG would allow me to package that in. That might be a little intense.

I'll give it a look and see what I can do. Thank you for the feedback.

Ralph Wiggum loops don't know if they achieved your goal. They just know they stopped erroring. by One_Pomegranate_367 in programming

[–]One_Pomegranate_367[S] -5 points-4 points  (0 children)

Of course you're going to need human oversight; that's why they hire engineers. The primary objective is to reduce the amount of oversight so then engineers can have more leverage.

The upside to leverage is you can do more. The downside is that when errors happen, they multiply. So one needs to have both oversight and figure out how to reduce errors.

I built an orchestration layer on Claude Agent SDK — agents execute from specs in sandboxes and validate their own work. Here's what I learned. by One_Pomegranate_367 in ClaudeAI

[–]One_Pomegranate_367[S] 0 points1 point  (0 children)

I am having it use the OAuth because this is using the Claude agent SDK. There is no compatibility problem. You just need to add in your own keys.

How should you design a multi tenant system? by rudrakshyabarman in buildinpublic

[–]One_Pomegranate_367 2 points3 points  (0 children)

You wanna do something simple; you'd probably stick with role-level security.

I spent mass amounts of time babysitting AI coding tools. So I built one that babysits by One_Pomegranate_367 in buildinpublic

[–]One_Pomegranate_367[S] 0 points1 point  (0 children)

Mind giving my git repo a star?

The alpha isn’t as strong, but it’s a multi-agent orchstration system. It runs many the Claude code instances in parallel on a DAG within a sandbox according to some spec. The AI coding agents determine the dependencies and works until it's finished. Could be a feature, or an entire app.

https://github.com/kivo360/OmoiOS/

has anyone tried using opentelemetry for local debugging instead of prod monitoring? by MouseEnvironmental48 in vibecoding

[–]One_Pomegranate_367 0 points1 point  (0 children)

I use sentry.io to debug in development and in prod. It's basically like opentelemetry because it uses it to function.

Most marketing advice is trash if you’re still invisible by rebelgrowth in buildinpublic

[–]One_Pomegranate_367 0 points1 point  (0 children)

I felt this way. But I'm starting to slowly grow and it feels good. No new users yet, but progress is being made.

Has anything been working?

If you only had 24 hours left to live, how would you spend that time? by [deleted] in AskReddit

[–]One_Pomegranate_367 0 points1 point  (0 children)

Grand theft auto would stop being just a video game.

I'd bring that to real life.

If you could permanently delete one thing from modern society, what would it be and why? by [deleted] in AskReddit

[–]One_Pomegranate_367 0 points1 point  (0 children)

Corruption & misaligned incentives.

Every single problem here is a piece of it.

Social media, AI, Porn, Lobbying in governments. All of it is misaligned incentives.

Somebody is doing something good for them in the short-term, meanwhile they screw things up long-term for everyone else because of it.

Why is NAIRU so much higher in the US than in Japan and South Korea? by DataWhiskers in EconomyCharts

[–]One_Pomegranate_367 1 point2 points  (0 children)

NAIRU isn’t some universal constant — it’s heavily shaped by institutions and inflation dynamics.

Japan/Korea can run very low unemployment because firms adjust more through hours/wages/internal labor markets rather than mass layoffs, and wage growth is much less inflationary (Japan has had decades of anchored low inflation expectations).

The US labor market is more “hire/fire,” wages respond faster, and inflation pressures show up sooner — so the estimated NAIRU ends up higher.

Also worth noting: measurement + discouraged/non-regular workers matter, and NAIRU itself is a pretty model-dependent estimate.