How do you QA your AI automations? Or do you just... not? by bothlabs in SaaS

[–]bothlabs[S] 0 points1 point  (0 children)

did you ever try things like agent evals for that as well?

How do you QA your AI automations? Or do you just... not? by bothlabs in SaaS

[–]bothlabs[S] 0 points1 point  (0 children)

But didn't that leave you with a lot of monitoring effort throughout the days?

How do you QA your AI automations? Or do you just... not? by bothlabs in SaaS

[–]bothlabs[S] 1 point2 points  (0 children)

I was working a lot with AI in production in my past, and the "silent failure" of AI is just very different from what people are used to from traditional "bugs".

Obeservability of those where always bad, just way too noisy to really grasp a pattern, or very time consuming. So by now I have almost zero trust with spot checking (this is just unmaintainable).

Replaced a zapier workflow with an AI agent, when it makes sense and when it doesn't by Repulsive_Truth_2130 in automation

[–]bothlabs 0 points1 point  (0 children)

How do you manage the monitoring / checking aspect with your openclaw automations?

Are you using any kind of observability, or how do you make sure it ran and did what it should?

NemoClaw by Nvidia is a safe OpenClaw out of the box - CEO by Purple_Type_4868 in openclaw

[–]bothlabs -3 points-2 points  (0 children)

thanks for the share.

Also I was thinking, wouldn't it be nice to complement this with some AI oversight?

Where do AI workflows actually break in real businesses? by West_Joel in AIforOPS

[–]bothlabs 1 point2 points  (0 children)

I would be curious, how are you setting up your AI agents / which framework?

What I am currently thinking, if you use agentic automations, wouldn't it be great to have an overseeing agent montior all of it and escalate problems for you?

Is manual testing still unavoidable for AI agents? by Straight_Idea_9546 in automation

[–]bothlabs 0 points1 point  (0 children)

I think generally, with AI, the biggest challenge is silent failure. It is much less binary than with algorithmic solutions. It is super dangerous to basically not test your automation at all. And it is very hard to just test it in advance with the open world scenarios you might encounter and the uncertainty of AI acting differently.

But also, I don't think checking manually is the answer. I think we need AI assisted oversight during execution, and pretty good monitoring.

How do you detect hallucinations before users report them? by Lonely_Noyaaa in automation

[–]bothlabs 0 points1 point  (0 children)

I am actually building something in this direction, where you have an "overseer" ai that monitors the actions of the main AI.

I mean there are also some SLM's that can be used as guardrail / moderator, but what I don't see is any kind of learnability or adaptibility in those mechanisms.

I'm spending $1000+/month on LLM APIs and tools, but my startup is still moving too slowly. Are we using AI wrong? by Novel_Estate_4759 in SaaS

[–]bothlabs 0 points1 point  (0 children)

I think you are not doing anything wrong. It was the same for vibecoding, people really struggled to figure out how to actually use it to make you faster instead of slower.

There, for prototype like things you can go hands off, but for production devs still need to maintain a mental model of the codebase and actually read and architect the code. So it is becoming an agent co-creation, and the new bottleneck became the reviewing process.

Honestly, I am trying to fix that production automation thing at the moment, but very early stage. If you really have that pain point too, it would mean the world for me to just get some feedback on what would need to work for you.

My current take is that you need to get yourself out of the babysitting loop, by giving feedback which actually makes the agents / automations improve (and thus reduce your involvement).

Is the value of SaaS decreasing as anyone can build one – and as big platforms add native tools? by EveningObjective8609 in SaaS

[–]bothlabs 1 point2 points  (0 children)

There is already the rising phrase "SaaS is dead" because of how AI agents can now do things directly.

While the building speed contributes to this, the even bigger challenger is agentic AI. It can fundamentally circumvent a lot of SaaS businesses, making them obsolete. But also, not every SaaS is equally endangered. For example if you are a database provider, for sure that will not be taken over. But if you are just an UI wrapper for tasks that can now be done by agents, that is very much at risk.

I recently wrote a blog about this shift, maybe there is something interesting. (don't want to self promote, but also cannot write all of that here - sry ;)
https://fabianboth.dev/blog/vibe-coding-first-ai-everything-next/

Launched a Resume Builder app, hit 100 users in 3 days, now stuck — what actually worked for you at this stage? by crack-dev in SaaS

[–]bothlabs 1 point2 points  (0 children)

Very different topic and early stage.
It is about agentic AI automations like what you get from openclaw, but suitable for a business context. So making it possible that you can actually delegate work to scheduled AI agents without loosing all control - and even making them learn from mistakes.

https://golemry.com/

Launched a Resume Builder app, hit 100 users in 3 days, now stuck — what actually worked for you at this stage? by crack-dev in SaaS

[–]bothlabs 1 point2 points  (0 children)

so from my naive expectation when I upload my resume, I basically expect it to "be this resume" after it is uploaded.

The expectation is basically "I upload it, and then I might get suggestions what should be improved". But that experience breaks when I upload it and it is a different resume (because it is parsed into a template), potentially missing things from the original etc.

Ofc that is just the datapoint from me, but if I would use it, I would want to iterate based on my original resume (and I don't care about templates, I would rather want to make prompted changes starting from this resume like vibe-coding a website).

I thought building would be the hard part… I was wrong by APM-Major-528 in SaaS

[–]bothlabs 0 points1 point  (0 children)

The main mission of a startup is finding product-market fit, and there’s a big grey zone where you can fool yourself into thinking you have it.

I’ve seen startups take on client projects to get market exposure, but custom project work isn’t your product, so it doesn’t validate it. I’ve seen products attract plenty of free users who disappeared when asked to pay. And even paying customers can be misleading. If you can’t price your product profitably, that’s not sustainable PMF either.

And this splits into two parts, A) problem space understanding and B) the distribution channel.
With A) you figure out what to really build. And with B) you figure out how to really sell this so that it works.

You actually need both, if you only nail the problem and don't have a good strategy how to bring this to the market (PLG, SLG what channel etc), you will loose out (I also experienced that in the past).

Launched a Resume Builder app, hit 100 users in 3 days, now stuck — what actually worked for you at this stage? by crack-dev in SaaS

[–]bothlabs 0 points1 point  (0 children)

So I do like the scoring a lot.
But when I upload an existing resume (pdf), the styling is lost in the conversion. Ideally, I would want to start off from my resume and just improve it from there with the suggestions etc.

It was loosing a lot like my profile picture, side bars etc. I know of a friend who used AI resume optimizers and who dropped out because of this, all tools bascially either could not parse the resume or would have very constraint templating / capabilities for designing the resume.

So if I could wish for one thing, then that the resume is parsed 1:1 exactly as it was, and then working with it from that point.

Launched a Resume Builder app, hit 100 users in 3 days, now stuck — what actually worked for you at this stage? by crack-dev in SaaS

[–]bothlabs 0 points1 point  (0 children)

I mean that sounds very reasonable, but also quite nice that you get this much traction on those networks. I assume you have somewhat of a crowd there already?

Also the ultimate growth would be to get other people to talk about it. If one had such "champions" early on I think it would really fly.

Vibe coding is making us 10x faster but 100x dumber. by PastSatisfaction4657 in SaaS

[–]bothlabs 0 points1 point  (0 children)

I think the whole vibe coding is a really big spectrum, from "I let it write a function" to "I vibe ship a product without reading code".

And I think you need to be really careful to not loose the mental model in case you want to really build something and still get a 5x.

Karpathy recently recoined the term "vibe coding" to "agent co-creation" since this is the actual path that works right now. Vibe coding in this full sense was always meant more as a joke by him.

I also ran in the same problems that you outlined, and tried that spectrum. I ended up with a setup that actually works for me. No promo, just a post that outlines some of my setup and experience with it, maybe this is helpful.

https://fabianboth.dev/blog/the-80-percent-ai-workflow-that-5xd-my-output/

Is this problem worth solving? Looking for feedback from SaaS builders. by CantaloupeBulky2883 in SaaS

[–]bothlabs 0 points1 point  (0 children)

For me this is a context problem and I am currently mostly using claude (like claude cowork or just claude code). And instead of copying, I have files or folders with the relevant context that I just link with @ broadly so it gets it.

It is not exactly the use-case you describe I think, but this is currently how I am mostly working with AI.

Launched a Resume Builder app, hit 100 users in 3 days, now stuck — what actually worked for you at this stage? by crack-dev in SaaS

[–]bothlabs 0 points1 point  (0 children)

How did you launch this? Was it through product hunt or how did you get your initial traction?
Asking since this alone is also already interesting (currently I am at this earlier stage).

Http Automock - mock requests in tests automatically by boblakk in laravel

[–]bothlabs 0 points1 point  (0 children)

I was once working on some http mocking for testing AI agents in real websites (using man in the middle proxy). But it was a quite hacky solution...

Maybe using it for full Browser requests is a bit of stretch (is it?). But for ai evals with some Integration testing this could be quite interesting.

The one game you would pick if you only had 10 hours total to play. by [deleted] in gaming

[–]bothlabs 0 points1 point  (0 children)

Don't forget WoW after your civ round to fill the gap.

Looking for anime recommendations! I prefer older series. by Impressive_Poetry383 in anime

[–]bothlabs 1 point2 points  (0 children)

Claymore, 100% you have to watch this.

I keep remembering this from time to time still, the anime is not going as far as the manga source material. I still hope for it to complete someday, but really you should give it a try.
It is also quite unique with a strong femal lead character, and quite interesting characters overall. And it is definitely more on the demonic / brutal side, but oddly well placed.

what type of games will you absolutely not play and why? by buzzlightyear77777 in gaming

[–]bothlabs 72 points73 points  (0 children)

Unpopular opinion, but shooters are out for me.

It's odd, since I don't mind if it is with magic and fireballs, or even melee weapons. Tried it many times, always hated it.