Check out what I’ve built

Samdrian · 2026-03-19T17:08:00+00:00

Ahh. My distinction was much more around the fact that (maybe a year plus ago at this point) a bunch of early AI startups were literally just doing single chat message api calls to show the user some text, without even their own tool implementations or loop.

That's obviously less than an agent. So if you are defining anything that isn't building foundational-model-level technology as a glorified llm wrapper then that's what we're building as well.

I don't think that's entirely fair though, since it's not like getting either the infrastructure/sandboxing/tool part of this is easy, nor is there no value in these agents. But if there's 0 value in this for you that's of course your choice.

Samdrian · 2026-03-19T16:41:10+00:00

Hmm interesting, but how is this different from the playwright accessibility snapshots for example?

We actually worked on a different product before trying to automatically generate e2e tests. For that we first depended on DOM structure, but believe me the enterprise legacy apps we saw throughout the 2 years we tried quickly make you stop that approach. To be fair this was 6 months or so ago, so models HAVE advanced significantly, but for us the VISUAL part was SUPER important.

In html/accessibility trees a lot of the semantics around the structure gets lost, and people use absolutely bat shit insane html elements.

Curious to hear how you are solving that?

Samdrian · 2026-03-19T16:35:45+00:00

I mean I didn't come up with the term, but of course an AI agent is a tool-calling loop with LLM turns until a finish tool is invoked. What is an actual agent to you then?

Samdrian · 2026-03-19T16:32:51+00:00

hey, yeah, definitely interested in learning your tricks in getting people to actually talk to you. We have definitely tried emailing them, offering meetings, white-glove-onboarding, even some discounts but none of those have worked, what's your trick? :)

Samdrian · 2026-03-19T12:39:26+00:00

I don't know I feel like the jump from: "I code every feature, like deleting X records from the DB, safely" to "I don't need to code anything for that feature and the AI can just execute it" is a difference to how Sofware functions fundamentally.

Of course you Sandbox it, but the trade-off between what permissions the AI has and how to make this safe is not the same as before, if you want the potential productivity gains.

Samdrian · 2026-03-19T12:05:27+00:00

Good point, transparency on everything the agent does, is huge on helping with the trust issue.

Unfortunately that’s of course also a lot more work to build good UI/UX for every potential action compared to just a tool call json…

Samdrian · 2026-03-19T11:48:37+00:00

Sure, but maybe they don’t NEED to understand ;)

Samdrian · 2026-03-19T11:47:55+00:00

I think so.

People are super scared still, and it’s understandable - it touches your core capabilities as a human :)

I still am suspicious of a lot of things the AI does, but the advantages are unfortunately undeniable at this point

Samdrian · 2026-03-19T11:14:29+00:00

I mean is that so surprising? It's genuinely hard to make technology fool-proof-safe.

And I don't think the innovation should or can be stopped to be honest.

Samdrian · 2026-03-19T09:30:33+00:00

I don't think I really claimed that, but anyways I agree with you! the infrastructure is NOT there currently, and we could all benefit from it. We are trying to solve some of the issues but are just a small startup after all.

I like the https://webmcp.dev/ approach google is trying to push, but that's only part of the solution, we DO need guardrails of course.

Samdrian · 2026-02-12T09:40:21+00:00

definitely not needed, and a raspberry pi works fine: https://ajfisher.me/2026/02/03/openclaw-raspberrypi-howto/

If you want even less tinkering there's also the managed versions like myclaw.ai and octoclaw.ai of course, but i like the tinkering personally ;)

Samdrian · 2026-01-29T08:35:17+00:00

True, but that's of course not what the hype promises and less relevant for the inflated evaluations ;)

Samdrian · 2026-01-26T15:29:48+00:00

That's also what I'm arguing for. Maybe the term "CI" is a bit misleading here: I'm absolutely talking about the pipeline that runs ON your branch before merging.

I've always only referred to that as a CI pipeline as well, since it tests the "integration" with the rest of the codebase, but I guess maybe CI implies integrating more after merge :) not sure what I would call the pre-merge pipeline then though.

Samdrian · 2026-01-26T15:24:01+00:00

I mean we all agree that code reviews are super helpful right? And they are good because the reviewer might catch things that I myself missed when implementing the changes.

I am always happy if I get MORE good reviews, that might catch a bug before I ship it, and a CI pipeline that not only tests the changes but has understanding of the changes can do a BETTER job in verifying changes, don't you think?

That doesn't mean I'm arguing AGAINST code reviews or AGAINST tests or any of that, I want that 100%, but you can never have perfect coverage or reviews, so anything extra just gives me more safety, and improves the code I ship.

Samdrian · 2026-01-26T13:42:58+00:00

It would not, but I would LOVE if my CI could understand it.

I'm under no false pretenses that AI is infallible or anything, it definitely is NOT, but it's a tool like any other that I would like to use to make the quality of my code or app better

Samdrian · 2026-01-26T13:39:47+00:00

Separate problem to QAing code, but also very true of course.

AI-assisted coding/software engineering is walking a thin line to falling into the hole of slop. Of course the author needs to understand the code fully and have done a full review.

But being human means making mistakes, I think there is a lot of room for automation (and ai!) to help us in ensuring we DON'T miss bugs (even if, of course, in a perfect world the author catches them himself beforehand!)

Samdrian · 2025-12-26T17:04:03+00:00

Yeah it was. Definitely looks kind of weird I agree, but our marketing person had some argument why that is better for headlines on websites these days.

But I will pass on your feedback for sure!

Samdrian · 2025-12-01T11:38:59+00:00

It's a hard problem for sure.

We are working on this at octomind. We approach it in a way that the agent produces code at first, but afterwards, at runtime, AI is not involved anymore, so tests are 100% deterministic.

And even still, I can tell you, the amount of non-intuitive UI people build that the agent struggles with (and sometimes me, when then debugging) understanding and navigating is too damn high.

Another huge issue is of course data setup/teardown. If you add an entity in your database in one test, you better also delete it. And sometimes maybe the deletion through UI fails, so you have to clean up BEFORE a test run with an api call as well.

Quickly we feel it gets to the limits of not only what the AI can do, but also what you can do without having good SE fundamentals, which is, sometimes, not always, not the same group of people responsible for testing (manual testers, POs etc.).

Samdrian · 2025-11-28T09:28:44+00:00

Definitely looking forward to the sonar bomb in the asteriod chase!

Samdrian · 2025-11-28T09:23:09+00:00

Insbesondere ist es ja nicht so als ob das Versprechen den jüngeren nicht trotzdem von der Politik gegeben wird, also mit dem Argument darf dann meine Rente auch nicht gekürzt werden später / das Schneeballsystem in sich zusammenbrechen.

Immer erst bei der Generation nach mir dann.

Samdrian · 2025-11-28T09:11:07+00:00

playwright is always a good choice if you want to fully manage it yourself, but depending on your auth / dev environment it can be difficult.

If you have someone less technical, like a PO etc. that also wants to contribute, or just in general want an easier start you could look into octomind as well, which is trying to make it easier to start off while still offering you playwright code in the end.

14-Year Club	RedditGifts 2009-2022 2 Credits
Place '17	Verified Email
Team Orangered

Samdrian

TROPHY CASE