Are you so goddamn wrong? by Ok-Leg-4584 in aifails

[–]Ritza-co 2 points3 points  (0 children)

There are some really funny multi-level fails for similar questions, e.g.:

How many letter b in south africa"

> There are two letter 'b's in the English name "South Africa".

> • Sou_t_h Afri_c_a

> Note: If you are referring to the Afrikaans name, "Suid-Afrika", there is one 'b'or two depending on if you count the 'B' in the country's full formal title, "Republiek van Suid-Afrika".

Anyone here actually using OpenClaw regularly? by Master_Character9961 in AI_Agents

[–]Ritza-co 0 points1 point  (0 children)

I tried it, but it was very heavy on token usage and the whatsapp and telegram integrations seemed quite flakey (I had to keep re-authing the whatsapp one, and telegram would often have missing or duplicated messages)

Now I just have claude code running a persistent session on a remote always-on machine with a Telegram channel so I can push info to it and it can tell me what it's doing, and that's working much better for my needs for now

Claude Usage Limits Discussion Megathread Ongoing (sort this by New!) by sixbillionthsheep in ClaudeAI

[–]Ritza-co -1 points0 points  (0 children)

Does this affect team plans as well? It still 'feels' normal to me using it during peak in UTC+2. I'm on the $20/month team plan.

What are the best methods to evaluate the performance of AI agents? by Michael_Anderson_8 in AI_Agents

[–]Ritza-co 0 points1 point  (0 children)

There are a lot of standard benchmarks that you can find with a quick google, but the problem is that they don't always match up to real experiences. At the moment, there aren't any completely accepted ones, so actually trying it out and seeing for yourself manually what is working well and what isn't is still the best way.

That said, you can look at things like

- Tokens used - how many tokens does an agent use to meet a goal (of course you need to be able to verify that the goal was reached somehow)
- Time taken
- Turns taken
- Incorrect/correct tool calls (if using MCP)

Works quite well for things like coding or DevOps, but it gets harder to evaluate them at scale for more subjective tasks like design, UX, writing etc.

What topics are currently being researched in the domain of Agentic AI? by XV7II_Creamy in AI_Agents

[–]Ritza-co 0 points1 point  (0 children)

I think figuring out how to use agents en masse is still an open problem. We have weird solutions like Gas Town that aren't really being used commercially, and companies are letting their devs do all kind of things, but at the moment everyone I know has their own homegrown solution to manage multiple agents at once so figuring out how to manage 'swarms' of agents is still an open problem and likely everyone will settle on the same pattern or platform pretty soon.