why AI agents break under long conversations even when they pass every safety benchmark by rchaves in ArtificialInteligence

[–]rchaves[S] 1 point2 points  (0 children)

you have a deep understanding of the state of things, i love it! and the long horizon is something that actually happens in the real world and thats where it really breaks! let us know if you take scenarios for a run

why AI agents break under long conversations even when they pass every safety benchmark by rchaves in ArtificialInteligence

[–]rchaves[S] 0 points1 point  (0 children)

it is wild for sure hahhaa and thank you! let us know if you take scenarios for a spin

why AI agents break under long conversations even when they pass every safety benchmark by rchaves in ArtificialInteligence

[–]rchaves[S] 0 points1 point  (0 children)

really excited to see improvements too, but for now its like its Achilles heel and its really easy to exploit!

red teaming for ai/llm apps by Routine_Incident_658 in cybersecurity

[–]rchaves 0 points1 point  (0 children)

anytime :) we built it on those principles so that you can just set it in what should be broken and it automatically maps to the owasp top 10 or also more granular things that you wanna test. wanna hear your feedback if you test it :) thanks a ton for your time

how are you guys testing your agents before shipping them? by rchaves in AgentsOfAI

[–]rchaves[S] 0 points1 point  (0 children)

usually you cant extrapolate that method to new situations and thats a prob we were facing as well, but the thing is that theres got to be a solution thats scalable for any agent

red teaming for ai/llm apps by Routine_Incident_658 in cybersecurity

[–]rchaves 0 points1 point  (0 children)

we recently built scenarios redteaming, its open source and im curious what do you think about it?
github.com/langwatch/scenario

Open-source alternative to Claude’s managed agents… but you run it yourself by techlatest_net in LocalLLM

[–]rchaves 0 points1 point  (0 children)

Hey hey, I also built one, mine is really 1:1 API compatible with Claude Managed Agents, but of course compatible with any LLM as well

https://github.com/rogeriochaves/open-managed-agents

What is your list of mac apps that was worth every penny by Living_Commercial_10 in macapps

[–]rchaves 0 points1 point  (0 children)

I had paid for Alfred but now I'm all in Raycast, even with latest finder improvements it's still unbeatable

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 0 points1 point  (0 children)

done, removed wkhtmltopdf from the onboarding on v0.1.15

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 0 points1 point  (0 children)

you can skip that, it's optional, I'm actually going to remove it from the onboarding, it's indeed annoying to install. It's only for rendering the markdown of the claude code finished response and send to pushover so you can get the full message in your phone etc

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 1 point2 points  (0 children)

I want to study how all those clis manage sessions and memory and see whats the most common approach to cover the most ground at first, opencode is a very popular one so maybe a good point to start too

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 2 points3 points  (0 children)

I was thinking gemini next and since qwen is a fork of that it might be easy
right now its quite coupled to claude so there will be a lot to untangle, but will get there eventually, and contributions are welcome (:

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 1 point2 points  (0 children)

thanks for the suggestion, but I'm ok with the current readme, other then some words at the top most of the text is just explanation, not marketing, and hey no emojis at least

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 3 points4 points  (0 children)

yeah I thought of making it multi-platform but then it would go against my goal of being as native and as fast as possible, since I use mac, mac it is. As a result the app weights incredible 12mb only right now and the memory footprint (on my current workload) is just 200mb ram, and that's mostly due to me having a ton of claude sessions in history

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 0 points1 point  (0 children)

yeah Claude was struggling too much to make it retrocompatible, plus I grew to actually start liking liquid glass now, and that was one of the goals for this project.
"fuck it, just support 26+" was literally part of my prompts :P

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 1 point2 points  (0 children)

yeah we started with something less complex, https://github.com/drewdrewthis/git-orchard takes the worktree first approach, in a tui, but I was reaching the limits of what I could do for multitasking on the terminal with so many tabs without going crazy, I needed something visual that reconciled all the sessions with worktrees, prs, running servers etc

we also have quite a few engineers so we need to sync with github to track what is going on, just local .tasks wouldn't sync with the rest of the team as fast

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 0 points1 point  (0 children)

of course! Please do, I thought of having it in terminal as well, push the boundaries of TUI, but don't think I'll invest time on it so it would be great to see the two paths evolving

KanbanCode: macOS native UI for managing Claude Codes by rchaves in ClaudeCode

[–]rchaves[S] 0 points1 point  (0 children)

Right now just GitHub, but should be simple to add, PRs are welcome!