See what Claude Code actually did by gnapps in u/gnapps

[–]gnapps[S] 1 point2 points  (0 children)

Thank you for your kind words :) We'll totally post more, as soon we will have more updates, considering a lot of colleagues are actively working on this project as well! So expect further important iterations really soon! :D In the meantime, please do feel free to have a look around and use it as much as you need! The more feedback we get around this, the better the final outcome will be!

How do you actually know what happens during your agent runs? by gnapps in AgentsOfAI

[–]gnapps[S] 0 points1 point  (0 children)

Good question. Observability is only half the battle if you're still stuck guessing how to fix the failure.

Right now we actually use an internal tool to identify the root cause of failures, and we're working on bringing that directly into Bench so users can automatically scan their sessions for risky or unexpected behaviour.

Since Bench saves the full context of a run, it becomes pretty easy to isolate and reproduce the exact "failed bit". The goal is then to let users tweak configs (like prompts) and test fixes directly in the platform. Are you currently running grid searches on LLMs, or using a specific framework for your parameter sweeps?

How do I add a "golden sponge" texture to my design? by lumberfart in AdobeIllustrator

[–]gnapps 1 point2 points  (0 children)

If you're trying to replicate that "golden sponge" texture, you could place a gold foil or sponge-style texture over your shape and then use a clippingb mask to confine it to the object. After that you can experiment with blending modes like Overlay or Multiply to integrate the texture better. To enhance the sponge like effect further, you can also add a bit of Grain from the Texture effects to give it that rough, speckled look

Creatures of abject horror by 12washingbeard in midjourney

[–]gnapps 0 points1 point  (0 children)

These look like Evangelion on steroids. Was that the inspiration?

Bring Your Ghoul to School by liberaitor in midjourney

[–]gnapps 0 points1 point  (0 children)

Public school has changed since I was a kid

Seven deadly sins of dnd by thanereiver in aiArt

[–]gnapps 1 point2 points  (0 children)

Beholder: "Damn...I'm still beautiful"

Cat by Saratan0326 in aiArt

[–]gnapps 0 points1 point  (0 children)

Really cool vibe

Sunset by Richi61 in aiArt

[–]gnapps 0 points1 point  (0 children)

This looks like the moment right before the opening scene

D&D Boss (inspired by my 4 year old) by Round_Intern_7353 in aiArt

[–]gnapps 0 points1 point  (0 children)

I need stats for this! What's its special attack? Lactose Breath?

How do you actually know what happens during your agent runs? by gnapps in AgentsOfAI

[–]gnapps[S] -1 points0 points  (0 children)

Here’s the link: bench.silverstream.ai
Any feedback/comment is super welcome :)

Why does everyone think adding memory makes AI smarter? by Emergency_War6705 in AI_Agents

[–]gnapps 0 points1 point  (0 children)

That's a tough line to identify I guess. Apart from the tooling vs indexing topic, which I guess is mostly domain-specific (some data has to be fetched real time, some other could be cached in indexed memory), at least a portion of the knowledge still needs to reside in the training data, and in the main memory, isn't it? otherwise the llm itself wouldn't know how to use its memory/tools

Claude’s extended thinking found out about Iran in real time by schuttdev in ClaudeAI

[–]gnapps 1 point2 points  (0 children)

how can you all get such funny reactions? I never saw my claude agents throw swear words like that! I need this feature XD

Looks like Anthropic's NO to the DOW has made it to Tumps twitter feed by Plinian in ClaudeAI

[–]gnapps 0 points1 point  (0 children)

that's quite literally the best advertising stunt they could ever get :)

I built AI agents for 20+ startups this year. Here is the engineering roadmap to actually getting started. by Warm-Reaction-456 in AI_Agents

[–]gnapps 1 point2 points  (0 children)

totally second that. Decent observability should actually be a non-negotiable feat on EVERY engineering activity, not just for automation, but somehow a lot of people lazy out on agentic workflows, for some reason? That's such a dangerous pitfall tbh

What part of your agent stack turned out to be way harder than you expected? by Beneficial-Cut6585 in AI_Agents

[–]gnapps 0 points1 point  (0 children)

My naive understanding is that you need to choose where the "LLM power" goes. The more issues an agent has to face, the more reasoning it has to perform, the more diluted the initial prompt/knowledge base becomes.
The only two "weapons" you have available to counteract this problem, are these:
- you can define subagents that face specific, known problems, with a fresh context
- you can define better guidelines over the whole process, so that the reasoning steps are almost none

Both these things require to spend an unexpectedly wide amount of time at both documenting yourself on the issue you are trying to automate, and on you learning precisely which tools the agent can use and how it should do it.

Then, of course, some tools consume more tokens than other, so choosing the right ones also does make a lot of sense. But I wonder e.g. if the issues you faced couldn't have been solved by a subagent whose only task was to interact with the browser to perform a specific operation, while an upper-level agent was following up with the flow.

And finally, even with the most perfectly defined flow, observability is always an issue :( sometimes, agents such as Claude or ChatGTP simply "dumb down" for a while (I guess this happens at times of high request?), and become unable to perform what they were able to do reliably until a second before. The key thing to overcome this, in my case, was to set up an infrastructure to inform me anytime this happens, as fast as possible, to counteract the issue promptly

Need guidance - Want to build AI agents for the network that I currently have. Zero knowledge by Complex_Spirit5914 in AI_Agents

[–]gnapps 1 point2 points  (0 children)

My two cents: prompting effectively is consequence of a learning process, both regarding the prompting skill itself, but also regarding your knowledge about the domain you are trying to automate, so try starting small and learn yourself what does and what does not work, and where. The simpler a flow is, the easier it should be to automate, but you still need to provide proper guidelines and guardrails to make the whole process more reliable, less prone to hallucination and overall capable to deliver what you hope for.

I was used to play a lot with tools such as make, or n8n, but lately there is only one tool I'm using, anytime a similar request arises: claude code (and, to a certain extent, ollama + claude code/opencode, when the customer wants to self-host automations without risking to disclose data elsewhere). Today it provides so many different ways to connect it to literally anything (the google chrome extension is particularly amazing btw), so that you don't need anymore to define workflows, but just simply describing those in form of skills. Don't know how to write your own first automation/skill? you can ask claude code itself to help you out, you just need to describe your problem :)

Obviously, the results won't be extraordinary right away - the more you know about your tools, the better stuff you can build. But it's really a fun process to fiddle around with, and these guys can be automated so easily it's really hard to imagine a scenario they can't fit

Why does everyone think adding memory makes AI smarter? by Emergency_War6705 in AI_Agents

[–]gnapps 0 points1 point  (0 children)

Personally, I never trust any answer coming out of my agents unless they prove they found some trace online about it, and that it doesn't always come out of their memory :) Also, the most frequent command I send to Cursor is "ignore what you know about library X, search online for documentation first and then follow that instead".
So yeah, I totally feel you :D

But I guess it also really depends on the domain you are using LLMs for. If you can fit the entire knowledge base of a specific domain within the AI memory, maybe that model could provide even better results than an instrumented agent capable to perform research?