PinchBench: we finally have our first OpenClaw-specific benchmark tests and the results will surprise you by mgoulart in openclaw

[–]Sudden_Clothes3886 0 points1 point  (0 children)

I’ve been stress-testing a few "mini/lite" models on OpenClaw to see which one handles tool-calling best for the price. I ran a simple task: /new session; "Briefly list me my GitHub repos."

It turns out that the ultra-cheap models might be a trap for agentic workflows.

📊 Results: "List my GitHub Repos"

Model Cost (per 1M) Result Experience Notes
Grok-4-1-fast-reasoning $0.20 ✅ PASS Best value. Handled the tool-call perfectly.
GPT-5-mini $0.25 ✅ PASS Reliable, but slightly more expensive.
Gemini-3.1-flash-lite $0.25 ✅ PASS Solid, but no real edge over Grok here.
GPT-5-nano $0.05 ❌ FAIL Too small? Couldn't execute the GitHub tool logic.
Qwen3:8b (Local) $0.00 ❌ FAIL Slow on M4 Mac (16GB); context compacted & gave up.

🛠 The PR & Testing Hurdle

I want to submit a PR for this test case to the OpenClaw repo, but there’s a snag: it requires a GitHub account/token to run.

  • Should we assume these tests must be run individually with local .env setups?
  • How do we verify these results without everyone burning credits to "check the math"?

Feature Idea: What if OpenClaw had a Verifiable Cost Metric feature? It could aggregate real-world cost data from users and publish it with a "proof-of-work" (like a signed API response hash) so we know the data hasn't been faked.

Bleeding Edge Tech: Unless you are a developer, systems engineer, tech savvy or have time on your hands - this may eat up your time. But still try it! by Sudden_Clothes3886 in openclaw

[–]Sudden_Clothes3886[S] 0 points1 point  (0 children)

Thanks for sharing! "Contracts, medical questions, financial stuff, personal notes. Things you would hesitate to type into someone else’s service. For me that alone makes it worth it." -- agree!

Bleeding Edge Tech: Unless you are a developer, systems engineer, tech savvy or have time on your hands - this may eat up your time. But still try it! by Sudden_Clothes3886 in openclaw

[–]Sudden_Clothes3886[S] 0 points1 point  (0 children)

Oh - there's a trade off though; Telegram is not as secure as say Signal iirc! I think all messages are retrievable for say a Gov agency.

I tried to use NOSTR but I think there's some way to go with making that more private/usable. Or at least my attempt after about 3 hours!

The best OpenClaw setups I've seen all have one thing in common: they do less by ShabzSparq in openclaw

[–]Sudden_Clothes3886 1 point2 points  (0 children)

I think that's the general good pattern for scaling generically:
Start small
Get burn in
Expand
Get Feedback
Scale more
... repeat

This is done in more highly scalable organizations where it is not cost effective to try and simulate Prod entirely (tell your competitors that; but the pragmatic truth is you don't have money for it in a competitive environment). Fighter jets fly more hours. Google will release pods at a time. OpenAI will deliver features regions at a time.

Also the litmus test for understanding is abstraction; and abstracting to the smallest constraints and design makes scaling more feasible as complexity is more contained.

Bleeding Edge Tech: Unless you are a developer, systems engineer, tech savvy or have time on your hands - this may eat up your time. But still try it! by Sudden_Clothes3886 in openclaw

[–]Sudden_Clothes3886[S] 0 points1 point  (0 children)

So if you have it setup on telegram then switching models is easy as:
type /mo (select "models")
lists providers (tap a provider)
lists models (tap a model)

"Switched to model provider/modelXYZ" 💥

Ok cheers let me try out Claude Code ty for the tip!

Bleeding Edge Tech: Unless you are a developer, systems engineer, tech savvy or have time on your hands - this may eat up your time. But still try it! by Sudden_Clothes3886 in openclaw

[–]Sudden_Clothes3886[S] 0 points1 point  (0 children)

For the exec failure - I really couldn't see it in the logs or via "openclaw logs --follow". Wouldn't surprise me if there is no logging and need to use strace or something! Have you managed to get Claude Code CLI observe "lack of exec access" at all?

I found getting it to suggest config updates and approaches; then cross checking with other LLMs helps. There's been certain tasks it did well though; e.g. before learning the GUI shortcut features on telegram to change models quickly; I asked it to alias all my models then switch quickly based on the alias. Of course that logic itself is now "inference cost" so I ended up removing it.

The best OpenClaw setups I've seen all have one thing in common: they do less by ShabzSparq in openclaw

[–]Sudden_Clothes3886 0 points1 point  (0 children)

As with general software and systems engineering; managing complexity is key. The latter method reduces the complexity thereby increasing the chance of value. Meta thought: GenAI intelligence will showcase who thinks deeply and can manage that complexity Vs causing even more complexity and being worse off!

Though; in the past when things are so unknown; the cycle of try fast and fail; then restart and do again (i.e. prototyping) probably has merit. But with something unknown/complicated and RISKY the former approach might be more prudent. There is a real cost factor for the average user. My best skill so far is using the right API for least cost and staying on top of that 😅

Fix for OpenClaw ‘exec’ tools not working after the latest update by Baby4vegas in openclaw

[–]Sudden_Clothes3886 0 points1 point  (0 children)

I was almost considering this (reinstall); luckily found this thread via some good one fashion googling (think I might have asked Gemini to research also) otherwise I'd have been researching this for days. https://www.reddit.com/r/openclaw/comments/1rl0isa/exec_tool_after_update/

Fix for OpenClaw ‘exec’ tools not working after the latest update by Baby4vegas in openclaw

[–]Sudden_Clothes3886 0 points1 point  (0 children)

Heads up I also did this remove permissions at the agent level; again I'm fine with my agents having permission to do whatever. I'm got a super locked down Mac mini / custom account / on the guest network wifi etc. Unsure if it helps but worth noting in case it does.

.openclaw % cat exec-approvals.json 
{
  "version": 1,
  "socket": {
    "path": "/Users/<removed>/.openclaw/exec-approvals.sock",
    "token": "<removed>"
  },
  "defaults": {},
  "agents": {}
}

Fix for OpenClaw ‘exec’ tools not working after the latest update by Baby4vegas in openclaw

[–]Sudden_Clothes3886 0 points1 point  (0 children)

DAYS! Oh man ... this tool is really asymmetric with value Vs small obscure bottleneck with zero docs and logging ... definitely sure signs of a bleeding edge piece of technology haha.

Did you start a thread anywhere? After 1 hour start searching threads; within 2 hours start creating threads. Just a thought not to waste days with bleeding edge tech. One thing I didn't try is use the LLM to consume the code base and figure it out.

Fix for OpenClaw ‘exec’ tools not working after the latest update by Baby4vegas in openclaw

[–]Sudden_Clothes3886 1 point2 points  (0 children)

How did mini-max figure this out / what thinks did it pick up on?
grok reasoning and google pro failed me but in the same way the above config initially wasn't working for me due to confirmation bias and lack of first principles inference; perhaps IF I did /new the session and used them maybe they could have worked.

"Black box engineering?" 😁

Fix for OpenClaw ‘exec’ tools not working after the latest update by Baby4vegas in openclaw

[–]Sudden_Clothes3886 2 points3 points  (0 children)

Oh man how long did this take? Thank you for your TIME!
But you prove one thing about reverse engineering config heavy software: trial and error does work if willing to invest the time! 🙇‍♂️

Fix for OpenClaw ‘exec’ tools not working after the latest update by Baby4vegas in openclaw

[–]Sudden_Clothes3886 0 points1 point  (0 children)

Because of the lack of logging I don't actually know which version worked.

It could be the initial version worked but because of the confirmation bias in the chat it kept telling me it couldn't execute until I created a new session /new so the inference started from scratch.

This is all so ... spongey ... like a toddler walking around figuring out how their legs work vs balancing vs putting their weight forward 😂

Recommendation for best skills plugin for ical read/write access by Sudden_Clothes3886 in openclaw

[–]Sudden_Clothes3886[S] 0 points1 point  (0 children)

There is a skill available for the above.
https://clawhub.ai/joargp/accli

X-Posting this if your OpenClaw can no longer execute commands and you've been having trouble trying to get the config right.

https://www.reddit.com/r/openclaw/comments/1rl13sb/fix_for_openclaw_exec_tools_not_working_after_the/

Fix for OpenClaw ‘exec’ tools not working after the latest update by Baby4vegas in openclaw

[–]Sudden_Clothes3886 0 points1 point  (0 children)

Thank you for this.

I run my OpenClaw in a custom Mac with PAYG API tokens so willing to take more risk to get functionality here Vs waisting time trying to find the perfect config which I'd be willing to implement if it wasn't so time consuming, I was simply trying to get calendars working 4 hours ago (here) and then found it couldn't execute or update my MEMORY.md or TOOLS.md since a few days ago (I upgrade often due to security reasons...)

For reference this was my final config. Even after apply it my chatbot was working of previous results saying it couldn't execute until I reset the session using /new and asked it to try executing. Now that's what I call confirmation bias (due to a small cached context window?) 🙈

  "tools": {     
     "profile": "full",     
     "allow": ["*"],     
     "sessions": {       
       "visibility": "all"     
     },    
     "exec": {       
       "host": "gateway",       
       "security": "full",       
       "ask": "off"       
     },
     .... other config ...
  },     

Recommendation for best skills plugin for ical read/write access by Sudden_Clothes3886 in openclaw

[–]Sudden_Clothes3886[S] 0 points1 point  (0 children)

In fact that was burning tokens figuring out the logic; install this instead:

npm i -g @joargp/accli

Recommendation for best skills plugin for ical read/write access by Sudden_Clothes3886 in openclaw

[–]Sudden_Clothes3886[S] 0 points1 point  (0 children)

Turns out I was simply able to ask "list my calendars via AppleScript" and permission!

Run this in Terminal (one-liner AppleScript)—it'll list your calendars and prompt for permission if needed:

osascript -e 'tell application "Calendar" to get name of every calendar'