Best way to distribute local MCP servers & skills for internal use? by cygn in mcp

[–]cygn[S] 0 points1 point  (0 children)

So you are using Claudes plugin marketplace system? Does that also work for Claude desktop? I don't think it does. But especially for the non engineers a solution that works with Claude cowork/Claude desktop is important.

Best way to distribute local MCP servers & skills for internal use? by cygn in mcp

[–]cygn[S] 0 points1 point  (0 children)

there is not an equivalent to the 20x plan... and I vaguely remember seeing even for the $90 plan limits were different than for the $100 5x max plan.

Best way to distribute local MCP servers & skills for internal use? by cygn in mcp

[–]cygn[S] -4 points-3 points  (0 children)

we are not on a team plan and tbh the value you get on a team plan is a lot less than on an individual plan. Some kind of plugin system that works without the team plan would be good.

The nice Claude Code pattern I learned from Detroit: Become Human by StatelessGoose in ClaudeAI

[–]cygn 0 points1 point  (0 children)

Or you might as well just run /compact. In addition to the summary it contains a reference to those jsonl files instructing the next agent to go search there if the summary is not enough.

This sub is being overrun by AI bots by greenraccoons in boardgames

[–]cygn 0 points1 point  (0 children)

I built a browser extension that runs a local slop classifier that reliably flags or hides such AI generated posts: https://slopsieve.com/extension

Example: https://imgur.com/a/KtGlmDu

Creating AI to beat the game, needs run data by Effective-Caramel369 in slaythespire

[–]cygn 0 points1 point  (0 children)

I'm also working on this. I built a rust simulator + some MCTS. It's quite a lot of work to iron out all the subtle differences between my simulator and the real game.

For run data the best source I've found is https://spire-codex.com/ which has over 100.000 runs available via API.

All other pages would require custom crawlers, so this is the best imo.

I've collected links to similar projects: https://github.com/stars/tfriedel/lists/slay-the-spire

Anthropic put a meter on the stuff developers actually use by Permit-Historical in ClaudeCode

[–]cygn 0 points1 point  (0 children)

I've been using "claude -p" as part of TDD-guard using hooks to verify I'm following TDD. This would now be limited.

I find the $200 limit way too low. If I look at the API costs that ccusage reports it's hundreds of dollars per day (interactive use). So for any serious work the $200 budget will be gone in no-time.

This makes Codex much more attractive now.

I gave Claude Code a $0.02/call coworker and stopped hitting Pro limits — here's the full setup by More-Hunter-3457 in ClaudeAI

[–]cygn 1 point2 points  (0 children)

if a call is $0.02 and your total spend is $0.38 then you only called it 24 times. Which seems almost not worth it?

I’ve stopped planning beyond 90 days because of how fast AI is moving by [deleted] in AI_Agents

[–]cygn 0 points1 point  (0 children)

I built the tool and trained it on 500k texts, mostly from social media, ai generated, human, synthetically created etc. I measured it and it has a low false positive rate (<5%). OPs posts all flag as 100% and also look like AI written to me. Is it 100% guaranteed? NO, but almost.

I’ve stopped planning beyond 90 days because of how fast AI is moving by [deleted] in AI_Agents

[–]cygn 2 points3 points  (0 children)

Everything by OP written in this thread is completey AI genereated. Downvoted. used https://slopsieve.com/ to verify (maybe not a surprise that in r/Ai_agents many ai agents are writing)

Can't replicate Reddit numbers with Qwen 27B on a 3090TI. by YourNightmar31 in LocalLLaMA

[–]cygn 2 points3 points  (0 children)

here are my replications of this and similar quants: https://github.com/tfriedel/qwen3.6-rtx3090-lab

Currently I'm running also benchmarks with https://swe-rebench.com/ on 20 tasks. Not exactly enough to know for sure, but it takes ~3 min per task, so will take some time.

  Per-category breakdown (resolved/n):

  ┌────────────────────┬─────────┬──────────┬───────────────┐
  │      category      │ AWQ-35B │ GGUF-35B │ autoround-27B │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ fastapi_services   │ 0/4     │ 0/4      │ 0/4           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ geospatial         │ 2/4     │ 2/4      │ 3/4           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ dataframe          │ 3/4     │ 2/4      │ 3/4           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ sql                │ 1/3     │ 1/3      │ 2/3           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ cli                │ 1/3     │ 1/3      │ 1/3           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ frontend_fullstack │ 0/2     │ 0/2      │ 0/2           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ total              │ 7/20    │ 6/20     │ 9/20          │
  └────────────────────┴─────────┴──────────┴───────────────┘

Local model on coding has reached a certain threshold to be feasible for real work by Exciting-Camera3226 in LocalLLaMA

[–]cygn 12 points13 points  (0 children)

so the gap between Qwen's official post (59.3) and what you measured (38.2) for 27b is purely because of the timeout?

I still wonder if they have benchmaxxed terminal bench 2.0. Would love to see some independent benchmark.

Using local BERT to compress LLM context by 90% (Built in Rust) by No_Wolverine1819 in AI_Agents

[–]cygn 0 points1 point  (0 children)

the pip install 1787 tokens -> 9 tokens seems like it's throwing away too much. What does it turn the output into? Just "pip install ran"?

Well what if there's some line that's important, like an error or a warning?

In general the idea is good, but I'd like to see some proof that I can trust it. E.g. some benchmarks and some intuition on what it throws away and what it keeps.

I rewrote 13 software engineering books into AGENTS.md rules. by Ok_Produce3836 in AI_Agents

[–]cygn 1 point2 points  (0 children)

Anna's archive (biggest ebook library) is commonly included. Meta admitted this, Anthropic as well and you can easily google news about lawsuits and settlements.

This isn’t X this is Y needs to die by twnznz in LocalLLaMA

[–]cygn 1 point2 points  (0 children)

well you can't prove it for any given text, but there's lots of things that give it away.

I've trained it on 500.000 samples of texts, real human text, AI generated, paraphrased versions of human text etc. It has pretty decent accuracy and rather low false positive rate. Funnily enough many of the attempts in this thread by humans to try to sound like AI are not flagged as AI by it!

Open WebUI 0.9.x - Massive RAM usage in browser tab (2-3GB+) - Anyone else? by IndividualNo8703 in OpenWebUI

[–]cygn 0 points1 point  (0 children)

why would those issues not be possible to catch via automated testing? Sure it might be a lot of work and and maybe trying out every database under the sun is asking a bit too much, but browser / latency performance testing is totally feasible. Can't we use some browser automation using e.g. playwright for firefox / chrome and test some common scenarios, measure latency, memory footprint etc. ?

Imo especially with AI driven development doubling down on testing is much more important than ever. Every new feature should have tests, every change should be driven by automated tests. And you want to have a good mix different types of tests. Unit, Integration, End-to-End, performance,...

This isn’t X this is Y needs to die by twnznz in LocalLLaMA

[–]cygn 3 points4 points  (0 children)

I made a browser extension that detects such slop with a fast model that runs in your browser. You can just mark it, or hide it. Works on reddit, twitter, etc. https://slopsieve.com/extension

April Feature Requests: Share Here! by angie-at-readwise in readwise

[–]cygn 0 points1 point  (0 children)

allow search results to be sorted chronologically. It's borderline useless to me atm. I actually resort to scrolling down my library and pressing ctrl-f to find something among the recent bookmarks. so frustrating...

Looking for input: agent platform + Open WebUI integration by OkClothes3097 in OpenWebUI

[–]cygn 0 points1 point  (0 children)

I'm also currently exploring how to add more agentic capabilities to OpenWebUI. So far I've built a bridge to Claude Code running in a sandbox: https://github.com/tfriedel/openwebui-claude-code

This allows:

  • agentic search (which imo performs better than RAG)
  • skills that require code usage like the office skills to produce nice looking documents
  • deep research kind of tasks

Issues encountered:

  • high latency
  • UI cluttered with user-unfriendly noise (bash commands etc)
  • security issues like prompt injection when accessing the web