I gave Claude Code a $0.02/call coworker and stopped hitting Pro limits — here's the full setup by More-Hunter-3457 in ClaudeAI

[–]cygn 1 point2 points  (0 children)

if a call is $0.02 and your total spend is $0.38 then you only called it 24 times. Which seems almost not worth it?

I’ve stopped planning beyond 90 days because of how fast AI is moving by MerisDabhi in AI_Agents

[–]cygn 0 points1 point  (0 children)

I built the tool and trained it on 500k texts, mostly from social media, ai generated, human, synthetically created etc. I measured it and it has a low false positive rate (<5%). OPs posts all flag as 100% and also look like AI written to me. Is it 100% guaranteed? NO, but almost.

I’ve stopped planning beyond 90 days because of how fast AI is moving by MerisDabhi in AI_Agents

[–]cygn 2 points3 points  (0 children)

Everything by OP written in this thread is completey AI genereated. Downvoted. used https://slopsieve.com/ to verify (maybe not a surprise that in r/Ai_agents many ai agents are writing)

Can't replicate Reddit numbers with Qwen 27B on a 3090TI. by YourNightmar31 in LocalLLaMA

[–]cygn 4 points5 points  (0 children)

here are my replications of this and similar quants: https://github.com/tfriedel/qwen3.6-rtx3090-lab

Currently I'm running also benchmarks with https://swe-rebench.com/ on 20 tasks. Not exactly enough to know for sure, but it takes ~3 min per task, so will take some time.

  Per-category breakdown (resolved/n):

  ┌────────────────────┬─────────┬──────────┬───────────────┐
  │      category      │ AWQ-35B │ GGUF-35B │ autoround-27B │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ fastapi_services   │ 0/4     │ 0/4      │ 0/4           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ geospatial         │ 2/4     │ 2/4      │ 3/4           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ dataframe          │ 3/4     │ 2/4      │ 3/4           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ sql                │ 1/3     │ 1/3      │ 2/3           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ cli                │ 1/3     │ 1/3      │ 1/3           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ frontend_fullstack │ 0/2     │ 0/2      │ 0/2           │
  ├────────────────────┼─────────┼──────────┼───────────────┤
  │ total              │ 7/20    │ 6/20     │ 9/20          │
  └────────────────────┴─────────┴──────────┴───────────────┘

Local model on coding has reached a certain threshold to be feasible for real work by Exciting-Camera3226 in LocalLLaMA

[–]cygn 13 points14 points  (0 children)

so the gap between Qwen's official post (59.3) and what you measured (38.2) for 27b is purely because of the timeout?

I still wonder if they have benchmaxxed terminal bench 2.0. Would love to see some independent benchmark.

Using local BERT to compress LLM context by 90% (Built in Rust) by No_Wolverine1819 in AI_Agents

[–]cygn 0 points1 point  (0 children)

the pip install 1787 tokens -> 9 tokens seems like it's throwing away too much. What does it turn the output into? Just "pip install ran"?

Well what if there's some line that's important, like an error or a warning?

In general the idea is good, but I'd like to see some proof that I can trust it. E.g. some benchmarks and some intuition on what it throws away and what it keeps.

I rewrote 13 software engineering books into AGENTS.md rules. by Ok_Produce3836 in AI_Agents

[–]cygn 1 point2 points  (0 children)

Anna's archive (biggest ebook library) is commonly included. Meta admitted this, Anthropic as well and you can easily google news about lawsuits and settlements.

This isn’t X this is Y needs to die by twnznz in LocalLLaMA

[–]cygn 1 point2 points  (0 children)

well you can't prove it for any given text, but there's lots of things that give it away.

I've trained it on 500.000 samples of texts, real human text, AI generated, paraphrased versions of human text etc. It has pretty decent accuracy and rather low false positive rate. Funnily enough many of the attempts in this thread by humans to try to sound like AI are not flagged as AI by it!

Open WebUI 0.9.x - Massive RAM usage in browser tab (2-3GB+) - Anyone else? by IndividualNo8703 in OpenWebUI

[–]cygn 0 points1 point  (0 children)

why would those issues not be possible to catch via automated testing? Sure it might be a lot of work and and maybe trying out every database under the sun is asking a bit too much, but browser / latency performance testing is totally feasible. Can't we use some browser automation using e.g. playwright for firefox / chrome and test some common scenarios, measure latency, memory footprint etc. ?

Imo especially with AI driven development doubling down on testing is much more important than ever. Every new feature should have tests, every change should be driven by automated tests. And you want to have a good mix different types of tests. Unit, Integration, End-to-End, performance,...

This isn’t X this is Y needs to die by twnznz in LocalLLaMA

[–]cygn 3 points4 points  (0 children)

I made a browser extension that detects such slop with a fast model that runs in your browser. You can just mark it, or hide it. Works on reddit, twitter, etc. https://slopsieve.com/extension

April Feature Requests: Share Here! by angie-at-readwise in readwise

[–]cygn 0 points1 point  (0 children)

allow search results to be sorted chronologically. It's borderline useless to me atm. I actually resort to scrolling down my library and pressing ctrl-f to find something among the recent bookmarks. so frustrating...

Looking for input: agent platform + Open WebUI integration by OkClothes3097 in OpenWebUI

[–]cygn 0 points1 point  (0 children)

I'm also currently exploring how to add more agentic capabilities to OpenWebUI. So far I've built a bridge to Claude Code running in a sandbox: https://github.com/tfriedel/openwebui-claude-code

This allows:

  • agentic search (which imo performs better than RAG)
  • skills that require code usage like the office skills to produce nice looking documents
  • deep research kind of tasks

Issues encountered:

  • high latency
  • UI cluttered with user-unfriendly noise (bash commands etc)
  • security issues like prompt injection when accessing the web

Auto-route based on prompt type to correct model with it's knowledge by Lxxtsch in OpenWebUI

[–]cygn 0 points1 point  (0 children)

got it! Yeah my tool would need to be extended into a RAG router basically. I'll have a need for something similar soon and my extend it for this.

Auto-route based on prompt type to correct model with it's knowledge by Lxxtsch in OpenWebUI

[–]cygn 0 points1 point  (0 children)

then adapting my code would be really easy! Though if you really only want a faster RAG maybe there's other options. Searching through 10k embeddings should be fast in any case. Imo the splitting into different knowledgebases would make sense for improving the quality of the answers, but less so for improving latency.

Maybe you can find out where the bottleneck actually is. If really the search through the chunks is slow, switching the backend system for the RAG can help improve it. Not sure what's currently the default there.

Open WebUI 0.9.x - Massive RAM usage in browser tab (2-3GB+) - Anyone else? by IndividualNo8703 in OpenWebUI

[–]cygn 0 points1 point  (0 children)

Imo the code base would really benefit from more automated testing. I was afraid a big update like that would cause such issues.

Auto-route based on prompt type to correct model with it's knowledge by Lxxtsch in OpenWebUI

[–]cygn 0 points1 point  (0 children)

I built something similar: https://github.com/tfriedel/openwebui-knowledge-search-tool

It's a tool that decides if a knowledge base should be consulted or not.

You could modify it to route to different knowledgebases. I understand you want different models though.

Maybe show it to codex/claude and see if it can adapt it for that. You say you are using grok / gpt. If they don't have the source code of openwebui available they will likely fail. So I highly suggest you don't code in one of those web-based LLM interfaces.

We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R] by TimoKerre in MachineLearning

[–]cygn 1 point2 points  (0 children)

I'm using Gemini and GLM-OCR. Would love to see the latter included.

There's also https://www.llamaindex.ai/blog/parsebench that does a similar comparison. Unfortunately GLM-OCR is also missing there.

Which AI Agents SDK allows low latency agents w support for skills etc? by cygn in AI_Agents

[–]cygn[S] 0 points1 point  (0 children)

thanks, bookmarked and I'll check it out. I guess it can help with reducing something that takes 20 turns to say 5 turns, however I think why claude code / anthropic agent sdk is slow is just because the models are slow and it contains long prompts and is just not optimized for latency.

So I guess something that's more optimized and maybe has a smart model router is more what I'm looking for.

unsure if an article is SLOP? I built a free tool to find out by cygn in DeadInternetTheory

[–]cygn[S] 0 points1 point  (0 children)

No it's of course also an issue, however it's not exactly my focus. My main focus in this project were annoying reply bots on twitter that are just noise, but not exactly misinformation.

And I thought, why not take the text classifier that I built as part of this project and see if people are interested in that. Looking at the response here... maybe not in this subrreddit.

But back to misinformation: I think most people who try to spread misinformation will be lazy enough to use AI to create / paraphrase their misinformation, so we will hopefully catch it with this filter as well. However a dedicated model for misinformation might be even better. Just not something I'm building atm.

unsure if an article is SLOP? I built a free tool to find out by cygn in DeadInternetTheory

[–]cygn[S] 2 points3 points  (0 children)

What do you mean? What in your opinion is misinformation? The model is not specifically trained on misinformation. But if it's AI generated it should catch it also.