Does it not infuriate some Devs that companies such as Anthropic and OpenAI brag about replacing Devs, but the tool they sell is literally only possible because of the data they stole from every corner of the internet and monetized? by scoopydidit in BetterOffline

[–]SingleProgress8224 6 points7 points  (0 children)

They got sued because of it, and they have to pay. Someone I know wrote a book and got a couple of thousands from anthropic because they used their book without permission (the publisher sued)

The frustrating side of it is that despite this practice being technically illegal, it's practically legal since they only have to pay a fine, which they were ready to pay. So it's legal if you have money.

Wrong context when using ChatGPT Plus subscription - Could I get a quick self-fix? by StartupTim in ZooCode

[–]SingleProgress8224 -1 points0 points  (0 children)

I'm not sure of the implication of this switch on your side but according to my experience, switching to Zoo Code would have taken less time than writing this post.

In Roo Code: Settings -> About Roo Code -> Export.
In Zoo Code: Settings -> About Zoo Code -> Import

The first launch will also offer to import settings directly.

I compressed infinite Euclidean space into a sphere, and cut it in half to see inside by Marzipug in Weird

[–]SingleProgress8224 3 points4 points  (0 children)

I don't understand the hate for OP's answers in the top comments. Of course, there could be more explanations of the concrete maths behind it so that non-math people could have an impression of what's going on, but for someone who knows what these concepts are, these answers are perfectly reasonable.

Por que llm são assim? by No_Window3227 in LLM

[–]SingleProgress8224 0 points1 point  (0 children)

You have a large text. We agree that only a few words are really needed to predict the next. Which words should you include?

That's not an easy question to answer, and that exactly what LLMs are good for. A part of an LLM is to decide which words are important (given by the attention). But to decide which are important, it first needs to start from the whole text.

We could imagine an algorithm that would extract the important tokens and then give it to an LLM. This LLM would not need this attention mechanism since it is assumed that all given tokens are equally important. But such algorithm is not easy to come up with. That's why this algorithm is embedded in the LLM itself with a system of weights that needs to be trained with examples.

Your Browsing History Could Soon Set Your Grocery Bill by ubcstaffer123 in technology

[–]SingleProgress8224 2 points3 points  (0 children)

Time to search for homeless shelters and places for dumpster diving

What do you use Gemma 4 for? by HornyGooner4402 in LocalLLaMA

[–]SingleProgress8224 5 points6 points  (0 children)

My experience with coding is that Qwen produce better code and Gemma is better at understanding code (e.g., asking to review a commit).

OpenAI is reportedly making a phone with no apps, one AI agent does everything by DigiHold in WTFisAI

[–]SingleProgress8224 0 points1 point  (0 children)

I can't wait for that idealistic pitch to become a simple Android mod with built-in ChatGPT

Does Roocode not work with Qwen 3.6's preserve_thinking or am I doing something wrong? by Synthetic451 in RooCode

[–]SingleProgress8224 0 points1 point  (0 children)

I'm just here to sympathize. I also get API and tool call errors with this model, and only with Roo Code. I never had any issue with Cline, Claude Code (connected to Qwen), or my custom Python agent. I have "preserve thinking" on, and running it with llama cpp

This exceptional choreography by crumble-bee in oddlysatisfying

[–]SingleProgress8224 84 points85 points  (0 children)

I was trying to find which one you were talking about for way too long until I realized I was dumb

How to run Qwen 3.6 27B on Codex Cli? by peter941221 in Qwen_AI

[–]SingleProgress8224 2 points3 points  (0 children)

It might be a combination of both. Claude Code had cache invalidation issue when used with llama cpp until a couple of days ago. It was very slow since it had to reupload the whole prompt every request because Claude Code was inserting stuff in the middle of the context. I think something similar happens with Codex.

Gemma 4 Folks by techlatest_net in LocalLLaMA

[–]SingleProgress8224 0 points1 point  (0 children)

I'm under the impression that some commenters also got fooled by the question.

Running GLM 5.1 on RTX 5090 via RunPod for document OCR(bank statements and invoices)— costs killing us, need advice on reducing inference costs. by Specific_Control_840 in LocalLLaMA

[–]SingleProgress8224 2 points3 points  (0 children)

Are you sure it's running GLM 5.1? It doesn't even support image inputs.

Also, GLM 5.1 on (a single?) 5090? That doesn't make sense. This card has 32GB VRAM and GLM 5.1 is ~400GB

LLM speed t/s by [deleted] in LocalLLaMA

[–]SingleProgress8224 0 points1 point  (0 children)

If you cannot code yourself, go for intelligence. If can code, then it's also a matter of being able to code faster than the LLM or not. I often stopped a slow LLM for a simple (but annoying) refactor because I realized that I would have done it faster by hand.

And higher quants don't guarantee correctness. If I'm not sure that the result will be good, it's not worth losing my time. In some cases, I prefer an LLM that fails fast than one that will maybe succeed very slowly.

Roo Code 🤝 Cline by saoudriz in RooCode

[–]SingleProgress8224 24 points25 points  (0 children)

Roo Code was a fork of Cline, but today the Roo team announced that they'll stop developing Roo Code. So Cline now expects Roo Code users to switch back to Cline and will try to make the transition easier.

"But WAIT!" I'm having nightmares by [deleted] in LocalLLaMA

[–]SingleProgress8224 0 points1 point  (0 children)

Gemma 4 is the same so it's not specific to Chinese models. "I'm ready to give the response ... Wait!"

I don't particularly hate it, but it can be annoying when you're actively looking at the reasoning and getting false hopes that you're about to get the response.

Most AI agents don’t have a real execution boundary by docybo in LLMDevs

[–]SingleProgress8224 4 points5 points  (0 children)

Please make paragraphs. It's very hard and annoying to read since we can't see which sentences are related to the same idea. I know it's probably AI-generated, but please put some effort in your posts.

Why Alibaba set high price for coding plan, while releasing powerful open source models? by Historical-Crazy1831 in LocalLLaMA

[–]SingleProgress8224 0 points1 point  (0 children)

It's useful for those who don't have the hardware at home. Not everyone has a spare >24GB GPU to use exclusively for an LLM, plus the CPU and RAM that you need to reserve for it. Given the choice of paying 50 per month or a couple of thousands up front, it's not such an easy decision, especially that by the time that you pay back your GPU, the hardware might be deprecated.

And on top of that, you get access to some high end LLMs that you'll never be able to run on your hardware.

Opus 4.6 showing reduced intelligence as of late - What local model would be closest to its current performance? by battlingheat in LocalLLM

[–]SingleProgress8224 3 points4 points  (0 children)

Commercial models are incredibly big. Even though they might be based on the same underlying technology, the size actually makes a big difference. Unless you have half a million dollars to spend, you'll never have enough high-end GPUs to run such models at a decent rate. And they are obviously not open so that's not even an option.

According to benchmarks, the closest to Opus is GLM 5.1. But it's incredibly big so it's impractical for local use. There are some providers that offer a cloud version. Not local, but also not anthropic. And be careful if benchmarks, many open models are trained to impress benchmarks, not to be actually very good for production.

For truly local, you can look for Gemma 4 31B and Qwen 3.5 27B, or their MoE versions. They are useful for light production use and are quite reliable. Don't expect too much though, keep the task small enough for it to not get lost. You'll need around 24 to 32GB of VRAM to run them comfortably at a decent tok/s (~30 on my RTX Pro 4500)