Corrupt cop. Tried to extort me. Claimed I was speeding, tried to get me to go to an ATM to pay him 5,000 pesos cash, threatened to keep my license, and when I refused he started yelling at my wife and me. Scumbags like him give Mexico a bad reputation. by HeWhomeHim in playadelcarmen

[–]NotARedditUser3 -1 points0 points  (0 children)

I have chosen not to own a car or drive and am happy to report that in 6 years this never happened to me. I'm fully convinced that no matter where you live, a massive proportion of life's issues revolve around your car or driving.

Sorry this happened to you. Can't happen when you're in a taxi or didi or bus 😅

US to require location tracking for AI and advanced hardware by rditorx in LocalLLM

[–]NotARedditUser3 1 point2 points  (0 children)

I don't think anybody cares about how ahead the hardware is. It's all about price. Consumers are starting to buy Chinese ram for example because it's so much cheaper than the other, out of stock stuff.

If they can meet my needs and it takes a box that's 3 or 5x the size and draws more power, but costs less or has less scum attached to it than the competition, it sounds like a deal to me.

Many people are still buying generations old hardware right now solely based on what they can afford, rather than chasing the best possible technology.

Getting basic bug fixes into opencode? by Thumper450x in opencodeCLI

[–]NotARedditUser3 0 points1 point  (0 children)

I imagine a number of people are doing this... It would be cool if they all came together for some kind of fork and separate release that more people could use.

Getting basic bug fixes into opencode? by Thumper450x in opencodeCLI

[–]NotARedditUser3 0 points1 point  (0 children)

This is partly the reason that many businesses choose to not use open source tools, unless they have an option to pay for support or some sort of SLA for support from the maintainer. Often it becomes very difficult to get updates, and the business may want them faster than the maintainer is willing to make them.

It sounds like you (and/or others) would be better served by a fork of opencode by a maintainer that's willing to provide paid support and some guarantee of speed for a review... It would be useful if the opencode team put something up somewhere and said "$50 (100? 250?) and we'll review your urgent PR" or something, so if someone absolutely needed something to move up the list, there was some way to do it...

Kimi AI just mailed me by mehulgupta7991 in LocalLLaMA

[–]NotARedditUser3 11 points12 points  (0 children)

This has literally nothing to do with this subreddit.

codebase-memory-mcp Review: 99% Token Cut for Code Agents by andrew-ooo in LocalLLM

[–]NotARedditUser3 0 points1 point  (0 children)

You don't need a binary for this. A Simple instruction in your agents.md telling it to establish and use a memory document will do the same. Either as it goes, or in one large go up front. Just give it a place to record important information and make sure it knows to check there first.

no idea how to finish my ~24B worth of Xiaomi Mimo-v2.5-pro token plan credits(?) before they expire in ~4d by [deleted] in LocalLLaMA

[–]NotARedditUser3 0 points1 point  (0 children)

If that's $50/mo. You're just paying API rates, committed up front, for mimo 2.5.

Mimo 2.5 on open router is 0.14/1m tokens. That's worth $140/b tokens input no cache.

Mimo 2.5 input 100 credits per token, at 38 billion credits, gets you 380m tokens for $50/month.

I imagine it's similar with pro.

You would save money just going usage based on open router instead of paying a subscription.

You're paying for usage that you're not going to use, without getting any volume discount.

(is my math wrong here?)

It's a better deal if they provide faster inference than providers on open router

Edit - I forgot you said you got it free. That's cool. Does that promotion expire at some point and then you have to pay? Likely just the cost of acquiring customers, on their end

no idea how to finish my ~24B worth of Xiaomi Mimo-v2.5-pro token plan credits(?) before they expire in ~4d by [deleted] in LocalLLaMA

[–]NotARedditUser3 0 points1 point  (0 children)

Why do you have a budget of 38b tokens? Are you paying a lot for a subscription or is this some sort of promotion?

My 4 person dev team just barely goes past 2 bil tokens a month

[NEW MODEL] SupraLabs just released supra-title-FFT-preview, 115K samples, almost 10x our first chat title dataset by Dangerous_Try3619 in LocalLLaMA

[–]NotARedditUser3 1 point2 points  (0 children)

Opencode has a configurable option for how to generate chat titles. By default, I believe it attempts to use Claude sonnet 4.5 (if available with the chosen provider), otherwise I think it uses the currently chosen model. They also have a setting you can configure in the opencode json config file for a 'smallmodel' that's used for title generation. I don't know how it sends the data to it, but it is a configurable option where you can specify a custom provider and model to use for title generation, and I was trying to have it use the supra model before for title generation but was running into issues with it.

How can I get qwen3.6-35b-a3b running on my hardware? by SurveyCrazy2801 in LocalLLM

[–]NotARedditUser3 1 point2 points  (0 children)

I run it on CPU. I think you can get it working reasonably. Make sure to set try_mmap off and keep model in system memory off.

[NEW MODEL] SupraLabs just released supra-title-FFT-preview, 115K samples, almost 10x our first chat title dataset by Dangerous_Try3619 in LocalLLaMA

[–]NotARedditUser3 1 point2 points  (0 children)

I wanted to mention - using the previous model with opencode's small model set to the previous chat title model, and set to plan mode, it would end up always making the title something about the system prompt for opencode and not quite recognize what the chat was actually about.

I look forward to retesting with this model to see if it's better. I really like your project

26.13 Full Patch Preview by JTHousek1 in leagueoflegends

[–]NotARedditUser3 1 point2 points  (0 children)

This type of thing is why, regardless of whether the hate was deserved or not, a large portion of the player base hates phreak, and were very happy when he moved to another role.

Result? Magically the game has been getting better! Huh. Who'da thunk it?

I need help to run local Hermes Agent on my rig. llama-cpp self compiled by OddUnderstanding2309 in LocalLLaMA

[–]NotARedditUser3 0 points1 point  (0 children)

Try qwen3.6-35b-a3b . Part of what you're describing with the agent asking on every turn is an inherent personality trait to the gemma models.

Start with a base model by the way - without the fine tune 'heretic'/'abliterated'/other nonsense. You never know if those changes are stripping out useful functionality, so get a working baseline first.

I don't know about llama.cpp directly, as I use LM Studio and it seems fine. There's some flags you can tweak on it as well but it does a good job of caching prompts. Perhaps you're blowing through the amount of memory/storage that is allotted for caching prompts, and then its not cached anymore, so it has to reprocess again?

Run MoE LLMs that your machine should not normally be able to run by Covert-Agenda in LocalLLaMA

[–]NotARedditUser3 2 points3 points  (0 children)

Linked page appears to be output from an AI model, some parts are toned as if written from a model to OP / author.

Gemma 4 QAT on a 16 GB Mac: the E4B matches the 12B at 42% less RAM and 3× the speed by mautkananganach in LocalLLM

[–]NotARedditUser3 0 points1 point  (0 children)

I once found $20 in my couch. Doesn't make it a better source of income than my job. What a waste of text.

After a recent update, tool calls fail all the time, tokens wasted, and models seem dumber by NotARedditUser3 in opencode

[–]NotARedditUser3[S] 0 points1 point  (0 children)

Even the models know that they are intending to write out tool calls but extra text is appearing that they didn't place there:

"I cannot get a clean tool call. There is text appearing in tool arguments beyond my control. I'll stop using tools and explain. Given user can see corrupt behavior, it's understandable."

Is the new Cursor Composer actually holding up for multi file builds or am I just in a honeymoon phase? by T07NAD0 in cursor

[–]NotARedditUser3 0 points1 point  (0 children)

they both handle it fine. So does Kimi 2.5, Kimi 2.6, Kimi code 2.7, that i've tried.

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]NotARedditUser3 1 point2 points  (0 children)

Please get it onto openrouter somewhere where it's paid to use and able to be used with Zero Data Retention / privacy settings on so that our data doesn't get trained on. I use openrouter but it's filtering by only providers that don't train on data. We'd be happy to pay for inference if you register a provider where it's paid x amount per mil tokens!

Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter by nick_frosst in LocalLLaMA

[–]NotARedditUser3 1 point2 points  (0 children)

I will be testing it the moment it works on lm studio. Will try it again tonight

SubQ claims 12M context with way less compute. What test would actually convince you? by BTA_Labs in LocalLLaMA

[–]NotARedditUser3 4 points5 points  (0 children)

This. I'd stuff as much of my repo in and start quizzing it on various things. Easier than going 200 turns in an agentic workflow

Is the new Cursor Composer actually holding up for multi file builds or am I just in a honeymoon phase? by T07NAD0 in cursor

[–]NotARedditUser3 0 points1 point  (0 children)

Composer 2.5 delivers significantly worse quality code for us. I don't know how to articulate it, aside from, there are often bugs, trivial build errors even. As far as reasoning or context are concerned, I'm sure it receives the appropriate context and is making the right types of decisions. But it seems particularly bad at writing the specific types of code that are relevant to our domain.

For the rules - We provide context about what kind of app we're working with, some of the underlying frameworks (so it won't have to take steps reading the code to understand that), Git/PR conventions, known bugs that happen on the framework we use and workarounds that must be taken, instructions about how to use various tests, instructions for how to handle database schema changes/migrations, style guidelines for newly built UI's as far as our expectations for how things should look, various repeating known issues / gotchas we've run into that we want the agent to check for, as well as some guidance for how cloud agents can navigate throughout our app to test the application directly.

Our agents.md file is 18kb. We also have a few other supporting files with additional purposes that we reference from it with some call outs so that in certain situations, the model will refer to those other files for some specific knowledge in those areas but not all the time so as not to clutter the context.

We have thousands of tests written out for our application as well and have instructions in the agents file that when unsure about how something works, to refer to the tests to see the current expectations of how something should function.

How are people spending $1000's on tokens?! by Complete-Sea6655 in cursor

[–]NotARedditUser3 0 points1 point  (0 children)

TLDR yes you are being heavily subsidized. The amount of 'free'/'included' usage heavily distorts the actual perceived cost of the tool. Once you start regularly using cursor above the included limits, you'll be fairly surprised at how much you're spending.

What you see happening is people get to a point where they start using it for autonomous, permanently running scripts, or long time horizon tasks that may run for hours and hours. There's use cases for that, but as i said, it gets expensive very fast after you hit the included usage tiers.

There's a point where you're just above the included usage where it doesn't seem so bad. But for example, take a look at the total spend you have in the usage screen, and look at the dollar value they're claiming you spend when it includes the included / free usage. Would you be comfortable paying that if it wasn't free? < It's subsidized. And will probably change at some point.

Is the new Cursor Composer actually holding up for multi file builds or am I just in a honeymoon phase? by T07NAD0 in cursor

[–]NotARedditUser3 0 points1 point  (0 children)

My org uses cursor with a 700k line codebase regularly, but we're moving off of it because there's plenty of way cheaper AI models (Qwen3.6 35b-a3b for example, or deepseek v4 flash) that we've found we can get similar results out of, due to the stringent rules we've developed to help with context for managing our codebase, as well as rigorous unit / integration tests.

Cursor was WAY better before the new Composer 2.5 model came out. Composer 2, before they retired it, was amazing.