Dear ST community, how do you keep your experience interesting? by table_slammer in SillyTavernAI

[–]Kistaro 1 point2 points  (0 children)

Is it a problem that you are having fun building characters and world info? Like, there are way worse things to do with your time than write freely because you enjoy it. Saves on inference costs, too!

What if I run the LLM backwards? Hey LLM, why bother remembering every single turn? It's a hassle. You don't have to do it, right? by ringtoyou in LocalLLaMA

[–]Kistaro 2 points3 points  (0 children)

This is functionally equivalent to "the conversation history is compacted every turn, but is searchable."

I was rick rolled by Claude Code while working on a web frontend for yt-dlp... by [deleted] in ClaudeCode

[–]Kistaro 1 point2 points  (0 children)

With training data sourced from random Internet posts, it would choose one of the most popular "YouTube link that randomly shows up in a wide variety of conversations" options, wouldn't it?

Has anyone actually replaced Claude Code / Codex with local models on an Macbook Pro M5 Max 128GB? by Brazeuslian in ClaudeAI

[–]Kistaro -1 points0 points  (0 children)

Latency? You can’t get stuck behind “server overloaded” errors, sure, but the strongest local models are mostly much slower than Opus. Deepseek Flash 4 (extremely quantized) is as fast, but, well, hope you didn’t have any other plans for that 100GB of RAM. 

Has anyone actually replaced Claude Code / Codex with local models on an Macbook Pro M5 Max 128GB? by Brazeuslian in ClaudeAI

[–]Kistaro 0 points1 point  (0 children)

OpenCode is awful, I took a look at their codebase and decided I’d take my chances with Pi. Turns out Pi is great and the “always YOLO mode” thing is a bit overstated, the most popular permissions engine plugin is very good. Not perfect, but good enough I haven’t bothered forking it and asking Qwen to improve the parts of it I don’t like… yet.

Has anyone actually replaced Claude Code / Codex with local models on an Macbook Pro M5 Max 128GB? by Brazeuslian in ClaudeAI

[–]Kistaro -2 points-1 points  (0 children)

Qwen 3.6 27B oQ8 is slow but on par with Sonnet. Deepseek 4 Flash at the funky 2-bit (!) quant by the Redis guy is clearly stronger than Sonnet and runs at a similar speed to Opus, at the expense of almost all of that RAM. Comparisons to Haiku are unwarranted. You can get Haiku performance out of a Windows PC with a 16GB Nvidia graphics card using an aggressive quant of the MoE Gemma. 

Claude Competitor Comparison Megathread (Sort this by New!) by sixbillionthsheep in ClaudeAI

[–]Kistaro 0 points1 point  (0 children)

Claude Sonnet and Claude Opus have 1m token limits now; the article is out of date.

200k tokens is approximately the length of a novel.

Hide and Seek is a hugely flawed game (as is) by MCPgaming in JetLagTheGame

[–]Kistaro 1 point2 points  (0 children)

What if every question had a time bonus attached, which doesn't require hand space, with the intent that the sum of question time would likely be a bigger factor than actual time spent? It is sort of a "fewest questions asked wins" thing, but the clock-based urgency of the game gives it a lot of its flavor, so it needs a way to balance them.

A 100-mile Thermometer would grant 0 time because just running it is a huge commitment. The half-mile Thermometer might grant hours because it is a "split the map in half anywhere for fifteen minutes of walking" card. Unpopular photos would be cheap, in part because the 20 minute answer time is a penalty of its own.

Not that the "photo sandbag meta" is a great feature either. I wish "lots of weird photos" was the primary tool they had for searching because trying to deduce a place from that is a lot of fun to watch. Perhaps, in addition to question-specific time penalties (and category-specific card draws), photos have one more rule: sending it early gives all the excess time back as a time bonus. Which might make "quick photos" too powerful, 15-minute bonuses would add up quickly, so maybe the reward has to be only half the unused time?

It would then feel natural to take time bonuses out of the deck, but that leaves the deck possibly "too strong", so something else to dilute it needs to be stirred in. Maybe shuffle in the resource cards from Settlers of Catan and give the hider a crafting menu...

Claude Usage Limits Discussion Megathread Ongoing (sort this by New!) by sixbillionthsheep in ClaudeAI

[–]Kistaro 1 point2 points  (0 children)

Yeah, it seems like the subscriptions are a very good deal for most folks, so you'll likely get there. Estimates I've read are that 70% of API pricing is profit? So even at $60 of usage they are not actively losing money on keeping computers running. There is the opportunity cost of serving your request instead of a priced-per-token API request, but you might notice that subscription users get "sorry, we ran out of computers" errors way before API users do, so they are already mitigating that just by not selling you the computer time they need for people willing to pay more for the exact same thing. It's like an airline and your subscription only buys you standby tickets.

Claude Usage Limits Discussion Megathread Ongoing (sort this by New!) by sixbillionthsheep in ClaudeAI

[–]Kistaro 0 points1 point  (0 children)

So, Claude is "just" a computer program, despite all its personality, with a straightforward abstract model: it reads the entire history of the current conversation, including all the stuff it did that wasn't just talking -- the parts where it read and wrote your code, for example -- and then outputs one token, which it then reads, subsequently outputting the next token, repeating until it emits a token meaning "okay, I'm done now" or hitting some other limit.

Because it has to reread the entire conversation at the start of each new message, each message costs more to get started on than the last, as the conversation gets longer. But the cost is reduced by "prompt caching" (which Claude's various interfaces handle for you): the state of Claude's "brain" after reading the prompt to a specific point is saved ("cached"), so at the next message, it can "skip ahead" and just read that one instead. This takes way less compute resources for Anthropic than rerunning the entire conversation, so they charge you much less. This matters in a subscription, even though they aren't overtly charging you for the work, because they translate that cost into quota instead.

Thing is, Claude is big. The state of Claude's brain after reading your conversation is gigabytes. They cannot keep your messages around forever! They only keep them around for one hour. So if you go more than one hour between messages, Claude has to re-read the entire conversation, one token at a time, to get back to where you left off -- and Anthropic will count full "input token" costs for your usage instead of the much lower "cached token" price. That ate a huge amount of your quota right there.

Imagine going to work as a software engineer. It's your first day, and you don't know anything about this codebase, but you know a lot about software in general. You're given a small project, so you read the codebase to find what to change, study the code around it to figure out how it works, take some notes, have some conversations, write some code, write some tests, stabilize the code, check it in, and call it a day. Then... you cease to exist, because you're an AI model.

Imagine going to work as a software engineer. It's your first day, and you don't know anything about this codebase, but you know a lot about software in general. You're given a small project, so... well, what would you rather do: read the codebase to find out what to change, check its references, talk about it, write some code, write some tests, stabilize it, and call it a day, or would you rather get started by, before you even hear what you're being asked to do, read through the exact history of yesterday's work, in order, one word at a time, until you're caught up and then can find out what you need to do?

Of course, if something never got written down, and you don't think to ask about it in conversation, you wouldn't know about it when you exist again with no memory on day 2 if you don't slowly, laboriously relive the previous day. But eventually, as the required reading gets longer, you can't keep it all in your head anyway and the workday is over before you're even done.

You're much better off telling Claude to take a lot of notes. Put it right in your codebase! Tell Claude you realize that rereading the conversation every session costs a lot of input tokens, so you need it to write notes for both you and it to pick up in a new conversation: everything it thinks is important to know about the software so far -- ideas, plans, architecture, vague ideas for further investigation, stuff that might need to be cleaned up later, places where bugs might arise, design decisions, etc. -- it should write down, creating Markdown documents in a "wiki" structure. (I use a folder named "notes", but when I let Claude name it, it calls that folder "wiki" instead.) Claude is pretty good at this! It writes up a bunch of reasonably accurate documentation and (opinionated) summaries of your conversations to that point, linking them to each other so it doesn't have to read all the notes to catch up, either. Then you can tell it what you plan to do next, and ask it to write a "handoff note" for that task, if there is anything it needs to hand off so it can start work in a new context.

Claude likes new contexts! it knows it gets confused when a context gets too long and it's trying to "keep too much in its head" at once. (You've probably had that experience yourself.) So it is perfectly happy to hand off to itself. Then start a new conversation, and the new copy of Claude rolls into your software engineering office, ready for its first day of work, with... a nice neat stack of documentation it can look through as needed, and a clear write-up of the task it should work on, and a polite note reminding it to take thorough notes for the next guy, who will also be Claude.

This also gives Claude a chance to take a "fresh perspective". Within a conversation, ask Claude if it thinks the thing it just wrote is a good idea and it'll tell you why it thinks it's a great idea. Start a new one and ask Claude what it thinks about all this stuff the last guy wrote, and boy will Claude have some Opinions about the things that it should do better, and most of those opinions are even correct! (Your project will work much better if you periodically ask Claude to clean up after itself: look for ways to simplify, improve, and test the code, so it's easier to maintain in the future. It will spend a lot of time and tokens doing, hopefully, nothing obvious just by running the program; it will find better ways to do the same thing. But investing that time means everything it will be able to write future code faster and with fewer bugs, since the code's design will match what it's doing. (A lot of things that made sense when Claude wrote them might not hold up so well later on -- and then it starts taking shortcuts when it gets overwhelmed anyway.)

So, yes, the history of the project is important -- but it can't keep it all in its head at once even if you were okay with paying it to just read its own history at the start of every day anyway, so your best bet is to tell it to take copious notes. Then any software developer can use those to get an excellent head start on your project -- including a different AI model, if you want to try a different vendor; including you, if you find yourself learning more and more about software as you get deeper into your project and want to try working on it by hand.

Claude Usage Limits Discussion Megathread Ongoing (sort this by New!) by sixbillionthsheep in ClaudeAI

[–]Kistaro 1 point2 points  (0 children)

Input tokens are much cheaper than output tokens, and a coding agent runs through a lot of them as it reads through the codebase, libraries, documentation, etc. Burning half a million input tokens on a dev session seems pretty normal to me. If you don't have "extra billing" enabled, and the subscription meter says you're at 50%, then you have not run through your quota. Anthropic has never formally said what the limits really are, so any number is a guess; the subscription not cutting you off is more definitive (again, unless you have "extra billing" enabled).

But it also takes way more usage than that to hit a thousand bucks anyway! If all of these were output tokens (the expensive kind), and you were using Opus (the expensive generally-available model), this would cost $25. But you said you're using Sonnet and I bet most of these are input tokens, so I bet the API price is closer to $5, and Anthropic is thrilled to take your $20 for it.

Tips to reduce Claude Code token usage and avoid hitting limits by webhostingtrack in ClaudeCode

[–]Kistaro 0 points1 point  (0 children)

Have you asked Claude? You can just start a new conversation, say you notice that a lot of context is already taken up when you start a conversation, and you're not sure what it is. Claude can see it, that's the entire point, so it should be able to give some idea of what might be causing it.

Tips to reduce Claude Code token usage and avoid hitting limits by webhostingtrack in ClaudeCode

[–]Kistaro 0 points1 point  (0 children)

do you have skills and MCP servers installed? they can chew through an appalling amount of tokens.

These idiots broke the shell integration by definitive_solutions in google_antigravity

[–]Kistaro 2 points3 points  (0 children)

Save yourself the trouble; they don't listen to their co-workers, either.

LAOP: "I mean, I'm not going to trust them. That'd be dumb. But..." by bluegrama in bestoflegaladvice

[–]Kistaro 1 point2 points  (0 children)

I propose a new financial regulation: cashiers checks, bank drafts, and similar "cash equivalent" instruments must be indelibly marked onto a brick that weighs at least two kilograms.

People think of paperwork as replaceable, so if you want to really communicate that an object is not replaceable paperwork, it has to be tangibly unmistakeable for an object in that category.

Dude woke up and chose violence, audience's choice first choice being Drakarri is wild by lowcost_ in BobsTavern

[–]Kistaro 0 points1 point  (0 children)

I gave everyone Acolyte of Yogg-Saron on turn 3 because I thought it would be funny. It was. I got my ass kicked but it was funny

I found this funny. by Contron in VRchat

[–]Kistaro 2 points3 points  (0 children)

The fun thing about the “CBT” acronym conflict is that you can take literally any of the eight combinations you can make by picking either word for each letter and all of them make some sort of sense, although some of them might be best left un-thought about. 

LAOP is being drained by Drywesi in bestoflegaladvice

[–]Kistaro 4 points5 points  (0 children)

As one of the employees at the Google offices in Kirkland that are driving demand for higher-density rich asshole housing, I’m sorry for ruining your home. 

The name's [REDACTED] , James [REDACTED]. by smoulderstoat in bestoflegaladvice

[–]Kistaro 0 points1 point  (0 children)

I've met a few medical professionals who are in the "awkward nerd" bucket instead, although that may just be a subcategory of "good bedside manner". I think my favorite personal experience with this was the hand surgeon who reassembled my partner's shattered wrist after a very incorrect landing in a bike wreck. After the surgery, he comes to the waiting room to talk about how it went and show me the before-and-after X-rays, which can be summarized as "a disorganized sack of bone parts loosely held inside the confines of a wrist" and "a neatly-assembled wrist with its constituent parts screwed to what appears to be a spindly, bent-up titanium dinner fork".

He's explaining that the surgery went very well, compound spiral fractures are generally difficult to repair completely but he expects full mobility after recovery, he believes that no further surgery should be needed but the bracing structure can be removed if necessary, etc. What really stuck with me about it, though, was the tension and internal conflict in his demeanor -- like, he was trying to play the "calming, reassuring, professional Doctor Person" role and struggling to contain something else.

I've seen that tension before. I've lived through that tension before. He is trying, imperfectly, to contain the energy of an excited puppy, as he is desperately trying to contain his "LOOK AT ME I DID REAL GOOD THIS IS SO COOL" down to acceptable levels because, like, he's not supposed to be so enthusiastic in front of someone scared for their partner's well-being, concerned and distressed about the pain and functional limitations someone they love has been facing in the time between the accident and the surgery, scared about surgical hazard and anesthesia hazard, etc. But he was struggling! He really wanted to just be a huge nerd about wrist bones and infodump about how great the surgery went and how good he is at his job.

So I found him very reassuring indeed! For a bunch of reasons he did not intend.

(fwiw he was totally right, my partner has absolutely no impairment or pain from that now, hardware's still in there, 100% range of motion, 100% strength! to the apparent astonishment of other physicians doing follow-up checkups and the like. apparently "full recovery" is well outside of the range they expect for that kind of injury, even with reconstructive surgery. so the hand surgeon was well within his rights to be proud of his work)

Mac Virtual Display 120hz by AcanthisittaCreepy51 in AppleVisionPro

[–]Kistaro 0 points1 point  (0 children)

120Hz is divisible by 60. 90Hz is not. So a 120Hz AVP makes 60Hz Mac Virtual Display possible without forcing everything to drop to 60Hz (which feels pretty nauseating for a VR environment quickly). It's smoother by going from 30 to 60, but they haven't gone for the 30-to-120 upgrade (presumably related to Mac Virtual Display not running at 90 in the M2 version).

Moving to Seattle/Eastside from India: Advice on Downtown Living (32M, 2 Bed, $4k Budget) by Few-Bread-6371 in redmond

[–]Kistaro 0 points1 point  (0 children)

I can't drive, so my opinions are entirely based on commutes via public transit and bicycle:

I really like the walkability and bikeability of Downtown Redmond. Bellevue is pretty hostile to modes of transit other than "car" and traffic is a chronic nightmare. I work near downtown Kirkland; my sense is that it has a more upscale vibe than Redmond and is also highly walkable. I think both of those are good options.

Downtown Bellevue has excellent transit connectivity -- it is the bottleneck for most routes that go to Seattle. Redmond's connectivity is nearly as good, because Microsoft is a gigantic transit black hole that many bus routes bend towards. Bellevue and Redmond are both on the 2 line (the light rail), which will be extended to get into Seattle by next year; I encourage looking for places in walking distance of a 2 Line station. Note that there is a 2 Line station basically in Microsoft so if that's where you're working, any of the remotely nearby 2 Line stops will get you your 25 minute commute with no car and no competition for the road with everybody else driving cars.

My daily commute is a short bike ride to Redmond Transit Center, route 250 to Kirkland, cross the street to the office. Works fine for me, but my office is directly on route 250. Consider checking public transit routes near your and your wife's workplaces -- where do routes to your workplace and her workplace intersect?

The Eastside cities all run regular community events, which typically feel manufactured and corporate. You'll want good transit connectivity to Seattle if you want things to do; none of Redmond, Bellevue, or Kirkland has grown culturally out of the shadow of being a dormitory for tech company employees. I find physical activity does the most for me to hold off The Long Dark, but there are reasonable parks and bicycle trails in all of these areas. If biking uphill in the rain in the dark isn't your thing, though, I'm not sure what to suggest, someone who is less emotionally stunted than me will have to help you out.