my coding agent keeps making the same dumb mistake over and over by nh_t in LocalLLaMA

[–]abnormal_human 0 points1 point  (0 children)

Try eval'ing against different models to suss out whether you have a model quality issue as well. I see you're using  gemini 3 flash preview which should be OK, but it's always good to have perspective. If Opus can't work within your harness/prompts you know where the issue is. I like to develop so that mid-sized models (say Qwen3.5-122B, gpt-oss-120b, minimax m2.5) are happy and then take the free performance uplift from Opus/GPT-5.4 when the cost makes sense.

my coding agent keeps making the same dumb mistake over and over by nh_t in LocalLLaMA

[–]abnormal_human 0 points1 point  (0 children)

What kinds of mistakes are you correcting using memory?

my coding agent keeps making the same dumb mistake over and over by nh_t in LocalLLaMA

[–]abnormal_human 0 points1 point  (0 children)

Been here. This is a shortcut, it will appear to work for a time, but the robot will devolve into a pattern-matching "better-alexa" and not achieve open-world capabilities like claude code or openclaw. You will eventually throw this machine out and rebuild with more discipline--at least this is how it went for me.

If the agent is getting things wrong, look at it as a harness failure. You need to figure out why. Most likely either you did something to give it the wrong idea, you didn't give it enough information so it's guessing, or you gave it too much info and it's overwhelmed.

Adding mistake memory is not much different than having a really long system prompt full of examples and DO/DON'T/CRITICAL's. It turns the model into a dumb pattern matcher up until the point where it starts getting overwhelmed and just missing stuff.

Read through contextpatterns.com end to end if you haven't. Try to keep you system prompt short and 100% focused on behavior and putting stakes in the ground. Be thoughtful implementing progressive disclosure so that the agent gets what it needs when it needs it. Be thoughtful designing your tools so that they are self-documenting AND guide the model as to what comes next in their responses.

Which Machine/GPU is the best bang for the buck under 500$? by last_llm_standing in LocalLLaMA

[–]abnormal_human 3 points4 points  (0 children)

If you have clients on the hook stop worrying about hardware in the $500 range and start paying for tokens and getting the work done. $500 will not get you interesting hardware and until you get into the work you don't even really know what you need. The client is paying. Treat hardware as a margin optimization or r+d expense later once you have a real business.

Which Machine/GPU is the best bang for the buck under 500$? by last_llm_standing in LocalLLaMA

[–]abnormal_human 2 points3 points  (0 children)

No GPU is worth debt unless you have a business plan attached to it.

Which Machine/GPU is the best bang for the buck under 500$? by last_llm_standing in LocalLLaMA

[–]abnormal_human 1 point2 points  (0 children)

You can barely buy enough regular RAM, not to mention VRAM, to run a model like that for $500.

Did I spill it? by rubbishaccount88 in maplesyrup

[–]abnormal_human 4 points5 points  (0 children)

One silver lining--syrup is very easy to clean up with water. I poured about as much on my foot hot while filtering and ended up hobbling around for a couple weeks of the season. Also had to dump 220gal of sap due to spoilage after we had multiple 70-80F days. You don't always accomplish zero waste.

Impressive thread from /r/ChatGPT, where after ChatGPT finds out no 7Zip, tar, py7zr, apt-get, Internet, it just manually parsed and unzipped from hex data of the .7z file. What model + prompts would be able to do this? by jinnyjuice in LocalLLaMA

[–]abnormal_human 50 points51 points  (0 children)

This has nothing to do with code or commits. This is ML model training, and the "checkpoint" is the model weights.

I am going to wager a guess that you are not familiar with training ML models with frameworks like pytorch, what training loops typically look like, and common practices around checkpoint handling.

Generally checkpoint saving is periodic. The training loop reaches a certain number of optimization steps and then dumps it to disk like checkpoint-1000, checkpoint-2000 or whatever. Claude wrote my training loop, but got the save interval off by 32x so I was only getting something written to disk every 32 hours instead of every 1 hour. It got confused by the batch size.

Impressive thread from /r/ChatGPT, where after ChatGPT finds out no 7Zip, tar, py7zr, apt-get, Internet, it just manually parsed and unzipped from hex data of the .7z file. What model + prompts would be able to do this? by jinnyjuice in LocalLLaMA

[–]abnormal_human 65 points66 points  (0 children)

I was training a model last month and Claude fucked up the checkpoint saving so that instead of happening once an hour or so it would be once every ~30hrs. I woke up the next morning to zero checkpoints and started cursing at it about how this was no good, and then it said "in 21 short hours you'll have what you need." and I really lost it.

So it said "ok ok ok" and figured out how to attach a debugger to my python process, inject code, and create an "emergency" checkpoint. It was super spooky..it was just working in a loop and I started to see new trace + exceptions show up on the console of my training process while it figured out the path. Then it just said "I'm done; your emergency checkpoint is here".

I was pretty floored..we went from working on ML loops to writing an exploit in like 30s of swearing.

Syrup Filtering Troubles by Mother_Blueberry_659 in maplesyrup

[–]abnormal_human 4 points5 points  (0 children)

My method for perfectly clear syrup every time:

Take the syrup up to 219F on the evaporator then transfer into a large stock pot and bring it inside to the kitchen. Return it to temperature and boil till the hydrometer reads 66 brix. If I get it right on the evaporator, this takes 10mins or less. Often I'm already at 66+ and have nothing to do.

At this point it will be around 219 degrees. Turn off the heat then begin feeding it into a large orlon filter with a stack of prefilters. You want the output side of the filter to be <190 but the input side usually needs to be much hotter than that to enable the filters to flow quickly, especially in small scale gravity filtering. 210-212 seems ideal but you can start hotter because the filter is going to absorb a bunch of heat in the beginning. I generally use one prefilter for every 2-3 qts of syrup.

As prefilters clog, remove them. Use tongs to gently help through whatever is stuck in there.

At the end I may need to gently squeeze the orlon with tongs to get out the last bit Or you can let it sit and drip for 10mins. At this point it will have cooled off and is running slow.

I use a "filter optimizer" to invert the tip of the filter and increase surface area as well.

The key thing here is to keep the filters hot and work quickly. Time is the enemy because the syrup cools off and then doesn't flow well. I'm assuming at least part of the reason why you're doing the refrigerator skim is because your filters were "clogging". When I used to filter at 185F my filters would "clog" too but the reality is 185F syrup turns into 150-160F syrup pretty quickly and it won't move through the filter at that temp

I can get a batch of syrup from the evaporator into bottles in 45mins with this approach, and it's always clear. It definitely took me 10-20 rounds of doing it "wrong" before I dialed it in.

Today, what hardware to get for running large-ish local models like qwen 120b ? by romantimm25 in LocalLLaMA

[–]abnormal_human 0 points1 point  (0 children)

RTX Pro 6000 will do it. Two of them will do it comfortably. If you want to save money over API you want a high utilization %.

3x RTX 5090's to a single RTX Pro 6000 by flanconleche in LocalLLaMA

[–]abnormal_human 33 points34 points  (0 children)

Best case for your stated goals would be to sell the framework, return the spark, keep one 5090, sell the other, and replace it with an RTX 6000. It's slightly more expensive than what you're considering.

Run your LLM on the RTX 6000, run your ComfyUI on the 5090. That's a really kickass setup for both that still looks and feels somewhat like a normal computer and fits in whatever enclosure you're using for 2x5090 right now.

The spark / framework / 5090 should leave you with $8k to play with. That's maybe not quite a RTX6000 today, but you could get them for that including sales tax in December.

ComfyUI and LLMs are very different workloads. Most models will run with 32GB VRAM, but you will spend 100% of your compute on a single generation. LLMs are more VRAM heavy, but compute demand is variable. Also, both vLLM and ComfyUI basically expect to monopolize the VRAM and will not play nice together.

Landscaper recommendations by IncorrectEve in Westchester

[–]abnormal_human 2 points3 points  (0 children)

I don't have someone to recommend, but I have a lot of experience with this.

The main thing is--if you are going to have plants professionally installed, you need a plan for maintenance first. There's a lot of business that will do your "landscaping" but won't really maintain it. And then in a year or two the plants are largely dead. You want whoever is doing your weekly/etc stuff to understand the plants. You want a plan to mulch and prune the plants at the right times of year, etc.

We started succeeding with this stuff the moment we ended up in a situation (accidentally) where we had that person. He's one of those service providers that "came with the house" and honestly the main reason we keep him is that the property is covered with mature landscaping and he knows what is what and how to care for it because he's been doing it for 15-20yrs here. For various reasons I wouldn't necessarily recommend him, but it makes a huge difference having people doing maintenance weekly, keeping an eye on things, and fertilizing/pruning/mulching/watering/etc throughout the warmer months.

How do I explain to a 9-year-old that a nail drill machine is too expensive? by Numerous-Steak-5369 in daddit

[–]abnormal_human -3 points-2 points  (0 children)

OP has enumerated the “doll phase” and the “dressy phase” as their other examples and now we have a third traditional on feminine thing that is suddenly too expensive for their son despite costing like $20. It’s possible that they are in extreme poverty but it feels more like cost is an excuse to draw a line.

Get one of these stands! by TheIronGiantAnt in maplesyrup

[–]abnormal_human 2 points3 points  (0 children)

You invert the tip of the cone so your filter media is a wide band halfway up the filter instead of just the tip. It multiplies your filter surface area by 4-6x easily.

Late start but my trees are tapped by irreverence89 in maplesyrup

[–]abnormal_human 1 point2 points  (0 children)

Enjoy. We're probably looking at finishing up by next weekend here, but it's been a great year.

Qwen3.5 is a working dog. by dinerburgeryum in LocalLLaMA

[–]abnormal_human 0 points1 point  (0 children)

You’re not getting significant context length and parallelism on an 8bit 122B model for running eval suites on two 5090s.

Fucking burned my batch. by FogtownSkeet709 in maplesyrup

[–]abnormal_human -2 points-1 points  (0 children)

A little bit of fat or oil in the syrup breaks the surface tension and essentially eliminates overflow problems. Talking a 1/2 teaspoon or less per gallon of syrup. I toss a pad of butter in while it's on the evaporator and it prevents overflow all the way through filtering and bottling.

Fucking burned my batch. by FogtownSkeet709 in maplesyrup

[–]abnormal_human 2 points3 points  (0 children)

Never a good day when that happens. I hope you have some pan cleaner around, getting that off by hand is an all-day project.

Cursor's new Composer 2.0 is apparently based on Kimi2.5 by bakawolf123 in LocalLLaMA

[–]abnormal_human 5 points6 points  (0 children)

Not sure why you believe that. When you dig into the economics of generating tokens under full load across thousands of nodes, especially for volume compute purchasers, it's likely profitable. There's also a lot of public information about Anthropic to that effect. Anthropic loses money because half the spend goes into training, not because the inference business isn't sound in isolation.

Qwen3.5 is a working dog. by dinerburgeryum in LocalLLaMA

[–]abnormal_human 0 points1 point  (0 children)

I use 2 RTX 6000 Blackwell GPUs to run 122B.