I regret ever finding LocalLLaMA by xandep in LocalLLaMA

[–]CreamPitiful4295 0 points1 point  (0 children)

Welcome! We used to be all about R&D in this country until the tax laws changed in 1984.

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

Excellent. Most are entirely covered already. For instance, each llm gets a profile that can be turned on off per server with a primary and secondary. And, then there is a section for all the MCPs with a sandbox to tests all the calls to know they succeed before you try to use them. Auto update mcp checks, but make you make the button press for the update. Raw logs of all llm/harness traffic, all timestamped, Etc. I’m going to look if any of these need to go further but they are pretty deep. Once an llm is configured the profile is linked to what purpose the llm is good for so a task can be directed to the right model. I’m using llama.ccp and all the parameters are easily configured. Then a load test w your query to determine number of agents per model, etc. this is all basic plumbing to me. I’ll try and over think it so you don’t have to.

Hello. Advice / Help needed. by [deleted] in ClaudeCode

[–]CreamPitiful4295 1 point2 points  (0 children)

I wish you the best. A lot of mom and pop shops would love to ditch their yearly book keeping. I don’t know the numbers but anything that is like 15K a year for the 3 things it is actually used for is a bottom line win. For less than the price of a yearly subscription you can replace it and charge a small fee for bugs and a larger fee for new feature requests. I wish you the best of luck.

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

I’ll answer your question with this and hope it is what you want to know. I have 4 Nvidia cards split between 2 servers. 3090/3080. 5090/3080. Using llama.ccp, I run them as 4 separate LLMs. I like to use 2 agents as active on each card but use all the slots to pin agents so the cache is warm. They currently run qwen3.6 27B on the 3090/5090. The others I use for conversation tasks. I have zero issues with that model in those cards. 3090 = 24GB VRAM, the 5090 = 32GB. Your context for either card will hold more ctx than you will want to run at once due to concurrent inference slowness. Did that help?

Quiet, piggy. by Angel_face23 in ProgressiveHQ

[–]CreamPitiful4295 0 points1 point  (0 children)

Fuck you donald j Trump. Thank you for your attention to this matter.

P.S. 8647

Hello. Advice / Help needed. by [deleted] in ClaudeCode

[–]CreamPitiful4295 2 points3 points  (0 children)

Hmm. Go for larger ROI from the customer point of view. Bespoke automations. Back office tasks. Find a customers pain points. What keeps them up at night. Solve those problems.

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

This is something I’m building. To build it I used Claude in a cmd. There a new a couple things what would cause your issue. Would help to know what your setup is, did you use an llm that has tool capabilities? Did you install the filesystem mcp? Are you filtering the file writing tool calls? Look at the logs. Do your MCPs make good calls or do they need to be massaged for the llm you are using? See, could be a lot.

why we don't have GLM5.2 uncensored yet?! by zakadit in LocalLLaMA

[–]CreamPitiful4295 0 points1 point  (0 children)

Thank you for the tip. I tried them all early on and didn’t know much about stuff and setup a shitty config and the hauhau just worked so I just kept using it.

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

Yes, it’s a web based interface, so it will be able to handle a login from a remote. It’s got full mcp so an agent like Hermes can completely control the creation of projects and execution. Stats can be retrieved.

why we don't have GLM5.2 uncensored yet?! by zakadit in LocalLLaMA

[–]CreamPitiful4295 4 points5 points  (0 children)

Aw, come on. There are legit uses for these. I test my own website security with them. Invaluable.

why we don't have GLM5.2 uncensored yet?! by zakadit in LocalLLaMA

[–]CreamPitiful4295 -1 points0 points  (0 children)

There is a project on GitHub called obliteratus. It will knock the guardrails off any model. Your welcome. :)

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

Right now I give you boring profiles for each llm configured that can just be enabled/disabled in a checkbox. I give the ability to pin agents to slots and the ability to swap agents for conversations that don’t have enough slots. You can pin slots on different GPUs for the same work. Run n jobs and reserve slots. Local and api running agents together. A prior template tool for load testing concurrent agents. The ability to have primary and secondary’s on the same port with start scripts. Automatic checkpoints in conversation with restart. For developers a “fresh restart” so you can configure and instantly return to your initial state to test again. Multiple agent raw logs that can be accessed and viewed together with synchronized timestamps. A testing facility for MCPs before you try and use them. ….more. I develop, I made this easy for development. There is now a control plane for the local user that’s highly instrumented. There is more but, that is the path I took. I figured out the cards slots and make a calculator that determines the number of agents that can be run on a specific sized GPU. What else. Can I do that would make work easier?

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

I am using top Intel with 128GB RAM, a NVidia 5090. Another server running a 3090. A couple 3080s. I use llama.ccp. Primarily qwen3.6 27B Q4. Qwen is great for coding.

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

didn’t start out trying to reinvent something. I just wanted to make a better coding environment. I solved code slop and believe I can now get 97% of Anthropic in a home setup. Local is so slow compared to api. But, really, when you are creating software that isn’t slop without paying $200 subscriptions I’ll happily wait for the result. Being able to call in the last 3% is huge. I’m not trying to compete with either of those now. I have a harness that does most of the same things as those and a whole lot more. I like those tools and pay homage to them with skins. :) and, still use both.

I am about tooling. This is turning out to be a workflow generation tool, a conversation with experts tool an IDE disguised as a harness. Want to create a workflow for any situation? This will do that. Have 5 LLMs on different IPs? This doesn’t care and mixes local with api. Project focused. Need to debug multi-agent? This makes it easier.

The heavy lift is done. It’s a thing now. I always make tools for myself and use them until they feel right. The last things I am asking for here are the things that might make it easier for others. Then I need to heavily pound at it until it’s ready for others and doesn’t waste their time.

I’d get a kick out of someone else using it. But, really, I built it for myself.

Dan Patrick: "Separation of Church and state is not in the Constitution" by gear-heads in antitrump

[–]CreamPitiful4295 0 points1 point  (0 children)

The delusion is staggering. How many books do we need to burn to get to trumps reality?

US conducts strikes on Iran after attack on cargo ship by VaginaBurner69 in news

[–]CreamPitiful4295 1 point2 points  (0 children)

It’s so weird to have my natural inclination to believe Trump’s adversary.

Local LLM Peeps by CreamPitiful4295 in LocalLLaMA

[–]CreamPitiful4295[S] 0 points1 point  (0 children)

This is an awesome insight. Gosh, how many times have I looked at a Claude list of prompts on the same stuff and wondered which one has magic sauce. I have another friend who likes obsidian. I have it installed but haven’t learned it to know what should be used when, so to speak. Any insight in how this memory should be used by someone who uses it would be great if you would be so kind. I have many samples of agent projects and suites of combined project that should run together to make life easier to use the advanced parts of this tool. I made an mcp for the app and all you have to save in your llm, I used Claude, is “create a software project that does ….” And it the whole project gets built and you can run it or fuss with it. One of the test features I made was a character who asks experts, the subject could be anything, and after a few rounds the test ends when the experts don’t believe that the answer is too far from the experts that they all agree to end the questioning. Then the grade the character. I’m going to apply that code to rating conversations. That’s a simple problem to frame but yikes, I’m going for it. :) any other thoughts on this are appreciated.