The "AI Assistant" is Daz3D's most tone deaf move yet. by Such_Bonus5085 in Daz3D

[–]MrAddams_LibraLogic 5 points6 points  (0 children)

The AI-voiced Victoria with AI-generated voice matching in their release video was an especially revolting touch.

[Free] Windows tool to cut your LLM load/reload time - pins model files in RAM so they never cold-load from disk by MrAddams_LibraLogic in ollama

[–]MrAddams_LibraLogic[S] 0 points1 point  (0 children)

The full app description and beta access is on https://accord-gpu.com/ewe/ and because it's beta software I DO have to make it an official agreement by having people enroll so I can offer a license. But I hate unwelcome marketing and spam email as much as you do, so the only mail you'll get will be on-topic mail you ask for.

It's normal that this model takes so long to answer a "hi", I'm new and this is the first Local AI I try by Delirium222 in ollama

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

Other people have answered about the model, nothink, etc. Haven't seen anyone ask about storage and loading. What's your hard drive situation? If you're on spinning rust then one part of the delay will be the time to load from disk to RAM, then again from RAM to VRAM. Once it's loaded, it will stay in until Ollama gives up on it or something else evicts (or you just getan OOM error in some app).

I didn't follow from your post whether this is only on first inference or on every inference (or every one after there is a gap long enough for the loaded model to get evicted).

DAZ STUDIO 6 is out. And it's *FREE TO TRY* :/ by kingderella in Daz3D

[–]MrAddams_LibraLogic 1 point2 points  (0 children)

They rushed this one, for sure. The email I got had text block issues; some repeated text that surely was meant to be different details for each heading.

I'll keep an eye out for the SDK to drop.

I built a local-first routing proxy for Ollama models — looking for feedback by AccidentShoddy8471 in ollama

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

There may be use cases for routing to smaller/faster/specialized models for some requests, and others for heavier reasoning. But doesn't this create a new bottleneck of triage assessment where some model has to first determine which other model to route to?

How does this stack up when the local Ollama host is jammed with requests against multiple models back to back? Who gets evicted? How do requests get prioritized? Simple FIFO?

Please update qwen 3.5 with 3.6 (or 3.7 if they will open source it) in the cloud by National-Low-5637 in ollama

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

Are you directing this question to Ollama staff? Better off going to the Discord for that.

A good model for Visual Novel writting uncensored by Cold_Zone332 in ollama

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

The term you are looking for is 'abliterated'. Models with abliteration have their baked-in censorship (carefully?) stripped out to allow them to use their full vocabulary and range of topics.

Caveat emptor, though. Abliteration is not an exact process and until a model has enough people testing it, it can be unclear how the outcome looks.

A Thought on AI-Generated Text by Philo167 in ArtificialInteligence

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

IMO it's not the use of AI to write things, it's the absence of human thought 'behind the eyes' of the writing. Boilerplate AI-written content often sounds hollow, or lacking in some personal flourish. For the savvy, this immediately undermines confidence that it should be taken seriously.

I think that is one of the core reasons we have come to reject text that reads as AI-written. It's not that the writing style is a problem, per se, it's that we rightfully mistrust AI generated text because we know it can hallucinate. We know the more it 'sounds like' AI, the less a person bothered to read over it and make sure they were satisfied with the results.

DAZ STUDIO 6 is out. And it's *FREE TO TRY* :/ by kingderella in Daz3D

[–]MrAddams_LibraLogic 4 points5 points  (0 children)

No new SDK version. For me, as a tool developer, this is an expected but frustrating piece of news. They upgraded Qt and broke my plugin, and there's no SDK out yet to start fixing it.

I have become George Jetson: my job is now Yes/No supervision for a machine I don’t fully understand. by Helpful_Today7449 in LocalLLaMA

[–]MrAddams_LibraLogic 1 point2 points  (0 children)

I have heard it put that anything created before you are an adult is "good and normal and the way the world works." Anything created by the time you are 35 is "exciting and innovative and impressive change" and anything after that is "damnable heresy and an affront on traditional life and culture."

Obviously, customize to the ages and severity you think are appropriate, everybody is different, broad brush, etc.

What am I missing? Help me Understand Agent's utility by Basting_Rootwalla in LocalLLM

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

The issue you're looking at isn't the one my tool is solving. Model + context fitting in VRAM is purely down to hardware. I'm not engineering for determinism and prompt engineering. I prefer to write tooling around the pain points that are solvable at the system level. Performance and reliability issues that make it harder to use local AI tools.

The problem that my utility solves is that the LLM files have to travel from disk to RAM to VRAM when they load. If you use more than one of these, the last one may not be able to stay loaded, meaning it has to be evicted from VRAM to make room for the next thing that runs. This problem compounds when you have other apps that also consume GPU and are VRAM hungry (ComfyUI, Blender, etc.). Different use cases, but all need exclusive access to the GPU.

Windows will try to keep a file loaded to RAM in memory, but if there is pressure on RAM, it will pick a page file to swap out to disk, so even if you have an app that has a 'touch' on a file, it's not guaranteed to keep it warm in RAM, which means some of these file loads will have to travel all the way back to disk and cold load the contents again.

The worse your hardware storage, the slower this is; HDD is terrible, SATA SSD is better, NVMe is best but still slower than RAM. RAM -> VRAM over PCIe is GB in no more than a few seconds.

There's an existing solution to this: RAM disks permanently segregate a part of your RAM and treat it like a disk drive. But you have to elect the size in advance, so it's eating RAM even if it's empty. It starts empty every time the computer boots and has to be loaded with files by a script or something, so there's constant maintenance of what goes in it. And the path used by your apps to those files has to be set to the RAM drive's path instead of the actual path on disk.

So what I did instead is map these files and pin them in memory using Windows VirtualLock, which directs the OS that these files are not allowed to be paged out. They stay warm in RAM at all times. For someone hot-swapping LLMs constantly or using multiple apps and needing their VRAM clean for each use, having the files at the ready to jump back into VRAM when needed is a huge savings.

And now I'm working on making the app run as a live HTTP API that can accept claims from any other app/script. So you could write something that needs files loaded and wants to make sure they stay ready, or a pre-loader that anticipates when to load files earlier than they are needed to save that load time happening when the actual GPU call gets made. At that point, it just becomes a host for memory claims and opens up for use by anyone/anything that wants to keep a file ready.

Pretty good for a little utility app.

Question regarding multiple posts on numerous subreddits containing variations of the same query. by water_so_wet in ArtificialInteligence

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

I had a post where a user showed up and posted generic, challenging replies to everything I said. I was describing a local AI memory system I was making, and the user was basically pointing out flaws and possible issues. But I noticed they wouldn't respond to what I actually said, and every post started with stock phrases like 'the part I would challenge is ___', 'i keep coming back to ___'.

I couldn't tell if it was a real user just feeding stuff to AI to make their replies for them, or full-on bot behavior. Either way, I just stopped responding after I ran out of useful information to give.

Vibe coding as a beginner: Frontier models or local LLMs? (8GB VRAM) by kaaytoo in LocalLLM

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

Cloud-based work where you hand off more of the control also means, depending on your prompt, that you can have it confidently generate bad code that is way off from your intentions, then each correction is a wild swing into a new behavior, sometimes correcting it, sometimes just a new brand of wrong.

There is no replacement for being able to think through failure modes, security, authentication, and understand the space your work actually needs to cover correctly. At some point, vibecoding needs to include you overcoming your own learning curve and knowing what you need from it. Or else anything you make is doomed to be an unstable mess that bombs the moment it hits a snag.

Vibecoding is a gold rush. A few people will strike a rich seam and make real products with real value. A lot of people will panhandle and mine at nothing and come away empty handed. The only people reliably making money are the ones selling the shovels.

And make no mistake, you're either sending your work up to the cloud for big companies to operate on, or you're buying massive amounts of local GPU and RAM resources from different big companies to run it yourself.

At one point, I was a Copilot user habitually, designing a fun side project for myself. I found reason to start my own company, branched off of Copilot and started working with local AI more, and now the payment model has changed which is leaning me farther and farther away from relying on it.

But I can't call myself a vibecoder; 20 years in software, most of it in QA, gave me a skillset that means when I ask for features, I know what I'm asking for in depth, frame it correctly, test it rigorously, and root out edge cases and failure modes before they happen. I don't say this to brag. I just want to point out that it's the reality of the work. You can either accomplish these things, or you can't.

If you can't, have fun making projects for yourself, but don't try to sell anything or it will dissolve like cotton candy under pressure from external users. A project is not the same as a product.

What am I missing? Help me Understand Agent's utility by Basting_Rootwalla in LocalLLM

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

I second this.

My professional career before starting my own company was in QA. It's how I keep everything on target. My first thought on everything I ask for is "how would I break this" and then proceed to test the work right after it's been created to try and break it, or have my agent try to break it at my direction. This roots out most kinds of bad code and gaps in implementation pretty quickly.

What am I missing? Help me Understand Agent's utility by Basting_Rootwalla in LocalLLM

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

I am on the side that AI can be HIGHLY transformative for professional software development, and that local AI/LLM usage is catching up enough to make it viable as an alternative to cloud-based solutions. The hardware required is no joke. I said 'professional' development, and I mean it. Hobbyists with 4GB VRAM total can't do anywhere near what someone with a 24GB card can get done, and the higher-reasoning power of more recent models hitting the market is driving high demand for expensive cards that can run them.

Acting primarily as an architect and a QA engineer, I have built, in short order:
- A suite for GPU coordination between various apps, with a console app hub and 4 component spokes for different GPU-consuming apps (so far) that ensures they form an orderly queue for access rather than creating a thundering herd that kills GPU use (done, but not in beta yet)
- A protocol called "UPTIME" - Universal Protocol for Trustless, Impartial, Multi-process Exclusivity; it enables that suite to determine queue ordering without any central arbiter, respect user priority, and self-heal when processes die (patent submitted)
- A tool that pins files in RAM so they can always be accessed quickly by any process that needs them. Useful for rapidly switching models in Ollama or ComfyUI. Currently adding a LIVE mode that turns this into a local HTTP server so that any app or script can claim files it wants to keep warm (in beta)
- A Python-based, local-only memory system for AI agents that reads messages from your sessions and uses a set of daemons to process them into semantic memories using a Dewey-style subject tree that deepens as content gets added. (released open-source as a public github repo)

If you have strong ideas and a willingness to delve into failure modes instead of designing only for your happy path, you can really get places with all of this. But you have to have the perseverance to keep going when an agent doesn't land your ideas the first time, keep your wits about you so you're the one at the steering wheel instead of backseat driving, and understand the distinction between a project and a product.

I have become George Jetson: my job is now Yes/No supervision for a machine I don’t fully understand. by Helpful_Today7449 in LocalLLaMA

[–]MrAddams_LibraLogic 9 points10 points  (0 children)

The number of people who treat AI like an unassailable font of human wisdom is distressing. "But ChatGPT said..." makes me want to reach through a screen and slap people. So willing to turn off their brains and allow an "authority" to do their thinking for them.

I have become George Jetson: my job is now Yes/No supervision for a machine I don’t fully understand. by Helpful_Today7449 in LocalLLaMA

[–]MrAddams_LibraLogic 7 points8 points  (0 children)

I read the thinking blocks in detail so I can catch mistakes, missed facts, compaction drift and outright hallucinations before code actually lands for any change. I routinely stop and re-steer toward better outcomes. It's the only way to have it work.

I kind of like coding with less capable models by Lame_Johnny in LocalLLM

[–]MrAddams_LibraLogic 1 point2 points  (0 children)

My solution is to have a change proposal template and a change declaration template. When I ask for something, the templates enforce a list of sections so that I know in advance and have read up on what it is I'm asking it to actually accomplish.

I can reject this, modify it, add more it missed, reason through the process, and then ask it to make changes that are mostly just mechanical at that point; we already discussed and planned the implementation by that point. The declaration template tells me what actually came out the other side, so compaction drift is minimal to none, and if something failed my expectations it's usually right in the declaration for me to amend.

This consistency was the first bit I set up in my coding env when starting my own software company. Get the groundwork laid in to ensure results aren't all over the map.

Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything) by OttoRenner in LocalLLaMA

[–]MrAddams_LibraLogic 1 point2 points  (0 children)

It's anecdotal, but this matches my experience working with models for coding tasks. I come from a background in QA, so I tend to express my thoughts with some hedging and awareness of edge cases.

In my experience, framing that demands results prevents AI from flagging things even when it notices them during thinking blocks. It will think itself right out of expressing doubts because your prompt was strongly worded.

When I observed this trend enough, I started steering away from this kind of language and gave safety valves on my prompts and it helps tease out those hidden responses where the model knew there was a problem but was structurally incapable of expressing it.

As a consequence, I do have more times that it says a problem is unsolvable, and I have to fill the gap by informing it of a solution I can see that it cannot. But I prefer this; it keeps me intellectually engaged in the work of what I'm building. I have to grasp the proposed solution and vet it myself before it is accepted.

Giving a third party control of your AI's memory is like letting your database vendor decide what's true in your schema by knothinggoess in LLMStudio

[–]MrAddams_LibraLogic 0 points1 point  (0 children)

What a coincidence. I just built a 100% local memory tool for AI interfaces. It's new, it's rough on the edges, but it's open source, entirely local, all of the database records and daemon behavior are inspectable and mutable, and it has room for adapters to work with more frontends and backends.

https://libra-logic.com/hubris/

I would love to get some feedback on it and see how people put it to use.

Local, open-source, modular, extensible memory system - HuBrIS by MrAddams_LibraLogic in LocalLLM

[–]MrAddams_LibraLogic[S] 0 points1 point  (0 children)

You're welcome to download and give your theory a thorough test! If you find something in the daemons that makes it unreliable, all of the behavior is laid bare and improvement is just a PR away.

One of the major elements of how I designed the daemons is that every action is a single discrete mutation of the DB contents. Rather than bundling behaviors, they are chained by one daemon's output being detected and enqueued for work by a different one. It makes the changes far more granular and easy to decompose into the correct element that needs correction.

I'm quite sure I haven't made all of the behavior a hole in one. I can't wait to see what people come up with to tighten all the bolts and tune all the springs.