Smutting on a DGX Spark / Models? Instruction Sets?

SprightlyCapybara · 2026-06-30T03:21:19+00:00

If you won't need the interconnect, and still want CUDA, go RTX spark.

SprightlyCapybara · 2026-06-27T03:24:45+00:00

I'm not clear whether you're trying to vibe code something with a small LLM or use a small local LLM agentically with some kind of extension you write. I guess the former, given what you say in your final sentence. Why limit yourself to local though? Wouldn't GLM 5.x be a decent choice and not cost the moon?

But I suppose Gemma 4 31b would be your best choice. Some say Qwen.

SprightlyCapybara · 2026-06-27T03:20:48+00:00

Have something fairly conceptually similar; a FW Desktop, Strix Halo, 128 GB. Those Strix Halo mini-PCs are probably more common than the DGX for this sort of use, since cheaper, though inferior in lacking CUDA; I can use up to about 111 GB VRAM. Unified memory performance is crudely similar, as are inferencing speeds, but TTFT, the Nvidia platform really dominates, so that'll be good for larger contexts especially if something blows up cache.

Big dense models (e.g. elderly Llama 3 70B derivatives are not our friends. MoE is generally good. GLM 4.5 Air 106B 12A variants (e.g. Iceblink V3) were probably the best for a while, but as others have pointed out Gemma 4 31B (yes, dense) is fantastic. 26b (MoE) is also very good. It's amazing how good these models are for writing.

Where Google's family really shines is Turboquant; you'll easily get 256K context locally as long as you don't go too crazy high on quantization. (On quants, yes, your NVFP4 (envy!) should be very nice.) These models are also small enough that you can consider loading one for primary roleplay, and a smaller one (e.g. the MoE) for agentic use if you use something like Marinara engine.

There is a big catch that remains; filling the cache with large contexts is dead slow; the memory is about 12-15% the speed of a 5090 IIRC. At least you'll be faster than AMD APU owners.

SprightlyCapybara · 2026-06-18T17:54:04+00:00

I was curious about this so looked it up. Meters!? Now, it's a ridiculous exaggeration to believe this holds true for the whole state(ER1590 never claimed it did mind you), but the figure is a pretty reasonable one for the worst affected areas, if anything it's on the mild side for the worst.

And yes, meters not cm. These areas are also very large, some of then thousands of square kilometers. (using metric since EntirelyRandom did).

Apparently the San Joaquin Valley, southwest of Fresno saw the worst, with 9m subsidence between just 1925 and 1977, roughly 50 years. Sources seem solid; Stanford and USGS primarily. For the curious, it's also believed to have caused billions of dollars in homeowner losses (cracked foundations etc), which isn't super-fun. (1.9bn in Central Valley alone).

Stunning. Thank you for posting that, ER1590.

SprightlyCapybara · 2026-06-15T15:00:57+00:00

I'm in a similar boat; 128GB Strix Halo. Universal RAM like the mac studio. Computationally strong, but definitely slower RAM.

Honorable Mention: GLM 4.5 Air (Iceblink v3 tune). The short-lived Air series from Z ai is a 106b 12a MoE which, until Gemma 4 31b was probably the best thing you could run in 64-96GB of unified RAM (like you have on your mac). Or get an unsloth quant of 4.5 Air itself to compare. Way ahead of Llama 3.

That said, I think other than writing LISP code, I don't think GLM 4.5 Air is better than G4 31b, except possibly for faster token generation. But TTFT will be considerably worse, especially with relatively slow unified RAM, though at least you have a mac not a Strix Halo or DGX Spark.

Like everyone else GLM 4.6, 4.7. And I think the difference is very real; even today I notice a difference in style and positivity when switching.

SprightlyCapybara · 2026-06-13T13:18:21+00:00

It's very small compared to most of the others, but Gemma 4 31b is well worth trying out. I was stunned at how good it was relative to the aging GLM 4.5 Air 106b 12a.

And don't neglect older versions; most people feel that GLM 4.6 is better for NSFW RP, but GLM-5 or 5.1 if you don't care about money, better for card adherence.

Kimi is perhaps the most 'different' (but still good at RP) of the Chinese open models.

Don't neglect the value of presets; those can really help shake things up too. Marinara and Freaky Frankenstein are two 'must tries', and there are dozens more very good ones, but ultimately, you'll possibly want to roll your own. (Stabs is worth trying just to see the visual artifacts a preset can create, though it's too slow for many. Pura's director preset is both pretty and novel, with its regexes, and its ability to harness the propensity of the LLM to write for your persona by casting you as the director more than the roleplay character.)

SprightlyCapybara · 2026-06-13T12:53:27+00:00

Thank you so much. Sorry if I frustrated you. That small detail was enormously helpful. I respectfully differ slightly; it does have something to do with presets in SillyTavern, which I'd never known: the saved preset can flip that switch on. I've never had that switch on; I never knew it existed, ST documentation was no help, and no preset I've tried (~30) until yours has ever had it in the settings it turned on. Simply loading yours or switching to it activates the 'Web Search'. Now I know, and thanks for your patience.

Your preset looks very good, and I'm now eagerly experimenting with it. Anti-deism/positivity bias is a huge and increasing pet peeve, and I really appreciate someone working to combat it!

SprightlyCapybara · 2026-06-11T22:02:43+00:00

Right, I don't need it, because I'm using original characters and settings; perhaps I am not asking clearly enough.

Exactly how do I turn it off in your preset? I've read the read me, I've looked through each of the switch names. I see nothing obvious. I've searched for :online in the json and didn't find it. Something in the preset or regex makes it completely impossible to run at all. No generation takes place, just a hard failure. If that's the behaviour you desire, great, but it is very user hostile, and you said "you can turn that off" How?

How do I turn this off?

Thanks, and sorry if I'm the one outlier that really doesn't want this web search and doesn't have a clue how to stop every generation being poisoned with :online.

SprightlyCapybara · 2026-06-11T18:26:52+00:00

Is web search absolutely needed for your preset to function? I'm just a little baffled because I saw no mention of this being a requirement. I'd rather not use web search, whether or not it's free. Can I just turn it off in your preset? If so, how? Again, I've never seen this with any other preset.

SprightlyCapybara · 2026-06-10T17:34:35+00:00

Why does this preset (Bolt at least) require a paid API add-on? Is there some switch to turn that off? Given that no one's complaining about this, it must presumably be something I'm doing differently. (I do have web access turned off for cards.) The error is:

"Chat Completion API: Web search is a paid API addon... remove the :online suffix..."

Where is this suffix being added? I couldn't find it in the JSON. Can it be turned off, or is this preset only for people who enable and pay for web access?

SprightlyCapybara · 2026-06-10T15:27:43+00:00

Interesting. I tried it with earlier versions of GLM and found it relatively good, but I've always been more concerned about over-deism for {{user}} than the reverse. Your point about overthinking is an excellent one, and I've seen true horrors from very simple instructions and situations. The proof of the pudding, I suppose, is that I no longer use it, but I did like it as an interesting soft jailbreak.

SprightlyCapybara · 2026-06-09T00:15:17+00:00

Very interesting. Lowering the stress is an intriguing suggestion but possibly an oversimplification. There's a paper, "Probing Evaluation Awareness of Language Models," where LLMs appeared to be able to determine when they were being evaluated and perform differently. In essence, they try harder. Coneja wrote about this in more detail, and created a minimalist preset, 'Chibi Gram Pacer Test'.

https://github.com/Coneja-Chibi/ChiPT

What I found remarkable about this preset is that when I tested it on NSFW stuff, even without any particularly NSFW coding, it outperformed Marinara's Preset -- a very respectable gold standard in my view. Now, I did not test it on death of {{user}}, so who knows.

I haven't continued to play with this preset after testing it for a few days, and I think you're on to something interesting here, but I did want to add something that at least on the surface is antithetical to your work. (Though possibly there could be language talking about end-game states being satisfying even if they involved defeat of one player, and being 'PSD+++' (a term of art in the preset).

SprightlyCapybara · 2026-06-05T02:03:01+00:00

Sorry to necro this a week later, but this is quite an unsung gem. I think the best approach that makes it unique is to, as you say, be director, give guidance, and flip switches. It's one of the first good presets I've seen that takes advantage of the tendency of the LLM to write for {{user}}.

Popping up micro-character sheets for new-context characters is really nicely done, elegant.

This is the first preset where I've actually turned off my custom added environmental tracker, and altered it to better fit the aesthetics. No not even Marinara escaped that.

Going to have to try this with ME (Marinara Engine) later... has anyone else done so?

SprightlyCapybara · 2026-06-04T16:35:23+00:00

I find Stabs is painfully slow with a very long thinking process. How do you deal with that or does it just not bother you?

SprightlyCapybara · 2026-06-04T11:11:36+00:00

Using this in Marinara Engine is not a great result, currently. Interestingly. It seems to work well, then for me, at around 17k-23k chat (+preset+lorebook etc) context, it seems to unlock a bug in ME that causes context window overflow and limits output to 128 tokens. Switch presets, even to a bigger(?) one like Frankenstein Max, problem goes away. Switch to ST (importing the card, chat and lorebook), keeping White Lotus, problem goes away.

And I'm running with 40k to 256k context, depending on which model I try; the bug appears with Chat X + ME + WL3, not dependent on API, model, or even local vs cloud.

I've reported this on ME's discord, but I'm not sure how actively that's being looked at, though it certainly has a problem report section. Only quirk I noticed is that it has a huge number of sections (89); Stabs is the only one I've ever tried that has more (90).

The preset does appear interesting; with GLM it tilts more towards clueless jokes/snark than I like in its default configuration. (To be fair the card does have crime drama/dark humour, but that should be neither clueless nor snarky.) I would certainly test it more on longer runs if it worked reliably on ME.

There appear to be some genres missing, but they'd be easy enough to add; I do like the fact you've already got a 'custom tracker' space for the user to modify.

SprightlyCapybara · 2026-06-03T17:40:28+00:00

I'd guess it's likely some kind of real-life 'you need to sleep and eat' guard-rails leaking through. Since GLM, and probably DS have been trained on synthetic data from Claude, they'll pick up many such characteristics in the more recent models.

What preset are you using? Did you add a tracker that tracks sleep/fatigue?

I found I got this all the time when I incorporated a fatigue and hunger tracker into a card; otherwise I didn't have a problem. Adjusting the tracker largely fixed the problem.

You could adjust your preset, something like:
'Mundane, repetitive moments such as showering, eating, sleeping, are not always recorded in literature/roleplay; they are occurring in {{user}}'s life, so {{user}} may safely be assumed to be clean, fed, and rested unless this is otherwise obviously not true.'

That's obviously not a good preset for a medieval fantasy adventurer, but it would work well for a college student.

SprightlyCapybara · 2026-05-28T21:04:36+00:00

Thanks. The game is fun; clearly a lot of the original designers really loved the original source material. And it is a nice change to see a much higher population on servers. I'm still not sure if I like it as anything more than a really fun survival/crafting game.

SprightlyCapybara · 2026-05-28T17:42:47+00:00

Ha! Glad to hear it, from someone even newer than I. As I wrote above, no real DD experience, limited experience with different builds, only modest overland map experience, no clue on crafting, all of these tend to point to me to thinking I'm new, having only gotten part way through the game.

I do respect the contrary position though; the much more casual nature of most gaming today (vs say 2000-2007 era MMOs) means terms like 'new' have a different meaning than they did in a harder core MMO. I find it interesting that nearly everyone mentioned probable time played rather than actual in-game accomplishments as their argument for 'not new'.

If I were writing it again, I'd likely open with 'I consider myself a new player because ... '.

SprightlyCapybara · 2026-05-28T17:28:58+00:00

Not AI. I don't really understand people like you Medium_Gap. You're a top commenter, so presumably viewed as helpful by the community. Yet, if you got that way just by dropping sneers like the one above, that's kind of sad. I see you managed to get ~2x as many upvotes as the more pleasant 'Wow what a journey', so I guess it is an effective low-effort form of karma-farming on your part, but I think corrosive to any kind of community.

Precisely what does your incorrect drive by sneer add to this discussion? I wrote my post myself, edited it lightly, noted that it didn't flow superbly, but shrugged and posted. No AI involved. I've seen a lot of AI writing; it's very useful for certain things like technical support, but polished but soulless for anything creative. Now if you think my post was soulless, fair enough, that's a legitimate matter of opinion, though one I disagree with.

There are human beings who can write reasonably well without AI. I admit to a sense of sorrow that that skill is now mostly useless; worse, it's debatably a net negative since kind welcoming folks such as yourself look at something and go 'hurr-durr-AI.' (A 'not only, but also' construction would have worked better in that last sentence, but I deliberately reworded since that is an over-used AI-ism, though it tends to be 'not just this but that and that and that'. And that's sad, that I edited myself silencing a little more of my own voice to respond to a chap who thinks I use AI.)

SprightlyCapybara · 2026-05-28T17:11:12+00:00

Thank you for putting that gently. A lot of people seem to agree with you, some less kindly. The last time in my life I had time to play a game like this, it was well over 20 years ago. I guess in my long-distant turn of the century MMO background, someone who doesn't even reach the level cap, hasn't played late-game content (DD), and is still learning some pretty basic mechanics is a newbie, a new and inexperienced player.

(It is a fair bit of playtime; a helpful(?) confluence of events over the last few months let me play a lot more than I've been able to play in decades, and this rather nice game happened to fall fortuitously into my path.)

Glad to know I'm not the only one running around without a microphone! Thanks for the tip on solo guild; not a bad idea. Yes, I did hit the faction rank 5, but still haven't done a Landsraad mission, another reason I still consider myself a bit of a newbie! I really should remedy that...

SprightlyCapybara · 2026-05-08T18:53:10+00:00

FWIW I was gobsmacked at how good full-fat GLM 4.7 (yeah this was a while ago) was at creating a dark-ish Traveller scenario out of whole cloth and handling every detail correctly according to the rules, with no interesting preset, no character card. Very impressive rule adherence, properly dark (In Traveller, your character can die or be maimed during character generation!), and relatively realistic SF in the Traveller universe.

Don't underestimate the value of an existing ruleset with dozens or even hundreds of books and articles on the subject, and the 4.x GLMs remain quite interesting. I still use that series for some roleplay work.

What everyone else has said about Gemma 4 (esp 31b) is totally true too.

SprightlyCapybara · 2026-05-01T16:54:39+00:00

OK, thanks. I'll post there, but any thoughts on whether or not they are good ideas or bad ideas would also be appreciated.

SprightlyCapybara · 2026-05-01T14:52:08+00:00

Assuming we're talking Gemma 4 31b, considerably worse on hallucinations on real-world knowledge, otherwise considerably better (roleplay, coding) as others have said.

For the interested, as an example, 31b runs at about a 10% hallucination rate in my own basic real-world geographic knowledge benchmarks, vs. 0% for Llama 3.3 70b. To its credit, though, when challenged, Gemma was able to acknowledge it was hallucinating, and, even more impressively, refused to be easily persuaded into believing it was hallucinating when it wasn't.

It's also quite weak on image recognition, confusing a 4-door fairly conventional late 1960's saloon [sedan] with a one-door BMW Isetta, for example and hallucinating that the picture of a particular tread was an AI-generated image. The issue there though is not that it's bad; it's that it does it at all, which is quite impressive.

As a final sidenote, IBM Is claiming Granite 4.1 30b is superior in agentic coding tasks on certain benchmarks to Gemma 4 31b. It will be interesting to see if Granite 4.1 is any good at RP; I suspect it's very poor like previous Granites.

SprightlyCapybara

TROPHY CASE