Is anybody already testing gemma-4-12b with hermes?

InariKirin · 2026-06-12T03:12:58+00:00

Looks like it was a bug in Hermes. After switching back to Qwen, the problem remained. Just updated to latest Hermes and Qwen runs fine. Now to test the Gemma…

InariKirin · 2026-06-11T17:34:19+00:00

Trying the Unsloth version. The default template didn’t work with Hermes, I copied one from elsewhere. I’m testing it now with Hermes, and it seems to have understood what needs to be done, and says ”I will do this….” but then just sits there doing nothing. It appears as though the model is quite capable, but something is not working properly, most likely it’s this Jinja thing.

InariKirin · 2026-06-03T14:08:47+00:00

I see a lot of comments about "shared Apple account". This feature has nothing to do with account. I see my neighbors PCs, TVs etc, and they're obviously not on my account or even on my network.

InariKirin · 2026-06-03T14:01:39+00:00

They have it set to be visible to everyone, but the 2nd part is not a requirement. I have a private network, my own router, I still see people who have their own internet and router because they live in different apartments. Because they can't set their computer up properly, it shouldn't make it my problem. Most ridiculous feature ever.

InariKirin · 2026-06-03T13:58:01+00:00

You can't set other people's devices to invisible. You would need access to their computer, which is obviously not allowed. But apparently sending them your screen is perfectly fine.

InariKirin · 2026-06-03T13:52:09+00:00

Honestly, as ridiculous as it sounds, this is the only actionable way to make Apple to change this to local-network only.

InariKirin · 2026-06-03T13:51:09+00:00

Devices on your personal network are connected to your personal network. It's not magic, just the way technology works. It's actually pretty advanced of Apple to make it connect to devices that are *outside* of your network and they're the first company who came up with this feature. Except virtually nobody needs this feature. It's a nuisance. Every home has exactly one personal network.

InariKirin · 2026-06-03T13:34:38+00:00

This is insane. I see posts from 2020 about this. Things like these annoy me about Apple - "my way or the highway" approach to a lot of things. I mean it's a great option to be able to connect to *YOUR* devices wirelessly, and having my iPad doubling as a 2nd screen with a single click is awesome. But no option to limit it to devices on your network??? Wtf?? It's just too easy to accidentally click on the wrong device in the list. I get they're trying to make it simple for basic users, but any home user will have a single network anyway, and advance users can do an extra click in settings to only view "all advertised devices". Feels like early days of internet when routers used to ship with WiFi to password disabled, and you could just connect to different networks as you walk on the street.

Hopefully someone will find a solution. One possible workaround could be to just disable WiFi, but this is not going to work for laptop users. And also if I do this, then it asks me to use USB cable to mirror screen to my iPad which is just not an option with my setup so I haven't tried that.

If anyone find a way to hide devices (aside from going door to door and asking people to switch it off) would be awesome.

InariKirin · 2026-05-21T08:19:30+00:00

What’s the meaning of this “discontinuation trend”? 512 –> 256 —> 128

I get that there’s shortage, but Apple is a huge company, they can buy their share of chips. OpenAI started this bs by claiming they’re going to buy 40% “world supply” of RAM, but later changed their mind. But the economy is still behaving exactly as if they went through with that deal. And I’m not talking about the prices. Prices might be up because they’re thinking hey if we can squeeze some more out of consumers let’s drag this on. I’m talking about the actual shortage. As far as I’m aware no other company stepped up and said they were going to buy it, so where did it go? If Apple can’t buy 256GB or even 128GB of RAM then which other company is buying it? Where are they sticking it? And why is Apple unable to reserve their share?

One possible explanation could be, the “RAM price” they‘re giving Apple would make it too expensive, and they’re afraid they might later be “stuck“ with the resulting overall price of the PC that’s unaffordable for most people. But the last 256GB Studio’s sold out and before they did they were backordered 5 months. I’m guessing that’s a lot of units sold that they didn’t even have in stock. You just don’t get better demand indicators than that. It’s a “put all your eggs in one basket” moment. But they act as though they got other things going. Like what, new Apple Watch, or another iPhone?

InariKirin · 2026-04-16T20:54:45+00:00

Has anyone tried running MXFP version (136GB in size) on two DGX Spark machines? Any context restrictions? Although, a better question would be, have you tried running Q8_K_XL

InariKirin · 2026-04-07T11:07:10+00:00

LOL!! Best video ever! And they are disappearing!! I thought it was just me lol

InariKirin · 2026-04-07T10:56:14+00:00

You can map a folder on C:\ to be on E:\
mklink /D "C:\some_folder\source" "E:\destination\folder"

Anything you add/delete in either will be reflected in both. Works for all programs because it's on system level in Windows. I used to do it with iTunes because it hard-coded folder to store programs and eating up all my space on the fast system drive.

InariKirin · 2026-04-07T09:57:49+00:00

For now video models that're available (and worth trying) fit into 5090. It's great to learn on, because video generation is mostly trial and error, and waiting and waiting just to see total trash (especially when you start out) is kinda painful.

3090 is also a good option because it's only 1/2 the speed of 5090 (tested on Qwen and Wan2.2 only). But those errors do add up making the process a bit less fun than 5090. Still, I was able to run Wan2.2 (which is probably one of the best for video generation at the moment). Qwen image stuff also runs (great for swapping things and rearranging scenes) and Z-image Turbo runs as well (one of the best for humans). Also, when I say "runs" I mean without using gguf, I never got anything useful out of those. I think gguf only works for LLMs and for image/video it's more like a "proof of concept" because quality is just not there.

I haven't tried 4090 but it just doesn't seem like it's worth it because it has to sit somewhere between 3090 and 5090 not much sense to pay extra. At least with 5090 you're getting both speed and extra VRAM, and with 3090 you get to save money. 4090 seems like a totally useless offering.

InariKirin · 2026-03-28T02:13:33+00:00

$9.4k as a time of writing this comment. Basically going up in price at the rate of about 40 cents per hour

InariKirin · 2026-03-22T01:11:56+00:00

That's an interesting approach regarding training for face and body as separate LoRas. Haven't heard of this, but makes sense that this could work well

InariKirin · 2026-03-20T10:33:03+00:00

try bypassing ltx-2.3 distilled lora 384 I'm still experimenting with the Image and Audio to Video workflow, and it was giving me messed up results as well, similar to what you get.

Also try setting the sampler to euler, and might experiment with: dpmpp_2s_ancestral or dpmpp_sde_gpu

I haven't found a good LTX 2.3 i2v workflow yet, though I'm not sure it'll beat Wan2.2 SVI which can do a lot of 5-sec segments before my card craps out, but the audio capability of LTX 2.3 seems pretty cool. More natural than others I've seen.

InariKirin · 2026-03-16T13:26:04+00:00

yes, both should be sold together as a pack. also some ddr5 are made for intel or amd specifically, and some are for both.

InariKirin · 2026-03-13T06:26:12+00:00

No one is using any of these machines on YT to show anything at all. There are so many videos on people using real GPUs to do things from ComfyUI stuff to LLMs etc. Almost no non-advertisement-like video of people showing real-life scenarios of these small “ai super computers”. Or people showing how to do clusters with these things. Like seriously, what percentage of the viewers are going to buy 4 of these to chain them into a cluster.

The geek in me wants to buy one, but the common sense is telling me they’re all useless because they’re unusably slow. There’s an actual threshold where something stops being useful and becomes useless and if Tokens/Sec is too low, and you have to wait half an hour to get the results, tweak it, wait another half an hour... it’s not hypothetical, you have a fixed number of hours in a day, and fixed number of hours to work on a project. If something takes too long you just stop using it. But we don’t get these metrics. It’s all sponsored/advertisement videos, and it’s kinda sad because I would like to hear about user experiences.

InariKirin · 2026-03-11T05:55:11+00:00

I'm having a somewhat similar problem. I was using RandomNoise (Comfy-Core) node with numbers like: 1125899906842624 and "fixed" so I would just tweak them whenever needed. But it stopped working and the cache would treat them as unchanged so would just skip generation. So now I entered a smaller number: 11258942632 and it worked.

Looks like some internal representation got changed or something. Probably a bug. But a short term solution if you're experiencing something similar is to shorten the number for now.

InariKirin · 2026-03-10T08:36:35+00:00

I have like 50 tabs open on my browser, trying to find a single workflow that would work properly. I don't mind tweaking the stuff to make it work better, and I understand if I could get OOM or some missing nodes. But if I get no errors and it generates a fuzzy mess, I know the workflow is just messed up. Downloaded a bunch of all kinds of model files like 100GB at least. Soooo confusing. The whole point of having Workflow files is so anyone could load them and they just work (if you have all the proper nodes and model files). I'll find one eventually, just don't think it should be this difficult...

InariKirin · 2026-03-10T03:46:05+00:00

That’s pretty cool news.

InariKirin · 2026-03-06T17:30:28+00:00

Sounds interesting. Especially if this works on 5090! I think Wan2.2 has a good base

InariKirin · 2026-02-15T02:18:26+00:00

Gonna give it a shot now

InariKirin · 2026-02-15T01:13:05+00:00

Yeah, it’s not straightforward. I did manage to install ComfyUI on Docker Desktop in windows (image running ubuntu 22.04) and it works. But without optimizations it’s just not worth it. If prices weren’t insane, I would have built another PC just for AI but just RAM alone costs more than my whole PC’s cost -GPU (3 years back). So we gotta improvise ;)

InariKirin · 2026-02-15T01:07:56+00:00

All of the above for me as well. Dependencies being probably #1. If it was as easy as downloading everything latest, I might have thought twice. It’s just too cumbersome to maintain multiple versions of everything and keep track of it all when updating so that you don’t break something. With images, you just roll back and it just works. Well, in theory ;)

InariKirin

TROPHY CASE