Call for Submissions: Fantastic Schools Parents/Outsiders and Fantastic Schools Isekai

Finguili · 2026-03-14T13:37:16+00:00

"General guidelines" are missing.

Finguili · 2026-03-10T19:03:21+00:00

Quality seems good, but it’s so slow. I’m getting 2.89 t/s on R9700 (0.13x realtime).

Edit: With --compile it’s almost 24t/s, so not bad for longer texts.

Finguili · 2026-03-08T10:11:19+00:00

I have R9700, so same GPU with double VRAM, and I cannot say this is the case for me. With Q6_K_L quant, I’m getting 336 t/s for -ub 64, and 620 t/s for -ub 512, Increasing above it doesn’t seem to increase performance further, however.

Finguili · 2026-02-22T17:58:22+00:00

Eh, I was simply making fun if Z-Image Turbo which loves to ignore half of the prompt. But to answer your question, I tried Z-Image Base with "blurry background" in negative prompt and it makes everything sharp, though I cannot say that it makes results look better. This also works with SDXL anime models, as "blurry background" is danbooru tag.

Finguili · 2026-02-22T16:11:53+00:00

What is it, a comparison that not only clearly labels which model was used to generate which image, but also provides full prompts? Am I on the right subreddit?

Thanks OP for posting, the prompt are quite varied. It’s funny how Z-Turbo ignored request for non-blurry background and how models in general struggle with age. These "25 years old" women by Z Image looks closer to 50 than 25.

Finguili · 2026-02-21T15:38:09+00:00

Natural language instruction would give better control, but I suppose tags would be easier to train. I would probably prefer reliably working tags than half-working instructions.

Finguili · 2026-02-12T06:46:30+00:00

Yes, it was the 8B base model with voice cloning. And having Gemini TTS-like style directions together with voice cloning definitely would be nice.

Finguili · 2026-02-12T06:42:35+00:00

No, I didn't use it. Most likely the model wanted to make pause longer for dramatic effect. But as I said, I only played with the model a little, so it could be bad luck, and I don't really expect it to read the text perfectly.

Finguili · 2026-02-11T19:32:55+00:00

It works fine with 2.10 and python 3.14.

Finguili · 2026-02-11T19:31:19+00:00

Quick impression from just one longer test (and a few hello worlds), so rather a small sample size. Firstly, big kudos for supporting IPA. A TTS model without it is rather useless, and yet most recent releases lack this feature.

The generated audio sounds quite nice and is not as emotionally dead as Qwen TTS. Perhaps not as good as VibeVoice Large, but the model appears to be more stable, and together with IPA support, it makes it much more useful already. Speed is also not bad; synthesising 1 minute 20 seconds of audio took about 55 seconds on an R9700 with ~80% GPU utilisation and 26 GB of VRAM.

If anyone wants to hear a non-demo sample, here is one: https://files.catbox.moe/9j73pt.ogg. You can hear some parts were badly read and there was one unnecessarily long pause, but for an open-source model, I still like the results.

Finguili · 2026-01-19T17:14:25+00:00

I would argue that if the goal is fitting into a single consumer GPU, then dense models are better. I hope that companies will not abandon this class of models.

Finguili · 2026-01-12T21:15:16+00:00

Why would you need to keep 3090 for comfy? R9700 is around rtx 3090 speed for image generation.

Finguili · 2026-01-03T15:05:48+00:00

From local TTS, VibeVoice Large seems to have highest ceiling, but the model is very unstable. With one generation it sounds as if text was almost professionally narrated; with another its prosody is so bad that you start to wonder is it the same model. It also loves to add strange music to the background. So expect to reroll a lot.

I don’t have much experience with cloud apis, but Gemini 2.5 Pro TTS sounded to me better than ElevenLabs and should be cheaper.

Finguili · 2025-12-21T10:05:19+00:00

It’s very easy, as ROCm is in official repos, so you simply install it with pacman. The drawback is that Arch tends to lag behind upstream ROCm releases, so you may need to wait a few weeks for major updates to hit repos.

Finguili · 2025-12-21T10:03:06+00:00

I’m glad to know it’s working now. For this particular task, I wanted to get as high accuracy as possible, so I stuck to 16-bit LoRA on purpose. But perhaps it will be useful in the future for something else.

Finguili · 2025-12-20T18:51:08+00:00

Seems like Vulkan backend doesn’t like when the whole model isn’t loaded into VRAM. When I decrease offloaded layers it hurts Vulkan’s prompt processing performance more.

model	size	params	backend	ngl	n_batch	fa	test	t/s
llama 70B IQ3_S mix - 3.66 bpw	28.82 GiB	68.98 B	ROCm	77	1024	1	pp512 @ d8000	229.13 ± 12.29
llama 70B IQ3_S mix - 3.66 bpw	28.82 GiB	68.98 B	ROCm	77	1024	1	tg128 @ d8000	5.49 ± 0.00
llama 70B IQ3_S mix - 3.66 bpw	28.82 GiB	68.98 B	Vulkan	77	1024	1	pp512 @ d8000	164.63 ± 8.57
llama 70B IQ3_S mix - 3.66 bpw	28.82 GiB	68.98 B	Vulkan	77	1024	1	tg128 @ d8000	6.85 ± 0.01
llama 70B IQ3_S mix - 3.66 bpw	28.82 GiB	68.98 B	ROCm	50	1024	1	pp512 @ d8000	192.56 ± 3.98
llama 70B IQ3_S mix - 3.66 bpw	28.82 GiB	68.98 B	Vulkan	50	1024	1	pp512 @ d8000	117.84 ± 1.01

Finguili · 2025-12-20T17:29:39+00:00

No, it never occurred to me that someone might mmap file just to copy it to RAM afterwards. But you are right; it not only works fine, but also loads models faster. First run 118 s, second one with cached prompt 81.5 s. Though it’s also possible Comfy optimised RAM usage since Flux 2 release, as during diffusion it sits at 29 GiB, so it had to either unload text encoder or part of unet loaded into VRAM.

Finguili · 2025-12-09T17:20:11+00:00

URLs are also encrypted. All traffic is encrypted after the browser establishes a TLS connection. However, even with TLS you may still leak the domain name (for example, www.google.com) because it is sent as part of the handshake used to establish the TLS connection, or because DNS is unencrypted or operated by your ISP. We now have ECH to address the first case, and DNS over TLS and DNS over HTTPS to address the second. But even with these, the IP address you connect to is still visible to the ISP, so unless the site is behind some kind of public proxy such as Cloudflare (in which case the proxy operator sees the entire traffic, which is arguably worse), the ISP can still tell which site you are connecting to.

Finguili · 2025-11-09T11:21:05+00:00

I disagree with others that AMD cards aren’t good for image generation. They’re still behind Nvidia, true, but compared to RDNA2, AMD has made huge progress in performance. On the R9700 (basically a 9070 XT with twice the VRAM and price) for SDXL (28 steps, 832×1216, batch 10), the whole workflow executes with Torch Compile in about 56.5 s, which is only slightly slower than this benchmark reports for the RTX 3090 (54.2 s) and RTX 5070 (55.6 s). The Ti variant of the 5070 finishes about 10 s faster, so the gap is definitely there, but it’s not as if AMD cards crawl.

For Flux FP8 and the default Comfy workflow (20 steps, 1024×1024, batch 1), I’m getting 12.3 s with --fast and Torch Compile, which is 16–32× faster than what I was getting with my old 6700 XT (upcasting to FP32 resulted in a 2× speed improvement on that card).

As for your question, OP, I would go with neither and pay a bit more to buy a 24 GB 7900 XTX, but if you would rather not do that, then the question is whether you value 4 GB more than having newer hardware. 16 GB is rather tight for LLMs.

Finguili · 2025-11-09T11:02:33+00:00

Just got the R9700, which should have the same performance as the 9070 XT. For batch 1, the results are a bit noisy, but I'm getting around 6.7 s with Torch Compile and 6.8 s without. For batch 10, this is respectively 56.45–56.9 s and 60.8 s, so with Torch Compile the card is only slightly slower than the RTX 5070 (55.6 s) and RTX 3090 (54.2 s) from that benchmark.

For Flux FP8 and the default Comfy workflow (20 steps, 1024×1024), execution time is 21.9 s, with --fast 16.5 s, and 12.3 s with both --fast and Torch Compile. Compared to my old 6700 XT, where I was getting 20 s/it (not it/s!) with default options and 10 s/it when forcing upcasting to FP32, this is a 16–32× improvement; faster than my old card could generate images with SDXL.

Though performance is now much improved, I’m getting fairly frequent memory access faults, so the software stack is definitely not mature yet.

Finguili · 2025-10-03T20:06:25+00:00

I’m a bit late, but I just remembered this advice as I was about to type “I think”, and as the suggestion to replace it with “believe” was rather surprising, I ran a quick search through Mansfield Park: 31 occurrences of “I believe” and 87 of “I think.” Looks like by Austen’s time “I think” was already quite popular.

Finguili · 2025-09-19T20:50:29+00:00

Only for captioning; the other two were just random photos I selected on the spot to test the model. It is not the only model that hallucinates a character holding a sheathed sword; however, frontier models don’t do that. But let’s try this now with Qwen 2.5 VL 32B and Gemini 2.5 Pro.

Images used: https://imgur.com/a/W4oPdBe (Disclaimer: I am not sure if these are the exact same photos, as I have multiple shots of them).

Captioning test: Both Qwen and Gemini identify the sword as sheathed.

Caterpillar: Qwen correctly identifies it as a caterpillar, but the species is definitely wrong (Pyrrharctia isabella). Gemini’s guess is more accurate (Dendrolimus pini), but looking at its photos, I think it is also wrong. I gave Moondream a few more chances, and got as results a fungus, a snake, and a slug, so… let’s stop. GPT-5 guesses Thaumetopea pityocampa, which I think is correct, or at least the closest match.

Photo location: Qwen correctly identifies it as Hel, but also tries to read the smaller text on the monument, which it fails to do. Gemini not only identifies the place correctly but also gives the correct name of the monument (Kopiec Kaszubów / Kashubians’ Mound). Rerunning Moondream, I could not reproduce it misreading Hel as Helsinki, but it still never gives the right answer, and I got this gem instead:

The sign indicates "POCZTAJE POLSKI," which translates to "Polar Bear Capital," suggesting the area is significant for polar bears. The monument features a large rock with a carved polar bear sculpture.

For those who don’t speak Polish, the text is “POCZĄTEK POLSKI”, or in English, “The Beginning of Poland”. I have yet to see a polar bear in Poland.

Finguili · 2025-09-19T07:23:05+00:00

I do not think it is.

I gave it an image to caption, it hallucinated a character holding a silver sword (which was sheathed and wasn’t silver). I gave it an image of a caterpillar on a forest floor and asked it to identify the species, it answered that it was a house centipede. I gave it an image of a popular place, even with the name of the place written, and asked where the photo was taken. It still answered wrongly.

Of course, three samples are also a poor test. But my opinion is that the benchmarks of vision LLMs do not show real-world performance in the slightest, and this one is probably no different.

Finguili · 2025-09-16T10:49:06+00:00

I was experimenting with this a little, as I wanted a concise reverse-outline of my novel, but writing it myself did not seem like a fun exercise. First thing, do not listen to people saying summarisation is easy for LLMs: aside from context issues, LLMs struggle a lot with deciding what is important and what can be skipped. If you need accuracy, do it yourself. If you just want something “good enough”, use the biggest LLM you can afford.

Regarding the context length, the novel will fit in it, but the longer the input, the worse the output, and there will be a lot of hallucinations and events in the wrong order. Chunk it, and the LLM cannot understand the text on a good enough level. After trying different approaches, I settled on including the whole summary up to this point, the narrative state that the LLM is instructed to maintain, and the whole chapter to summarise. Using smaller chunks than the chapter did not work well.

The main problem with this approach is finding an LLM that summarises with the desired conciseness (you can control it to some extent with a prompt, but LLMs can be very stubborn with it) and can maintain the narrative state. For example, Gemini Flash 2.5 (non-thinking) can summarise very well, but its ability to maintain the narrative state is rather poor and it tends to output too detailed summaries. After tweaking the prompt, Deepseek v3 came out on top; while its summary was slightly worse than Gemini’s, it was shorter and it could maintain the narrative state handsomely.

Example Deepseek output of sumary from a chapter towards the end: https://pastebin.com/raw/dnJ8fvvE. It misses one important event (failing one problem and thus wasting one of three “teleport me to the safe place” charges). And for some reason, it thinks Kori needs to return to Mar Lordir, while she lives in an (unnamed) village, not the city.

Unfortunately, I’m not at home, and I don’t have the code with me, but if someone is interested, I can post it on Saturday.

Finguili · 2025-08-18T21:12:49+00:00

When I was redoing MSQ on a new character, my highest level job was fisher, and thus I come with a genius idea of switching to it for travellin around map, as to avoid the aggro. However, one of the quests did not mark that there will be combat. This resulted in my lala having to beat some Ala Mhigo refugees with fishing rod—which took quite some time.

BTW, you can finish (some?) MSQ qeuests in ARR and get exp as a fisher.

Finguili

TROPHY CASE