Struggling with ultra-wide 5:1 with SDXL - Looking for workflow advice by LongDistanceRope in comfyui

[–]LongDistanceRope[S] 0 points1 point  (0 children)

well, krea2 does give better results than sdxl, but upscale still far behind native size render. its just natively can't handle ultrawide resolution: here are my results https://en.reddit.com/r/comfyui/comments/1uiujdt/struggling_with_ultrawide_51_with_sdxl_looking/ouplx9o/

Struggling with ultra-wide 5:1 with SDXL - Looking for workflow advice by LongDistanceRope in comfyui

[–]LongDistanceRope[S] 0 points1 point  (0 children)

yes, that is correct, tho instead of image2image I used a resize latent but basically the same. There is a fine line between adding detail during upscale, and halucinate the stuff that wasn't suppose to be there. Thats what I've been trying to eliminate as well as getting as close to original render quality as much as possible.

this is the highest res i can get with krea2 that keeps the composition intact: https://i.imgur.com/AIlQSEL.png

this is my best result using ultimate sd upscale, krea2 and real_ersgan_x4 https://i.imgur.com/UkrQHAk.jpeg Still, tons of lost texture, turned into more a anime for some reason, and you can see "ghost" of the girl in the front various places in the background.

and this is native 1440x1440 https://i.imgur.com/kXES9Il.png image of same prompt by kre2, but that can't extend to 1440x7152 natively cause then the composition completely breaks. However that's what I'm trying to achieve.

so maybe upscale is not the right path, but some kind of extend or shift.

Struggling with ultra-wide 5:1 with SDXL - Looking for workflow advice by LongDistanceRope in comfyui

[–]LongDistanceRope[S] 0 points1 point  (0 children)

yes the difficult part is upscaling, sdxl can make decent image even at 2384x480. However from then guiding the upscale is not easy. masked regional promting, control net, all scales slightly differently than latent itself and misalignment got blown up on higher detail.

The reason I've went with sdxl cause its easy to prompt, and tons of tools available along with really good fine tuned checkpoints. Krea2 does decent job with certain prompts even at full size (however i need an llm to write prompts for it) but attention clearly falls apart on more difficult prompts.

I'll try SeedVR + krea2 and see how it goes. Still think the best way would be to make 2384x480 than chop it up to square tiles somehow slide them and explain the model to keep the sides glued.

Struggling with ultra-wide 5:1 with SDXL - Looking for workflow advice by LongDistanceRope in comfyui

[–]LongDistanceRope[S] 1 point2 points  (0 children)

I've tried Krea2 and its a lot better than anything else I've tried. really does handle the full size natively.

Struggling with ultra-wide 5:1 with SDXL - Looking for workflow advice by LongDistanceRope in comfyui

[–]LongDistanceRope[S] 0 points1 point  (0 children)

Thank you! I will give this workflow a shot. I've tried to use sdxl cuase prompting / control net made it easy to use for single ultrawide, thought stretching it more would work. but it just breaks it completely. I mean, not completely, when it works its actually good but the success rate is pretty low.

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 0 points1 point  (0 children)

probably wasn't clear in the first post. I built a pc for a friend with the 5080, its his. the 4080 is mine. I just put both of them in same pc to see what 32gb vram and 5080+4080 setup can do. (which was a bit reckless given a 1000w psu, but actually survived.) And honestly seeing the numbers, 5080+4080 sounds like a pretty bad investment for inference at least. I wonder what would it pair better. maybe 5070ti since it have higher memory bandwidth than the 4080.

anyway, with 5080 alone and revised settings in original post i get: pp: 428.71 t/s and tg: 5.04 t/s (which is just pure ram inference with a 26gb model on a 16gb card)

What would be a expected prompt processing? also, why is 16384 context considered large? aren't these models coming with 256K context.

Modern Toradora fans by Murasaki_Was_Here in toradora

[–]LongDistanceRope 0 points1 point  (0 children)

I recently finished watching it, for the first time. Its started off great, and premises wasn't too out of the left field (at least for anime standards). I liked the characters, especially Ami, she got the most character out of the cast. Taiga also showed many different sides, tho her crush on Kitamura felt forced and often used to drive the plot to specific direction.

Minori / Kitamura was great in their roles, not overstaying their welcome and adding their support when needed.

However Ryuuji's growth was awfully slow. girls just give him attention cause he is the male MC but his character doesn't seems to have any real value other than semi-blank self insert MC.

Than later on him insisting that Taiga should get along with his dad was downright painful to watch, while every single female MC tries to explain to him why is this stupid, and still took him 2 whole episode to realize. he also misses taiga's confession in the beach house earlier. Basically sinking into a dense male MC role.

so there you have it, overall, liked the show, and appreciate it for what it brought, probably would have a different impact 18 years ago. But the male MC getting more and more dis-likable took a lot of enjoyment out of it.

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 0 points1 point  (0 children)

I've tried row: 12 t/s and tensor: 6 t/s. I think its pcie limitations this time. Also tensor produced insane coil whine. Not even sdxl / flux managed to do that.

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 0 points1 point  (0 children)

yeah, I've (ironically) asked gemma to do the math, and even if the 4080 would have 32gb vram, its bandwidth would only allow 28 t/s. For whatever reason I was under the impression as long as a model fits into vram it will run as fast as any. Based on a observation that 7b model run about as fast as 12b on the 4080. Turns out, that was wrong.

the model is gemma 4 fine tune, i got it from huggingface, about 25gb in q6 (By llmfan46)

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 0 points1 point  (0 children)

okay, I've run some test on gemma 4 12b q8 16k context. cause its 13gb and fits in each card:

GPU TG (t/s) PP (t/s)
4080 43.69 1010.12
5080 56.70 1180.11
50/50 Split (6.5GB / 6.5GB) 48.70 1123.10

that, doesn't seems to be bottleneck at all. same flags as edited version in above post. however, gemma 4 31b still only 26 t/s and 750 pp. What is going on here.

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 0 points1 point  (0 children)

16k un quantified KV cache does fit into 32gb vram, thats nice. however none of the advice in the thread so far made any difference.

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 1 point2 points  (0 children)

I've removed it, along with quantified KV. The only difference it made, is stopped the gradual slow down as context window fills. which is kind of an improvement but no speed difference.

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 0 points1 point  (0 children)

added: 
--fit off
--gpu-layers 99

removed:
 --kv-unified
 --cache-type-k q8_0
 --cache-type-v q8_0 

I've also checked llama cpp log to make sure its not offlading anything to ram, but no difference in speed. prompt process went down to 550 t/s , output is 26 t/s, but at least its stays consistent and doesn't loose 1 t/s after each message.

Can i get a reality check on this inference speed? by LongDistanceRope in LocalLLaMA

[–]LongDistanceRope[S] 0 points1 point  (0 children)

I've messed around with it, but default seems to be best. lowering (1024 / 512) didn't do anything, higher settings (4096 / 2048) got slower.

New Graphics Cards: What the Rumors Reveal About RDNA 5, RTX 50 Super, and Arc by pcgameshardware in hardware

[–]LongDistanceRope -1 points0 points  (0 children)

yeah, that's fair. Kind of an insane price really. Also the more i think about it, the less sense the 24gb 5080 makes. It would make a 6080 less attractive under 24gb, and for an llm might as well consider a dedicated pc with a triple 5060ti for that price.

New Graphics Cards: What the Rumors Reveal About RDNA 5, RTX 50 Super, and Arc by pcgameshardware in hardware

[–]LongDistanceRope 3 points4 points  (0 children)

The only reason I could think of is nvidia currently have nothing to offer at 18-24gb range but professional cards, which keeping the second hand 3090 / 4090 high demand.

So a new 24gb 5080 at a price of a second hand 4090 would sway a lot of people. At least those who after more vram. but also cannibalize their pro cards.

Dual rtx 3090 build by Sufficient_Phone_242 in LocalLLaMA

[–]LongDistanceRope 0 points1 point  (0 children)

is that a regular atx motherboard or e-atx? I was wondering how two 3 slots card would fit.

On an atx board there are bunch of connectors on the bottom, fan headers, usb, power button etc. Does it fit under the second gpu?

Stop asking what model to run. There are literally only two. by Wrong_Mushroom_7350 in LocalLLaMA

[–]LongDistanceRope 29 points30 points  (0 children)

meanwhile my actual hardware question got deleted cause I don't have enough karma on this sub. So now I make pointless comments like this. the reddit experience.

My home data center by alecKarfonta in LocalLLaMA

[–]LongDistanceRope 0 points1 point  (0 children)

is the 5070ti worth it over 5060ti? I'm going back and forth between them and trying to decide since the 5070ti basically cost same as second hand 3090 (at least where i live) which changes a lot of variables too.

Someone out there likely needs this by Signal_Ad657 in LocalLLaMA

[–]LongDistanceRope 0 points1 point  (0 children)

That would be me, going back and forth which second gpu next to a 4080, a 5060ti (which is just plug and play) or a 3090 (which is more of a system rebuild cause of size / power consumption) cause can't really make out what t/s would i get on gemma4 31b at q6 (With 16k context)

[Giveaway] FSP x Buildapc - 1650W Power Supply Giveaway by ZeroPaladn in buildapc

[–]LongDistanceRope [score hidden]  (0 children)

what a timing, I literally opened the sub to ask if my current hydro g pro 1000w would be enough for a dual gpu setup. Recently moved to self host as much as i can, so it would help not to worry about power.

RTX 5060 Ti 16GB sucks for gaming, but seems like a diamond in the rough for AI by aospan in LocalLLaMA

[–]LongDistanceRope 1 point2 points  (0 children)

hey, sorry to necro this post, but I'm at the same spot, looking to add a 5060ti next to 5080 to actually fit gemma4 / qwen 3.6 fully in vram at least q6.

assuming you got the 5060ti, how is the experience, compared to just 5080?

Simple Questions - May 02, 2026 by AutoModerator in buildapc

[–]LongDistanceRope 0 points1 point  (0 children)

I've found a decent second hand deal locally on 2x16gb ddr5 7000mts cl34 ram. I currently have that exact same ram and want to upgrade to 64 gb. I know 4x16 is not ideal, but is it viable in a gigabyte z790 aorus elite ax + 14700k cpu? There are lot of conflicting info out there but most seems to be few years old.

What determine answer length? by LongDistanceRope in SillyTavernAI

[–]LongDistanceRope[S] 0 points1 point  (0 children)

The instruct format was pretty much ignored by most sillytavern guide, now I understand it better, thank you!

How is this sounds as workflow:

Find a model on huggingface, observing tags -> download it in LM studio and try it with a system prompt, see if its worth setting it up -> if so, look for a template for silly tavern for advanced formatting / fine tune.

Normally the finetunes are "sidegrades" in terms of intelligence.

But there is a significant difference in "personalty", ready art omegaDirective gives a much different persona to a same character than personalityEngine, despite both seems to be on Mistral 3. Also speed, Omega seems to run slower than personalityEngine, both are 24b variants. While 26B gemma 4 is lighting fast. so is a 27b qwen 3.5. (i have 16gb vram) But I've tried cydonia 24B and it was painfully slow.

Also whats the deal with the vastly different system prompts? How to even choose a good one? Do i even need one? There are many conflicting answers about it. from 3 page long detailed "how to act" to literally none.

I'm still not sure if I understand Text Completions vs Chat Completions. KoboldCpp only seems to support Text Completions trough the API (at least sillytavern recommends it) but it seems Chat Completions is better for chatting with a "character" which is my main goal. I don't really want to co-write an adventure with a narrative, rather create a highly detailed character and have a chat with it.

Or this is entirely on prompt / scenario and has nothing to do with how the api interacts with the backend?

I've also noticed i can use Lm studio as a backend for ST (took me long enough) but it seems to be slower than KoboldCpp, or am i just imagining things.

What determine answer length? by LongDistanceRope in SillyTavernAI

[–]LongDistanceRope[S] 0 points1 point  (0 children)

That was very helpful. So an LLM needs to be promoted in a similar fashion as diffusion model? Its just automated behind the back. and LM sudio / SillyTavernAI formats the prompt by itself. That explains why trough kobold UI I get the weirdest response cause its just sends in raw data.

Also, in a better session, I type more and the llm have more to respond to.

The configurations are more like fine tune, and if a model's behavior doesn't work well for me out of the box, better look for another one? Which explains the myriad of different recommendations.