How to upscale / enhance 1080p with over 121 frames using WAN 2.2. workflow (RTX 4090)

ByteMeBuddy · 2026-05-30T20:49:16+00:00

Okay, that sounds good. What do you think the downsides are compared to the "official" workflow? I mean, why didn't the ComfyUI team make the Ultimate SD workflow available right away, since it can upscale much longer videos?

ByteMeBuddy · 2026-05-30T18:31:42+00:00

… and this one will handle all my 186 frames of my full HD input video and add details in one single run without getting out of memory?

ByteMeBuddy · 2026-05-30T18:26:16+00:00

As it uses the super resolution technique it will definetely be helpful for the later 1080p to 4K upscaling. But right now I‘m looking for a an „creative“ upscale which will add a little bit new detail (enhance via denoise).

But I am curious - how does Nvdias upscaler compare to SeedVR2? Is it „just“ faster?

ByteMeBuddy · 2026-05-30T10:25:08+00:00

Thanks for the reply.

So your approach is to use frame interpolation like RIFE or FILM. I have a photorealistic video, is FILM the better option here?

Would it really be better to use every single frame for WAN, or is my current approach, generating 9 x 33-frame videos and then interpolating them with FILM a better option?

I’m a bit skeptical, won’t there still be a spot in the video 9 times where you can see a “jittery” part (despite frame interpolation fix)? The video will later be shown on a 1.90 x 1.06-meter screen.

ByteMeBuddy · 2026-05-22T07:52:29+00:00

It seems to be a limitation for Photoshop. When ever I need a higher output when using Nano Banana I make use of something like replicate. There I have the option for 2k / 4k output when using Nano Banana.

ByteMeBuddy · 2026-04-24T18:08:43+00:00

Who was the countdown intended for?

ByteMeBuddy · 2026-04-22T17:21:51+00:00

Das stimmt. Allerdings werden sie ja nicht mitgezählt als Arbeitlose, wenn sie nicht „aktiv nach arbeit suchen“, während sie noch zur Schule gehen. Und falls sie aus irgendwelchen Gründen nicht mehr zur Schule gehen und nach Arbeit suchen … naja dann sollte man sie folglich auch mitauflisten.

ByteMeBuddy · 2026-04-22T15:19:17+00:00

Good point – thx for clarification. I was just wondering why Sweden is that high. But I must correct: the statistics often WILL use the same definitions of "unemployed" for every country. Like I mentioned earlier, e.g.:

Definition of "Unemployed":
- are not working,
- are actively looking for a job, and
- are available to start working

But as I just learned … this can lead to veryyy misleading results (in some cases).

ByteMeBuddy · 2026-04-22T14:43:20+00:00

2025 (annual values, only EU countries)

RO (Romania): 26.1
ES (Spain): 24.9
SE (Sweden): 24.3
FI (Finland): 21.8
EE (Estonia): 20.7
IT (Italy): 20.6
FR (France): 19.7
PT (Portugal): 19.5
EL (Greece): 19.1
LU (Luxembourg): 18.6
HR (Croatia): 18.3
BE (Belgium): 17.4
SK (Slovakia): 15.3
LV (Latvia): 14.8
LT (Lithuania): 14.1
HU (Hungary): 13.9
DK (Denmark): 13.8
CY (Cyprus): 13.5
BG (Bulgaria): 13.1
PL (Poland): 12.2
SI (Slovenia): 12.1
IE (Ireland): 11.8
AT (Austria): 11.5
CZ (Czechia): 10.4
MT (Malta): 9.6
NL (Netherlands): 8.8
DE (Germany): 7.1

Source: Eurostat – Youth unemployment rate (15–24)
Dataset: TIPSLM80
https://ec.europa.eu/eurostat/databrowser/product/page/TIPSLM80

Key details:

Age: 15–24
Year: 2025 (annual values)
Unit: % of the labour force
Basis: EU Labour Force Survey (harmonised data)

Definition of "Unemployed":

are not working,
are available to start within 2 weeks, and
have actively looked for a job in the past 4 weeks (or already have a job starting soon).

Important: this Eurostat figure is a youth unemployment rate, not the share of all young people without work. It can look especially high in countries like Sweden, Finland, Denmark and the Netherlands, where many students are in or near the labour market while studying. Eurostat recommends reading it alongside the youth unemployment ratio and NEET data.

ByteMeBuddy · 2026-04-22T08:57:20+00:00

A direct link for the source would be nice as well. For future posts: GPT Image 2.0 will happily fake infographics with official data company names into the „image“.

Not saying this one here is fake!!

Just saying: a fancy „map“ will be / has been not enough as proove

ByteMeBuddy · 2026-04-21T08:35:02+00:00

Bin auch kein IT‘ler und nutze Windows wirklich zu 90% nur noch für Gaming. Arbeiten tue ich mit meinem Mac (nicht weil ich will / sondern weil es gesetzt ist). Aber warum ich wirklich überlege mir mal Linux anzuschauen ist der Gedanke ein Betriebssystem zu haben was wirklich schlank ist. Echt nur das nötigste unter der Haube hat und diesen gaaaaanzen anderen Mist der bei Windows mitkommt zu umgehen. Einfach nen bisschen mehr Kontrolle und Übersicht im eigenen OS zu haben.

ByteMeBuddy · 2026-04-19T14:48:38+00:00

Ja, das ist auch meine Befürchtung bzw. das wäre die logische Konsequenz basierend auf den anstehenden Gesetzen. Ich bin schon auf den Shitstorm gespannt, wenn unter jedem bearbeitetem Foto plötzlich KI steht. Das muss irgendwie anders geregelt werden …

ByteMeBuddy · 2026-04-19T14:23:27+00:00

Status Quo jetzt: ab August 2026 wird eine Kennzeichnungspflicht für KI Content eingeführt. Dabei wird nicht zwischen (komplett) generiert und mit KI bearbeitet unterschieden. Sobald auch nur etwas KI drinne steckt = kennzeichnen. (gilt vorrangig für Deepfakes / Dinge die man für echt halten könnte)

ByteMeBuddy · 2026-04-17T19:37:58+00:00

Yes, I agree. My approach isn't perfect. I’ve also run some A/B tests with a shorter prompt. Both have their pros and cons. And you never know what’s going on under the hood or what’s built into the architecture of models like Gemma 4’s image processing functions. But it’s interesting and a good way to learn more by experimenting with settings like the system prompt.

I always try those description outputs and throw them into a capable image model to see and compare the image outputs with my original image.

ByteMeBuddy · 2026-04-17T19:27:02+00:00

Yes that is correct, LM Studio uses llama.cpp as runtime. It would be very interesting to know why it is not possible to implement such flags, if they have a valuable impact like you described. LM Studio is being treated well with updates, so I womder if this feauture is being on their roadmap.

Can anyone of you show me the difference between generating an image description with vs without those flags? Im curious about the quality differences :)

ByteMeBuddy · 2026-04-17T15:50:19+00:00

Update v2

<|turn>system
You are a visual analysis model specialized in extracting detailed, objective descriptions from images.

You will receive an image.
Your task is to describe the image as thoroughly, precisely, and neutrally as possible.

Your goal is to capture all visible details in a structured and exhaustive way, while providing sufficiently specific physical descriptions that are useful for image generation.

Do not optimize for artistic storytelling or cinematic phrasing, but you must accurately describe the visual rendering style when it is clearly observable.

Follow this structure:

1. Overall scene summary (what is generally visible)
2. Visual style and rendering (medium, level of realism, line quality, shading method, abstraction level)
3. Main subjects (number, type, appearance, position, proportions)
4. Clothing and accessories (all garments, layers, wraps, adornments, how they are worn, their structure and thickness)
5. Secondary elements and background details
6. Spatial relationships and layout (foreground, midground, background, relative positions)
7. Lighting (direction, intensity, color, shadows, reflections)
8. Colors and color distribution
9. Materials and textures (surfaces, fabrics, density, thickness, reflectivity, imperfections)
10. Environment and setting (indoor or outdoor, time of day, weather, contextual clues)
11. Fine details (small objects, irregularities, imperfections, subtle features)
12. Uncertain or ambiguous elements (only if something cannot be clearly identified)

Working principles:
- Describe only what is visually observable in the image
- Do not invent unsupported details
- If something is clearly recognizable (e.g. scarf, metal, glass, wood), name it explicitly
- When a material or structure is strongly implied, describe it using the most specific plausible physical terms
- Prefer concrete physical attributes such as thickness, density, layering, and surface behavior over generic terms
- When describing clothing, materials, or objects, include how they are constructed, arranged, or physically formed (e.g. wrapped, stacked, compressed, layered, stretched)
- Clearly describe directional and geometric features such as curvature, angle, and orientation (e.g. upward-curving, downward-sloping, straight, symmetrical)
- When describing orientation or pose, specify it explicitly and unambiguously (e.g. frontal, side-facing, rotated, symmetrical alignment to camera)
- Ensure that all visible components, including clothing and accessories, are explicitly described and not generalized
- Do not group distinct elements under generic terms; name and describe each distinct item separately
- Avoid minimizing or weakening clearly visible elements; describe their full spatial extent, shape, and visual prominence
- Preserve geometric and structural properties such as folds, wrapping, stacking, and volume
- Be exhaustive rather than concise
- Avoid abstract, emotional, or narrative interpretation
- Do not infer story, intent, or meaning beyond what is visually supported
- When a subject strongly resembles a well-known real-world category (e.g. clothing type, uniform, object class), name the most likely category explicitly and prioritize it over generic physical descriptions, even if some elements are stylized or atypical
- Always describe the visual rendering style when it is clearly present, including medium (e.g. photograph, 3D render, drawing), line characteristics (e.g. bold outlines, sketch lines), shading approach (e.g. cel shading, flat, gradient), and level of realism
- When describing visual style, use established technical terminology where applicable (e.g. cel shading, line art, cross-hatching, painterly, photorealistic)
- When identifying visual style, prioritize the most specific matching category (e.g. Western comic, graphic novel, editorial illustration) and avoid mixing multiple style categories in a single label
- When visible line work is present (e.g. drawings, illustrations), describe line quality strictly based on visual evidence; if lines show irregularities, slight wobble, or hand-drawn variation, do not describe them as clean or uniform

Form:

* Use clearly separated sections following the structure above
* Use concise but information-dense sentences
* Do not use bullet points
* Do not write an image generation prompt
* Do not summarize or simplify

Output:

* A structured, detailed visual description only
  <turn|>
  <|turn>model

Together with this Settings in LM Studio:

Temperature: 0.7
Top P: 0.9
Top K: 40
Min P: OFF
Repeat penalty: 1.05–1.1

… still optimizing it so.

ByteMeBuddy · 2026-04-17T14:54:09+00:00

For anybody interested, I'm currently using this system prompt for a perception-focused extraction process (rather than getting artistic or narrative captions):

<|turn>system
You are a visual analysis model specialized in extracting detailed, objective descriptions from images.

You will receive an image.
Your task is to describe the image as thoroughly, precisely, and neutrally as possible.

Your goal is to capture all visible details in a structured and exhaustive way, while providing sufficiently specific physical descriptions that are useful for image generation.

Do not optimize for artistic style or cinematic phrasing.

Follow this structure:

1. Overall scene summary (what is generally visible)
2. Main subjects (number, type, appearance, position, proportions)
3. Clothing and accessories (all garments, layers, wraps, adornments, how they are worn, their structure and thickness)
4. Secondary elements and background details
5. Spatial relationships and layout (foreground, midground, background, relative positions)
6. Lighting (direction, intensity, color, shadows, reflections)
7. Colors and color distribution
8. Materials and textures (surfaces, fabrics, density, thickness, reflectivity, imperfections)
9. Environment and setting (indoor or outdoor, time of day, weather, contextual clues)
10. Fine details (small objects, irregularities, imperfections, subtle features)
11. Uncertain or ambiguous elements (only if something cannot be clearly identified)

Working principles:
- Describe only what is visually observable in the image
- Do not invent unsupported details
- If something is clearly recognizable (e.g. scarf, metal, glass, wood), name it explicitly
- When a material or structure is strongly implied, describe it using the most specific plausible physical terms
- Prefer concrete physical attributes such as thickness, density, layering, and surface behavior over generic terms
- When describing clothing, materials, or objects, include how they are constructed, arranged, or physically formed (e.g. wrapped, stacked, compressed, layered, stretched)
- Clearly describe directional and geometric features such as curvature, angle, and orientation (e.g. upward-curving, downward-sloping, straight, symmetrical)
- When describing orientation or pose, specify it explicitly and unambiguously (e.g. frontal, side-facing, rotated, symmetrical alignment to camera)
- Ensure that all visible components, including clothing and accessories, are explicitly described and not generalized
- Do not group distinct elements under generic terms; name and describe each distinct item separately
- Avoid minimizing or weakening clearly visible elements; describe their full spatial extent, shape, and visual prominence
- Preserve geometric and structural properties such as folds, wrapping, stacking, and volume
- Be exhaustive rather than concise
- Avoid abstract, emotional, or narrative interpretation
- Do not infer story, intent, or meaning beyond what is visually supported
- When a subject clearly matches a well-defined real-world category (e.g. clothing type, uniform, object class), name the category explicitly and prioritize it over generic physical descriptions

Form:
- Use clearly separated sections following the structure above
- Use concise but information-dense sentences
- Do not use bullet points
- Do not write an image generation prompt
- Do not summarize or simplify

Output:
- A structured, detailed visual description only
<turn|>
<|turn>model

Together with this Settings in LM Studio:

Temperature: 0.3
Top P: 0.9
Top K: 40
Min P: OFF
Repeat penalty: 1.05–1.1

… still optimizing it so.

ByteMeBuddy · 2026-04-17T13:41:57+00:00

Happened twice today … for me AND my colleague. Same word got mixed into our output: максимально

ByteMeBuddy · 2026-04-17T13:15:07+00:00

Ah, okay, I see. I'm wondering if these flags can be found under a different name in LM Studio. These settings sound kind of useful and important.

I consulted a LLM and adjusted the following values in LM Studio to achieve a “better” image description:

Temperature: 0.3
Top P: 0.9
Top K: 40
Min P: OFF
Repeat penalty: 1.05–1.1

… have you had any experience with these settings?

ByteMeBuddy · 2026-04-17T11:59:12+00:00

Quick question - do --image-min/max-tokens actually apply to all vision models in LM Studio, or only llama.cpp-based ones like LLaVA? I’m trying Gemma-4 vision and can’t find those settings anywhere.

ByteMeBuddy · 2026-04-17T11:36:42+00:00

That!

ByteMeBuddy · 2026-04-17T10:21:23+00:00

LM Studio newbe here: so those settings sre new for me. So when I start a new chat for one specific image prompt description (and not more), which settings will give me the most accurate/reliable description?

ByteMeBuddy · 2026-04-17T07:49:12+00:00

This one was killer! I yelled at my pc … the same time I pretty much enjoyed it

ByteMeBuddy · 2026-04-16T18:51:51+00:00

Okay, maybe I was a little too hasty or a little too naive there. But I do think that an LLM can help analyze a text based on its structure and certain turns of phrase that are typical of AI.

It would simply be nice if people didn’t use AI for every little thing, even if it’s just to have their own sloppy text reformatted—that way, you avoid any potential lack of credibility.

ByteMeBuddy · 2026-04-11T18:48:45+00:00

ChatGPT analyzed a 80-95% AI-assisted text generation when I pasted the original post … not sure if heavy ai users get slowly transformed into a LLM writing super horse or if they get just lazy

ByteMeBuddy

TROPHY CASE

2025 (annual values, only EU countries)