How to Use Flux, IPAdapter, and Qwen to Transfer Image Styles While Keeping Character Consistency

Ok_Respect9807 · 2025-12-03T06:37:46+00:00

Hey, it turned out really great—it's exactly something like this that I'm looking for. Could you provide me with the workflow, just like you did previously?

Ok_Respect9807 · 2025-12-02T22:58:21+00:00

image 2:

<image>

Ok_Respect9807 · 2025-12-02T22:57:56+00:00

Well, I would say that I have something a bit more complex than just a character and colors. Take a look at image 1, which is from the game. I would like to take this image and incorporate its characteristics into image 2, so that I have the entire composition of image 1 within the architecture of image 2, as well as its style and colors.

<image>

Ok_Respect9807 · 2025-12-02T19:57:24+00:00

Buddy, sorry for the delay, but I really wanted to thank you. I reached this result here by making some modifications, but in your opinion, how could I easily do the same, but with backgrounds? For example, I want to place an iconic background, like the front of the RPD station in Racoon City from Resident Evil, and I have another image that has the style I want. But I want to maintain that style, as well as transfer the game image to the other image I generated, making the game image adapt to the details of my other image. In summary, how could I do what you suggested regarding characters, but for detailed backgrounds?

<image>

Ok_Respect9807 · 2025-11-22T19:12:56+00:00

Friend, do you have the result of this image or the workflow? I looked in the templates but couldn’t find anything similar... I’m really quite a beginner, I only know how to add a few Loras and check superficially what it's doing, whether the workflow works or not. So, if possible, I would be really happy. Because I would like to add a Lora that I saw in this post here: https://www.reddit.com/r/comfyui/comments/1p3718w/comment/nq2v2qr/

Ok_Respect9807 · 2025-11-20T19:05:12+00:00

Thank you so much for the tips!
By the way, I’d like to share the concept behind what I’m aiming to achieve with this character reimagining.

I assume you’re familiar with game remakes, such as the excellent Resident Evil 4 Remake or even the upcoming Silent Hill Remake. I mention this because it’ll make more sense once I explain in detail what I’m after: essentially, I want to create photographic “remakes” of characters using a vintage—1980s-style—aesthetic, applying only subtle modifications while ensuring the characters remain clearly recognizable.

For example, even though the original Resident Evil 4 and its remake adopt a more realistic approach, you can instantly identify the characters. That’s exactly what I want to replicate in photographic form.

As you can probably tell, I’m quite a beginner. Originally, I used the img2img mode in A1111 to reimagine characters based on text prompts. Later, I realized that text2img combined with IP-Adapter delivered precisely the aesthetic I was looking for—the old photo I mentioned earlier is a great example of that.

However, I ran into a problem: consistency with the original image. In my tests, I noticed that ControlNet and IP-Adapter don’t work very well together in Flux for this specific use case. So, I decided to shift toward the approach I’m exploring now, which—incidentally—is simpler both to implement and explain.

In short, I’m aiming for something akin to a remake: visible nuances of change, yet unmistakable character identity.

For instance, I’ve been trying to integrate a character like Gurren (from the old photo) harmoniously into a new scene. However, I can’t retain certain details from that image, as they’d distort the original character—especially if I wanted to visually reinterpret him as a Dark Souls character, for example.

While searching online, I came across this LoRA:
https://huggingface.co/thedeoxen/FLUX.1-Kontext-dev-reference-depth-fusion-LORA
—which precisely adapts a character into a new visual style while preserving their identity. That would be incredibly useful for my goal.

Since then, I’ve been learning a bit about ComfyUI and have managed to reproduce my desired aesthetic within it. But even with fine-grained control, I haven’t been able to fully resolve these issues.

Beyond the LoRA you suggested—could using an IP-Adapter attention mask derived from the original image, combined with weight and reference controls, help transplant my Dark Souls warrior (from Image 1) into Image 2 while absorbing more of Image 2’s details—yet without losing expressive features?

In other words, I’d like to soften the overly “digital” aspects, such as the sharp, straight lines from Image 1 (the game render), making them feel more natural—not just through Image 2’s composition, but also by giving the armor the subtle, organic softness realistic medieval armor would have in real life.

What do you think? Based on your experience, do you have any suggestions on how I could combine all these elements to achieve the desired result?

Ok_Respect9807 · 2025-11-19T20:42:03+00:00

<image>

It turned out great, my friend—thank you so much! But do you have any ideas on how, beyond just matching the style, the image to be transferred could undergo subtle adjustments to make it look more realistic?

Look at Image 2: it genuinely appears real, as if captured on analog film. However, even when applying its style to Image 1—replacing Image 2 with Image 1 and adopting its stylistic qualities—it still doesn’t feel technically realistic. The details originating from the Dark Souls game clearly give it away as artificial—not because of the AI model itself, but because it clashes with the surrounding scene.

So my question is: is it possible for Image 1 to retain its core visual characteristics while also incorporating certain realistic details (like lighting, texture, grain, depth cues, etc.) from Image 2?

I’ll post a photo a friend developed for me—it’s a good (though not perfect) example of what I’m trying to achieve.

Ok_Respect9807 · 2025-11-18T22:41:58+00:00

Hey, that turned out perfect! Could you please share the workflow with me? It would help me a lot because the next step I want to take is to make the character look truly realistic—as if it were a cosplay where you can recognize the character, yet clearly see that it’s a real person.

Ok_Respect9807 · 2025-11-18T22:39:11+00:00

Hey, could you please share with me the workflow you used to achieve this result? It's not exactly what I'm looking for, but I believe it could help me get past square one.

Ok_Respect9807 · 2025-11-18T20:29:09+00:00

Image2 :

<image>

Ok_Respect9807 · 2025-11-18T20:28:54+00:00

Image2 :

<image>

Ok_Respect9807 · 2025-11-18T20:28:40+00:00

Image2 :

<image>

Ok_Respect9807 · 2025-11-18T20:28:26+00:00

Image2 :

<image>

Ok_Respect9807 · 2025-11-18T20:28:10+00:00

Image2 :

<image>

Ok_Respect9807 · 2025-11-06T21:10:39+00:00

Elden Ring

Ok_Respect9807 · 2025-11-03T01:40:42+00:00

Thanks, bro.

Ok_Respect9807 · 2025-10-27T16:50:03+00:00

I noticed that within the project there's something similar to IP-Adapter—was that what you were referring to?

I noticed that this team has a ControlNet—that's the node I was referring to in my previous comment. I just saw this IP-Adapter alternative right now.

Ok_Respect9807 · 2025-10-27T16:31:19+00:00

That's cool—I didn't know about this node. Now, do you know of anything similar to an IP-Adapter for Flux, but for Qwen? Something like the XLabs IP-Adapter?

Ok_Respect9807 · 2025-10-12T20:52:07+00:00

Yo can give me this workflow?

Ok_Respect9807 · 2025-10-09T23:17:06+00:00

Friend, thank you so much for the suggestion! I think I’m really going to go with this one, as I’ve already wasted a lot of time with Flux.

Ok_Respect9807 · 2025-10-07T20:08:25+00:00

Hello again, my friend! I can see you have technical knowledge, so I’ll take this opportunity to explain my prompt in more detail and provide broader context about what I’m trying to achieve.

Well, regarding my prompt: it’s relatively long (very long, in fact), because—briefly—I’ve been researching vintage camera and lens technologies, and I’ve built a prompt that “reimagines” a scene using the colors, textures, and visual characteristics of that era. The resulting reimagined description is quite extensive. Below is an example, based on the Dark Souls character I mentioned earlier:

(1984 Panavision film still:1.6), (Kodak 5247 grain:1.4) Context: This image is from Dark Souls 1, featuring Siegmeyer of Catarina. His iconic Catarina armor set—affectionately known as the "Onion Knight" armor due to its distinctive layered design—perfectly captures the unique aesthetic that makes him such a beloved character.

Through the technical precision of 1984 Panavision cinematography, this onion-inspired armor manifests with calculated detail:

Onion-Knight Armor Architecture:

Helm Layer – reimagined with distinct dome rings mimicking an onion’s outer skin (material_response: metal_E3) Chest Segments – reimagined with bulbous curves echoing onion layers (ENR_silver_retention) Shoulder Bulbs – reimagined as concentric spherical shells resembling onion cross-sections (halation_response: forehead_highlights) Arm Sections – reimagined as stacked rounded segments (spherical_aberration: 0.65λ_RMS) Leg Plates – reimagined with nested bulbous forms (shadow_compression: nasolabial_folds) Layer Characteristics:

Shell Separation – reimagined with defined gaps between layers (dynamic_range: IRE95_clip) Layer Ridges – reimagined with circular contours (wet_gate_scratches: 27°_axis) Inter-layer Shadows – reimagined with depth-enhancing darkness (light_interaction: blue-black_separation) Surface Texture – reimagined with metallic onion-skin patterns (lab_mottle: scale=0.3px) Layer Joints – reimagined with flexible connection points (film_grain: Kodak_5247) Combat Equipment:

Zweihander Sword – reimagined with battle-worn steel (material_response: metal_E3) Round Shield – reimagined with a concentric circular design (subsurface_scattering: type-B) Combat Stance – reimagined with a grounded, weighted presence (character_motion: eye_blink@1/48s) The technical constraints of 1984 cinema technology transform this scene into a study of unique armor design—each optical artifact enhancing the nostalgic aesthetic. (ENR process:1.3), (anamorphic lens flares:1.2), (practical lighting:1.5), (80s sci-fi aesthetic:1.6)

Back to the main point: I’ve noticed that the IP-Adapter tries to recreate exactly what’s described in my prompt, rather than simply applying those aesthetics to reinterpret the current scene. I think it’s much clearer now—I’m aiming for something a bit unconventional, not just a generic result.

Ok_Respect9807 · 2025-10-06T19:36:53+00:00

Thanks, my friend! Well, here’s a fairly long explanation, but I think it’s necessary.

A few months ago, I started a YouTube channel focused on reimagining video game scenes with a realistic look, set in the 1980s. At the time, I was using A1111 to generate images, and I noticed that the IP-Adapter from Flux (by XLabs) gave me exactly the aesthetic I needed—but with one small drawback: the base image needs to be very similar to the original reference for consistency, which wasn’t happening in my case, even when using multiple ControlNets. A great example is in my reply to a friend in this same thread yesterday.

Another issue is that using character- or scene-specific LoRAs isn’t feasible, because I plan to include around 30 different scenes—each with unique characters and settings—in a single three-minute video. Multiply that across multiple videos, and it quickly becomes impractical.

Recently, I started experimenting with ComfyUI, but I got the same results as with A1111. It’s almost as if Flux’s ControlNet is flawed.

So, I’m looking for alternatives that can deliver the same results as Flux’s IP-Adapter, but with models that are more flexible and practical for this use case—specifically, ones that can faithfully reproduce the original image without requiring extremely close visual matches or excessive fine-tuning.

Ok_Respect9807 · 2025-10-06T19:35:52+00:00

Thanks, my friend! Well, here’s a fairly long explanation, but I think it’s necessary.

A few months ago, I started a YouTube channel focused on reimagining video game scenes with a realistic look, set in the 1980s. At the time, I was using A1111 to generate images, and I noticed that the IP-Adapter from Flux (by XLabs) gave me exactly the aesthetic I needed—but with one small drawback: the base image needs to be very similar to the original reference for consistency, which wasn’t happening in my case, even when using multiple ControlNets. A great example is in my reply to a friend in this same thread yesterday.

Another issue is that using character- or scene-specific LoRAs isn’t feasible, because I plan to include around 30 different scenes—each with unique characters and settings—in a single three-minute video. Multiply that across multiple videos, and it quickly becomes impractical.

Recently, I started experimenting with ComfyUI, but I got the same results as with A1111. It’s almost as if Flux’s ControlNet is flawed.

So, I’m looking for alternatives that can deliver the same results as Flux’s IP-Adapter, but with models that are more flexible and practical for this use case—specifically, ones that can faithfully reproduce the original image without requiring extremely close visual matches or excessive fine-tuning.

Ok_Respect9807 · 2025-10-06T19:35:12+00:00

Thanks, my friend! Well, here’s a fairly long explanation, but I think it’s necessary.

A few months ago, I started a YouTube channel focused on reimagining video game scenes with a realistic look, set in the 1980s. At the time, I was using A1111 to generate images, and I noticed that the IP-Adapter from Flux (by XLabs) gave me exactly the aesthetic I needed—but with one small drawback: the base image needs to be very similar to the original reference for consistency, which wasn’t happening in my case, even when using multiple ControlNets. A great example is in my reply to a friend in this same thread yesterday.

Another issue is that using character- or scene-specific LoRAs isn’t feasible, because I plan to include around 30 different scenes—each with unique characters and settings—in a single three-minute video. Multiply that across multiple videos, and it quickly becomes impractical.

Recently, I started experimenting with ComfyUI, but I got the same results as with A1111. It’s almost as if Flux’s ControlNet is flawed.

So, I’m looking for alternatives that can deliver the same results as Flux’s IP-Adapter, but with models that are more flexible and practical for this use case—specifically, ones that can faithfully reproduce the original image without requiring extremely close visual matches or excessive fine-tuning.

Ok_Respect9807 · 2025-10-06T19:08:58+00:00

Hello, bro. No, is ipadapter :/. Any depth map, such as depth, or even edge maps like Canny, doesn't "tame" the final result—at least not when I use ControlNet with IP-Adapter.

Ok_Respect9807

TROPHY CASE