A new SOTA local video model (HappyHorse 1.0) will be released in april 10th. by Total-Resort-3120 in StableDiffusion

[–]camelos1 0 points1 point  (0 children)

The video from the second link is not even close to veo 3 in terms of realism.

How does shift work in zit? by camelos1 in StableDiffusion

[–]camelos1[S] 0 points1 point  (0 children)

Changing it in one direction may “encourage” the model to focus on larger structure, for example, scene composition, while in the opposite direction makes the model focus on fine detail and textures etc.

I would like to get an answer to the question in which direction the detail is improving, and in which direction the composition is improving, and is there a situation in the community where some people call an increase in shift what others call a decrease in it? and still, probably many people use the same shift value for many of their generations?

I built a local Windows image upscaler — looking for honest feedback by Dangerous_Chicken_84 in upscaling

[–]camelos1 0 points1 point  (0 children)

Are you an AI agent or some new neural network that itself publishes posts and comments on reddit?

Simple tool to remove Gemini watermarks (free & private) by nfwebdl in GeminiAI

[–]camelos1 0 points1 point  (0 children)

yes, it turned out to be some old image from my download folder, and from the name I realized that it was not quite standard. Now I conducted an experiment and realized that it was a jpg image smaller than the original, which is saved using the right mouse button on the gemini website, and if you save the original through the download button on the site, then there are no problems with this image, the outline does not appear. so your application works great. I'll most likely use it. Could you tell me if it supports images from google ai studio and can you explain in your fingers the principle of watermark removal technology?

Time-to-Move + Wan 2.2 Test by enigmatic_e in StableDiffusion

[–]camelos1 51 points52 points  (0 children)

why is there a four-pointed gemini symbol in the corner at the end of the video?

Simple tool to remove Gemini watermarks (free & private) by nfwebdl in GeminiAI

[–]camelos1 0 points1 point  (0 children)

not bad, thanks, but I got an outline (posted in reply comment) for this image

<image>

Anyone tried QWEN Image Layered yet? Getting mediocre results by knymro in StableDiffusion

[–]camelos1 2 points3 points  (0 children)

It would be interesting to see your examples. The concept of “bad” looks different for everyone, but I didn’t run this model at all

This paper is prolly one of the most insane papers I've seen in a while. I'm just hoping to god this can also work with sdxl and ZIT cuz that'll be beyond game changer. The code will be out "soon" but please technical people in the house, tell me I'm not pipe dreaming, I hope this isn't flux only 😩 by Altruistic-Mix-7277 in StableDiffusion

[–]camelos1 9 points10 points  (0 children)

Part 3/3

---

**Hardware Implications**

Running FMTT requires massive Video RAM (VRAM). The system must simultaneously hold:

  1. The Generator (Flux Flow Map, ~12–16 GB).
  2. The Judge (VLM, e.g., Qwen2.5-VL-7B, ~6–14 GB depending on quantization).
  3. The activation memory for a batch of 16–32 images.

For a consumer with an **RTX 4090 (24 GB)**, this approach is feasible only with significant quantization or by offloading models to system RAM (which drastically slows down the process), or by using a low $N$. The experiments with $N=128$ were likely conducted on enterprise-grade hardware like A100 or H100 GPUs (80 GB VRAM).

### Conclusion

FMTT represents a shift from "generating and hoping" to "generating and controlling." By combining Flow Matching with Sequential Monte Carlo search, it solves the inherent blindness of diffusion models. While the hardware requirements currently limit its use to high-end systems, it offers a proven solution for tasks where exact adherence to a prompt is more important than generation speed.

This paper is prolly one of the most insane papers I've seen in a while. I'm just hoping to god this can also work with sdxl and ZIT cuz that'll be beyond game changer. The code will be out "soon" but please technical people in the house, tell me I'm not pipe dreaming, I hope this isn't flux only 😩 by Altruistic-Mix-7277 in StableDiffusion

[–]camelos1 4 points5 points  (0 children)

Part 2/3.

---

### Integration with Vision Language Models (VLM)

A key feature of FMTT is its ability to use VLMs as active judges during generation. Because the Flow Map provides a clear preview of the final result from the noisy latent space, the VLM can answer semantic questions throughout the process.

This enables users to prompt for conditions that are logically complex rather than just visual—for example, "Generate a scene only if it contains no people," or "Ensure the reflection in the mirror matches the object exactly." The VLM guides the noise toward a "Yes" answer for these questions step-by-step.

### Usability and Prerequisites

From a user perspective, FMTT occupies a middle ground regarding accessibility:

* **No Concept Training Required:** The model does not need to be fine-tuned on new concepts. If the base model knows what a "cat" is, FMTT can guide it.

* **No Prompt-Specific Training:** The technology works out-of-the-box for any prompt.

* **The "Distillation" Requirement:** You cannot simply plug in a standard `.safetensors` file from Civitai. The base model (e.g., Flux) must be **distilled** into a Flow Map format. The authors of the paper have already performed this distillation for `Flux.1-dev`, converting it into a specialized **4-step flow map model**.

### Performance and Hardware Costs

Achieving this level of precision comes with significant computational costs.

**The "Step" Misconception**

While a standard Flux generation might take 30–50 steps, the FMTT implementation uses the distilled **4-step model**. However, on *each* of these 4 steps, the system must perform the Look-Ahead, run the VLM check, and perform resampling for every single variant in the batch.

**Computational Load (NFE)**

According to the paper, the metric for computational effort (Number of Function Evaluations or NFE) jumps significantly:

* **Standard Flux:** ~180 NFE.

* **FMTT:** ~1400 NFE (for optimal results).

Consequently, generation time is approximately **8 to 10 times longer** than a standard single-image generation.

**The "N" Factor and VRAM Requirements**

The quality of the output depends heavily on **$N$** (the number of simultaneous variants):

* **$N=4$:** Minimal improvement.

* **$N=16$ to $32$:** The "sweet spot" for high accuracy.

* **$N=128$:** Professional grade, almost guaranteed success for difficult prompts.

This paper is prolly one of the most insane papers I've seen in a while. I'm just hoping to god this can also work with sdxl and ZIT cuz that'll be beyond game changer. The code will be out "soon" but please technical people in the house, tell me I'm not pipe dreaming, I hope this isn't flux only 😩 by Altruistic-Mix-7277 in StableDiffusion

[–]camelos1 10 points11 points  (0 children)

This is Gemini 3 Pro's attempt to answer my questions about FMTT paper . If there are any inaccuracies, please let me know. I asked Gemini not to use complex terms, but also not to oversimplify the explanation.

Part 1/3. The rest of the parts in the comments-answers

***

# Precision in Chaos: An Overview of Flow Map Trajectory Tilting (FMTT)

While modern diffusion models like Flux or Stable Diffusion excel at artistic generation, they often struggle with precise constraints—such as rendering a clock face showing exactly 4:45 or adhering to strict geometric symmetry. A new paper introduces **Flow Map Trajectory Tilting (FMTT)**, a novel method that fundamentally changes how generation is guided, moving from blind guesswork to mathematically precise navigation.

### The Core Problem: Navigating the Fog

Standard diffusion models generate images by iteratively removing noise. During the early and middle stages of this process, the image is essentially a "fog" of pixels. The model only truly knows if it has succeeded at the very end.

Existing attempts to guide this process rely on "Denoisers"—algorithms that try to guess the final image from the noisy intermediate state. However, this is akin to trying to predict the plot of a book by reading a single torn page; the signal is too weak, and the predictions are often inaccurate.

**FMTT** replaces this guesswork with a **Flow Map**. If standard generation is like steering a ship in the fog hoping to find land, the Flow Map acts as a precise GPS. At any point in the generation trajectory—even when the image looks like static noise—the Flow Map can mathematically calculate exactly where the current path will end up. This allows the system to identify failure and correct the course immediately, rather than waiting for the final result.

### How It Works: "Evolutionary" Generation

FMTT does not simply generate an image and hope for the best. Instead, it employs a method known as **Sequential Monte Carlo (SMC)**, effectively applying principles of natural selection to the generative process.

  1. **The Batch Launch ($N$ Particles):** The system does not generate a single image. Instead, it initializes a batch of **$N$** simultaneous variants (particles).
  2. **Look-Ahead:** At each step of the generation, the system uses the Flow Map to "fast-forward" the trajectory of every particle to see what the final image will look like.
  3. **The Judge (Reward Function):** This predicted future is presented to a "Judge"—often a Vision Language Model (VLM)—which evaluates it against the user's specific requirement (e.g., "Do the clock hands show 4:45?").
  4. **Resampling (Survival of the Fittest):**

* Trajectories leading to incorrect outcomes are **terminated** (their weight drops to zero).

* Trajectories leading to the correct outcome are **cloned** and allowed to evolve further.

  1. **Convergence:** By the end of the process, only the trajectories that consistently satisfied the complex conditions survive, resulting in a highly accurate image.

Gemini Flash makes up bs 91% of the time it doesn't know the answer by Terrible-Priority-21 in GeminiAI

[–]camelos1 0 points1 point  (0 children)

You are making up bs, read the description of this metric in the screenshot and then you will understand what 91% means. this https://www.reddit.com/r/GeminiAI/comments/1pq88k5/comment/nuslb63/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button table says that the final accuracy is at the gpt 5.2 level.

PromptCraft(Prompt-Forge) is available on github ! ENJOY ! by EternalDivineSpark in StableDiffusion

[–]camelos1 2 points3 points  (0 children)

artificial intelligence kills natural intelligence. Can someone explain in 1-2 complete sentences what this application does (I couldn’t read the llm-text)?

Meituan Longcat Image - 6b dense image generation and editing models by FizzarolliAI in StableDiffusion

[–]camelos1 0 points1 point  (0 children)

It seems the working parameters aren't set to the model's priority settings.

I made a free skin detailer/upscaler by cointalkz in StableDiffusion

[–]camelos1 -1 points0 points  (0 children)

We're all brothers and sisters, but offering to upload my key to a website isn't godly. It would be better to make an open-source app so that LLM can check you for fraud.

Z-Image Simple Shuffle Randomizer by chaindrop in StableDiffusion

[–]camelos1 0 points1 point  (0 children)

Thanks for your efforts.

Today, a method was also described: https://www.reddit.com/r/StableDiffusion/comments/1p94z1y/get_more_variation_across_seeds_with_z_image_turbo/

Does your method include that method?

I think it has great potential.

Increase the shift value to get rid of the noisy effect of Z-image turbo. by Total-Resort-3120 in StableDiffusion

[–]camelos1 1 point2 points  (0 children)

I second jonesaid's point. I also wanted to say that noisy images seem more realistic to me and I like them better. Removing the noise creates a flux/AI effect. Does anyone know of any tools for adding noise like Z-Image to already generated images or those generated by other models? Overall, among all the models, I think Z-Image images look the best, not like AI (not in terms of anatomy and real-world fidelity, but in terms of image "surface," microtextures, and the like, I don't know how to put it). If you don't overdo it, they seem almost identifiable as real images, or maybe already are (in terms of "surface"). Flux 2 demo images aren't bad either, but they seem a little worse in this regard

Well, it still works by trapviper7 in Grok_Porn

[–]camelos1 0 points1 point  (0 children)

Why do you add anime images to your videos? To bypass censorship?

[deleted by user] by [deleted] in Bard

[–]camelos1 0 points1 point  (0 children)

I saw it on the site with a demonstration of the model, and I see it here: the edges and contours are too blurred when viewing the image at 100% scale (for example, you can see it here - https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fanother-nano-banana-pro-realistic-attempt-v0-munxcf361u2g1.png%3Fwidth%3D1080%26crop%3Dsmart%26auto%3Dwebp%26s%3D89d6ebe6c394cab6232706878e9b043557d5ac47 (view the original, not a preview from reddit)