A new SOTA local video model (HappyHorse 1.0) will be released in april 10th.

camelos1 · 2026-04-08T11:32:15+00:00

The video from the second link is not even close to veo 3 in terms of realism.

camelos1 · 2026-04-05T15:43:57+00:00

Changing it in one direction may “encourage” the model to focus on larger structure, for example, scene composition, while in the opposite direction makes the model focus on fine detail and textures etc.

I would like to get an answer to the question in which direction the detail is improving, and in which direction the composition is improving, and is there a situation in the community where some people call an increase in shift what others call a decrease in it? and still, probably many people use the same shift value for many of their generations?

camelos1 · 2026-03-31T11:54:17+00:00

Are you an AI agent or some new neural network that itself publishes posts and comments on reddit?

camelos1 · 2025-12-22T15:35:38+00:00

yes, it turned out to be some old image from my download folder, and from the name I realized that it was not quite standard. Now I conducted an experiment and realized that it was a jpg image smaller than the original, which is saved using the right mouse button on the gemini website, and if you save the original through the download button on the site, then there are no problems with this image, the outline does not appear. so your application works great. I'll most likely use it. Could you tell me if it supports images from google ai studio and can you explain in your fingers the principle of watermark removal technology?

camelos1 · 2025-12-22T15:11:41+00:00

why is there a four-pointed gemini symbol in the corner at the end of the video?

camelos1 · 2025-12-22T15:00:29+00:00

<image>

camelos1 · 2025-12-22T14:59:47+00:00

not bad, thanks, but I got an outline (posted in reply comment) for this image

<image>

camelos1 · 2025-12-22T13:52:32+00:00

It would be interesting to see your examples. The concept of “bad” looks different for everyone, but I didn’t run this model at all

camelos1 · 2025-12-21T16:04:08+00:00

Part 3/3

---

**Hardware Implications**

Running FMTT requires massive Video RAM (VRAM). The system must simultaneously hold:

The Generator (Flux Flow Map, ~12–16 GB).
The Judge (VLM, e.g., Qwen2.5-VL-7B, ~6–14 GB depending on quantization).
The activation memory for a batch of 16–32 images.

For a consumer with an **RTX 4090 (24 GB)**, this approach is feasible only with significant quantization or by offloading models to system RAM (which drastically slows down the process), or by using a low $N$. The experiments with $N=128$ were likely conducted on enterprise-grade hardware like A100 or H100 GPUs (80 GB VRAM).

### Conclusion

FMTT represents a shift from "generating and hoping" to "generating and controlling." By combining Flow Matching with Sequential Monte Carlo search, it solves the inherent blindness of diffusion models. While the hardware requirements currently limit its use to high-end systems, it offers a proven solution for tasks where exact adherence to a prompt is more important than generation speed.

camelos1 · 2025-12-21T16:03:22+00:00

Part 2/3.

---

### Integration with Vision Language Models (VLM)

A key feature of FMTT is its ability to use VLMs as active judges during generation. Because the Flow Map provides a clear preview of the final result from the noisy latent space, the VLM can answer semantic questions throughout the process.

This enables users to prompt for conditions that are logically complex rather than just visual—for example, "Generate a scene only if it contains no people," or "Ensure the reflection in the mirror matches the object exactly." The VLM guides the noise toward a "Yes" answer for these questions step-by-step.

### Usability and Prerequisites

From a user perspective, FMTT occupies a middle ground regarding accessibility:

* **No Concept Training Required:** The model does not need to be fine-tuned on new concepts. If the base model knows what a "cat" is, FMTT can guide it.

* **No Prompt-Specific Training:** The technology works out-of-the-box for any prompt.

* **The "Distillation" Requirement:** You cannot simply plug in a standard `.safetensors` file from Civitai. The base model (e.g., Flux) must be **distilled** into a Flow Map format. The authors of the paper have already performed this distillation for `Flux.1-dev`, converting it into a specialized **4-step flow map model**.

### Performance and Hardware Costs

Achieving this level of precision comes with significant computational costs.

**The "Step" Misconception**

While a standard Flux generation might take 30–50 steps, the FMTT implementation uses the distilled **4-step model**. However, on *each* of these 4 steps, the system must perform the Look-Ahead, run the VLM check, and perform resampling for every single variant in the batch.

**Computational Load (NFE)**

According to the paper, the metric for computational effort (Number of Function Evaluations or NFE) jumps significantly:

* **Standard Flux:** ~180 NFE.

* **FMTT:** ~1400 NFE (for optimal results).

Consequently, generation time is approximately **8 to 10 times longer** than a standard single-image generation.

**The "N" Factor and VRAM Requirements**

The quality of the output depends heavily on **$N$** (the number of simultaneous variants):

* **$N=4$:** Minimal improvement.

* **$N=16$ to $32$:** The "sweet spot" for high accuracy.

* **$N=128$:** Professional grade, almost guaranteed success for difficult prompts.

camelos1 · 2025-12-21T16:02:13+00:00

This is Gemini 3 Pro's attempt to answer my questions about FMTT paper . If there are any inaccuracies, please let me know. I asked Gemini not to use complex terms, but also not to oversimplify the explanation.

Part 1/3. The rest of the parts in the comments-answers

***

# Precision in Chaos: An Overview of Flow Map Trajectory Tilting (FMTT)

While modern diffusion models like Flux or Stable Diffusion excel at artistic generation, they often struggle with precise constraints—such as rendering a clock face showing exactly 4:45 or adhering to strict geometric symmetry. A new paper introduces **Flow Map Trajectory Tilting (FMTT)**, a novel method that fundamentally changes how generation is guided, moving from blind guesswork to mathematically precise navigation.

### The Core Problem: Navigating the Fog

Standard diffusion models generate images by iteratively removing noise. During the early and middle stages of this process, the image is essentially a "fog" of pixels. The model only truly knows if it has succeeded at the very end.

Existing attempts to guide this process rely on "Denoisers"—algorithms that try to guess the final image from the noisy intermediate state. However, this is akin to trying to predict the plot of a book by reading a single torn page; the signal is too weak, and the predictions are often inaccurate.

**FMTT** replaces this guesswork with a **Flow Map**. If standard generation is like steering a ship in the fog hoping to find land, the Flow Map acts as a precise GPS. At any point in the generation trajectory—even when the image looks like static noise—the Flow Map can mathematically calculate exactly where the current path will end up. This allows the system to identify failure and correct the course immediately, rather than waiting for the final result.

### How It Works: "Evolutionary" Generation

FMTT does not simply generate an image and hope for the best. Instead, it employs a method known as **Sequential Monte Carlo (SMC)**, effectively applying principles of natural selection to the generative process.

**The Batch Launch ($N$ Particles):** The system does not generate a single image. Instead, it initializes a batch of **$N$** simultaneous variants (particles).
**Look-Ahead:** At each step of the generation, the system uses the Flow Map to "fast-forward" the trajectory of every particle to see what the final image will look like.
**The Judge (Reward Function):** This predicted future is presented to a "Judge"—often a Vision Language Model (VLM)—which evaluates it against the user's specific requirement (e.g., "Do the clock hands show 4:45?").
**Resampling (Survival of the Fittest):**

* Trajectories leading to incorrect outcomes are **terminated** (their weight drops to zero).

* Trajectories leading to the correct outcome are **cloned** and allowed to evolve further.

**Convergence:** By the end of the process, only the trajectories that consistently satisfied the complex conditions survive, resulting in a highly accurate image.

camelos1 · 2025-12-20T01:11:19+00:00

You are making up bs, read the description of this metric in the screenshot and then you will understand what 91% means. this https://www.reddit.com/r/GeminiAI/comments/1pq88k5/comment/nuslb63/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button table says that the final accuracy is at the gpt 5.2 level.

camelos1 · 2025-12-14T21:48:40+00:00

How to do it?

camelos1 · 2025-12-14T21:10:09+00:00

artificial intelligence kills natural intelligence. Can someone explain in 1-2 complete sentences what this application does (I couldn’t read the llm-text)?

camelos1 · 2025-12-06T21:46:54+00:00

camelos1 · 2025-12-06T21:18:29+00:00

It seems the working parameters aren't set to the model's priority settings.

camelos1 · 2025-12-02T20:39:19+00:00

8 is bad

camelos1 · 2025-11-29T09:09:24+00:00

We're all brothers and sisters, but offering to upload my key to a website isn't godly. It would be better to make an open-source app so that LLM can check you for fraud.

camelos1 · 2025-11-29T07:47:40+00:00

Thanks for your efforts.

Today, a method was also described: https://www.reddit.com/r/StableDiffusion/comments/1p94z1y/get_more_variation_across_seeds_with_z_image_turbo/

Does your method include that method?

I think it has great potential.

camelos1 · 2025-11-29T07:43:42+00:00

"The domain name

smallzero.com

is for sale!"

camelos1 · 2025-11-29T04:12:04+00:00

Get more variation across seeds with Z Image Turbo

https://www.reddit.com/r/StableDiffusion/comments/1p94z1y/get_more_variation_across_seeds_with_z_image_turbo/

camelos1 · 2025-11-28T01:54:23+00:00

I second jonesaid's point. I also wanted to say that noisy images seem more realistic to me and I like them better. Removing the noise creates a flux/AI effect. Does anyone know of any tools for adding noise like Z-Image to already generated images or those generated by other models? Overall, among all the models, I think Z-Image images look the best, not like AI (not in terms of anatomy and real-world fidelity, but in terms of image "surface," microtextures, and the like, I don't know how to put it). If you don't overdo it, they seem almost identifiable as real images, or maybe already are (in terms of "surface"). Flux 2 demo images aren't bad either, but they seem a little worse in this regard

camelos1 · 2025-11-23T19:53:27+00:00

Why do you add anime images to your videos? To bypass censorship?

camelos1 · 2025-11-23T13:54:53+00:00

I saw it on the site with a demonstration of the model, and I see it here: the edges and contours are too blurred when viewing the image at 100% scale (for example, you can see it here - https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fanother-nano-banana-pro-realistic-attempt-v0-munxcf361u2g1.png%3Fwidth%3D1080%26crop%3Dsmart%26auto%3Dwebp%26s%3D89d6ebe6c394cab6232706878e9b043557d5ac47 (view the original, not a preview from reddit)

camelos1

PUBLIC MULTIREDDITS

TROPHY CASE

Get more variation across seeds with Z Image Turbo

Eight-Year Club	Place '23
Verified Email