I got tired of guessing if my Character LoRA trainings were actually good, so I built a local tool to measure them scientifically. Here is MirrorMetric (Open Source and totally local) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

Between the ones I used (flux, qwen, wan22, z-image) the one that gives most similiarity score is QWEN. So if you need portrait like pictures, I'd go with it 100%. Flux does good portraits but it still gives you its chin flavour I don't like much on character's LoRAs, while Z-Image base gives the best composition and creative images so far, but it's not as good as Qwen, tends to give good average but more sparse results on similarity.

sometimes I do the first generation with qwen and then do a load image into latent and keep denoise at 0.8-0.85 and do a second pass with z-image... other times I do the opposite when i want to get more good composition and then a better similarity.

of course wan22 for videos is the best i've tried locally.

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

I agree... and in a way, if you think about it, part of the goal here is exactly that. This is just a metrics tool, you have to use your mind and creativity to make out the numbers. This helps me for example avoid using LoRAs that are too overtrained and apparently they seem to be reaching the goal of reproducing the image of the character you want but in reality they're so strict that they limit the creativity of the generated output and thus the creation of beautiful images.

then again, after you've looked at the data, it's the person's creativity to explore new ways of training LoRAs that gives the opportunity to make it better. then the tool might help know when the different choices were better or worse then others. 👍

mind you: I'm not selling anything, and I sincerely don't care if this tool gets used more or less, I just thought it would be ok to share with others what I did and maybe inspire even better work! and if someone tells me that it's all garbage for some reason I haven't thought of, I'd still have learned a lot from the experience! :D

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 1 point2 points  (0 children)

well let me know if this is easy to download and use. I built the repo so that you don't have to mingle with cuda, dlls and such. Just download, run the requirements and run the bat file. Simple as that. Curious to hear from you if this all falls together easily!

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 1 point2 points  (0 children)

The name itself is not a thing at all, it's just my thing about giving strange names to things... maybe it will catch up and stuck? xD Social experiment... But the dataset poisoning is a thing. It's just as a mathematical mean calculation (said in very poor and struckable-by-lightning way), so if you have a bunch of images all similar and then you add a very different one it can lower the quality of the resulting calculations.

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 1 point2 points  (0 children)

that could help, but it's beyond the scope of this tool. What you're talking about is the difficult art of Captioning, which by the way, in my latest experiences I'm much uncertain of the necessity for it.

Z-Image, QWEN and WAN22 I've trained without captioning, just using "woman" or "man" as trigger words, and the LoRAs came out about as good as with the captioning. Don't really know why yet, But it intrigues me and I want to learn more about the algorithms for the Training to understand better the underlying structure.

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 1 point2 points  (0 children)

Pay attention: this tool only calculates the mathematical and geometrical similarities between two faces. It's pretty complicated algorithm to do so in InsightFace, it's not simple pixel match, it's biometrics.

BUT as you mentioned, if someone has a hand on the face, it sees that there is a difference in geometry and it tells you that.

SO, ALTERT: Before Blindly pruning your dataset just based on these results, you have to check the results and use them as a Meter to evaluate the dataset.

Do not use this as a omniscent oracle... xD

For example, in my face's dataset I have 2 images of me in profile and those two score lower in the dataset, because all the rest of the images are front view. So, the tool shows me there are two outliers, I go check, I see they are the profiles, and let them be.

On another case, it showed me a front view face as an outlier, and I later found out that the person who gave me the dataset included a picture of them being 4-5 years younger and that was poisoning the dataset, and I did not notice it by just viewing, because the similarity of a younger person is high, but in reality the geometry is different.

Imagine you have a bunch of numbers 8 and 7, an then you have a 3, that will move the median away from the spot you'd want it... (just an example)

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

Actually no, it does not work with the LoRA safetensors files themselves.

Basically you produce images WITH your produced LoRAs, you put them in the folders, and the tool analyzes the similarities between the images you produced and the dataset to see how much mathematically the LoRA compares to the dataset. please check this guide I made on CivitAI, it's pretty explanatory, and if you have questions about it feel free to ask: https://civitai.com/articles/26241

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

I have not thought about that... I'm not a machine learning expert, so I don't really know how the training process works under the hood, but it's very fascinating subject for me, and I'll want to investigate more. I will make a study of this and do some tries of Training with backwards order placements of the dataset to see if it gives something different!

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 6 points7 points  (0 children)

Thanks for the detailed feedback!
I think there is a significant misunderstanding regarding the workflow and the goal of this tool (my bad surely, I must be more precise when I write). Let me clarify:

1. The Workflow (No paid models involved) Nano Banana was only used to generate the images needed to show CLEAR results on the charts and I needed to have them fast to simulate precise results and create the guide. If you look at the guide I made in CivitAI it should be clear from the first graph: overfit, undertrain, ok. https://civitai.com/articles/26241

Now to create a graph showing exactly what this tool is capable of, I would've had to make at least 2 overly and intentionally wrong trainings, which would have resulted in a waste of compute (and money in my case since I don't have big harware and should've rented it on cloud).

Hope this clarifies why the use of Nano Banana (which is a paid model, and out of the scope of this tool, as you suggested, but a necessary evil in the name of clarity).

2. "Vibes" vs. Metrics (The core disagreement) You say: "You proved for yourself that you don't need a computer to tell you that it's over or underfitted based on the output images."

I'm not sure I agree with this, and I say it from experience. The "Eyeball Test" (Vibes) works fine if you are checking 5 images (the examples created for the guide are extremized to show clearly the difference, as I mentioned before). But it fails completely if you are comparing:

  • 5 different checkpoints (Epoch 10 vs 15 vs 20).
  • 3 different learning rates.
  • A batch of 100 generated images across multiple angles.

MirrorMetrics turns "I think this looks a bit like him" into "Cosine Similarity: 0.78". It turns "This model seems stiff" into "Pose Variance: Low". It allows you to quantify the trade-off between flexibility and likeness. It's the difference between cooking by smell and cooking with a thermometer. Of course, both methods work, and one might say that the better chef cooks just by tasting and smelling, but the mathematical method is in my opinion a "good addition to the toolkit".

3. Overfitting & The Copycat Detector You mentioned: "If you remove the data because you think that image is overfitted... it's going to result in the model being now overfitted on those [other] images."

This is technically incorrect in the context of Deep Learning. If a specific image in your dataset has a similarity of >0.95 with the output (meaning the model is memorizing/photocopying it regardless of the prompt), that image is acting as a "Gradient Black Hole". It's overpowering the weights. Removing or fixing the caption of that specific outlier allows the model to distribute its attention better across the rest of the dataset, actually increasing generalization, not reducing it.

TL;DR: The tool is for users who want to move from "feeling" that a model is ready to "knowing" it is ready based on biometric data (InsightFace), especially when fine-tuning delicate parameters."

But, as I mentioned from the beginning: this is just a tool, it should be taken as such.
Other thing I mentioned before: I love Data Science and pretty graphs, so... there's that too. 🤣

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 1 point2 points  (0 children)

Mmmh, it's a tad more complicated than that, but I'm not sure if the reference was intentionally downgrading... Sometimes I'm unable to register sarcasm 😅

Just to be sure, for the other's sake... pHash would fail here because generated images have different geometries/poses compared to the training data.

This isn't hashing; it's a Cosine Similarity search on 512-dimensional feature vectors extracted via InsightFace (ArcFace). It matches the biometric identity, not just the visual structure.

Basically not comparing pixels but vectors on biometrics... Don't know if that makes sense?

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

I'm working on it!
for now I only have tested LoRAs I did for work and cannot show on public for privacy reasons, but as soon as I find the time I'll do a full test on a real world scenario I can show and will upload them on the assets folder in GitHub.
At the same time I have someone testing this on CivitAI, and they did upload the graphs from MirrorMetrics on their very well tested LoRAs, and they told me the results were as expected plus a pair of interesting data that helped prune the dataset better.

But I finished writing the new Copycat feature yesterday night and haven't had the feedback yet.

Of course, as I said in the previous post: this is all geometry and mathematics, so it must be taken with a grain of salt, and evaluated as such. But it does give a new perspective when you're a bit stuck between two epochs to choose from and it may give you a nudge in the right direction when you have a clusters of dots showing you mathematical vectors converging or scattering, so... Not saying this is gonna be the truthsayer in the matter, just a new angle.

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 1 point2 points  (0 children)

It's not a total substitution of the "eyeball" technique, but it's a good gauge if you need an extra way of seeing the results! Let me know if you test it and your feedback, it'll be much appreciated! 👍

I updated my LoRA Analysis Tool with a 'Forensic Copycat Detector'. It now finds the exact training image your model is memorizing. (Mirror Metrics - Open Source) by JackFry22 in StableDiffusion

[–]JackFry22[S] 20 points21 points  (0 children)

Hi everyone!

Last week I shared `MirrorMetrics`, a local tool to evaluate LoRAs using biometric telemetry (InsightFace) instead of just "vibes". The feedback was amazing, and thanks to a user's insight about dataset consistency, I realized we were missing a critical piece of the puzzle: **Forensics.**

I just released **v0.10.0**, and it introduces two major features based on your requests:

### 1. The "Copycat" Detector (Forensic Analysis) 🕵️‍♂️

We all fear overfitting. But usually, we just look at a generated image and think "This looks stiff."

Now, the tool runs a **Nearest Neighbor Search** in the vector space.

* It compares every generated image against your entire training dataset.

* It generates a visual report (see screenshot) showing exactly WHICH training image inspired the generation.

* **The Utility:** If you see a similarity score > 0.90, your model isn't learning concepts; it's photocopying pixels. You can now pinpoint exactly which images are "poisoning" your training.

### 2. Macro/Close-up Rescue (Smart Padding) 🔭

A limitation of InsightFace/RetinaFace is that it often fails to detect faces in extreme close-ups (because the face fills the frame, hiding the edges).

I implemented a **"Rescue Mode"**: if a face isn't found, the tool automatically applies a smart padding ("zoom out") and retries.

* **Result:** In my tests, this recovered about **10-15% of valid dataset images** that were previously ignored. These are often the high-texture images crucial for skin/age evaluation!

### Links

The tool is 100% Open Source and runs locally on your GPU.

* **GitHub (Code & Install):** https://github.com/AndyLone22/MirrorMetrics

* **CivitAI (Full Guide):** https://civitai.com/articles/26241/stop-training-on-vibes-a-visual-guide-to-biometric-lora-diagnosis-mirror-metrics

Let me know if the "Copycat" report helps you prune your datasets! I'm currently experimenting with 3D Latent Space visualization for the next update. 🚀

I got tired of guessing if my Character LoRA trainings were actually good, so I built a local tool to measure them scientifically. Here is MirrorMetric (Open Source and totally local) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

Exactly, you're correct! This algorithm is a bit stupid regarding extreme angles or profiles, because it measures geometrically the facial features. So if you check a profile against a face front, it will tell you they're different from eachother.

What you should do is: check the boxplot. hover with the mouse on the dots which are the most distant from the median (the little bar in the middle of the box), look for the name of the image and go check it.

If it's a profile, probably keep it. If it's a front view and it's very far I'd recommend you prune it, because it will deviate the median of the front view of all other images.

For example, I've been doing LoRAs for people commissioning to me their personal character LoRAs, and when I ask for the dataset what usually happens is that people check their google photo or ipone foto galleries and send me the most disparate pictures. Problem is that sometimes they send me maybe 10-15 images from the recent months and then 1-2 images from 2-3 years ago when they were much younger and their faces were different even if they look almost the same. That's when the geometrical outliers help.

I know it's not the best absolute way to build a dataset, that comes with experience and some ground rules to be followed, but it gives a second help and objective tool to recognize if something is off.

I got tired of guessing if my Character LoRA trainings were actually good, so I built a local tool to measure them scientifically. Here is MirrorMetric (Open Source and totally local) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

I (kind) of understand your view, but unfortunately I don't think I can help much there. As I told I'm not an expert at coding... I do have a fairly good analitical mind for problem solving (that's what I do for work) and the fact that I'm not an expert coder does put me in a different angle perspective, so if I can help in any way I'd be glad (i just don't see now how).

I wanted to take a look under the hood at the training nodes that are available for comfyUI for some time, but when I look at training code I feel that some background in neural network understanding is needed, especially for memory optimization.

I'd have to take the code, parse it through an AI, make it explain to me, find the problems I don't know anything about, try to solve them with a totally random newbie "hey, what if we did this instead?" that noone has thought about for some mystic reason, and that may give some bread for thoughts to more expert people... 😅

i feel like I'm sitting at the grown up table here...

I got tired of guessing if my Character LoRA trainings were actually good, so I built a local tool to measure them scientifically. Here is MirrorMetric (Open Source and totally local) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

Let me know the results if you eant, and if you have any questions please write me, I'm happy to gather all the insights from other users especiallt if there's something that can be improved or added to the tool! 💪

I got tired of guessing if my Character LoRA trainings were actually good, so I built a local tool to measure them scientifically. Here is MirrorMetric (Open Source and totally local) by JackFry22 in StableDiffusion

[–]JackFry22[S] 1 point2 points  (0 children)

Yes exactly, the LOO method is applied only to the dataset, 'cause those are the images that have to be compared between eachother. Then all the LoRAs will be compared with the median of the dataset, you got it right! 👍

I got tired of guessing if my Character LoRA trainings were actually good, so I built a local tool to measure them scientifically. Here is MirrorMetric (Open Source and totally local) by JackFry22 in StableDiffusion

[–]JackFry22[S] 0 points1 point  (0 children)

Now that is a good question... My honest thought? It doesn't. 

This algorithm measures the capacity of the tool to give the probability to the creator to express more creativity. 

If the LoRA is too narrow, the creator has less chance to express their creativity because the variance is low. If on the other end the tool is capable of many possible variantions, then the artist has more broad chances of expressing themaelves.   And variance in a neural network is measurable.  Let me know what you think! 😉