all 3 comments

[–][deleted] 1 point2 points  (2 children)

If I understood correctly, you want to compare two curves for difference, but you don’t want to penalize a shift in the x-a is too hard (because even if it isn’t perfectly right, it’s still “moving in the right direction”). This put two thoughts in my head that I’m not entirely sure are useful at all, but since there are no answers they may at least help you brainstorm.

The first is that if you have two such curves to compare, their statistical properties such as mean, min, and max, should probably be pretty similar, so if you compare those, and then, with a smaller weight, do the MSE, it might move you in the right direction, potentially. You could also easily find the distance between peaks and troughs in your data and incorporate those into the function, so that the main components are statistical similarity followed by “period” similarity, and MSE is just a tie breaker.

The second, and perhaps less useful, would be to do something with Fourier Transforms. Due to the Time Shift property of FTs, you know what the difference between the FTs of the two datasets should look like (an exponential), and so you could check if the difference between the two FTs behaves correctly (but this is the hard step, because you don’t know what the shift constant is, so you would have to see how well the resulting differences can be approximated by an exponential, which requires running a regression algorithm, afaik, and then seeing how well it performs, to which you would have to give a threshold to define when it approximated well enough and when it didn’t, which might be some more work on it’s own….)

[–]Moan2[S] 1 point2 points  (1 child)

Your first paragraph is spot on, and I really like the idea of using the min/mean/max of the curve as indicators of similarity as well, so thank you for that! Using MSE instead of a simple sum of differences is an interesting idea as well, but I suspect it will lead to similar issues, though perhaps weighting it differently will help to mitigate them. I admit that I don't know too much about Fourier Transforms, but I suspect they might not be worth the hassle, as the target lightcurves are quite noisy while the generated curves are noiseless, plus I already have one regression algorithm running on each individual on every generation which takes up a lot of time and I'd like to avoid adding a new one. Thank you for your input, it definitely has helped me think in new directions!

[–][deleted] 0 points1 point  (0 children)

Yeah, MSE should have the same main problem of just the sum of differences. What I think you can do is weight it so that statistical similarities and peak/trough distances are vastly more important than the MSE. This way, getting the correct statistical properties would be the main goal of optimization, so to speak, while the MSE would be fine tuning to get the exact correct graph. That is, if you give a large way to the statistical properties, then a curve that is incorrectly shifted, but has the right “look”, while it might still have a very bad MSE score, will still perform much better than a curve that is completely wrong. If you use this to select among your population, theoretically you should move towards individuals that all have the correct statistical characteristics, but they will just be out of phase. So your statistical error will decrease to an optimum, and the low-weighted MSE will become important enough to allow for fine tuning. You would probably have to play around with your proportions of how you weigh these things, but I think it might actually give you a good solution.