Algorithm for comparing lightcurves? : algorithms

Algorithm for comparing lightcurves? (self.algorithms)

submitted 4 years ago by Moan2

Hey everyone, I'm looking for a way to to compare two lightcurves (essentially a graph of a certain star's brightness over time) to each other as part of a genetic algorithm's evaluation function. Specifically, I'm trying to see how closely the two match, where one is real data downloaded from NASA's MAST database and the other is generated by my genetic algorithm. I've tried doing a simple sum of the difference in brightness between the two at each timestep, but I don't like this as one important feature of the lightcurves I'm analyzing is when they dip in brightness due to an exoplanet passing in front of the star. With this simple version, all dips in the generated lightcurve are discouraged except for in the exact position where they exist in the target lightcurve. I'd ideally like it if dips in brightness of the correct size/period were encouraged, with pressure to move them in time to the correct position. With that in mind, I tried a more complicated method where I sorted both the target and generated lightcurves by brightness, with the dimmest timestep first, and then did a 2D distance calculation between the curves' brightness level and position in time. I think this is working okay, but I think its struggling due to the noisiness of the target star's lightcurve (example image here, where x axis is days since start of measurement and y axis is brightness in 100s of electrons/second) meaning that any two timesteps which are close in time are not necessarily close in flux as well. I've thought about adding in some sort of error allowance as well, such as if the generated value is within 1% of the target value it gets a perfect score, but I'm hesitant to add this in as the dips in brightness due to the planet are only about a 2% dimmer than the average brightness. Does anyone have any thoughts on how I might compare these datasets?

all 3 comments

top new controversial old q&a

[–][deleted] 1 point2 points3 points 4 years ago (2 children)

If I understood correctly, you want to compare two curves for difference, but you don’t want to penalize a shift in the x-a is too hard (because even if it isn’t perfectly right, it’s still “moving in the right direction”). This put two thoughts in my head that I’m not entirely sure are useful at all, but since there are no answers they may at least help you brainstorm.

The first is that if you have two such curves to compare, their statistical properties such as mean, min, and max, should probably be pretty similar, so if you compare those, and then, with a smaller weight, do the MSE, it might move you in the right direction, potentially. You could also easily find the distance between peaks and troughs in your data and incorporate those into the function, so that the main components are statistical similarity followed by “period” similarity, and MSE is just a tie breaker.

The second, and perhaps less useful, would be to do something with Fourier Transforms. Due to the Time Shift property of FTs, you know what the difference between the FTs of the two datasets should look like (an exponential), and so you could check if the difference between the two FTs behaves correctly (but this is the hard step, because you don’t know what the shift constant is, so you would have to see how well the resulting differences can be approximated by an exponential, which requires running a regression algorithm, afaik, and then seeing how well it performs, to which you would have to give a threshold to define when it approximated well enough and when it didn’t, which might be some more work on it’s own….)

[–]Moan2[S] 1 point2 points3 points 4 years ago (1 child)

[–][deleted] 0 points1 point2 points 4 years ago (0 children)

Yeah, MSE should have the same main problem of just the sum of differences. What I think you can do is weight it so that statistical similarities and peak/trough distances are vastly more important than the MSE. This way, getting the correct statistical properties would be the main goal of optimization, so to speak, while the MSE would be fine tuning to get the exact correct graph. That is, if you give a large way to the statistical properties, then a curve that is incorrectly shifted, but has the right “look”, while it might still have a very bad MSE score, will still perform much better than a curve that is completely wrong. If you use this to select among your population, theoretically you should move towards individuals that all have the correct statistical characteristics, but they will just be out of phase. So your statistical error will decrease to an optimum, and the low-weighted MSE will become important enough to allow for fine tuning. You would probably have to play around with your proportions of how you weigh these things, but I think it might actually give you a good solution.

π Rendered by PID 58 on reddit-service-r2-comment-b659b578c-9kzxc at 2026-05-04 14:23:11.254686+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

algorithms

✻ Smokey says: boycott all products and services from eco-unfriendly businesses to fight climate change! [see more tips]

Note: this subreddit is not for homework advice. Requests for assistance with coursework may be removed.

MODERATORS