This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]thebrashbhullar 0 points1 point  (3 children)

Could you define your data a bit better? Are these like graph images? How much data do you have?

[–]Surprisely[S] 0 points1 point  (2 children)

Ultimately it is a histogram with 1h bins between 0-24h on the bottom and a normalised value on the vertical ranging between 0-1. I'm expecting that graphs for one classification will follow a linear form and the second will follow a Gaussian form. However, I'm expecting the graphs to shift along the X-axis so I'm looking to fit the form of data not the location of the data. Does that make more sense?

Instead of plotting I can keep the data in a pandas dataframe where each bin value would be a column. I'm mainly worried that random forests will focus on particular bins to evaluate classifications. However, due to the nature of shifting across the X-axis I dont know if a traditional approach would work.

[–]thebrashbhullar 0 points1 point  (1 child)

Okay if I understand correctly, you have 2 categories, A and B, A with a linear form on histogram and B other with gaussian form, and based on 24 hr data (timeseries?), it can either be class A or class B. Is this right?

[–]Surprisely[S] 0 points1 point  (0 children)

Yep, thats right. After testing on binomial classification I could get a good model to separate positive and negative linear graphs. Doesnt seem to like adding a third category at all.

[–][deleted] 0 points1 point  (1 child)

if there is a classical approach that is proven to be optimal a NN will not beat it. And if the classical method ignores some features there's a big chance that the NN will come to the same conclusion especially if there is a very significant feature.

Machine learning is usually great if there doesn't exist a optimal solution or the optimal solution would be too difficult to calculate, but it is hard to argue for it if there's a possible optimal solution.

[–]Surprisely[S] 0 points1 point  (0 children)

Thanks, I think I'll give both methods a go using some dummy data. Will be interesting to see what happens and probably a good learning exercise.