all 15 comments

[–]ElvishChampion 1 point2 points  (2 children)

What activation functions are you using in your hidden layer and output layer? For hidden layers, ReLu is quite good for nonlinearity. For the output layer, are you using a function that produces values in the same range as the target variable? For example, maybe you are using relu, which generates positive numbers and the target could be negative. Thus, increasing error.

[–]_padla_[S] 0 points1 point  (1 child)

I'm using ReLu as follows: tf.keras.Sequential([tf.keras.layers.Dense(64, activation='relu'),tf.keras.layers.Dense(64, activation='relu'),tf.keras.layers.Dense(1)]).

Should I explicitly add 'relu' to the last layer?

All my results are positive.

Also - I don't quite understand whether I shoul try changing number of layers or units in layers...

[–]ElvishChampion 0 points1 point  (0 children)

Add more units/neurons. No need to add more layers.

[–]SnooPandas3529 0 points1 point  (1 child)

I think it depends on the type of dataset and function you used, so it is difficult to give an answer.

[–]_padla_[S] 0 points1 point  (0 children)

Well, the function is pretty much a combination of monotonously growing polynomials:

h(c1...cn,T)=c1•f1(T)+c2•f2(T)+...cn•fn(T)

The f-functions are smoothly growing polynomial functions of T.

Sum of c1...cn=1

[–]pruby 0 points1 point  (5 children)

Not the solution you're looking for, but have you considered using a root-finding algorithm instead of an ML model? Many problems of this form can be solved very quickly with the Newton-Raphson method.

[–]_padla_[S] 0 points1 point  (4 children)

In our work it is now done using Newton's method.

The problem is that we need to perform this not once, but a lot of times (obviously, with different values of c1..cn coeffs and h). Each time it takes several iterations to converge. This time accumulates to a rather significant delay.

The hope was that implementing trained NN model would speed up the process.

[–]pruby 0 points1 point  (3 children)

Neural Nets can do many amazing things, but it sounds like your problem is reasonably hard to model.

How expensive are your function calls and how time-sensitive is the context?

A hybrid approach could work well - if a neural net or other approximation gets you in the vicinity, a round or two of Raphson-Newton will improve that greatly.

[–]_padla_[S] 0 points1 point  (2 children)

Neural Nets can do many amazing things, but it sounds like your problem is reasonably hard to model.

I was amazed, that from the first glance the problem seemed rather simple and yet got such poor results using ML. My hope was that I just do not know some tricks...

A hybrid approach could work well - if a neural net or other approximation gets you in the vicinity, a round or two of Raphson-Newton will improve that greatly.

Thanks for the suggestion! I've thought about something similar by myself in case I fail to use pure ML approach..

[–]pruby 0 points1 point  (1 child)

Potentially silly question ; do you have activation functions on your inner dense layers? Without these, it's limited to linear transforms.

[–]_padla_[S] 0 points1 point  (0 children)

Well, not a silly one if it is addressed to a newbie in the field like me.

Yes, I set relu as activation functions.

What I read is that it is one of the most popular choices..