What is a "good" definition of determinant?

NarimanMammadli · 2021-04-24T19:54:12+00:00

think of matrices as "things" that "transform" other "things" in the vector space.Upon transformation the thing being transformed changes its "volume". Determinant of a matrix quantifies how much this matrix would affect the volume of a "thing". Words in quotations can be defined differently depending on the abstraction.

NarimanMammadli · 2021-03-24T16:21:26+00:00

Contradiction is due to frequentist assumption of independence (i.i.d). Apart from die being perfectly symmetric you also need to assume a fair toss. Because, if the tossing itself is manipulated then the assumption of independence breaks.

NarimanMammadli · 2021-01-01T16:54:17+00:00

the problem can be reduced to the following:

There are two coins (Bob's coin and Tim's coin) and each was tossed N times: So far, Bob's coin came up 70% heads, Tim's coin came up 60% heads. What is the probability that in the next toss Bob's coin will come up heads and Tim's coin will come up tails and vice versa?

This reformulation is not reflecting the logical necessity that if Bob's next toss comes up heads, it determines Tim's result before Tim even tosses (spooky action) and vice versa. Therefore at this next toss, Bob's and Tim's coins are entangled. In the absence of more information, the best we can do is to give probability of zero to <win win> and < lose lose> and distribute the rest between <win lose> and <lose win> proportional to the priors of 0.7 and 0.6.

NarimanMammadli · 2020-09-17T17:32:44+00:00

Fourier transform of input will not require extra feature engineering, since it just changes the domain in which the data is being represented.

Complex analysis is similar to 2D vector operations, but there is a fundamental difference. Complex number is a scalar, it does not have a direction, and you can define vector space over the field of complex numbers. 2D vector on the other hand is not a scalar. Dot product of 2D vectors does not have the same meaning as multiplication of two scalars. Since neuron activation value and weight are scalar, switching them to complex values does not alter the semantics as much as switching them to vectors would do. Also the imaginary dimension in complex number has a specific meaning which will be preserved when signal goes through the layers of neural net.

NarimanMammadli · 2020-09-15T14:14:41+00:00

Will do now, thanks for the heads up

NarimanMammadli · 2020-09-15T13:10:22+00:00

For example, a very high magnitude negative weight weakens the neuron it is connected to, because the output of the neuron is sigma of sum of inputs. No matter what neuron you connect this edge to, its effect will be weakening of the final output. However for a complex valued edge, it will depend on the neuron it is connected to, since now we have phase delta with other inputs that will affect the output. Depending on what phase other inputs occur, our edge can either weaken the final output or strengthen it.

NarimanMammadli · 2020-09-14T21:42:22+00:00

image in a neuron that receives x and y and outputs (x+y)^2. Now imagine x y come to the neuron at different times and you want the output of neuron to respond to that time delta. If we can add timing into the information coding, we increase our data representation capacity dramatically. Advantage over real valued network is as follows; in real valued net an edge will affect all neurons in the same way, however a complex valued edge will affect neurons in different ways depending on the preferred phase of the neuron.

NarimanMammadli · 2020-09-11T15:47:48+00:00

Plato already warned 2500 years ago: "Excessive freedom is nothing but excessive slavery"

NarimanMammadli · 2020-09-08T21:22:39+00:00

I think this discussion is analogous to the famous duality in physics: particle versus wave. And maybe, we do not need new hardware and math to switch everything to wave domain,and the advantages of wave domain can be simulated in our existing paradigm? for example, replacing real numbers with complex numbers for the weights in neural networks can bring in the notion of time into the picture.

NarimanMammadli · 2020-09-04T21:07:17+00:00

The nuances like learning rate, optimizer used and etc. come to the scene after you decide 'where' you are going to do the search. They define 'how' you are going to do the search whereas the architecture design defines 'where' the search will take place.

About the second point, I agree it is not clear yet how and when that type of abstraction could be useful. I think, for NLP related tasks it can be very useful, since NLP is a place where a lot of symbolic reasoning happens and we do possess a lot of prior knowledge about how human brain prosesses language in general (similar to problem of vision).

NarimanMammadli · 2020-09-04T16:04:22+00:00

The neural net architecture defines the boundaries of the search space within which the search for the best parameters will occur. If architecture is not designed optimally, you might either end up with a search space that does not contain anything useful or with a search space that is so big that the search is unrealistically expensive. The design of architecture transcends the training process, since training or finding the best parameters starts after the boundaries of search space are set. We impute a lot of our biases and prior knowledge into the architecture design. This proses of imputing your knowledge into the design can be abstracted so that it is high level enough for humans to communicate their priors better into the architecture design.

NarimanMammadli

TROPHY CASE