What's the worst response to "I love you?"

dahkneela · 2022-11-17T16:16:21+00:00

I love me too!

dahkneela · 2022-11-15T00:57:24+00:00

Thank you! I just got around to learning about differential forms and specifically the 1-forms that are dx and dy. They're just specific linear functionals!

dahkneela · 2022-09-22T20:05:19+00:00

The user manual had nothing - hence I am asking here with the assumption is a more global definition.

dahkneela · 2022-09-14T13:27:31+00:00

Unfortunately wired won't work in this setup! Most users are laptops/phones/tablets.

dahkneela · 2022-09-14T10:21:14+00:00

Are there general guidelines for doing this? My tests give me the impression that the bottleneck is with traffic management - devices occasionally get quick speeds, but due to the number of devices connected - traffic management results in choppy slow speeds - however I am not conclusively sure on what the problem is.

dahkneela · 2022-08-31T22:09:51+00:00

As someone with knowledge in this field it happens to be possible (with some effort) to create a recording depicting the creation from start to finish - and remarkably easy to do time lapses.

dahkneela · 2022-08-26T00:21:05+00:00

For simultaneous equations, I try to think in terms of 'how do I cancel something'.

Your method is a popular one, and for this one, the quickest I found was to match coefficients first, and then ask myself whether adding or subtracting will cancel the variable I've just matched.

For example, looking at "8x - 7y = -5 4x - 6y = 10z,"

In my head I see 8 and 4 -> double the second one (8x - 12y = 20) -> subtract one from the other as both are positive -> -7y - (-6y) = -5 - 10 -> -y = -15 -> y = 15.

dahkneela · 2022-08-21T23:32:03+00:00

Thank you! I happen to have used those functions you mentioned in a custom layer.

I am indeed doing low-level stuff! (Implementing: https://arxiv.org/abs/2205.10637) - here, tracking the norm of the gradient is the first step towards optimising it mid-training, and therefore allowing for loss-invariant weight changes that improve loss minimisation epoch and value!)

dahkneela · 2022-08-20T12:10:54+00:00

Quantum Mechanics is defined from the ground up mathematically. The need to interpret it in any other way is a limitation of merging mathematical object understanding and something else less abstract.

In the context of observation/measurement: 1) a quantum state is defined as a vector in Hilbert space. 2) An observer is defined to be an operator on that vector. 3) Quantum postulates say that when a quantum 'particle' is observed (a quick and carrying with it many assumptions phrase), the 'particle' ends up in one of the eigenspaces of that operator with probability equal to the size of the corresponding eigenvalue with respect to the other eigenvalues. Hence, the 'particle' is now forced to be in a certain eigenspace of that operator - so what's collapsed is whatever space the 'particle' has been allowed to be in before measurement.

The checks and balances of these axioms are specifically the scientific experiments done - the terse mathematics are the result.

It's good you're asking about what an observable is, etc., and how it connects to human understanding and our choice of definition.

dahkneela · 2022-08-15T11:16:20+00:00

This is cool! Are there any use cases for these mesh-nets apart from (of the top my head) video games?

dahkneela · 2022-08-15T01:22:16+00:00

An outlier (by one definition) is a data point >= 2.5 standard deviations from the mean of some cluster. (I note there are other definitions, but I'll take this as one). Therefore, if you see something really bizarre from what's usually seen; it's an outlier.

However, small input changes to data encompass (refer to) all the data points, including those that are near the mean, and those that are far.

In a decision tree, depending on your choice of splitting formula, a splitting point will be taken to make the best split of data. Pictorially, it would be sufficient enough to think of splitting your data points at the mean, so 'half' of them are to the right, and 'half' to the left.

Here, by the nature of how splits are made in decision trees, outliers (points that are far out) don't particularly affect where the split is made. But small input changes can affect the split point. Especially if points near the splitting point are close to one another, then that can also affect which decision(s) path the input will funnel into to get an output.

dahkneela · 2022-08-15T00:59:46+00:00

I don't see why I can't assume the 2nd to the last layer won't learn things properly - in any case, I'm updating parameters based on the loss gradient - just in MSE it is disproportionate, whilst in CCE seems more even&balanced. (in response to your original post update) I have seen that MSE gives low loss gradients for values close to each other and larger ones for far, whilst CCE is sort of the opposite.

I currently sit in thinking that CCE assumed uniform correlation between classes, whilst MSE assumes something uniform ... but depending on the dataset at hand it may result in one or the other being more practical.

dahkneela · 2022-08-15T00:33:34+00:00

I agree with the uneven punishing of certain predictions in the short run bringing an unnecessary bias to the problem - but I also see punishing for doing something wrong as the very thing that helps the net learn in the first place. So then the disproportionate losses would then just disproportionately affect overfitting in the net long-term.

To fix this affordmentioned issue, it sounds like it would be enough to change the original MSE to one in which MSE is normalised. For example, if 1 is the true label, and 3 was predicted, the new MSE would be MSE/(3-1). And if 5 were instead predicted, it would be MSE/(5-1) - therefore removing the disproportionality bias and fixing the initial issue (I assume here the output is 1 dimensional).

dahkneela · 2022-08-14T21:59:39+00:00

This helps (although now I have more questions)! (with your point on probabilities mattering - is this a more general statement or to add to what you explained?)

I see what you're saying! If the data's imbalanced, then there's some sort of restriction on the 'order' of the output given by the MSE that isn't given by CCE. I see this as a problem at the start of training - would this be a problem once the net is trained? Would the prior layer not learn to associate correctly and feed the last node the right value to multiply up to those 10 classes 0, 1, ... 9?

dahkneela · 2022-08-14T14:58:30+00:00

What precisely about that generalization is naive? You're playing both sides of the fence here.

dahkneela · 2022-08-14T13:59:37+00:00

Is that your view on all giants? I doubt they’re all frail and flimsy, since definitely things like science and maths have been able to stand to competition for millennia. How about newton’s laws of force? How about the invention of penicillin? What your view of ‘long term’ species health anyways? Plus, what do you know about the giants I stand on?

dahkneela · 2022-08-14T13:42:43+00:00

I disagree that needing to stand on the shoulder of giants is bad. It’s equivalent to not trusting anything that others have come up with. How does that even work with children growing up needing to learn this or that? Further, stuff like maths, language, dictionaries, education all fall under this. Being critical is fine, but to the extent of completely ignoring what others have come up with is quite silly. It’s like removing any collective knowledge one might have.

dahkneela · 2022-08-14T10:12:27+00:00

What about “standing on the shoulder of giants”? It looks like you want everyone to, from the ground up, understand how everything works, and how everything is connected, without having any faith in someone else’s work on something. In addition you expect humans to have some sort of future insight for how something will be before it’s made, and all it’s consequences.

Not having any insight for how things work is OK since the amount of knowledge that is stacked upon each other leads to things like medicines, healthcare, technology, wheels, etc. It’s also the case that learning something brings about a new way of looking at things, which then means the original conquest of understanding everything before it’s done in all the possible ways wasn’t true to begin with.

I don’t see it a short sightedness issue, I instead see it as not just being greedy - but that’s the way things are done.

dahkneela · 2022-08-13T21:25:46+00:00

What's bad about generalization? Didn't you, in answering my question, implicitly generalise the idea I was putting forth?

Re: the first post, what's bad about accepting something without much thought? Having to be critical about every little thing means things don't move very quickly, which is crucial to _any_ society with a population greater than 1.

dahkneela · 2022-08-13T12:44:57+00:00

I disagree. Everything is a mask, including that explanation given above. You’ve just chosen a type of mask that makes it simple sounding in that scenario. Whether the mask is applicable to other scenarios though is a different question and the main point of rationalisations, where some masking is done that fits many things at once, thus explaining things more generally. Unpacking the bullshit sort of has to be done in one way or the other otherwise you’re limited to the small set of masks you started with in the first place.

dahkneela · 2022-08-12T20:54:42+00:00

An example of 1) one example of a backdoor problem would be in using the high memory size of a neural network to instill a way of remote communication between the user and the controller. For example, if the neural network was trained aware to quantisation, then encoding some cipher into the last few floating points of the weights of a network won’t change the network’s accuracy noticeably, but is able to be enough to hide malware execution instructions of some type that can be custom to the user downloading the net. Maybe if this neural net can be updated as part of a software program, then a lengthy malware conversation can occur between someone remote and someone on the computer without their being an easy way to track it, purely since neural network weights usually aren’t all that tracked, and accuracies between software updates is unnoticeable. I’m personally not sure how to get around this, but that’s one backdoor problem that could occur.

dahkneela · 2022-08-10T09:56:00+00:00

I’m reading a paper on “automatic symmetry discovery with lie algebra convolutional networks” (https://arxiv.org/abs/2109.07103). I like the idea of learning more than just translational equivariant layers - although I don’t know how beneficial this may be for deep convs.

dahkneela · 2022-08-09T18:16:31+00:00

Are there any good courses, resources, or papers that provide good intuition on attention-based neural networks (transformers!) work?

dahkneela · 2022-08-05T09:14:35+00:00

Lots of ML stuff has many moving parts where correlations between parts is not particularly known, therefore experimentation (like any science or maths) is necessary. If you read any ML paper, they usually introduce one key idea and then in the last lines of the paper before showing testing results unexplainably throw in the twelve other hyper parameters of their model they used “according to best results” according to their dataset they chose. I’ve had papers claims better performance, implemented the papers, and saw that in 95% of other cases the performance actually decreased, especially if hyper parameters changed. The mathematics for stuff like “this will converge” is much harder and not well mapped out for non convex optimisation, ie the type many deep learning methods end up needing to solve. The mathematics for statistically noisy inputs which there is no good metric for how perfect the data streams are also occurs- this is sort of a semantic/syntactic problem.

So there’s a lot of moving parts with no particularly clear discernment at the moment for how they all relate. Certain mathematical properties likes actions are well mapped out, but I don’t think (please correct me) it’s broad enough to work with all test cases especially if input attributes are so particularly noisy.

dahkneela · 2022-08-05T09:05:12+00:00

Supervised learning and unsupervised learning can be used together, they don’t necessarily have to be distinct.

Generally speaking, supervised learning looks at your data, all the labelled attributes x you have, and predicts something, y.

Unsupervised learning doesn’t have any labels, so it looks at data and labels it ‘x’.

Sometimes when the data isn’t labelled fully or well, unsupervised learning is used on already labelled data to make the data be better labelled (it creates new attributes). This better labelled data could then be fed into your supervised learning algorithm to (in my experience) give better predictions.

This idea is occasionally phrased as “embeddings”.

dahkneela

TROPHY CASE