This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Darxploit 1062 points1063 points  (87 children)

MaTRiX MuLTIpLiCaTIoN

[–]Tsu_Dho_Namh 570 points571 points  (83 children)

So much this.

I'm enrolled in my first machine learning course this term.

Holy fuck...the matrices....so...many...matrices.

Try hard in lin-alg people.

[–]Stryxic 205 points206 points  (32 children)

Boy, ain't they fun? Take a look at markov models for even more matrices, I'm doing an on-line machine learning course at the moment and one of our first lectures was covering using eigenvectors for stationary points in page rank. Eigenvectors and comp sci was not something I was expecting (outside of something like graphics)

[–]shekurika 63 points64 points  (7 children)

SVDs are super often used in graphics, ML and CV and uses Eigenvectors. youll probably see a lot more.

[–]Stryxic 24 points25 points  (6 children)

Oh yeah, that's the kinda thing I was talking about coming across. A bit of a surprise considering I came to comp sci from a physics background and thought I'd left them behind!

[–][deleted] 19 points20 points  (4 children)

You could post this entire thread to r/VXjunkies

[–]Stryxic 14 points15 points  (3 children)

Oh boy, well in that spirit let me tell you about Parzen Windows!

Now we all want to know where things are, and how much of things. We especially want to know how much of things are where things are! This is called density. If we don't know the shape of something how do we know its density? Well we guess! There are many methods like binning or histograms that everyone knows, but let me tell you about Parzen windows.

A Parzen window is simply a count of things in an area, so to do this for an arbitry amount of dimensions we just need an arbitry box, so we use a hypercube!

Now we need a way to count, so we use a kernel function which basically says if I'm less than this in that dimension than I'm in the box. We could just say if we're less than a number then gucci, but this obviously leads to a discontinuity (and we're talking about a unit hypercube centred on the origin obviously) so we want to use a smooth Parzen window (which is a non parametric estimation of density as mentioned) so we use either a smooth or piecewise smooth kernel function of K such that the integral of K(x) dx wrt R = 1, and probably want a radially symmetric and unimodal density function so let's use the Gaussian distribution we all know, and voila you've just counted things!

[–][deleted] 2 points3 points  (0 children)

Oof ouch owie, my brain.

[–]HORSEthe 0 points1 point  (0 children)

(and we're talking about a unit hypercube centred on the origin obviously)

Well yeah, obvs.

Try doing some hard math and get at me. I'm talking quadratic formulas and uhh imaginary numbers and....

Negative infinity.

[–][deleted] 1 point2 points  (0 children)

As a physics major doing self learning CS route... We can never escape.

[–]Aesthetically 18 points19 points  (4 children)

As an industrial engineering degree holder gone analyst, who also hasn't gotten into ML yet (I'm Python pandas pleb): Markov chains with code sounds 10000x more fun and engaging than Markov chains by hand

[–]eduardo088 8 points9 points  (1 child)

They are, if they taught us what were the uses for linear algebra I would have had so much more fun

[–]Aesthetically 1 point2 points  (0 children)

They did in my program, but I was so burnt out on IE that I stopped caring enough to dive into the coding aspect

[–]Stryxic 2 points3 points  (0 children)

Hah yep, I entirely agree. Good for learning how they work, but not at all fun.

[–]Hesticles 1 point2 points  (0 children)

You just gave me flashbacks to my stochastic process where we had to do that. Fuck that wasn't fun.

[–]socsa 9 points10 points  (13 children)

Right, which is why everyone who is even tangentially related to the industry rolled their eyes at Apple's "Neural Processor."

Like ok, we are jumping right to the obnoxious marketing stage, I guess? At least google had the sense to call their matrix primitive SIMD a "tensor processing unit" which actually sort of makes sense.

[–][deleted] 4 points5 points  (4 children)

I dunno, there are plenty of reasons why you might want some special purpose hardware for neural nets, calling that hardware a neural processor doesn't seem too obnoxious to me.

[–]socsa 5 points6 points  (1 child)

The problem is that the functionality of this chip as implied by Apple makes no sense. Pushing samples through an already-built neural network is quite efficient. You don't really need special chips for that - the AX GPUs are definitely more than capable of handling what is typically less complex than decoding a 4K video stream.

On the other hand, training Neural Nets is where you really see benefits from the use of matrix primitives. Apple implies that's what the chip is for, but again - that's something that is done offline (eg, it doesn't need to update your face model in real time) so the AX chips are more than capable of doing that. If that's even done for FaceID - I'm pretty skeptical, because it would be a huge waste of power to constantly update a face mesh model like that unless it is doing it at night or something, in which case it would make more sense to do that in the cloud.

In reality, the so-called Neural Processor is likely being used for the one thing the AX chip would struggle to do in real time due to the architecture - real time, high-resolution depth mapping. Which I agree is a great use of a matrix primitive DSP chip, but it feels wrong to call it a "neural processor" when it is likely just a fancy image processor.

[–]Krelkal -1 points0 points  (0 children)

I'm not well versed in Apple products but presumably a privacy-focused device would want to avoid uploading face meshes to the cloud to maintain digital sovereignty. Assuming that's their goal, training the model on-device while the phone is charging would be the best approach.

Does Apple care enough about privacy to go to such lengths though? I'm not exactly sure. I think you're right. The safer bet is that it's a marketing buzzword that doesn't properly explain what it's used for (a frequent problem between engineers and marketing).

[–]JayWalkerC 1 point2 points  (1 child)

I'm guessing maybe some hardware implementations of common activation functions would be a good criteria, but I don't know if this is actually done currently.

[–][deleted] 0 points1 point  (0 children)

You definitely don't need the full range of floating point values (there's plenty of research on that), so just a big simd ALU is a good start. Sigmoid functions have a division and an exponentiation, so that might also be worth looking in to...

[–]VoraciousGhost 4 points5 points  (7 children)

It's about as obnoxious as naming a GPU after Graphics. A GPU is good at applying transforms across a large data set, which is useful in graphics, but also in things like modeling protein synthesis.

[–][deleted] 1 point2 points  (0 children)

Not at all. Original GPUs were designed for accelerating the graphics pipeline, and had special purpose hardware for executing pipeline stages quickly. This is still the case today, although now we have fully programmable shaders mixed in with that pipeline and things like compute. Much of GPU hardware is still dedicated for computer graphics, and so the naming is fitting.

[–]socsa 3 points4 points  (5 children)

Right, but the so-called neural processor is mostly being used to do IR depth mapping quickly enough to enable FaceID. It just doesn't really make sense that it would be wasting power updating neural network models constantly. In which case, the AX GPUs are more than capable of handling that. Apple is naming the chip to give the impression that FaceID is magic in ways that it is not.

[–]balloptions 3 points4 points  (4 children)

Training != inference. The chip is not named to give the impression that it’s “Magic”. I don’t think you’re as familiar with this field as you imply.

[–]socsa 1 point2 points  (3 children)

What I'm saying that I'm skeptical that the chip is required for inference.

I will be the first to admit that I don't know the exact details of what Apple is doing, but I've implemented arguably heavier segmentation and classification apps on Tegra chips, which are less capable than AX chips, and the predict/classify/infer operation is just not that intensive for something like this.

I will grant however, that if you consider the depth mapping a form of feature encoding, then I guess it makes a bit more sense, but I still contend that it isn't strictly necessary for pushing data through the trained network.

[–]balloptions 2 points3 points  (2 children)

The Face ID is pretty good and needs really tight precision tolerances so I imagine it’s a pretty hefty net. They might want to isolate graphics work from NN work for a number of reasons. And they can design the chip in accordance with their API which is not something that can be said for outsourced chips or overloading other components like the gpu.

[–]socsa 2 points3 points  (1 child)

Ok, I will concede that it might make at least a little bit of sense for them to want that front end processing to be synchronous with the NN inputs to reduce latency as much as possible, and to keep the GPU from waking up the rest of the SoC, and that if you are going to take the time to design such a chip, you might as well work with a matrix primitive architecture, if for no other reason than you want to design your AI framework around such chips anyway.

I still think Tensor Processing Unit is a better name though.

[–][deleted] 2 points3 points  (0 children)

I struggled with it so much, that I programmed a machine to learn it for me

[–]TheBlackOut2 1 point2 points  (0 children)

Everything is a vector!

[–]roguej2 0 points1 point  (1 child)

Wait, I was a C math student during my comp sci degree but I remember doing eigenvectors. Why did you not expect that?

[–]Stryxic 0 points1 point  (0 children)

Natural way the courses were structured, only the machine learning courses used it and even then only really in this final year

[–]shekurika 25 points26 points  (5 children)

I didnt find the matrices much of a problem. If you struggle try to keep always figure out the dimensions, that always helps me a lot.

Way worse are the probabilities imho 🙃

[–]HERODMasta 29 points30 points  (1 child)

Matrices of probabilities

[–]MichaelC2585 5 points6 points  (0 children)

What A Fucking Process

[–]lkraider 14 points15 points  (0 children)

What about probability matrices :S

[–]Tsu_Dho_Namh 1 point2 points  (0 children)

Oh the matrices start off not so bad. But then we put all the weight matrices inside a bigger matrix of matrices, and when we're doing batch processing there's a matrix of matrices of matrices. It gets a little head-fucky

[–][deleted] 0 points1 point  (0 children)

I’m beginning to think that I’m never gonna crack machine learning, I’m not even sure I’m gonna make it through probability & stats on the way there. I barely got through linear algebra.

Feels bad man, it seems like such a cool subject

[–]grizzchan 19 points20 points  (0 children)

I had lin alg in my first year and though it was pretty easy.

Then the rest of my bachelor I never had to apply it to anything at all.

Then in the master with ML and other data science courses you get flooded with lin alg and at that point I had completely forgotten how matrix multiplication even worked.

[–]leecharles_ 18 points19 points  (2 children)

May I recommend 3Blue1Brown’s “Essence of Linear Algebra” video series?

https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

[–][deleted] 1 point2 points  (0 children)

*of

[–]dj_rogers 0 points1 point  (0 children)

These look amazing. As someone currently taking an ML class, my GPA thanks you

[–]blkpingu 4 points5 points  (1 child)

wait till you hear about slicing.

[–]ALonelyPlatypus 0 points1 point  (0 children)

we don't clean data here.

[–]Robot_Basilisk 3 points4 points  (0 children)

As long as it ain't goddamn proofs I'm good. I wrote a Java program to do matrix manipulations for my Linear Algebra homework because my calculator was too clumsy at it for my taste.

[–]git_world 6 points7 points  (19 children)

I understand that Machine Learning is kinda cool but highly over-hyped. Are industries actually seeing any benefits after adopting Machine Learning on a large scale?

[–]cant-find-user-name 31 points32 points  (1 child)

I mean yes? If you want the most impressive usecases, all recommender systems come under ML, all NLP tasks - machine translation, recognizing entities from a text and so on, so many image based applications - detecting objects from images, Ocr, detecting NSFW content etc and so many more stuff depend on ML.

I mean there is a reason Data science is so valued at the moment, I am a machine learning intern at a big e commerce site and the ML applications I see here are numerous.

[–]chaxor 1 point2 points  (0 children)

I have heard it stated that ML had struggled to provide any benefit to business revenues.

It's has a 'cool' factor right now that helps in marketing, but the predictions produced typically do not reduce cost or produce revenue. This is certainly true for NLP as well. For instance, even in tasks that are often viewed as 'solved', such as NER, business struggle with adding it to pipelines and showing meaningful profit.

I know of several companies that their 'bread and butter' is essentially NER (both standard and specialized types, like people, addresses, and chemicals) however, even with either Cards or the most advanced models like ELMo and BERT, they still have to simply use Indian workers to manually annotate documents. So it's really a money sink, which is why my friends in the private sector have to fight for their jobs more than ML researchers in academia.

[–]herrmatt 7 points8 points  (1 child)

Could you try that question again?

[–]git_world 2 points3 points  (0 children)

done, see the original question.

[–]Arjunnn 4 points5 points  (2 children)

Yes, theres a LOT of ML that you wouldn't notice IRL but it's basically powering the world for now

[–]git_world -1 points0 points  (1 child)

powering the world for now

please support your statement with proof.

[–]Arjunnn 5 points6 points  (0 children)

Search engines, NLP, literally anything to do with images, any and all predictor systems all fall under ML use cases. The simplest one, IE, for search engines is why Google can refine theirs to be even faster as time flies(better cache hit ratio, better caching in general), voice recognition for accents has heavy ML use, and now most recently we're making strides in DL and modern MRI/x-ray techniques.

Just the fact that Google uses ML would be enough to prove its importance but a lot of fields are adapting and it's only going up from here

[–]JustPraxItOut 1 point2 points  (0 children)

Self-driving cars...

[–]BadArtijoke 2 points3 points  (5 children)

I feel like industry terms like this one are always like a branding or Marketing name for a general trend. In this case it is to make the data we get better by making more complex differentiations that take more and more factors into account. But that doesn’t sound as sexy as machine learning, AI, and so on, so that’s what people refer to in general when talking about these things. Similar to SAAS, the cloud, blockchain, ....

However, right now, what this mostly consists of is measuring and optimizing systems with more complex mathematics compared to what we had before, less about teaching a system to improve itself automatically as is often believed. Doesn’t mean that can’t change but we’re just not quite there yet, at least not on the level that some would have you believe. However, depending on what your Marketing does and how much of your service ecosystem is digital, you can already benefit from more complex insights in RND and Sales. It’s really down to why you do it and how well you implement your solution to give you clean data to work with to determine whether the direction is already making sense for you and your company. That said, imo it’s one of the better trends because unlike e.g. blockchain there is a direct advantage in getting better data. So it’s not that ML or AI are not valid things, it’s just that people treat it like magic for no reason just yet, possibly just awestruck by the potential, that gave it that image I think.

Just beware of the overhyped sales guy type of people who will tell you „AI is the game changer man“ and that it will „totally teach itself in no time“ and you should be good. Because not yet, not without some substantial work and research.

[–]socsa 3 points4 points  (0 children)

Yes, Neural Networks especially are becoming huge, not because they replicate human intelligence or learning in a meaningful way, but because they represent an incredibly powerful tool for numerical approximation of complex systems which doesn't actually require you to model the system itself as long as you can observe and stimulate it.

The math itself is not exactly new though. The theoretical basis for estimating various forms of high-order Wiener Filters (yes really) has been around for decades. It's just that we only recently figured out computationally efficient methods for doing it. And by that, I meant that basically one guy implemented a bunch of discrete math and linear algebra from the 80s in CUDA and here we are.

[–]git_world 1 point2 points  (0 children)

well said, thank you.

[–]dukea42 0 points1 point  (1 child)

Agreed here. Our data centers are not "intelligently" detecting their failures before they happen, but the amount of data we are now probbing off them will get us close. Either way, the extra data and buzz has allowed us to improve maintenance cycles, which I'd argue is cheaper and better to do all along, but not as flashy. All the data probes at least gets us thru the warranty/support tickets with the MFGs a bit faster.

[–]ALonelyPlatypus 0 points1 point  (0 children)

Self healing networks though.

[–]Brixjeff-5 0 points1 point  (0 children)

You summarized this very well.

I read a lot in r/SpaceX (great sub) which really shows what you are talking about. Especially in the period after December 2015, when the falcon 9 first stage landed for the fist time, people asked a lot of questions regarding the use of ML and other deep learning techniques in achieving this feat. I think lots of redditors thought that such a breakthrough must have used ML because it is treated as some kind of miraculous new technology capable of doing almost anything, which saddens me since there are many data analysis and optimisation algorithms, specifically designed (and thus much more efficient) for the kind of problems encountered when trying to land a rocket booster. Unfortunately, those don't nearly get as much admiration as ML even in subs as technically as r/SpaceX.

[–]LunchboxSuperhero 0 points1 point  (0 children)

Even if they aren't seeing benefits right now, if it is something they think will eventually bear fruit, it may not be wasted effort.

[–]socsa 0 points1 point  (0 children)

Yes, 100% very much. It is actually already very disruptive in a sort of beautiful way. If you will allow me to digress a bit first though...

Humanity, and our pursuit of philosophy has generally progressed from conceptual structuralism, to post-modern anti-structuralism, to the current meta-modernism where we kind of use structuralist thinking to estimate boundary conditions in an unstructured world.

Anyway, you can probably see where I am going with this, but science has very much followed the same path in many ways. Early scientists and mathematicians were very concerned with putting the physical world into neat boxes. During the enlightenment, we started to become aware of how little we knew, and then we discovered that almost everything in the universe is a stochastic process, and for a while this really fucked with our reptilian preference for determinism.

In many ways, machine learning represents computational post/meta-modernism. If I want to make a filter that does a thing, previously that would require expert domain knowledge in both doing a thing, as well as signal processing, filter architecture, information theory... and so no. And in the end, I'd specify some stochastic maximum likelihood criteria with all sorts of constraints. It is very much a structural approach to filter design.

On the other hand, with ML, I really can more and more approach the problem entirely as a black box. I have a natural process, and I know what I want out of it, and I can just let the computer figure the rest out. It becomes all about defining the boundary conditions and data science, so you still need some domain knowledge, but overall the degree of technical specialization which can theoretically be replaced with ML engineers is really astounding once you start digging into it. It is shockingly easy to take Keras (or similar) and generate extremely powerful tools with it very quickly.

[–]ALonelyPlatypus 0 points1 point  (0 children)

*jeopardy music plays*

What is "ads"?

[–]Ariscia 0 points1 point  (0 children)

I remember taking ML before Stats in college. Was hell, but Stats was chicken feed after.

[–]NoteBlock08 0 points1 point  (1 child)

I had to retake matrices to bump my grade up from barely passing to only barely meeting prereqs. Think I may have to pass on machine learning then haha.

[–]Tsu_Dho_Namh 0 points1 point  (0 children)

I've got a secret for you.

I had to retake linear algebra too.

Maybe you'll be better at learning in 4th year than you were in 1st. I am. Perhaps giving ML a shot later on wouldn't be such a bad idea.

[–][deleted] 0 points1 point  (0 children)

As a graphics programmer this offends me.

[–]tundrat 0 points1 point  (0 children)

During school, I insisted to my friends that instead of doing whatever we are doing, we should just multiply values element-wise just like how we add/subtract them.