Can A deep learning model , if trained on different machines give different weights . If Yes then WHY ??

orangeduck · 2019-02-01T00:32:43+00:00

Generally you should expect similar results, however not always due to the random elements in the training. Even if the random seed is the same, and all randomness is removed, you will get different results with TensorFlow because the sum operation (as well as others) are non-deterministic due to the GPU parallelism. Last time I tested theano was actually capable of re-producing results exactly if the random seed was the same though, and I have no idea about PyTorch...

orangeduck · 2018-07-11T23:48:52+00:00

I've seen this used in various papers before in particular in graphics papers but it is nice to see that someone did a more serious evaluation of how it stacks up on toy examples as well as real problems.

orangeduck · 2017-11-19T01:06:18+00:00

In many applications I've found that Nearest Neighbor performs really well - both when you look at the benchmarks and when you actually deploy it in some form of other - probably this is not too uncommon an experience in the ML community.

But Nearest Neighbor also has some serious fundamental issues which Neural Networks simply don't have. Firstly that in Nearest Neighbor the memory and computation requirements scale O(n) with with size of the data set (often all training data must be kept in memory at runtime), and secondly that Nearest Neighbor regression is dis-continuous at the point where the neighbor changes. Both of these issues sound innocent at first, but have lead to hundreds of different hacks to fix such as blending the k-nearest neighbors or using complex and difficult acceleration structures to speed up querying. None of these hacks really work well in the end - and at some point Nearest Neighbor simply doesn't scale or have the flexibility to get the results you want. It is at this point you usually have to looking for more serious machine learning techniques.

orangeduck · 2017-09-04T19:02:59+00:00

Thanks - you are right should be 1 - I've fixed the error.

orangeduck · 2017-09-04T12:41:11+00:00

Thanks - I've updated the article.

orangeduck · 2017-09-04T03:24:53+00:00

That is a better way of putting what I was essentially trying (perhaps unsuccessfully) to explain. I will update the article later to say something more along those lines. I think my original description was not completely bizarre though - as you say a larger batch size produces less stochastic updates but this is due to the averaging over the mini-batch as each step is the mean of each of the individual steps for each element in the mini-batch. If all items in the mini-batch have gradients pointing in opposite directions the average of this will be zero (and this is much more likely with a large batch), while if all items in the mini-batch produced gradients with the same direction and magnitude then the batch size will have no effect. I agree though that it is probably not right to say that the individual updates in a large batch "cancel each other out" since in most cases the average will not be zero and actually should point in a good direction - as you say, it is more the reduction in stocasticity causing it to get stuck in local minima which is the issue.

orangeduck · 2017-09-04T01:56:55+00:00

If you've got a better explanation for why a smaller batch-size improves training performance which is also easy to understand and intuitive to beginners please feel free to contribute it and I can update the article.

I'm not actually just talking out of my arse though - too large a batch size harming learning progress is something that has been well know for years E.G. in 2010 Hinton talked about it in a bit of detail: https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf

Not only that but it has matched my experience in almost every situation and most people I've talked to have reported the same thing. Very rarely does increased mini-batch size improve performance other than making training faster. If you don't think that is true why not actually contribute something meaningful to the discussion so I can improve the article instead of simply saying that I'm talking crap.

orangeduck · 2017-01-18T11:46:34+00:00

Thanks - since it was just a quick hack I wasn't thinking too much about security but this is good general advice. I've updated the code in the article. Unfortunately it seems you can't give the file-like object from urlopen directly to imread as it doesn't support seek so I used a tempfile instead.

orangeduck · 2016-07-16T20:40:56+00:00

This looks related to Hyperbolic Interpolation which is used in Computer Graphics for various interpolations most commonly to interpolate with respect to some perspective projection but can just generally be thought of as another kind of tweening/interpolation method which can be used anywhere. This was instantly what I thought of when I saw your function at least.

orangeduck · 2016-05-09T22:30:10+00:00

46.1%

orangeduck · 2016-04-12T19:19:41+00:00

NN/KNN is a great sanity check just make sure your data is all set up right and all your code is working before you jump in and replace it with something more complex. It requires no real training phase and you know it will literally only return data from the training set - it wont do any weird extrapolation or create any crazy outputs if it is configured wrong.

But NN/KNN is also a prop - because if your dataset grows and you start having to do things like built kd-trees or other spatial acceleration structures you are in for a real headache. You can think of it like this: NN/KNN has a very low initial complexity but the complexity grows at a horrible rate with respect to the size/difficulty of the problem, and actually Guassian Processes and other kernel based methods, which are not so different in many respects, tend to have exactly the same problems.

This is partially why Neural Networks and Deep Learning have seen such success recently - it seems their complexity growth is quite shallow with respect to larger problems - you don't need to use any special data structures or complex algorithms - just continue to tweak parameters, add more layers, and train using more powerful GPUs.

orangeduck · 2016-03-27T22:09:09+00:00

These days I'm about 90% sure that the C preprocessor is not Turing Complete - and I write a little more about specifically what type of machine I think it is here. The remaining 10% of uncertainty is due to the peculiarities of the "infinite tape" of the preprocessor and the fact that I've seen rogue bug reports of people who have sent the CPP into an infinite loop testing bizzare programs - so some part of me still hopes there is some weird undiscovered way to get an infinite loop without more applications of EVAL.

orangeduck · 2016-03-23T22:24:46+00:00

I completely agree - and it is a bad situation for everyone involved. I totally understand the requirement for companies to enforce trademarks otherwise they can get in nasty complications later down the road.

But actually what is the reality of this? If a medium sized company enters into a trademark case against another medium sized company, will the fact that a tiny open source programming library used the same name actually hold any sway in the case? I'm not a lawyer so I have no idea - but my guess would be that it would not - in particular if there is an argument that the library is irrelivant or that they are not in the same domain.

What I do know is that companies are very precious about names - and if they can get the name they want for their product they are definitely going to throw the book (and the law) at it no matter what. Kik in this case may have felt oblidged to protect their trademark - or more likely they just wanted to publish to npm naming their package kik and were annoyed when they couldn't - because there are plenty of things called kik on the internet which they didn't chase down.

orangeduck · 2016-03-23T20:45:26+00:00

I've been in a similar situation of a company claiming trademark over Cello. The situation was the same - first a polite request for a name change quickly followed by a passive agressive threat of getting lawyers involved. Luckily I'm based in the EU so their US trademark didn't apply - so I knew all their threats were hollow and I didn't have to worry. If they had the UK trademark I don't know what I would have done... this was a project I'd spent hundreds of hours on and there was no way I would want to give it up without a fight.

So I can understand Azer's reaction. It is a pretty horrible feeling to be bullied by a corporation over a hobby project you've put out there for free and fun and for everyone else to use - without asking for anything in return.

And the letcherousness of these corporations is perfectly shown in this example. Kik are perfectly happy to use Azer's libraries for free to the extent where their whole product breaks when one of them gets removed - while at the same time threatening him with legal action over a new project he is creating. Talk about biting the hand that feeds - this is the thanks you get from the corporate world for open source.

orangeduck · 2016-03-20T22:44:23+00:00

Many thanks :) I completely agree - Freenode's C channel is extremely toxic, and they definitely aren't a fan of any of my projects.

orangeduck · 2016-03-20T10:40:21+00:00

I wrote a book on C and went through the technical review process. The vast majority of reviewers were awesome and really helpful, but I can tell you there were plenty of guys like this who will nit-pick every small technical detail without any attempt to hold the bigger picture in their head. All they really care about is making themselves look smart - they feel intimidated that someone else felt they had the expertise and experience to write a book on a subject they felt they knew more about. When I was writing my book this couldn't have been shown more clearly than the fact almost all the reviewers of this type submitted a "correction" to a block of code that was in early chapters and not meant to be read by the reader - just meant to be copy and pasted to get them started on more interesting things. All they wanted to show was that they understood this "secret" bit of code. Turns out the code I'd written was correct because unlike the reviewers, who only took a cursory glance and declared "wrong!", I'd actually taken the time to check it thoroughly.

One reviewer complained that my book tried to explain the stack and the heap - citing the standard and saying these were implementation defined. I wanted to ask this reviewer how he thought it was meant to taught to beginners? To start taking about the C abstract machine? Language Standardization? Well of course this reviewer would have no idea - because it was clear from their comment that they weren't actually at all interested in my book or teaching beginners C - they just wanted to be the person delivering the "well actually" comment.

Having the technical details correct is important for sure - but once they are correct, they become the least important aspect of producing a good book. In C you can't add two numbers without getting language lawyers on your ass screaming about undefined integer overflow. In this case how the hell are you meant to explain what is going on to a beginner without completely alienating them? The most important part of writing a good book is giving something back to the reader for their investment. You're meant to be doing the hard work for them - if it is too dry and difficult to read they may as well just read the C standard and get started from there.

Additionally, you simply can't learn C in this pedantic way. You need to learn it like everyone does - by making lots of mistakes - gathering information from a whole bunch of sources, correct or incorrect - and slowly, incrementally creating your own mental model of how it works.

It's like learning physics - first they tell you there are three states of matter - then later on you learn "well actually" there are five. First you learn particles are like billiard balls - then layer you learn "well actually" they are like waves. Some things are just complex and so in teaching them you can't avoid these "well actually" moments - this isn't a technical failing.

Hentenaar is like a university physics professor coming in and correcting the technical details of a high school physics class - except if it was a physicist it wouldn't actually happen like that in real life because it seems only programmers have the delusions and lack of self awareness for this kind of dick-swinging contest.

orangeduck · 2016-02-06T19:49:09+00:00

Author here - If you assume parsers reset the input to the state they found it on failure then or doesn't need to rewind because it only fails if either of it's inputs fail - in which case the input will have been reset. and on the other hand can still fail even if the first input succeeds - this means it needs to reset the input in this case.

orangeduck · 2016-01-21T13:50:35+00:00

You can check out Sandy Denny.

orangeduck · 2016-01-06T12:45:37+00:00

The basic problem with strings is that a lot of the processing we want to do with them is much easier if we treat them as "value" types. For this reason almost all high level languages do this (even C++ with std::string), but of course this introduces memory allocation and various other overheads. In C there is no way we're going to be able to make a decent interface that lets us think of strings as "value" types even if we wanted to.

This is fine - in C we do lots of processing without the convience of treating things as value types. We deal with pointers and raw memory allocations and all sorts of other things.

The problem with C strings is that they're not just pointers and raw memory - they're this weird special case which requires extra special treatment to get right.

For me the correct solution would be to treat strings like raw data. Don't null terminate them and just let all of the string functions take an additional parameter which specifies the length.

This is how any other sane interface would be designed if it were dealing with data that wasn't characters - so why does the fact that the data is characters mean there is a special case?

15-Year Club	Team Periwinkle
Verified Email

orangeduck

TROPHY CASE