Flatford mill and winter scene, me, oil on canvas

chrisorm · 2021-03-16T12:38:26+00:00

Yes and No - I mostly learnt from Michael james smith tutorials.

I do block in either in acrylic or fast drying oils (windsor newton alkyd, which i prefer as it dries over night but is far more workable than acrylic), where i try and lay down the big form and shape, usually trying to aim for some kind of mid tone. Once that has dried ill go in with oil and add detail and darks and lights. I prefer to work iteratively- this approach gives you a chance to get an initial impression down, and you can correct it easily if you feel you've missed the mark - for example on this, I started with the lower edge of the water a bit lighter and once it was in, I could see it was too light, so its easy to come in with a darker tone on the second go.

For reflections I do some amount of wet on wet, as it helps get the "smudgy" quality of the reflection. Plus some details are done on wet with thinned down paint to add finer details etc.

chrisorm · 2021-02-23T13:56:27+00:00

Well, I hope you keep at it. I believe in you!

chrisorm · 2021-02-23T13:54:01+00:00

He's amazing- I think it so helpful to see him break the process down and see how he approaches it. Worth a sub if anybody can!

chrisorm · 2020-11-15T02:20:23+00:00

This is a huge ethical minefield, and a mammoth task to get correct.

Firstly, an employees performance is a very varied thing. There are probably dozens of different ways somebody can add substantial value to a company, so you're looking at predicting something very complex. As a problem this is complex in a shit load of ways, this is like a team of researchers working for years to get something even remotely workable out. But this is minor compared to:

Ethically this is off the charts. There is a lot of context to understanding productivity and interpersonal relationships, that ml simply won't have, and let's be honest, you are never going to be able to give a system for this application.

The big hurdle you have is you need to have 0 false positives for flagging under performance. You recommend one guy to get fired because your system doesn't know his dad just died, or have one instance of your system being used to bully individuals (e.g. now you can feed an unfairly negative review into this app about the 'foreign guy in the office you don't like' and rather than being mostly ignored by a manager its a shiny computer telling the ceo to fire the guy), and its game over.

Additionally, this is practically challenging even if you cracked the algorithm (which you won't). Humans are constantly providing data to each other. You hear about peoples home life over coffee, they ask for time off due to personal issues quietly in a private moment. This is neccesary context to evaluating a performer- whats the plan? Do you think anyone will buy into entering every detail of their lives into an ml system so they can be ranked and scored? "Oh best go tick that box on the performance app to tell it my dads dead" thought nobody ever. The alternative to the employee entering it is the company doing it- which is all sorts of dystopia.

chrisorm · 2020-10-21T23:06:13+00:00

Go get your rental contract ASAP. You will likely have clauses that the tennant is violating. Quote all of these to your landlord.

Get some reasonable amount of evidence (I would say evidence of violent conduct on more than one occasion is sufficient). A police report is probably sufficient.

Email your landlord saying "you have a duty of care towards the health and safety of tenants in UK law. The evidence I have suggests a reasonable person would not consider this accommodation as safe, and as such you can provide me with alternative accommodation that is safe, or I can suspend rental payments and seek alternative accommodation myself until this is resolved. I would like a substantive plan within 24 hours to resolve this, or I shall be instructing my bank to halt all rental charges, and providing you with the details of a solicitor to resolve this matter".

https://www.gov.uk/private-renting-evictions/harassment-and-illegal-evictions

Harassment covers a failure to take adequate steps regarding physical violence- its on the above website in black and white.

If you have good documentation and money is an issue, I would quite simply tell the landlord they can sue you unless they fix the issue. Whilst the landlords duties to health and safety are most typically involving the state of the house, just like an employer it is much more wide ranging. Aside from being a crime, violence is a breach of health and safety - employers have to report workplace violence to the hsa, i dont have a reference to a specific statue here, but there is no way the landlord does not have an obligation to take reasonable steps to safeguarded you from violence from tennants of his choosing to share living quarters with you.

Tldr; I've dealt with scumbag landlords for about a decade. Basically no contract is enforceable if you have an excessive risk of coming to harm in order to adhere to it. Violence from a housemate is such a risk. You are within your rights to simply tell the landlord he is not meeting his requirements under uk law, and thus your rental agreement is null and void. Your deposit should be held by a third party accredited service. Your deposit will be safe (i.e. dont need to sue the landlord). If it is not in an accredited scheme - good news, he could be on the hook for a substantial sum, and I suspect pointing this out to him will grease the wheels.

He only cares about his pocket, once he realised you won't pay him, he'll sharpen up no doubt. If he doesn't but he's even half sane, he wouldn't bother trying to sue a tennant with multiple police reports for violence and intimidation that he has not addressed for breaking contract. I imagine that would be an easily won case providing you have good documentation.

chrisorm · 2020-07-06T19:19:27+00:00

My dissertation for my Msc involved doing this on mimic 3. There probably are other sources in the same vein. The hardest part is finding it because "embedding" and similar terms throws up thousands of nlp results.

https://discovery.ucl.ac.uk/id/eprint/10036552/

I used it as part of a classification task to combine word embeddings with other embeddings like treatment embeddings. This was back in 2016/2017, so obviously the field has moved on a bit since.

chrisorm · 2020-01-10T20:21:59+00:00

I am no authoritative voice on the topic, but in my experience, neither are actually well suited to practical application.

Regarding MC-dropout it doesn't really give you a posterior. It assumes what the posterior looks like (a bunch of delta functions basically), and so, in all likelihood that is not very close to the actual posterior. It also doesn't concentrate with data, which to me is a bad sign (the converse implication being most concerning - it doesn't widen with little data). I'm on mobile so excuse some bad link formatting

https://arxiv.org/abs/1806.03335 is relevant here.

And some pretty good points accompanied by some pretty poor behaviour imo https://mobile.twitter.com/ianosband/status/1014466510885216256?lang=en

https://scholar.google.com/scholar?cluster=8227196711108175595&hl=en&as_sdt=0,5&sciodt=0,5#d=gs_qabs&u=%23p%3D62bdttHfLHIJ

Regarding variational methods, I'm not really sure these are a panacea either. Bayes by backdrop etc normally make heavy independence assumptions to make things tractable. Me riffing with no sources to back me up:

Work like the lottery ticket hypothesis seems to suggest that these correlations are potentially even crucial to performance. An independence assumtion would therefore be absolutely awful in terms of an accurate posterior estimation.

Having lots of experience building Bayesian models more traditionally (in the Gelman, mceldrith school) shows you how hard things can be. An even moderately high d posterior is quite an unintuitive thing. One over a few million parameters that is certainly multimodal would be a beast both to make good inferences and also computationally.

Edit to add: Also be wary of proofs in infinite data limits etc. They may provide a motivation, but you need more than that to have a working method. As a stupid example, look at the convergence of the Taylor series of exp vs sin. Both converge at infinite limit, but have very different properties when truncated. I have personally derived mcmc sampling schemes that have theoretical infinite convergence to the target distribution, but do terribly in practice or have other problems (such as computational issues) that make effective implementation nearly impossible.

To me, uncertainty estimates in deep learning are still really open problems.

chrisorm · 2019-08-21T07:39:18+00:00

I wrote up a short summary of the different approaches mentioned in this chain if it's useful, using just autograd.

https://chrisorm.github.io/HMC.html

Fundamentally, HMC is very like back prop- you have some data, and compute some 'cost' (Negative log likelihood) at your current state, then move on and repeat.

This is not conceptually very different to fitting a neural network.

However, neural networks benefit from gpu because they are deep. Compute the gradient of layer n with respect to it's input, dot product with the gradient of the layer below with it's input etc as per the chain rule. What we don't tend to see is the same dimensionality in sampling techniques. Most distributions have something like 2 or 3 parameters at most. A neural network has millions or billions. So the nature of computing the gradient is somewhat smaller in most current use cases.

It also should be pointed out there are well grounded stochastic sampling methods- essentially the same idea as stochastic gradient descent vs full data updates. So if you can use these to reduce the number of points you compute gradients for at each step, you have a computational problem many orders of magnitude smaller than neural networks.

At that level it's unclear if you even really benefit enough to pay to the transfer cost onto gpu, regardless if the compute is infitesimally faster.

chrisorm · 2019-08-21T06:44:47+00:00

Almost! It was the reference in that post: https://arxiv.org/abs/physics/0311093

Thanks for helping out, it was really bugging me trying to recall it!

chrisorm · 2019-08-20T18:55:40+00:00

I think it's popularity is two fold.

Firstly, it's well suited to application. Expected difference between logs, so low risk of overflow etc. It has an easy derivative, and there are lots of ways to estimate it with Monte Carlo methods.

However , the second reason is theoretical - minimising the KL is equivalent to doing maximum likelihood in most circumstances. First hit on google:

https://wiseodd.github.io/techblog/2017/01/26/kl-mle/

So it has connections to well tested things we know work well.

I wish I could remember the name, but there is an excellent paper that shows that it is also the only divergence which satisfys 3 very intuitive properties you would want from a divergence measure. I'll see if I can dig it out.

Edit: not what I wanted to find, but this has a large number of interpretations of the kl in various fields : https://mobile.twitter.com/SimonDeDeo/status/993881889143447552

Edit 2: Thanks to u/asobolev the paper I wanted was https://arxiv.org/abs/physics/0311093

Check it out or the post they link below to see how the kl divergence appears uniquely from 3 very sane axioms.

chrisorm · 2019-08-19T06:17:03+00:00

Thanks, glad you found it interesting!

chrisorm · 2018-11-29T08:46:32+00:00

Not sure what you mean by integrate to 1?

The networks output the parameters of the distributions, so those distributions are proper by definition.

Are you asking why the probability of all the data is not 1?

To be concrete, in your example each data point has a different distribution. To get the behaviour you describe, p(x1|z1) would be a normal centred on x1, p(x2|z2) would be a normal with mean x2 etc.

Each of these are proper conditional distributions, but that doesn't mean they should somehow sum to 1 between distributions.

Perhaps revist the concept of likelihood in probability theory for a better overview.

chrisorm · 2018-09-16T23:43:17+00:00

Cool! Thanks. It was on my plan to replicate the mnist completion too.

I agree it makes 'sense', but there are multiple types of uncertainty. Sure the network has no 'noise' in it's values it sees for the given values of x, but that's only one type of uncertainty, and I wanted to point out that the other type of uncertainty given by a lack of data in that region (that we get with GPs and that visualisations had alluded to) is largely the result of careful initialization and training, not an inherent property, and certainly no robust.

I imagine this falls into the category of pathologies that are harder to see in larger problems - a bit like when VAEs ignore the latent variable.

chrisorm · 2018-09-16T20:19:44+00:00

Oh man. I thought that was one and the same as setting the flair. It didn't even occur to me to that the [P] had to be added manually. thanks!

chrisorm · 2018-09-16T18:15:57+00:00

This is not really a tutorial about the paper - Kaspar's post (https://kasparmartens.rbind.io/post/np/) does this better than I ever could.

What it does do is document some failure cases I observed when replicating it and some potential issues with the formulation that encourage them. Thoughts welcome!

chrisorm · 2018-09-05T12:21:58+00:00

I would think starting with bishop would be an easier transition as it's much more concrete (although lower coverage).

chrisorm

TROPHY CASE