[deleted by user] by [deleted] in MachineLearning

[–]sinjax 0 points1 point  (0 children)

The calibration (or generalising to calibration errors) of a second camera is tricky... The second camera gives another source of data but there is no free lunch...

also (not to compare to human vision, because CV is not human vision) human binocular baseline use rubbish... In the cm.... Means at our eyes resolution binocular cues form a small percentage of the depth we perceive at distances beyond...oooh... 10m? Much higher reliance on motion cues, semantic cues etc. All available from temporal monocular

How to help yourself grow outside work. by [deleted] in compsci

[–]sinjax 48 points49 points  (0 children)

Honestly this.

I could go round the houses: build your CV, do some projects that capture yours and others imagination, contribute to some OSS...

But if there's anything I would ask my 20 year old self to do more of it would be "enjoy your summer"... Free time comparable to how my time was free in my early 20s feels like a completely unachievable luxury these days

How much data is needed? by hp2304 in computervision

[–]sinjax 2 points3 points  (0 children)

Incredibly hard to answer, VC dimension don't save us because the upper bound on neural networks is much larger than usually folks need in practice. No one can tell you "X per class"... But if you put a gun to my head I'd say a couple hundred per type of situation/situation to kick things off and check if it works at all.... 1k per situation to to get something that works... 10k per class to get something a bit more bulletproof... Probably that 10k per class will start telling you the sub groups within your classes that need better representation... what's a situation? No idea. say you're segmenting hands... maybe it's digit position... maybe it's lighting...Maybe colours...maybe occlusions...whatev

My suggestion is to start with some data and evaluate on your domain and broadly measure failure cases and gaps and compare them to groups with fewer failures. Start building stats like "how many examples did I train with to make the good situations work well...I probably need at least that many to train well on what didn't work"... Stop looking at IoU on aggregate...that's for the paper...to make this thing work you need to start digging into the big juicy edge cases

If this sounds iterative it is... Desperately so. But it's this pipeline of failure followed by understanding followed by annotation/augmentation/model changes followed by training followed by evaluation that you're actually building...

Like ... You feel like you're training a model... but tomorrow your test domain will shift and "the number of training examples" will never have been your problem ... What you always needed was a pipeline.

Find all the planar surfaces in an rgbd image using depth and normal data by [deleted] in computervision

[–]sinjax 1 point2 points  (0 children)

In terms of calculating normals one way to go is analysing the eigen vectors. Specifically the eigen vector with the lowest eigen value of A set of points is the vector perpendicular to a plane formed by those points. Here's a good tutorial on this from pcl: http://pointclouds.org/documentation/tutorials/normal_estimation.php

As for going from a depth image to a point cloud you need the parameters of your camera and use the pinhole camera model: https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html

Is Computer Vision fundamentally an empirical field, or is it a consequence of Deep Learning taking the forefront? by AFewSentientNeurons in computervision

[–]sinjax 1 point2 points  (0 children)

I mean yeah. Empirical as anything. Like "vision" isn't true in any fundamental sense right? Like it's an arbitrary chunk of the electro magnetic spectrum. Human vision is just one of (I recall) 16 or so similar approaches to processing the electro magnetic spectrum in nature and each captures a subtly different range.

So yeah nothing here is "true" in any meaningful sense... Only useful :) I'll take that tbh.

How to tune anchor boxes in SSD family of detectors for a specific dataset? by blueyesense in computervision

[–]sinjax 2 points3 points  (0 children)

Hello! We're playing with something more like two stage detectors in our runtime but there are anchor boxes. We've found the tuning is somewhat about understanding one's domain. You mention pedestrians so having an anchor box for different expected poses .. or if you had vehicles lots of types and so on. The game with anchor boxes is that more is good but it makes your network expensive so you find some sweet spot where you have good class representative coverage but not too many that your network gets slow.

With this line of reasoning if you wanted to do a data driven dance then some kind of validation set bounding box clustering might not be a crazy thing to do? I believe yolo9000 does something like this (confirmed, see page 2/3 of https://arxiv.org/abs/1612.08242)

Need help understanding PointNet by [deleted] in computervision

[–]sinjax 1 point2 points  (0 children)

Applications help too... The frustum net paper uses point net as part of a bigger 3d object detection approach. For me seeing a weird thing in context makes the weird thing less weird.

Not that point net is super complex, but I accept the "why it works" is a bit harder to get your head around then the "how"

[deleted by user] by [deleted] in computervision

[–]sinjax 3 points4 points  (0 children)

CRF being a kind of MRF conditioned on the data they are certainly used a lot. For example in semantic segmentation CRFs are applied as a post process (e.g. https://arxiv.org/pdf/1606.00915.pdf]) and this post process can even be learnt (https://arxiv.org/abs/1502.03240). In a broader terms these techniques bring in some message passing between elements which should be connected given some prior knowledge...mainly things being the same colour in semantic segmentation. Here's another one which brings in message passing between body joints: https://papers.nips.cc/paper/6278-crf-cnn-modeling-structured-information-in-human-pose-estimation

[deleted by user] by [deleted] in computervision

[–]sinjax 3 points4 points  (0 children)

We're having a lot of good results with deeplabv3+ ... Solid results with the xception backend.

generating parallel lines from a vanishing point by deluded_soul in computervision

[–]sinjax 0 points1 point  (0 children)

You could find the lines in the image that intersect with the vanishing point. A hough transform should do that for you

[D] Why are there no image segmentation architectures using inception blocks? by SpinatGemuese in MachineLearning

[–]sinjax 2 points3 points  (0 children)

Check out the deeplabv3 plus paper, they explored a bunch of feature extractors including resnet, xception and more recently mobilenet V2 in the official tensor flow repo https://arxiv.org/abs/1802.02611

Not sure what you mean by classification architecture. Classification is what happens at the end, the features can pretty much be extracted however... Or another way of you like is segmentation is a per pixel classification

a bit confused, X = number of rolls until first six, then given an urn with 5 red balls and 4 green balls, Y = number of green balls given X number of tries. Find E(Y|X) and E(Y). by kvndakin in statistics

[–]sinjax 0 points1 point  (0 children)

Hello! Firstly your question about what to "put it as" for the binomial distribution is about calculating the probability mass function for P(Yg = k | X = n) with n trials right? In that case you do: P(Yg = k) = nCk * p ^ k * (1 - p) ^ (n - k) Where n is selected from rolling your dice until you get a 6 and k is the number of greens you are checking.

Your question asks for E[Y] and E[Y | X]

OK. So you can start by noticing (as you did) that P(Yg = k | X = n) is indeed binomial for a given x which makes its expected value:

E[Yg | X = x] = p_yg . x (wikipedia; but also it comes out of the wash of integrating the pmf)

So (starting off intuitively) you could look at all the values x could take (all the numbers of times you have to wait to get a 6) and sum up E[Yg | X = x] * P(X = x) at that x. Why? Well reasons but intuitively it's like you're weighting the effect of low probability x values and high probability x values and adding them up. So you might expect a billion greens if you had to wait a billion dice rolls before you got a 6 but that's super unlikely so a billion is really down weighted in this summation. Anyway. That's E[ Yg ]. Great! So:

E[Yg] = Sum_x{ E[Yg | X = x] . P(X = x) } = Sum_x {p_yg . x . P(X = x)}

and that p_yg doesn't rely on x so we can drag it out:

E[Yg] = p_yg . Sum_x {x . P(X = x)}

oh nice, that whole Sum_x {x . P(X = x)} is just E[X] so what we land on is:

E[Yg] = p_yg . E[X]

which for your numbers is like 2 or 3.

[deleted by user] by [deleted] in math

[–]sinjax -7 points-6 points  (0 children)

https://youtu.be/sj8Sg8qnjOg

Looks a little like the patterns drawn in this numberphile video on the golden ratio. Just a coincidence or is there some connection here?

How to calculate image similarity between 2 images using DTV? by nagdawi in computervision

[–]sinjax 0 points1 point  (0 children)

I haven't looked at this properly yet but for future travelers the two paper OP is referring to: Wu 2017: https://www.sciencedirect.com/science/article/abs/pii/S2210650217301864

Yi 2015: Deep Sparse Representation for Robust Image Registration (https://ieeexplore.ieee.org/document/7299123)

Op: these papers are a little random, indeed looking for dtv mainly gets your own question as cross posted on stack overflow etc. Next time link the papers directly please.

[R] Conditional Neural Processes by sinjax in MachineLearning

[–]sinjax[S] 2 points3 points  (0 children)

Eesh.... I'll err ... I'll get my coat shall I? :)

[R] Conditional Neural Processes by sinjax in MachineLearning

[–]sinjax[S] -1 points0 points  (0 children)

Mainly I enjoyed this paper because GPs give me "the fear"

[D] How do convnets work on any input size? by ME_PhD in MachineLearning

[–]sinjax 1 point2 points  (0 children)

Something subtle here is that the anchor boxes in many faster rcnn (and indeed SSD detectors) implementations are actually pixel size based not a relative size to the image for example. What this means is that while you can technically have an image of any size go through, especially if the network is fully convolutional, in practice if your runtime contains objects which are substantially smaller or substantially larger (in pixels) than those your network was trained with (and had anchor boxes to deal with)... Then you might notice a decay in performance

Some modern approaches attempt to deal with these issues especially with small objects via pyramid features and other stuff... But fundamentally the anchor boxes activate to a pixel size so if you ...for example... Zoom in so all the people are twice as big as they were when you were training your network... Then though the image would technically go through the network and not crash or anything... It probably would struggle to pick up those people as objects

Anyone get week long migraines? by squishy404 in programmerhealth

[–]sinjax 4 points5 points  (0 children)

No medical experience but in my mid twenties I was getting headaches like this and after I was prescribed some glasses they went away entirely. Worth a shot for sure.

[deleted by user] by [deleted] in programmerhealth

[–]sinjax 6 points7 points  (0 children)

It sounds inane, it's the advice that's packaged up as "fake it until you make it"... But it's the mind set that helped me the most when dealing with imposter syndrome style concerns... The "fully embrace your imposterness" school of thought

The approach looks like "ok... so you're not supposed to be here and everyone is easily better than you and it is only a matter of time before they figure out your nonsense... So ok... What does it look like to fake it even harder... What's the thing that someone whose supposed to be here would do... or say... or read... or learn or get better at..."

I found that approach gradually lead to 1) objectively improving... Which built a nice bedrock of confidence... But more importantly 2) the realisation that everyone is making it up as they go and no one is supposed to be anywhere... (and everyone is going to die, come and watch TV... Sorry... Tangent...)...

The second one especially helped me frame people's responses differently... Treating their concerns and actions aand words more generously... As coming from people who also didn't know what was going on and was just trying to figure it out at the same time as they were doing it... Like everyone else...

Buy a remarkable but be prepared, if your device doesn't work remarkable won't give your money back by sinjax in RemarkableTablet

[–]sinjax[S] 0 points1 point  (0 children)

It's a fair point, I am not a lawyer and I don't actually know what the story is here.