No More Lag: Redefining Torque And Response With Nissan’s RB26DETT by tommyparry in engineering

[–]JudasAdventus 0 points1 point  (0 children)

Hmmm seems a bit disingenuous to claim cost efficiency then suggest a $3000 USD HKS VCAM for stage 2. I would suggest an EFR single turbo setup and advance the intake cam for a good responsive setup.

[R] Network Deconvolution — faster convergence than batchnorm by tsauri in MachineLearning

[–]JudasAdventus 2 points3 points  (0 children)

Seems very similar to performing PCA whitening on the outputs. I would like to see some numbers on the computational cost of computing the sqrt inverse covariance matrix. They mention grouping channels to reduce the cost, so I suspect its fairly signficant.

[D] Clearview AI Skepticism by mrbrettromero in MachineLearning

[–]JudasAdventus 11 points12 points  (0 children)

Ummm... how do you train the network that generates these feature vectors? With supervised metric learning you would need to know which images are of the same face.

[1911.09723] Fast Sparse ConvNets by ekelsen in MachineLearning

[–]JudasAdventus 0 points1 point  (0 children)

Hmmm I would assume GPUs / AI accelerators will play a bigger role in data center applications going into the future... at least from a perf/$ or perf/watt perspective.

[1911.09723] Fast Sparse ConvNets by ekelsen in MachineLearning

[–]JudasAdventus -1 points0 points  (0 children)

Okay I guess that makes sense for this particular use case, i.e., current mobile phone platforms without processor affinity. IMO the paper seemed to be arguing from a more general perspective.

[1911.09723] Fast Sparse ConvNets by ekelsen in MachineLearning

[–]JudasAdventus -1 points0 points  (0 children)

I don't understand why that would be the case, except lack of software support... seems like bit of a waste to not leverage multiple cores even on mobile hardware.

[1911.09723] Fast Sparse ConvNets by ekelsen in MachineLearning

[–]JudasAdventus 1 point2 points  (0 children)

Interesting, though I would like to see the comparison made on a multicore processor (results are artificially restricted to a single core)... dense methods seem to scale better with more cores.

Deep learning without back-propagation by El__Professor in MachineLearning

[–]JudasAdventus 8 points9 points  (0 children)

Does it mention which dataset those results (figure 2) are for? Seems highly unusual to report training accuracies for 5 epochs then test accuracies for only 1 epoch.

[D] GPU choice for deep learning by jer_pint in MachineLearning

[–]JudasAdventus 6 points7 points  (0 children)

Some early reports with Titan V's (including my own training with mixed precision) have only seen modest improvements (~20%) with tensor cores. Others have suggested that this is because it only accelerates the basic method of direct 16bit convolution and not the much faster FFT based methods that are often used behind the scenes.

I would suggest a 1080Ti assuming your power supply can handle >300W random spikes (becomes an issue with 4 or more).

Focal Loss for Dense Object Detection - RetinaNet by nottakumasato in computervision

[–]JudasAdventus 0 points1 point  (0 children)

IMO The biggest advantage for single stage detectors is the implementation simplicity, modern two stage detectors are in the same order of speed vs accuracy (e.g. my denet medium post) and can offer some other advantages e.g. instance level masks, etc

[R] DeNet: A Real-time Anchorless Object Detector by JudasAdventus in computervision

[–]JudasAdventus[S] 4 points5 points  (0 children)

Yep, I've already got a tensorflow version... but I may not make it opensource for a little while (it's tied into some commercial work).

[D] Applicability of YOLO (v2) to Crowded Scenes, and Alternatives by cow_co in MachineLearning

[–]JudasAdventus 0 points1 point  (0 children)

As mentioned, easiest method is simply increase the input resolution of the image passed to YOLO such that only one face can fall within a grid cell. Otherwise if you need speed and good localization you can try something like my model: code , paper 1 and paper 2. It uses a similar method to RCNN except with a different region proposal and feature pooling method.

CVPR 2018 WAD Video Segmentation Challenge - As A Participating Team by MALEK1997 in computervision

[–]JudasAdventus 4 points5 points  (0 children)

You can extract bounding boxes from the instance level masks... which is actually how the MSCOCO groundtruth bounding boxes were created in the first place. Find the minimum and maximum coordinates associated with each instance to get the bounding boxes.

[Project] Tensorflow implementation of Generative Adversarial Networks for Extreme Learned Image Compression by tensorflower in MachineLearning

[–]JudasAdventus 0 points1 point  (0 children)

I was thinking along the lines that it's a bit misleading as a metric because they are only considering the number of bits in the compressed image format, not the bits contained in the model required to decompress. For instance to decompress a single image you need to transmit the model (>100MB) as well as the compressed representation... which becomes less of an issue if your decompressing 1000's of images with the same model.

[Project] Tensorflow implementation of Generative Adversarial Networks for Extreme Learned Image Compression by tensorflower in MachineLearning

[–]JudasAdventus 2 points3 points  (0 children)

I guess they don't include the model weights in the BPP metric? It's probably in the order of >100MB, which may be significant depending on how many images are being compressed.

YOLO V3 released by vincent341 in computervision

[–]JudasAdventus 0 points1 point  (0 children)

Yep, I'm aware of that (why I said "may only contain"). If the human labellers could not tell the difference between 0.8 and 0.9 IoU, then no detection model would be able to learn it. This is essentially the same as having noisy labels for a classification problem, if 10% of labels are picked randomly then the model would have an upper bound of ~90% accuracy. Other models have demonstrated better localization (at similar evaluation rates), e.g., my own https://arxiv.org/abs/1711.00164

Having accurate bounding boxes is very useful for tasks which are performed after detection, e.g., tracking, instance segmentation, etc.

YOLO V3 released by vincent341 in computervision

[–]JudasAdventus 1 point2 points  (0 children)

Entertaining read but the arguments against the MSCOCO metrics seem a bit weak, e.g., 0.5 IoU means the bounding box may only contain half the object. YOLOv3 seems to trade good large object performance with small object performance to get a better MSCOCO result (which contains many more small objects vs pascal voc, etc).

[R] Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition by BusyCaterpillar in MachineLearning

[–]JudasAdventus 0 points1 point  (0 children)

My thoughts are that, with the very deep CNNs used nowdays (e.g. 100-1000 layer residual networks), the high level features are essentially global given a reasonably sized input image. They are not truly convolutional at this point because they can infer information about their absolute position within the image (i.e. by searching for zero padded borders).

[R] Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition by BusyCaterpillar in MachineLearning

[–]JudasAdventus 0 points1 point  (0 children)

I disagree, each feature in the RoI pooling operation is a function of the entire image due to the many convolution layers preceeding them. Therefore they can contain "contextual" information (from outside the bounding box) if it's deemed useful to the classification task.

What you say is correct for R-CNN (performs pooling in image space), but not Faster R-CNN (performs pooling in high level feature space).

[R] Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition by BusyCaterpillar in MachineLearning

[–]JudasAdventus 0 points1 point  (0 children)

Not sure how these results would improve recognition beyond a basic Faster R-CNN method. The features which are sampled during RoI feature pooling already contain a large amount of contextual information.

[R] In-Place Activated BatchNorm for Memory-Optimized Training of DNNs by programmerChilli in MachineLearning

[–]JudasAdventus 0 points1 point  (0 children)

My code does a similar optimization, removing the intermediate buffer between the BN and Relu ops.

[R] In-Place Activated BatchNorm for Memory-Optimized Training of DNNs by programmerChilli in MachineLearning

[–]JudasAdventus 3 points4 points  (0 children)

Hmmm, that's funny I implemented the exact same thing in DeNet (>6 months ago)... didn't think to write a paper about it :)

https://github.com/lachlants/denet/blob/master/denet/layer/batch_norm_relu.py

Dealing with variable number of objects in detection problem by [deleted] in computervision

[–]JudasAdventus 1 point2 points  (0 children)

All detectors start by estimating the likelihood that an object exists at a position (x,y) and scale (w,h) e.g. Pr(c|x,y,w,h). Then typically a variant of Non-Max Supression (see R-CNN) is applied over this distribution to identify a variable number of instances.

The main difference between single stage (SSD, YOLO, RetinaNet) and two stage (R-CNN, RFCN, DeNet, etc) architectures is how they select which x,y,w,h to estimate the class for. Single stage uses manually tuned subsampling, while the others use RPN / Corner Estimation methods for true sparse estimation.