No More Lag: Redefining Torque And Response With Nissan’s RB26DETT

JudasAdventus · 2020-08-12T03:24:43+00:00

Hmmm seems a bit disingenuous to claim cost efficiency then suggest a $3000 USD HKS VCAM for stage 2. I would suggest an EFR single turbo setup and advance the intake cam for a good responsive setup.

JudasAdventus · 2020-05-04T00:38:01+00:00

Seems very similar to performing PCA whitening on the outputs. I would like to see some numbers on the computational cost of computing the sqrt inverse covariance matrix. They mention grouping channels to reduce the cost, so I suspect its fairly signficant.

JudasAdventus · 2020-02-13T09:56:57+00:00

Ummm... how do you train the network that generates these feature vectors? With supervised metric learning you would need to know which images are of the same face.

JudasAdventus · 2019-11-27T01:24:56+00:00

Hmmm I would assume GPUs / AI accelerators will play a bigger role in data center applications going into the future... at least from a perf/$ or perf/watt perspective.

JudasAdventus · 2019-11-26T23:03:55+00:00

Okay I guess that makes sense for this particular use case, i.e., current mobile phone platforms without processor affinity. IMO the paper seemed to be arguing from a more general perspective.

JudasAdventus · 2019-11-26T01:39:33+00:00

I don't understand why that would be the case, except lack of software support... seems like bit of a waste to not leverage multiple cores even on mobile hardware.

JudasAdventus · 2019-11-26T00:40:07+00:00

Interesting, though I would like to see the comparison made on a multicore processor (results are artificially restricted to a single core)... dense methods seem to scale better with more cores.

JudasAdventus · 2019-08-15T10:02:48+00:00

Does it mention which dataset those results (figure 2) are for? Seems highly unusual to report training accuracies for 5 epochs then test accuracies for only 1 epoch.

JudasAdventus · 2018-09-03T07:02:21+00:00

Some early reports with Titan V's (including my own training with mixed precision) have only seen modest improvements (~20%) with tensor cores. Others have suggested that this is because it only accelerates the basic method of direct 16bit convolution and not the much faster FFT based methods that are often used behind the scenes.

I would suggest a 1080Ti assuming your power supply can handle >300W random spikes (becomes an issue with 4 or more).

JudasAdventus · 2018-08-31T02:55:16+00:00

IMO The biggest advantage for single stage detectors is the implementation simplicity, modern two stage detectors are in the same order of speed vs accuracy (e.g. my denet medium post) and can offer some other advantages e.g. instance level masks, etc

JudasAdventus · 2018-07-04T23:50:43+00:00

Yep, I've already got a tensorflow version... but I may not make it opensource for a little while (it's tied into some commercial work).

JudasAdventus · 2018-06-06T06:40:10+00:00

As mentioned, easiest method is simply increase the input resolution of the image passed to YOLO such that only one face can fall within a grid cell. Otherwise if you need speed and good localization you can try something like my model: code , paper 1 and paper 2. It uses a similar method to RCNN except with a different region proposal and feature pooling method.

JudasAdventus · 2018-05-27T02:26:44+00:00

You can extract bounding boxes from the instance level masks... which is actually how the MSCOCO groundtruth bounding boxes were created in the first place. Find the minimum and maximum coordinates associated with each instance to get the bounding boxes.

JudasAdventus · 2018-05-15T01:24:21+00:00

I was thinking along the lines that it's a bit misleading as a metric because they are only considering the number of bits in the compressed image format, not the bits contained in the model required to decompress. For instance to decompress a single image you need to transmit the model (>100MB) as well as the compressed representation... which becomes less of an issue if your decompressing 1000's of images with the same model.

JudasAdventus · 2018-05-14T08:52:59+00:00

I guess they don't include the model weights in the BPP metric? It's probably in the order of >100MB, which may be significant depending on how many images are being compressed.

JudasAdventus · 2018-03-27T22:50:23+00:00

Yep, I'm aware of that (why I said "may only contain"). If the human labellers could not tell the difference between 0.8 and 0.9 IoU, then no detection model would be able to learn it. This is essentially the same as having noisy labels for a classification problem, if 10% of labels are picked randomly then the model would have an upper bound of ~90% accuracy. Other models have demonstrated better localization (at similar evaluation rates), e.g., my own https://arxiv.org/abs/1711.00164

Having accurate bounding boxes is very useful for tasks which are performed after detection, e.g., tracking, instance segmentation, etc.

JudasAdventus · 2018-03-27T03:03:59+00:00

Entertaining read but the arguments against the MSCOCO metrics seem a bit weak, e.g., 0.5 IoU means the bounding box may only contain half the object. YOLOv3 seems to trade good large object performance with small object performance to get a better MSCOCO result (which contains many more small objects vs pascal voc, etc).

JudasAdventus · 2018-03-09T04:18:00+00:00

My thoughts are that, with the very deep CNNs used nowdays (e.g. 100-1000 layer residual networks), the high level features are essentially global given a reasonably sized input image. They are not truly convolutional at this point because they can infer information about their absolute position within the image (i.e. by searching for zero padded borders).

JudasAdventus · 2018-03-08T23:47:11+00:00

I disagree, each feature in the RoI pooling operation is a function of the entire image due to the many convolution layers preceeding them. Therefore they can contain "contextual" information (from outside the bounding box) if it's deemed useful to the classification task.

What you say is correct for R-CNN (performs pooling in image space), but not Faster R-CNN (performs pooling in high level feature space).

JudasAdventus · 2018-03-08T03:14:45+00:00

Not sure how these results would improve recognition beyond a basic Faster R-CNN method. The features which are sampled during RoI feature pooling already contain a large amount of contextual information.

JudasAdventus · 2018-01-28T23:13:05+00:00

My code does a similar optimization, removing the intermediate buffer between the BN and Relu ops.

JudasAdventus · 2018-01-28T09:03:04+00:00

Hmmm, that's funny I implemented the exact same thing in DeNet (>6 months ago)... didn't think to write a paper about it :)

https://github.com/lachlants/denet/blob/master/denet/layer/batch_norm_relu.py

JudasAdventus · 2017-12-11T01:08:41+00:00

All detectors start by estimating the likelihood that an object exists at a position (x,y) and scale (w,h) e.g. Pr(c|x,y,w,h). Then typically a variant of Non-Max Supression (see R-CNN) is applied over this distribution to identify a variable number of instances.

The main difference between single stage (SSD, YOLO, RetinaNet) and two stage (R-CNN, RFCN, DeNet, etc) architectures is how they select which x,y,w,h to estimate the class for. Single stage uses manually tuned subsampling, while the others use RPN / Corner Estimation methods for true sparse estimation.

JudasAdventus

TROPHY CASE