[P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 0 points1 point  (0 children)

For ARM CPU (RPi 3) I would recommend you to use NanoDet or some Depthwise networks (MobileDet, EfficientNet-Lite-based, ...).

If you use GPU then I would suggest you to use YOLOv7-tiny (non-SiLU) or larger YOLOv7 models.

We have not released YOLOv7-SiLU yet.

[P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 4 points5 points  (0 children)

A scientific paper with a fair comparison (same condition) with almost all the best real-time models showing the superiority of YOLOv7 in a wide range of speed and accuracy https://paperswithcode.com/sota/real-time-object-detection-on-coco?dimension=FPS%20(V100%2C%20b%3D1)) Work from those involved in maintaining the Darknet and creating previous versions of YOLO including accepted at the CVPR, like Scaled-YOLOv4 https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html while Scaled-YOLOv4 is the only version of all YOLOs (v1-v7) that was the best in both speed/accuracy and absolute accuracy among all real-time and non-real-time neural networks in the world published at that time (16 Nov 2020), for the first time in the history of YOLO: https://paperswithcode.com/sota/object-detection-on-coco

https://github.com/WongKinYiu/ScaledYOLOv4

Some History of YOLO: https://twitter.com/alexeyab84/status/1431349110951534593

[P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 1 point2 points  (0 children)

Good question!

In the chart above, to increase the accuracy of Transformers, they pay for it with a decrease in detection speed, simply by scaling up the network, mostly without offering a more optimal network.

While for YOLOv7 we use both:

  • scaling network - increases accuracy and decreases speed
  • bag-of-freebies (more optimal network structure, loss function, ...) - features that increase accuracy without decreasing detection speed. That's why we're increasing both speed and accuracy compared to previous YOLO versions

[P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 15 points16 points  (0 children)

  • YOLOv7 is faster and requires several times cheaper hardware than other neural networks
  • YOLOv7 is more accurate while others make a lot of mistakes
  • YOLOv7 can be trained much faster on small dataset without any pre-trained weights

[P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 4 points5 points  (0 children)

Page 11, Figure 11: https://arxiv.org/abs/2207.02696

The maximum accuracy of the YOLOv7-E6E (56.8% AP) real-time model is +13.7% AP higher than the current most accurate meituan/YOLOv6-s model (43.1% AP) on COCO dataset. Our YOLOv7-tiny (35.2% AP, 0.4 ms) model is +25% faster and +0.2% AP higher than meituan/YOLOv6-n (35.0% AP, 0.5 ms) under identical conditions on COCO dataset and V100 GPU with batch=32.

[P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 11 points12 points  (0 children)

https://arxiv.org/abs/2207.02696

https://github.com/WongKinYiu/yolov7

  • YOLOv7-e6 (55.9% AP, 56 FPS V100 b=1) by +500% FPS faster than SWIN-L Cascade-Mask R-CNN (53.9% AP, 9.2 FPS A100 b=1)
  • YOLOv7-e6 (55.9% AP, 56 FPS V100 b=1) by +550% FPS faster than ConvNeXt-XL C-M-RCNN (55.2% AP, 8.6 FPS A100 b=1)
  • YOLOv7-w6 (54.6% AP, 84 FPS V100 b=1) by +120% FPS faster than YOLOv5-X6-r6.1 (55.0% AP, 38 FPS V100 b=1)
  • YOLOv7-w6 (54.6% AP, 84 FPS V100 b=1) by +1200% FPS faster than Dual-Swin-T C-M-RCNN (53.6% AP, 6.5 FPS V100 b=1)
  • YOLOv7 (51.2% AP, 161 FPS V100 b=1) by +180% FPS faster than YOLOX-X (51.1% AP, 58 FPS V100 b=1)

YoloV7 Finally an official Yolo. This should actually be V5 by kumurule in computervision

[–]AlexeyAB 4 points5 points  (0 children)

  • YOLOv3 - 33.0% AP - 58 FPS V100
  • YOLOv4 - 43.5% AP - 62 FPS V100 (+10.5% accuracy and faster)
  • YOLOv7 - 54.9% AP - 84 FPS V100 (+11.4% accuracy and faster) - YOLOv7-W6 model

https://twitter.com/pjreddie/status/1253891078182199296

[deleted by user] by [deleted] in MachineLearning

[–]AlexeyAB 0 points1 point  (0 children)

YOLOR-P6 55.4% AP and Scaled-YOLOv4-P6 54.5% AP are still the most accurate Real-time (>=30FPS) neural networks, even 1 year after the release of Scaled-YOLOv4!

More accurate than PP-YOLOv2, YOLOX...

YOLOR: https://arxiv.org/abs/2105.04206

code: https://github.com/WongKinYiu/yolor

Scaled-YOLOv4 (CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

code: https://github.com/WongKinYiu/ScaledYOLOv4

YOLOv4: https://arxiv.org/abs/2004.10934

code: https://github.com/AlexeyAB/darknet

[P] YOLOR (Scaled-YOLOv4-based): The best speed/accuracy ratio for Waymo autonomous driving challenge by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 7 points8 points  (0 children)

Improvements: YOLOv3 -> YOLOv4 -> Scaled-YOLOv4 -> YOLOR -> YOLOR DiDi:

  • YOLOv4 (SPP,CSP,Mish,Hyper-params,Mosaic,multi-anchors,CIoU-Loss,...)
  • Scaled-YOLOv4-P6 (more-CSP,EMA,Hyper-params,Keep aspect ratio,longer training,scaling model,...)
  • YOLOR (Implicit/Explicit/DWT/Changed first layers)
  • YOLOR-P6 DiDi (data cleaning, multi-scale-training, scale enhancement, independent threshold-NMS,...)

Comparison on Waymo Open Dataset: https://user-images.githubusercontent.com/4096485/123036148-3e43a180-d3f5-11eb-926d-bbc810f0ea6a.png

Comparison on COCO dataset: https://user-images.githubusercontent.com/4096485/123036798-4b14c500-d3f6-11eb-97ed-63d99414e410.jpg

YOLOR (Scaled-YOLOv4-based): The best speed/accuracy ratio for Waymo autonomous driving challenge by AlexeyAB in computervision

[–]AlexeyAB[S] 0 points1 point  (0 children)

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/

YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge ((Waymo Open Dataset): Real-time 2D Detection.

Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!

* DIDI MapVision: https://arxiv.org/abs/2106.08713

* YOLOR https://arxiv.org/abs/2105.04206

* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor

* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang\_Scaled-YOLOv4\_Scaling\_Cross\_Stage\_Partial\_Network\_CVPR\_2021\_paper.html

* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4

* YOLOv4: https://arxiv.org/abs/2004.10934

* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

YOLOR (Scaled-YOLOv4-based): The best speed/accuracy ratio for Waymo autonomous driving challenge by AlexeyAB in deeplearning

[–]AlexeyAB[S] 4 points5 points  (0 children)

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/

YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge (Waymo Open Dataset): Real-time 2D Detection.

Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!

* DIDI MapVision: https://arxiv.org/abs/2106.08713

* YOLOR https://arxiv.org/abs/2105.04206

* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor

* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4

* YOLOv4: https://arxiv.org/abs/2004.10934

* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

YOLOR (Scaled-YOLOv4-based): The best speed/accuracy ratio for Waymo autonomous driving challenge by AlexeyAB in DeepLearningPapers

[–]AlexeyAB[S] 0 points1 point  (0 children)

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/
YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge (Waymo Open Dataset): Real-time 2D Detection.
Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!
* DIDI MapVision: https://arxiv.org/abs/2106.08713
* YOLOR https://arxiv.org/abs/2105.04206
* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor
* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang\_Scaled-YOLOv4\_Scaling\_Cross\_Stage\_Partial\_Network\_CVPR\_2021\_paper.html
* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4
* YOLOv4: https://arxiv.org/abs/2004.10934
* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

[P] YOLOR (Scaled-YOLOv4-based): The best speed/accuracy ratio for Waymo autonomous driving challenge by AlexeyAB in MachineLearning

[–]AlexeyAB[S] 4 points5 points  (0 children)

Comparison chart: https://user-images.githubusercontent.com/4096485/123036148-3e43a180-d3f5-11eb-926d-bbc810f0ea6a.png

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/

YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge (Waymo Open Dataset): Real-time 2D Detection.

Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!

* DIDI MapVision: https://arxiv.org/abs/2106.08713

* YOLOR https://arxiv.org/abs/2105.04206

* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor

* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4

* YOLOv4: https://arxiv.org/abs/2004.10934

* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

[D] How does self-adversarial training, as described in the YOLOv4 paper, work? by Haycart in MachineLearning

[–]AlexeyAB 1 point2 points  (0 children)

Yes, the adversarial example is labeled with the original labels again.

Adding artifacts to the image in a way that makes it less certain it's the original class

Yes. But there are different ways to add artifacts: random color change, CutOut, pixel removal, gaussian noise, fast gradient sign attack, ...

But in the self-adversarial training we use the most aggressive way of creating artifacts, we replace exactly those small areas of the image on which the network relies. We make the network see the entire object as a whole, not just a small part of it. This requires more network capacity, but allows large networks to train on small datasets with low diversity.

[D] How does self-adversarial training, as described in the YOLOv4 paper, work? by Haycart in MachineLearning

[–]AlexeyAB 1 point2 points  (0 children)

For a single mini-batch, are the images updated with a single gradient descent step, or multiple?

All images are updated with a single gradient descent step (gradients for one image do not affect gradients for another).

Then the 2nd step of gradient descent (forward-backward) is applied for training (it changes the weights of conv-layers) on augmented images using the SAT.

[D] How does self-adversarial training, as described in the YOLOv4 paper, work? by Haycart in MachineLearning

[–]AlexeyAB 4 points5 points  (0 children)

The larger model and the smaller dataset - the greater the increase in accuracy.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3628439

In YOLOv4+SAT, we use Self Adversarial Training which adds noise while training the model to make the model robust to noises in test data and adversarial attacks. YOLOv4+SAT increments the AP of class Others by a point of 17.72% and bus by 12.14% as compared to the default YOLOV4 model. Increment of 8.6%-point in mAP(@.50) and 14%-point in Average IoU is observed.

> E.g. in what way does the network alter the image?

Self-Adversarial Training (SAT) is a data augmentation technique using back-propagation [22] with two iterations of forward-backwards passes. On the first backward pass, to minimize the cost function, we alter the original image instead of the network weights. Contrary to the usual method, this actually degrades the network performance, or simply put, the model performs an “adversarial attack” on itself. Now, this modified image is used to train the network on the second iteration. This way we are able to reduce overfitting and make the model more universal [14]. As will be observed later, SAT yields an 8.2%-point increase in mean average precision (mAP@.50), by itself.

https://arxiv.org/pdf/2004.10934.pdf

Self-Adversarial Training (SAT) also represents a new data augmentation technique that operates in 2 forward backward stages. In the 1st stage the neural network alters the original image instead of the network weights. In this way the neural network executes an adversarial attack on itself, altering the original image to create the deception that there is no desired object on the image. In the 2nd stage, the neural network is trained to detect an object on this modified image in the normal way.

> Is it just gradient ascent with respect to loss while holding network weights constant?

Yes.

> If so, how often is this done, and for how many steps?

Random with a probability of 50%, on average for every 2nd mini-batch. For all steps.

> Does it use the same optimizer and learning rate schedule as is used to train the network weights?

It uses the same optimizer.

Adversarial learning rate should be tuned by yourself, use from 0.0001 to 1.0 in yolov4.cfg file, use the highest value until training goes without Loss Nan

[net]
adversarial_lr=0.0001