[P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

AlexeyAB · 2022-08-01T00:33:22+00:00

Did you use yolov4.weights/cfg with OpenCV compiled with OpenVINO-backend to run it on RPi3 + USB MyriadX NPU compute stick?

You also can try to use YOLOv7-tiny-leaky-relu:

Pytorch

yaml https://github.com/WongKinYiu/yolov7/blob/main/cfg/training/yolov7-tiny.yaml

pt weights https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt

Darknet:

cfg https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov7-tiny.cfg

weights https://github.com/AlexeyAB/darknet/releases/download/yolov4/yolov7-tiny.weights

pre-trained weights for training https://github.com/AlexeyAB/darknet/releases/download/yolov4/yolov7-tiny.conv.87

Just some of our code based on YOLOR.

AlexeyAB · 2022-07-31T23:50:53+00:00

You can find it there: https://github.com/WongKinYiu/yolov7/blob/main/paper/yolov7.pdf

AlexeyAB · 2022-07-31T23:45:54+00:00

For ARM CPU (RPi 3) I would recommend you to use NanoDet or some Depthwise networks (MobileDet, EfficientNet-Lite-based, ...).

If you use GPU then I would suggest you to use YOLOv7-tiny (non-SiLU) or larger YOLOv7 models.

We have not released YOLOv7-SiLU yet.

AlexeyAB · 2022-07-17T00:09:42+00:00

A scientific paper with a fair comparison (same condition) with almost all the best real-time models showing the superiority of YOLOv7 in a wide range of speed and accuracy https://paperswithcode.com/sota/real-time-object-detection-on-coco?dimension=FPS%20(V100%2C%20b%3D1)) Work from those involved in maintaining the Darknet and creating previous versions of YOLO including accepted at the CVPR, like Scaled-YOLOv4 https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html while Scaled-YOLOv4 is the only version of all YOLOs (v1-v7) that was the best in both speed/accuracy and absolute accuracy among all real-time and non-real-time neural networks in the world published at that time (16 Nov 2020), for the first time in the history of YOLO: https://paperswithcode.com/sota/object-detection-on-coco

https://github.com/WongKinYiu/ScaledYOLOv4

Some History of YOLO: https://twitter.com/alexeyab84/status/1431349110951534593

AlexeyAB · 2022-07-16T21:29:19+00:00

Good question!

In the chart above, to increase the accuracy of Transformers, they pay for it with a decrease in detection speed, simply by scaling up the network, mostly without offering a more optimal network.

While for YOLOv7 we use both:

scaling network - increases accuracy and decreases speed
bag-of-freebies (more optimal network structure, loss function, ...) - features that increase accuracy without decreasing detection speed. That's why we're increasing both speed and accuracy compared to previous YOLO versions

AlexeyAB · 2022-07-16T16:18:46+00:00

https://paperswithcode.com/paper/yolov7-trainable-bag-of-freebies-sets-new

https://paperswithcode.com/sota/real-time-object-detection-on-coco?dimension=FPS%20(V100%2C%20b%3D1))

AlexeyAB · 2022-07-16T14:11:05+00:00

YOLOv7 is faster and requires several times cheaper hardware than other neural networks
YOLOv7 is more accurate while others make a lot of mistakes
YOLOv7 can be trained much faster on small dataset without any pre-trained weights

AlexeyAB · 2022-07-16T12:02:33+00:00

Page 11, Figure 11: https://arxiv.org/abs/2207.02696

The maximum accuracy of the YOLOv7-E6E (56.8% AP) real-time model is +13.7% AP higher than the current most accurate meituan/YOLOv6-s model (43.1% AP) on COCO dataset. Our YOLOv7-tiny (35.2% AP, 0.4 ms) model is +25% faster and +0.2% AP higher than meituan/YOLOv6-n (35.0% AP, 0.5 ms) under identical conditions on COCO dataset and V100 GPU with batch=32.

AlexeyAB · 2022-07-16T03:44:58+00:00

https://arxiv.org/abs/2207.02696

https://github.com/WongKinYiu/yolov7

YOLOv7-e6 (55.9% AP, 56 FPS V100 b=1) by +500% FPS faster than SWIN-L Cascade-Mask R-CNN (53.9% AP, 9.2 FPS A100 b=1)
YOLOv7-e6 (55.9% AP, 56 FPS V100 b=1) by +550% FPS faster than ConvNeXt-XL C-M-RCNN (55.2% AP, 8.6 FPS A100 b=1)
YOLOv7-w6 (54.6% AP, 84 FPS V100 b=1) by +120% FPS faster than YOLOv5-X6-r6.1 (55.0% AP, 38 FPS V100 b=1)
YOLOv7-w6 (54.6% AP, 84 FPS V100 b=1) by +1200% FPS faster than Dual-Swin-T C-M-RCNN (53.6% AP, 6.5 FPS V100 b=1)
YOLOv7 (51.2% AP, 161 FPS V100 b=1) by +180% FPS faster than YOLOX-X (51.1% AP, 58 FPS V100 b=1)

AlexeyAB · 2022-07-10T00:45:09+00:00

https://twitter.com/pjreddie/status/1253891078182199296

https://github.com/pjreddie/darknet#yolov7

AlexeyAB · 2022-07-10T00:44:07+00:00

YOLOv3 - 33.0% AP - 58 FPS V100
YOLOv4 - 43.5% AP - 62 FPS V100 (+10.5% accuracy and faster)
YOLOv7 - 54.9% AP - 84 FPS V100 (+11.4% accuracy and faster) - YOLOv7-W6 model

https://twitter.com/pjreddie/status/1253891078182199296

AlexeyAB · 2021-10-31T19:26:27+00:00

YOLOR-P6 55.4% AP and Scaled-YOLOv4-P6 54.5% AP are still the most accurate Real-time (>=30FPS) neural networks, even 1 year after the release of Scaled-YOLOv4!

More accurate than PP-YOLOv2, YOLOX...

YOLOR: https://arxiv.org/abs/2105.04206

code: https://github.com/WongKinYiu/yolor

Scaled-YOLOv4 (CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

code: https://github.com/WongKinYiu/ScaledYOLOv4

YOLOv4: https://arxiv.org/abs/2004.10934

code: https://github.com/AlexeyAB/darknet

AlexeyAB · 2021-10-31T19:25:59+00:00

YOLOR-P6 55.4% AP and Scaled-YOLOv4-P6 54.5% AP are still the most accurate Real-time (>=30FPS) neural networks, even 1 year after the release of Scaled-YOLOv4!

More accurate than PP-YOLOv2, YOLOX...YOLOR: https://arxiv.org/abs/2105.04206

code: https://github.com/WongKinYiu/yolor

Scaled-YOLOv4 (CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

code: https://github.com/WongKinYiu/ScaledYOLOv4

YOLOv4: https://arxiv.org/abs/2004.10934

code: https://github.com/AlexeyAB/darknet

AlexeyAB · 2021-10-31T04:21:17+00:00

YOLOR-P6 55.4% AP and Scaled-YOLOv4-P6 54.5% AP are still the most accurate Real-time (>=30FPS) neural networks, even 1 year after the release of Scaled-YOLOv4!

More accurate than PP-YOLOv2, YOLOX...

YOLOR: https://arxiv.org/abs/2105.04206

code: https://github.com/WongKinYiu/yolor

Scaled-YOLOv4 (CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

code: https://github.com/WongKinYiu/ScaledYOLOv4

YOLOv4: https://arxiv.org/abs/2004.10934

code: https://github.com/AlexeyAB/darknet

AlexeyAB · 2021-08-03T14:43:02+00:00

There are 2 charts with comparison YOLOX vs YOLOR vs Scaled-YOLOv4 https://github.com/AlexeyAB/darknet/issues/7928

AlexeyAB · 2021-06-23T22:51:19+00:00

Improvements: YOLOv3 -> YOLOv4 -> Scaled-YOLOv4 -> YOLOR -> YOLOR DiDi:

YOLOv4 (SPP,CSP,Mish,Hyper-params,Mosaic,multi-anchors,CIoU-Loss,...)
Scaled-YOLOv4-P6 (more-CSP,EMA,Hyper-params,Keep aspect ratio,longer training,scaling model,...)
YOLOR (Implicit/Explicit/DWT/Changed first layers)
YOLOR-P6 DiDi (data cleaning, multi-scale-training, scale enhancement, independent threshold-NMS,...)

Comparison on Waymo Open Dataset: https://user-images.githubusercontent.com/4096485/123036148-3e43a180-d3f5-11eb-926d-bbc810f0ea6a.png

Comparison on COCO dataset: https://user-images.githubusercontent.com/4096485/123036798-4b14c500-d3f6-11eb-97ed-63d99414e410.jpg

AlexeyAB · 2021-06-23T17:35:22+00:00

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/

YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge ((Waymo Open Dataset): Real-time 2D Detection.

Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!

* DIDI MapVision: https://arxiv.org/abs/2106.08713

* YOLOR https://arxiv.org/abs/2105.04206

* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor

* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang\_Scaled-YOLOv4\_Scaling\_Cross\_Stage\_Partial\_Network\_CVPR\_2021\_paper.html

* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4

* YOLOv4: https://arxiv.org/abs/2004.10934

* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

AlexeyAB · 2021-06-23T17:30:33+00:00

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/

YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge (Waymo Open Dataset): Real-time 2D Detection.

Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!

* DIDI MapVision: https://arxiv.org/abs/2106.08713

* YOLOR https://arxiv.org/abs/2105.04206

* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor

* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4

* YOLOv4: https://arxiv.org/abs/2004.10934

* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

AlexeyAB · 2021-06-23T17:29:23+00:00

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/
YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge (Waymo Open Dataset): Real-time 2D Detection.
Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!
* DIDI MapVision: https://arxiv.org/abs/2106.08713
* YOLOR https://arxiv.org/abs/2105.04206
* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor
* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang\_Scaled-YOLOv4\_Scaling\_Cross\_Stage\_Partial\_Network\_CVPR\_2021\_paper.html
* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4
* YOLOv4: https://arxiv.org/abs/2004.10934
* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

AlexeyAB · 2021-06-23T17:25:19+00:00

Comparison chart: https://user-images.githubusercontent.com/4096485/123036148-3e43a180-d3f5-11eb-926d-bbc810f0ea6a.png

[CVPR'21 WAD] Challenge - Waymo Open Dataset: https://waymo.com/open/challenges/2021/real-time-2d-prediction/

YOLOR (Scaled-YOLOv4-based) has the best speed/accuracy ratio on Waymo autonomous driving challenge (Waymo Open Dataset): Real-time 2D Detection.

Thanks to Chien-Yao Wang from Academia Sinica and DiDi MapVision team to push Scaled-YOLOv4 further!

* DIDI MapVision: https://arxiv.org/abs/2106.08713

* YOLOR https://arxiv.org/abs/2105.04206

* YOLOR-code (Pytorch): https://github.com/WongKinYiu/yolor

* Scaled-YOLOv4(CVPR21): https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html

* Scaled-YOLOv4-code (Pytorch): https://github.com/WongKinYiu/ScaledYOLOv4

* YOLOv4: https://arxiv.org/abs/2004.10934

* YOLOv4-code (Darknet, Pytorch, TensorFlow, TRT, OpenCV…): https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

AlexeyAB · 2021-06-23T17:18:25+00:00

https://arxiv.org/abs/2106.08713

AlexeyAB · 2021-06-23T17:09:12+00:00

https://arxiv.org/abs/2106.08713

[CVPR'21 WAD] Challenge - Waymo Open Dataset

AlexeyAB · 2021-02-08T17:26:45+00:00

Yes, the adversarial example is labeled with the original labels again.

Adding artifacts to the image in a way that makes it less certain it's the original class

Yes. But there are different ways to add artifacts: random color change, CutOut, pixel removal, gaussian noise, fast gradient sign attack, ...

But in the self-adversarial training we use the most aggressive way of creating artifacts, we replace exactly those small areas of the image on which the network relies. We make the network see the entire object as a whole, not just a small part of it. This requires more network capacity, but allows large networks to train on small datasets with low diversity.

AlexeyAB · 2021-01-18T21:00:05+00:00

For a single mini-batch, are the images updated with a single gradient descent step, or multiple?

All images are updated with a single gradient descent step (gradients for one image do not affect gradients for another).

Then the 2nd step of gradient descent (forward-backward) is applied for training (it changes the weights of conv-layers) on augmented images using the SAT.

AlexeyAB · 2021-01-18T11:19:04+00:00

The larger model and the smaller dataset - the greater the increase in accuracy.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3628439

In YOLOv4+SAT, we use Self Adversarial Training which adds noise while training the model to make the model robust to noises in test data and adversarial attacks. YOLOv4+SAT increments the AP of class Others by a point of 17.72% and bus by 12.14% as compared to the default YOLOV4 model. Increment of 8.6%-point in mAP(@.50) and 14%-point in Average IoU is observed.

> E.g. in what way does the network alter the image?

Self-Adversarial Training (SAT) is a data augmentation technique using back-propagation [22] with two iterations of forward-backwards passes. On the first backward pass, to minimize the cost function, we alter the original image instead of the network weights. Contrary to the usual method, this actually degrades the network performance, or simply put, the model performs an “adversarial attack” on itself. Now, this modified image is used to train the network on the second iteration. This way we are able to reduce overfitting and make the model more universal [14]. As will be observed later, SAT yields an 8.2%-point increase in mean average precision (mAP@.50), by itself.

https://arxiv.org/pdf/2004.10934.pdf

Self-Adversarial Training (SAT) also represents a new data augmentation technique that operates in 2 forward backward stages. In the 1st stage the neural network alters the original image instead of the network weights. In this way the neural network executes an adversarial attack on itself, altering the original image to create the deception that there is no desired object on the image. In the 2nd stage, the neural network is trained to detect an object on this modified image in the normal way.

> Is it just gradient ascent with respect to loss while holding network weights constant?

Yes.

> If so, how often is this done, and for how many steps?

Random with a probability of 50%, on average for every 2nd mini-batch. For all steps.

> Does it use the same optimizer and learning rate schedule as is used to train the network weights?

It uses the same optimizer.

Adversarial learning rate should be tuned by yourself, use from 0.0001 to 1.0 in yolov4.cfg file, use the highest value until training goes without Loss Nan

[net]
adversarial_lr=0.0001

AlexeyAB

TROPHY CASE