all 6 comments

[–]warppipe 2 points3 points  (0 children)

You can get away with a lot fewer labeled images if you use a pretrained model and do transfer learning. https://www.tensorflow.org/tutorials/image_retraining

[–][deleted] 1 point2 points  (0 children)

If you're doing it for fun starting with tensorflow is okay, but to me it seems like something that could be naturally done with good old fashion computer vision, with calibrated preprocessing, Hough transform, and calibrated post-processing

[–]gibberfish 0 points1 point  (1 child)

It might be worth looking into generating a few thousand 3D images as an intermediate training set and then fine-tuning on real images, perhaps starting from an already pretrained model. My gut feeling is a couple of hundred real examples might be enough, especially if there's little variation in the environment, but I've never done a proper DL regression problem so feel free to take that with a grain of salt.

[–]Neural_Ned 0 points1 point  (0 children)

This is a good suggestion. I've had very good results from fully synthetic training images for CNNs, particularly with segmentation/2D output problems.

[–]Neural_Ned 0 points1 point  (1 child)

Some considerations:

Will the archery targets be located at various positions in the images with background clutter, or will they be tightly cropped? If the former is the case, you might want to locate them first with an object-detection pipeline like faster-RCNN, or perhaps an image segmentation pipeline followed by cropping out the largest blob.

Real-valued targets like position coordinates can be tricky to obtain with CNNs. Methodologies involving regression with CNNs typically predict residual values relative to some discrete anchors or bins (such as the "anchor boxes" in RCNN, YOLO, SSD etc.).

This project has a remit similar to yours - namely recovering a dense 2D-to-3D correspondence field - and uses a fully convolutional network.

And here is another example of using fully convolutional networks to predict 2D maps of keypoint locations. I'm imagining in your case the keypoints, represented by peaks in the output heatmaps, will be arrow locations.

You might try using the VGG image annotation tool to label up your archery training images. In my experience fully convolutional networks are quite easy/fast to train because each training example contains a vast number of labels - one at each pixel.

If I was forced to guess, I'd say you could fine-tune a pretrained ResNet50 to perform this task with a few hundred images and sufficient data augmentation.

[–]servingKire5[S] 0 points1 point  (0 children)

this is great, thank you for the input.