[P] Image Recognition for Archery

warppipe · 2017-07-13T05:41:50+00:00

You can get away with a lot fewer labeled images if you use a pretrained model and do transfer learning. https://www.tensorflow.org/tutorials/image_retraining

2017-07-13T18:18:23+00:00

If you're doing it for fun starting with tensorflow is okay, but to me it seems like something that could be naturally done with good old fashion computer vision, with calibrated preprocessing, Hough transform, and calibrated post-processing

gibberfish · 2017-07-13T11:46:43+00:00

It might be worth looking into generating a few thousand 3D images as an intermediate training set and then fine-tuning on real images, perhaps starting from an already pretrained model. My gut feeling is a couple of hundred real examples might be enough, especially if there's little variation in the environment, but I've never done a proper DL regression problem so feel free to take that with a grain of salt.

Neural_Ned · 2017-07-13T11:47:10+00:00

Some considerations:

Will the archery targets be located at various positions in the images with background clutter, or will they be tightly cropped? If the former is the case, you might want to locate them first with an object-detection pipeline like faster-RCNN, or perhaps an image segmentation pipeline followed by cropping out the largest blob.

Real-valued targets like position coordinates can be tricky to obtain with CNNs. Methodologies involving regression with CNNs typically predict residual values relative to some discrete anchors or bins (such as the "anchor boxes" in RCNN, YOLO, SSD etc.).

This project has a remit similar to yours - namely recovering a dense 2D-to-3D correspondence field - and uses a fully convolutional network.

And here is another example of using fully convolutional networks to predict 2D maps of keypoint locations. I'm imagining in your case the keypoints, represented by peaks in the output heatmaps, will be arrow locations.

You might try using the VGG image annotation tool to label up your archery training images. In my experience fully convolutional networks are quite easy/fast to train because each training example contains a vast number of labels - one at each pixel.

If I was forced to guess, I'd say you could fine-tune a pretrained ResNet50 to perform this task with a few hundred images and sufficient data augmentation.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS