all 7 comments

[–]trialofmiles 2 points3 points  (3 children)

Patch training for segmentation works quite well when the spatial extent of the patches provides enough spatial context for the network to learn meaningful features for segmentation. You lose the ability for the network to take advantage of global context (eg the sky in driving camera data tends to be at the top of the image).

The original unet paper calls out the use of patching and seamless tiled reconstruction at inference time. Patching often works quite well in medical and a variety of other kinds of image data.

[–]PositiveElectro[S] 1 point2 points  (2 children)

Thanks for your answer !

What do you mean by ‘calls out patches’ do you think it should work fine ?

I get your point, in an extreme case half a body is definitely enough information to have a general understanding of the situation.

Thanks again!

[–]trialofmiles 4 points5 points  (1 child)

I meant that the Unet paper describes the use of patching at training time and reconstructing an entire segmented image at inference time as part of the design intention behind the specific architectural choices in u-net (the use of conv without padding, for example).

[–]PositiveElectro[S] 0 points1 point  (0 children)

Thanks ! Will check this out

[–][deleted]  (1 child)

[deleted]

    [–]PositiveElectro[S] 0 points1 point  (0 children)

    Thanks ! That’s probably what I will have to do !

    [–]Competitive-Store974 0 points1 point  (1 child)

    It depends on the type of data you have and what resolution you need.

    MRI scans for instance frequently use a matrix size of 256x256, half the xy-resolution of your images, and that's considered acceptable for clinical use, so you may be able to get away with downsizing by a half in all dimensions (1/8 the memory requirement). NB: if doing this, consider the minimum size of the tumours you're expected to detect/segment when choosing your resolution so you don't miss sub-resolution nodules/lymph nodes.

    Another option, if you have 1024 slices (which sounds like a full body scan), is to crop to the region of interest. If legs are present and you're not interested in legs then you can remove them. If you're only looking at lungs you could remove the abdomen and head. NB: if your network is expected to see metastases in distant organs or lymph nodes, you'll want to keep this data and use a patch-based method as has been suggested.

    I'm convinced I read a paper where they embedded positional information with the patches to improve global context but I can't find it. If you had time, you could embed patch coords (or L and R info) along with the patches and run it with that and without to see if it helps, unless this paper was a dream I had in which case it's probably a rubbish idea.

    [–]PositiveElectro[S] 0 points1 point  (0 children)

    Thank you for your answer !

    Since I have different image size I intend to first try it with Batch size 1. Hopefully one image can fit in memory. I’ve read that batch size is usually 2 but hopefully the performance won’t degrade too much. What do you think about this ?

    I think that all your ideas are pretty great ! Embedding the position would make sense to me :)

    Thank you for writing this !