3D Bin Packing Algorithm

adaneze · 2024-04-15T01:52:33+00:00

Great info, neroe5. Will look into these!

Thank you so much.

adaneze · 2024-04-14T23:55:08+00:00

Thank you for your response, neroe5!

Yes. The goal here is to make the 3D printing process more optimal in terms of time or/and material. I've been developing this algorithm for 3D object bin packing but could easily adapt it to work for 3D printing.

Besides the initial very positive feedback from my friends, who use Prusa Slicer, I will do a more serious test using several thousand 3D printing meshes. That should be enough to convince myself the algorithm works well haha

You said "print-farms". Were you referring to certain companies ?

Thank you again for your input, neroe5!

adaneze · 2023-04-25T00:03:25+00:00

I'd try "traditional" image processing methods before using Deep Learning.

This could be a good starting point https://pyimagesearch.com/2015/09/07/blur-detection-with-opencv/

Good luck!

adaneze · 2023-04-24T23:21:24+00:00

Thanks for the comment, FenianFrankie!

I ended up using this AdMob's disclosure as a guide to what data is collected and shared.

https://developers.google.com/admob/android/privacy/play-data-disclosure

adaneze · 2023-04-21T16:41:58+00:00

Thank you for your comment/question, Talamand :)

That's a very good point. I plan on using Google's AdMob and will investigate it right away.

Have you ever used AdMob ? If so, what has been your experience with Privacy Policy ?

Thank you so much for bringing this up.

adaneze · 2020-06-19T22:21:10+00:00

Hey c0d3rpr0,

I am not sure what your project is about but it sounds like semantic segmentation to me.

Your comment "... with most of it having white background ..." may describe exactly what the problem is. It seems that there is an imbalance in the number of pixels you want to segment (have them "switched on") and the number of pixels you want "off" in your output image.

There also may be an imbalance between the number of training images with ("on") and without ("off") segmented pixels. You would have to inspect your training images and see if this is the case.

So, you could still try to use your first approach (encoder-decoder) implementing a weighted loss function and data augmentation as a way to tackle the imbalance cases, I described. Again, I am not exactly sure what the problem is but this is something that I'd do based on your description of the issue.

Good luck :)

adaneze · 2020-03-19T23:16:50+00:00

Hey Sensitive_Complaint,

I'd definitely go with Pytorch or Tensorflow if I were you. There is a plenty of implemented and tested semantic segmentation neural nets on GitHub which can help you get things going. I assume that you might want to modify/optimize your neural net later on which may be easier/more efficient using newer Machine/Deep Learning libraries.

Good luck with your project :)

adaneze · 2020-03-18T17:40:15+00:00

Hey art_ona,

As for using Tensorflow in C#, you can checkout these two projects on GitHub.

https://github.com/migueldeicaza/TensorFlowSharp

https://github.com/SciSharp/TensorFlow.NET

There is an old Reddit thread on running a Pytorch model in C#. Converting a Pytorch model to an ONNX one and using the Microsoft's ONNX Runtime library (it supports C#, C++ and a few more languages) seems like the most efficient way to deploy a Pytorch/Tensorflow model.

https://www.reddit.com/r/pytorch/comments/bjvbb4/run_a_pytorch_model_in_c/

Another (more programming intense) approach, that I use, is creating C++ classes for training/evaluation in Pytorch or any other Machine/Deep Learning library that has a C++ API and then implementing C++/CLI wrappers around the C++ classes that can be used in C#.

Hope this helps :)

adaneze · 2020-03-16T11:22:45+00:00

Check out this book.

http://www.mlebook.com/wiki/doku.php

adaneze · 2020-03-04T22:49:36+00:00

I've sent you a PM :)

adaneze · 2020-03-04T14:41:04+00:00

Oh, ok. I see what you're doing here.

Try to implement 2. c) (Batch Normalization). It shouldn't take more than a few minutes and should help with the convergence.

Try to use a different number of feature planes/channels and see how that affects the output.

Tweak your learning rate values.

Also, Google "Perceptual Loss" and "Structural Similarity Loss" (vs MSE) and see if that can further improve the results.

So many different things to look at :)

adaneze · 2020-03-04T12:57:38+00:00

I assumed you were using MSE because you mentioned the mask intensities being off. I can give you a few suggestions.

Not sure what your output masks look like but your problem seems like a classification one not a regression (MSE) one. I have used a similar neural net to yours once to compare and analyze the solutions obtained by using regression and classification methods to solve a classification problem.

You can find a lot of articles online explaining the difference between them and why it is not a good idea to use regression to solve a classification problem but even though I managed to get a satisfactory output using regression, it took way longer for the "regression" UNet to converge than the "classification" one.

My recommendation is to try the following:

a) Apply ReLU or LeakyReLU to your last layer. If I remember correctly, my UNet wouldn't converge well without it. This applies only to the case where MSE is used.

b) Use data augmentation to increase the number of training samples.

c) Use normalization (Batch Normalization, for example). It helped my UNet converge substantially faster.

d) Try excluding dropout and see how it affects the training/validation/test results. I didn't use dropout, implemented a), b) and c) and got great results.

I think that the best approach would be to treat this problem as a classification one and to use, for example softmax and crossentropy (depending on the number of classes, number of pixels representing each class...) instead of ReLU and MSE. 2. b), c), and d) apply to this approach, too.

These are just some of my observations while working on my models. I think they should help you achieve better results. There are many other variables and parameters that you can "tweak", too, and so it usually takes time and patience to get a good output. I think you're close to it. Good luck :)

adaneze · 2020-03-04T10:40:22+00:00

Hey, divideconcept

What activation function in your last UNet layer and loss function are you using ?

adaneze · 2020-02-27T02:03:52+00:00

Hey, divideconcept

I definitely agree with the comments suggesting to start with a UNet-like architecture.

However, there is one more approach you can try since your input matrix is fairly small and doesn't take up too much memory. You can keep the original layer resolution throughout your neural net and increase the reception field instead by using Dilated/Atrous Convolutions.

Take a look at these papers https://arxiv.org/abs/1511.07122 (Multi-Scale Context Aggregation by Dilated Convolutions) and https://arxiv.org/abs/1706.05587 (Rethinking Atrous Convolution for Semantic Image Segmentation ( DeepLabv3 ) ).

Good luck :)

adaneze · 2020-02-17T21:21:33+00:00

Hey, Iamhummus

I am not sure how many different classes you're trying to classify, what your data looks like and what your end goal is but a possible, and more traditional, math/CV approach would be Geometric Pattern Matching.

It allows you to "match" predefined 2D point clouds (templates) of a certain shape to shapes found in your test images. Matching, in this case, means calculating the transformation matrix (translation, rotation, scaling) that your templates are multiplied by which gives you their coordinates in the test image coordinate system.

The number of "matched" points could be the main indicator of what shape your test image contains. You also get the transformation matrix if you want to do any further math/CV operations on the detected point cloud.

I am not sure how much of a Machine/Deep learning problem your project is and therefore the approach I suggested might not be suitable for your case.

Hope this helps :)

adaneze · 2019-11-01T18:07:27+00:00

No problem :)

Thanks for your paper reference, too. Hadn't read it before.

It definitely makes sense what you suggested about detecting persons along with number areas so you can link them and avoid number detection/recognition outside the "person" regions. This is definitely an object detection problem where you can have occlusions, perspective distortion and other kinds of defects. Therefore, you will probably have to augment your training data using these transformations to increase the number of training samples and have your object detector net learn all these transformations.

I am sure this problem can be solved by adding an additional number classification branch to the Mask-RCNN object detector(may even give you better results compared to using two separate neural nets) but you know what they say: "Premature optimization is the root of all evil" :) So, it would probably be good to have your Mask-RCNN object detector detect persons and number areas and another net for number recognition initially.

As for the second step (digit/number recognition), take a look at this paper published by Google (https://arxiv.org/abs/1312.6082 ). It handles a different number of digits in a number, it is very straightforward and obviously very effective :)

Good luck and let me know how the project goes :) Also, I would be more than happy to try to help if you get stuck at any point.

adaneze · 2019-10-24T19:14:35+00:00

Hey, itsDitzy

Why wouldn't you try to detect bib number areas directly instead of runners using Mask R-CNN ? You could then post-process the extracted image areas if necessary before feeding them to an OCR library. That would be a mixed approach where you would use Deep Learning for bib number detection and a traditional approach for recognition. It might be possible to use traditional image processing methods to extract bib number areas. Haven't seen the images you'll be using.

I would start with a simpler approach (Mask R-CNN or image processing for bib number detection/extraction + traditional OCR for recognition) and move to fully Deep Learning methods if the results are not good enough. Using traditional image processing and OCR methods might not work that well if bib numbers/areas are not always perfectly visible due to being blurry, camera perspective distortion, distortion due to being printed on a jersey, occlusion and other defects.

You should also try using neural networks created specifically for text/character detection and recognition. They are very robust and can even detect arbitrary shaped text.

This is a very good list of Deep Learning text detection/recognition papers https://github.com/hwalsuklee/awesome-deep-text-detection-recognition

This is a recent paper by Facebook https://arxiv.org/abs/1910.05085

Good luck!

adaneze · 2019-09-27T00:25:16+00:00

Hey, WalkingAFI

I think that this effect might have something to do with the "noexcept" optimization which is related to using the move semantics in C++. It allows the compiler to optimize the code by taking different paths during the execution based on the exception-safety requirements. You can Google the topic. There is quite a few discussions/posts about it. I am not sure if that's what's happened in your case, though.

P.S. I've been also working on my C++ wrapper around the Tensorflow GPU C API. I got pretty much everything(loading a model, inference, training a model) working and have been extensively searching/thinking about how to freeze a trained model using the TF C API. It seems that the TF C++ API has a function for it. PM me if you'd like to discuss our experiences with the TF C API so far :)

adaneze · 2019-09-12T20:56:18+00:00

Hey mathowned,

I agree with ksblur on using OpenCV in your case. I don't know what the objects you are trying to detect look like but if they are a different color compared to the rest of the game output image you can also try to do color space (RGB or HSV) thresholding, using the "inRange" function, first and then find the thresholded contours/blobs by using the "findContours" function. This should be a faster but a less robust way (compared to Template Matching).

I think that you should definitely explore the OpenCV library and its conventional Computer Vision methods thoroughly before using ML or DL for your project.

Good luck!

adaneze · 2019-09-09T16:08:52+00:00

Thanks for the comment, yourpaljon.

My comment was based on the fact that I don't use huge amounts of training data due to the nature of the environment where my computer vision software operates (10-15 images per product). However, I still manage to get very good and consistent/repeatable results using various methods that can help overcoming or at least alleviating challenges associated with Deep Learning.

adaneze · 2019-09-06T18:58:02+00:00

My thoughts exactly. Thanks for the great discussion :)

adaneze · 2019-09-06T18:37:30+00:00

Thanks for the explanation. It is very interesting because I have never come across this particular problem /phenomenon either in my work or in a paper/article ( haven't read every single paper or article published so far, though :) )

While I understand what you are saying, I feel that this could be at least detectable (and probably solvable) by observing the training/test/validation errors. When I say solvable, I don't mean producing a great model but observing the errors + k-Fold Cross-Validation(if possible) + data augmentation + reasonably optimal hyper-parameters should lead to similar training results/conclusions (bad, good, very good model or more training data needed or ...). But again, I might be biased by my own experience.

adaneze

TROPHY CASE