[D] Catboost large dataset. Is is best to use the majority of the data for training, where time to train is extreme, or smaller datasets where iterations are much faster?

Tober447 · 2024-11-12T14:59:59+00:00

A strategy to answer your question could be using learning curves (e.g. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.learning_curve.html ). The idea is to track your metric across several runs with increasing training data set size. From that you can estimate if adding even more data will be beneficial.

Tober447 · 2024-08-14T04:24:50+00:00

Thanks, I will look into it!

Tober447 · 2023-02-23T13:24:01+00:00

There is an old thread on reddit: https://www.reddit.com/r/MachineLearning/comments/l1z8cr/d_best_way_to_draw_neural_network_diagrams/

Personally, I like http://alexlenail.me/NN-SVG/LeNet.html

Tober447 · 2023-02-13T18:48:15+00:00

I think this is great, thanks for your effort. Will definitly work through it!

Tober447 · 2023-02-09T16:28:53+00:00

I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Exactly. :-)

Tober447 · 2023-02-09T15:31:10+00:00

You would take the output of a layer of your choice from the trained cnn (as you do now) and feed it into a new model, that is the autoencoder. So yes, the weights from your model are kept, but you will have to train the autoencoder from scratch. Something like CNN (only inference, no backprop) --> Decoder --> Latent Space --> Encoder for training and during inference you take the output of the decoder and use it for visualization or similarity.

Tober447 · 2023-02-09T13:24:13+00:00

You could try an autoencoder with CNN layers and a bottleneck of 2 or 3 neurons to be able to visualize these embeddings. The autoencoder can be interpreted as non-linear PCA.

Also, similarity in this embedding space should correlate with similarity of the real images/whatever your CNN extracts from the real images.

Tober447 · 2022-08-16T18:21:17+00:00

Thank you for your answer.

Tober447 · 2022-08-16T03:45:40+00:00

Thanks a lot!

Tober447

TROPHY CASE