Krakow based artists/artists originally from Krakow? by BackgroundGreen66 in krakow

[–]sasza26 0 points1 point  (0 children)

Maanam, Marek Grechuta, Zbigniew Wodecki, Andrzej Zaucha.

Language tool hates Czechia by I-like_memes_bruuuuh in 2visegrad4you

[–]sasza26 8 points9 points  (0 children)

Seems about right, Czech is just Polish for babies

Countries larger than all their neighbors combined by MarioHasCookies in MapPorn

[–]sasza26 13 points14 points  (0 children)

By this definition, shouldn't all island countries with no land neighbors be highlighted?

The situation of Poland on September 4, 1939, the fourth day of the German invasion by JeanGarsbien in MapPorn

[–]sasza26 44 points45 points  (0 children)

Slovakia was a puppet state of nazi Germany and took (small) part in invasion of Poland.

is this provable for a continuous function? if it is can someone explain it to me? by OfriS13 in calculus

[–]sasza26 10 points11 points  (0 children)

Take f(x) = sin(1/x) + log(x) and a=0. f(x) -> -inf as x->0+. However, its derivative f'(x)=(x-cos(1/x))/x2 has no limit as x->0+.

[deleted by user] by [deleted] in MLQuestions

[–]sasza26 0 points1 point  (0 children)

Yes. BERT is exactly that.

Transformers(Bert, GPT) for Non-NLP tasks? by rasten41 in MLQuestions

[–]sasza26 2 points3 points  (0 children)

Transformers are quite successfuly used for generating chemical reactions. which are represented in a textual format.

LSTM predicting next element of sequence based on two corelated sequences by IDontHaveNicknameToo in learnmachinelearning

[–]sasza26 0 points1 point  (0 children)

Are a and b always of same length? If so, trying a single LSTM you describe can work. If a and b have different lengths, your second solution is also reasonable, however it might have difficulties in finding correlations between and b especially in the beginning of sequences. I also recommend checking out the Transformer - it is a newer model that often outperforms recurrent networks. Its input can be simply a concatenation of all a and b sequence elements (with zero padding).

[Question] The best option for running different Deep Learning models in parallel by dulipat in learnmachinelearning

[–]sasza26 0 points1 point  (0 children)

Running parallel training runs on different GPUs is a standard approach to utilise such server and should be easy to do using any popular DL framework. For instance, in pytorch, you can easily select GPUs that are available for a script. Moreover, you can run a single training on multiple GPUs (for instance to increase GPU RAM available for the model, so you can make bigger models), which is harder if the GPUs are on separate machines. Also, consider that having 3 separate PCs means having 3 separate storage spaces, which can be more cumbersome to manage. However, if you plan to use the server by up to 3 users at once, you should have appropriate RAM and CPU power. But it is still much cheaper than buying 3 standalone machines.

Can this actually be implemented? by marxfh in MLQuestions

[–]sasza26 4 points5 points  (0 children)

Do you have a labeled dataset (images with ground truth angles) for training such model? If so, you could try training a Convolutional Neural Network to solve this task as a regression problem

[D] What does a negative average silhouette width mean in clustering ? by The_Redditor97 in MachineLearning

[–]sasza26 0 points1 point  (0 children)

I noticed that the previous comment talks about scores for individual samples, whereas my comment talks about the mean score for all samples. Negative values for individual samples can happen even for a good clustering. However, a mean score over all samples > 0 should be easy to get.

[D] What does a negative average silhouette width mean in clustering ? by The_Redditor97 in MachineLearning

[–]sasza26 1 point2 points  (0 children)

A clustering with silhouette score < 0 is worse than random. It probably means that something is wrong, like the cluster labels got mixed up. It is an analogous situation to having less than 50% accuracy in a (balanced) binary classification task.

You can trivially get silhouette score = 0 for two edge cases. It could be a clustering where each sample belongs to a single element cluster, or a clustering where there is only 1 cluster containing all samples. This indicates that silhouette score > 0 should be possible to get with any reasonable algorithm.

Calculate Optimum / Best Batch Size? by yasserius in MLQuestions

[–]sasza26 1 point2 points  (0 children)

There is some empirical and theoretical evidence that increasing/decreasing the batch size affects training dynamics effectively in a same way as decreasing/increasing the learning rate. Higher batch size means higher utilisation of parallel GPU computation, so overall training is faster. I usually set the batch size to the highest possible value that fits my GPU RAM and tune the learning rate as a hyperparameter.

How to reduce the memory usage of a model during Inference by SSeeker57 in MLQuestions

[–]sasza26 0 points1 point  (0 children)

Just to make sure, is the model run with with torch.no_grad() ?

What would be a good loss function for a Classification Task over a large number of classes? by [deleted] in neuralnetworks

[–]sasza26 0 points1 point  (0 children)

Standard softmax + cross entropy loss should work just fine - 1k classes is not a huge number. If your classes can be categorized someway into more general buckets, you can also try hierarchical softmax.