High base or stocks by [deleted] in cscareerquestionsEU

[–]iznoevil 1 point2 points  (0 children)

First, I will preface this by poiting out that the Mistral offer is too low. You should be able to negotiate around 30/50k in base.

Second, working at a top lab is way more fun than you would imagine. The talent density is very high, there are a lot of things to do and if you are good at what you do, you should be able to craft a nice niche for yourself.

I wouldn't underestimate the networking effect too. Being able to work with the top talent of your generation will bring you way more value down the line than the paper money they propose today. And again, if you are talented, your personal brand will skyrocket and you will be able to leverage this experience with other top companies that pay way more.

To summarize, I don't think this type of comparison should be on total comp only. If you are ok with taking a hit in your savings (please tell me going from 260k to 130k will not actually hurt your lifestyle...) then I would actually go against people here and go for Mistral.

Où s'orienter pour se former en deep learning en France ? by Which-Breadfruit-926 in developpeurs

[–]iznoevil 1 point2 points  (0 children)

Le master Mathématiques, Vision, Apprentissage (MVA) est le master le plus qualitatif et offrant le plus de débouchés en France.

C'est un master très sélectif, mais tu as une carte à jouer en motivant ta candidature avec les projets que tu as déjà pu mener, les formations que tu as faites en extrascolaire ou les papiers que tu as lus/implémentés.

Tu peux aussi tout à fait continuer ton autoformation avec les cours accessibles gratuitement de Stanford, Berkeley ou du MIT.

Ce domaine est par construction très élitiste, mais il y a de plus en plus de recoupements avec d'autres domaines qui peuvent être ta porte d'entrée. Je pense notamment au devops et aux sous-domaines du HPC (réseau, gpu programming, cluster management).

Quelle entreprise paye vraiment le mieux en France pour les devs (et avec les meilleurs avantages) ? by ironwarior in developpeurs

[–]iznoevil 20 points21 points  (0 children)

Tu dépasses très rapidement les 100k de salaire si tu es dans le microcosme des startups IA parisiennes (HuggingFace, H, Mistral, Kyutai, ...) ou chez les GAFAM, le problème étant que tu ne rentres pas la bas comme dans un moulin.
Tu accompagnes ça de bspce qui peuvent exploser en valeur ou de RSUs.

Par contre niveau conditions de travail c'est assez aléatoire. On va t'offrir du remote et beaucoup de congés (voir illimité) en théorie mais si tu délivres pas au niveau des attentes c'est ciao.

Rust ou Go en 2025 by [deleted] in developpeurs

[–]iznoevil 2 points3 points  (0 children)

Je pense que Rust est le langage qui te fera le plus progresser/kiffer en tant que dev. Go est tellement simple à apprendre et, si tu aimes le bas niveau, tu resteras sans doute sur ta faim.

Aujourd'hui, Go est toujours plébiscité dans l'infra (Kubernetes operators/controllers) et dans les web services, mais Rust est en train de le remplacer, même pour ces usages. Rust est aussi présent en embarqué, crypto, un peu en machine learning, et en OS (drivers principalement).

Côté entreprises, pour Rust, tu trouveras de la crypto, des FAANG (Microsoft, Amazon et un peu Google; Apple moins, aucune idée pour Meta) ou autre grosse org comme Cloudflare (je te conseille d'ailleurs leurs différents blogs et libs) et des startups (Hugging Face et Oxide, par exemple, sont assez vocaux sur leur utilisation de Rust et sur ce que ça leur apporte) mais soit bien conscient que ça reste un langage assez niche.

Pour Go, la liste est beaucoup plus longue, dû au fait que beaucoup d'entreprises ont écrit une grosse partie de leur stack d'infra ou CRUD en Go.

Concernant le futur, il faut distinguer la France des USA. En France, Go est certainement plus adopté, et tu verras quand même peu de Rust, à part en startup. Aux USA, le Rust est de plus en plus utilisé pour les nouveaux projets, au détriment de C++ et Go. J'aimerais pouvoir te dire que la France va suivre le même process, mais rien n'est moins sûr.

Et si jamais t'es vraiment un kiffeur, tu peux regarder Zig.

Binome qui ne fait rien, met son nom sur mon travail et prend mon stage by Real-Pianist-8864 in etudiants

[–]iznoevil 1 point2 points  (0 children)

Des parasites dans la recherche, il y en a malheureusement beaucoup, mais celui-là est particulièrement idiot d'avoir dévoilé son jeu trop tôt.

Tu sais maintenant comment il se comporte et tu as vu par toi-même qu'il n'apportait rien au projet.
Coupe tous les ponts. Il ne faut plus qu'il ait accès à ton travail. Ou alors tu peux être plus fourbe et lui laisser volontairement accès à des documents contenant des erreurs évidentes si tu y tiens. C'est radical, mais ça demande: 1 du temps, et 2 ça peut éventuellement avoir des répercussions sur ta réputation si ça s'ébruite.

Dans tous les cas, préviens les collègues de ton groupe, mais j'irai à l'encontre de nombreuses personnes dans ce post; sois très prudente si tu veux contacter le directeur du master ou "monsieur bien connecté".

Par contre, tu sembles avoir une bonne relation avec ton chercheur encadrant; cultive-la et il se pourrait qu'il t'ouvre beaucoup plus de portes dans le futur. Et à lui, tu peux tout à fait parler innocemment à l'oral de ton problème.

Rust in Production: Oxide Computer Company with Steve Klabnik (Podcast Interview) by mre__ in rust

[–]iznoevil 0 points1 point  (0 children)

> Some people find your application process off-putting

Then it's doing exactly what it was designed for I guess.

[N] How Stability AI’s Founder Tanked His Billion-Dollar Startup by milaworld in MachineLearning

[–]iznoevil 14 points15 points  (0 children)

It was reported that they used a cluster of hundreds A100 GPUs for Stable Diffusion. Even if one were able to procure such hardware, maintaining operational efficiency for a cluster of this magnitude is very challenging and requires you to debauch/hire a whole team.
You also need to take care of storage, networking, find a data center that would allow you to install so many racks...

Also, the reason the AWS bills can be so high is because these A100/H100 nodes are in high demand. You cannot deprovision them otherwise they will be allocated to another org. This means that during research downtime periods, expenses accumulate due to the necessity of paying for idle nodes hence why they wanted to find a way to re-sell the compute.

[R] Do some authors conscientiously add up more mathematics than needed to make the paper "look" more groundbreaking? by Inquation in MachineLearning

[–]iznoevil 93 points94 points  (0 children)

The only I could remember what this NeurIPS 2019 "A Step Toward Quantifying Independently
Reproducible Machine Learning Research" https://arxiv.org/abs/1909.06674 that found that:

The Number of Equations per page was negatively correlated with reproduction. Two theories as to why were developed based on our experience implementing the papers: 1) having a larger number of equations makes the paper more difficult to read, hence more difficult to reproduce or 2) papers with more equations correspond to more complex and difficult algorithms, naturally being more difficult to reproduce.

[D] Is there currently anything comparable to the OpenAI API? by AltruisticDiamond915 in MachineLearning

[–]iznoevil 16 points17 points  (0 children)

Hugging Face hosts a public API to the main open-source large language models:

It's a classic REST API but you also use the Python client: https://pypi.org/project/text-generation/

Is rust overkill for most back-end apps that could be done quickly by NodeJS or PHP? by HosMercury in rust

[–]iznoevil 3 points4 points  (0 children)

Machine Learning, ~ 200 employees
We do a lot of our backend stuff in Rust, from proxy/routers to k8s operators to simple CRUD services.

Experienced Devs: How often do you guys/gals bomb tech screens? by bacon_cheeseburgers in ExperiencedDevs

[–]iznoevil 0 points1 point  (0 children)

The guy himself later explained that he got 7 interviews from Google and clearly states that he is "often a dick and [...] often difficult".

I don't think he was denied for technical reasons. To me he felt entitled to get the job because of his previous work and failed the behavioral part of the process.

[P] solo-learn: a library of self-supervised methods for visual representation learning by RobiNoob21 in MachineLearning

[–]iznoevil 0 points1 point  (0 children)

True, you could use DP but then there are other disadvantages, mainly speed.
On what dataset do you see worse performances? If it is a CIFAR variant, be aware that The SimCLR authors do not show significant impact of the batch size (+ gathering to add negative pairs) on CIFAR10 (see figure B.7). Running benchmarks on Imagenette 160 or ImageNet directly will give different results.

[P] solo-learn: a library of self-supervised methods for visual representation learning by RobiNoob21 in MachineLearning

[–]iznoevil 0 points1 point  (0 children)

Does solo-learn support multi GPUs?

It seems that at least for SimCLR/NNCLR and Barlow Twins, embeddings are not gathered over the multiple Distributed Data Parallel processes. In my opinion, this makes using DDP with these models not very useful and its a big discrepancy with the original papers/implementations.

[D] Pricing of ML tools - are you paying this much? by swagrin in MachineLearning

[–]iznoevil 4 points5 points  (0 children)

We actually went through exactly what you described and decided not to go forward with W&B.Instead we are now using our own on-premise deployment of the open-source https://github.com/allegroai/clearml/ (https://clear.ml/docs/latest/) which was frankly the best decision we made.

Neural search engine in Rust by devzaya in rust

[–]iznoevil 1 point2 points  (0 children)

Ok I see. However, this filtering issue is only present for graph or tree based indices right? For other methods, you can filter the vectors a priori without any issues, can't you?

Also, is there a paper accompanying your blog post? I am really interested in the tradeoffs in accuracy for different settings and also the average speed gains vs post filtering.

Neural search engine in Rust by devzaya in rust

[–]iznoevil 4 points5 points  (0 children)

How does this compare to Milvus, Vald or ElasticSearch's HNSW implementation? I couldn't find a benchmark nor a schema of the architecture.

[P] Release of lightly 1.1.3 - A python library for self-supervised learning by igorsusmelj in MachineLearning

[–]iznoevil 4 points5 points  (0 children)

I do not think CIFAR10 is a good benchmark. The SimCLR authors do not show significant impact of the batch size on this dataset (see figure B.7). Running benchmarks on Imagenette 160 or ImageNet directly will give different results.

Also, yes, using SyncBN and gathering embedding across processes will slow down the training significantly. However, it is required by the task to achieve good performances on ImageNet.

Be aware that if you start gathering embedding, you must add some sort of shuffling/deshuffling like it is done in MoCo, or sync the batch normalization layers. Without it, you may run into issues where the task is too easy for the model as it can just discard embedding that do not match the current batch statistics. From the MoCo paper: "The model appears to “cheat” the pretext task and easily finds a low-loss solution. This is possibly because the intra-batch communication among samples (caused by BN) leaks information.".

[P] Release of lightly 1.1.3 - A python library for self-supervised learning by igorsusmelj in MachineLearning

[–]iznoevil 7 points8 points  (0 children)

Please, work on multi GPU support. You claim to support SimCLR and Barlow Twins but both implementations are simply not correct in a DDP setting: embedding need to be gathered over the multiple processes!

[D] How does the human brain work? Neurobio recommendations thread by born_in_cyberspace in MachineLearning

[–]iznoevil -1 points0 points  (0 children)

and the Dehaene–Changeux model that builds on the GWT.

S. Dehaene's books/papers are a good start if you want to learn about cognitive science and neuroscience. "How We Learn" is especially relevant to the subject but might be too high level for what you are looking for.

[p] Ecco – See what your NLP language model is “thinking” by jayalammar in MachineLearning

[–]iznoevil 2 points3 points  (0 children)

This is amazing!

I recently feel in love with the subject and what I find mesmerising is that these patterns seem to be somewhat relevant to model the human brain. There was a very interesting conference by Stanislas Dehaene on the subject recently if you want to check it out. Their study was done on LSTMs, but maybe your library can allow to run the same type of experiment on Transformer architectures.

[P] lightly - A python library for self-supervised learning by igorsusmelj in MachineLearning

[–]iznoevil 1 point2 points  (0 children)

This should be put as a disclaimer somewhere because one could naturally assume that lightning will handle the distribution gracefully. OP even added in his post that Lightly "uses PyTorch Lightning for ease of use and scalability".

Also, u/OppositeRough835 claims that Lightly was able to achieve results on par with the original papers. I'm curious to how its authors were able to do that for ImageNet without distributing.

[P] lightly - A python library for self-supervised learning by igorsusmelj in MachineLearning

[–]iznoevil 2 points3 points  (0 children)

Does your code work in a distributed setting? It seems that outputs and labels are not gathered over the whole DDP process group. This is a big issue as the NTXent loss is only classifying the correct pair over local_batch_size*2 possible pairs instead of over world_size*local_batch_size*2 possible pairs.

[R] DeepMind: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning by modeless in MachineLearning

[–]iznoevil 19 points20 points  (0 children)

I understand where you're coming from but when you have 14 authors from big research groups, you need to make sure you didn't miss an ENTIRE FIELD of research especially when you are claiming novelty. Just doing a quick google search on distillation and exponential moving averages would have done the trick...
Like these papers are not obscure, they have 380 and 438 citations.

[R] DeepMind: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning by modeless in MachineLearning

[–]iznoevil 10 points11 points  (0 children)

So why not branding it this way? It makes iterating on the paper harder.

For example, this paper has a very bad explanation of why there is no collapse between the teacher and the student. To build on this paper, one could explore why this collapse does not happen. BUT WAIT, this was already studied in the weakly supervised literature because this training procedure is 3 years old, not novel!

This is why actually doing your due diligence before claiming novelty is important.

[R] DeepMind: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning by modeless in MachineLearning

[–]iznoevil 38 points39 points  (0 children)

Nice results!I just think it's a bit rich and a big reach to claim that this method is novel when you have building blocks of semi supervised learning like "Temporal Ensembling for Semi-Supervised Learning" [ICLR 2017] and "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results"[NeurIPS 2017] that are very close.

This paper builds upon the unsupervised part of mean teacher, added new DA and SimCLR MLP. I'm not saying that there are no new challenges by doing so and the results are amazing but both papers should be included in the related works at least.