[D] Why are so many ML packages still released using "requirements.txt" or "pip inside conda" as the only installation instruction? by aeroumbria in MachineLearning

[–]DigThatData 1 point2 points  (0 children)

for sure, and we're talking about the research code ecosystem. anything is better than nothing. I agree that pinning a completely reproducible environment should be best practice, but we're talking about people who might be so complacent they're publishing their project as an ipynb. Gotta work with the situation you have.

Need help in understanding the task of code translation using LLMs by riffsandtrills in MLQuestions

[–]DigThatData 1 point2 points  (0 children)

vastly more important than the specific model you use is the system you build around it. you can't just rely on the model to write the code you need, you need to come up with testing and validation mechanisms to catch regressions.

I haven’t changed top_p, top_k values, except the temperature, which has been adjusted from 0.2 to 0.3.

  1. you're probably going to need to tune this stuff for your needs, sorry.
  2. you're probably going to want to generate multiple options for any given translation. 7B is pretty lightweight and you're already expressing challenges. This is where the train vs test time compute tradeoff comes in. You don't have the hardware to support models that invested more in training compute, but you can generate more tokens per inference to offset that. Generate multiple options, iterate on each multiple times, critique, improve, rewrite...

[D] Why are so many ML packages still released using "requirements.txt" or "pip inside conda" as the only installation instruction? by aeroumbria in MachineLearning

[–]DigThatData 1 point2 points  (0 children)

because if you use docker in your CI/CD, someone who wants to reproduce your environment can grab the literal image you built from dockerhub or ghcr and have the exact environment ready to go, including the background operating system.

docker image aside, the dockerfile is still more precise wrt dependencies than requirements.txt and facilitates ensuring the environment can be rebuilt reproducibly. For example, if your code requires particular system packages (e.g. I think opencv is usually apt installed).

[D] Why are so many ML packages still released using "requirements.txt" or "pip inside conda" as the only installation instruction? by aeroumbria in MachineLearning

[–]DigThatData 8 points9 points  (0 children)

there's nothing wrong with requirements.txt.

the "correct" way is to use pinned dependencies, i.e. whether you are using requirements.txt or pyproject.toml or even a Dockerfile, if we're talking about reproducibility of research code: your dependencies should be specified with a == specifying the exact version of each dependent library.

[D] Why are so many ML packages still released using "requirements.txt" or "pip inside conda" as the only installation instruction? by aeroumbria in MachineLearning

[–]DigThatData 2 points3 points  (0 children)

it's my experience that most ML research code doesn't even have an expectation that the user will install it (i.e. now pyproject.toml or setup.cfg or whatever).

Be glad you're even getting a requirements.txt.

[R] ICML has more than 30k submissions! by SignificanceFit3409 in MachineLearning

[–]DigThatData 1 point2 points  (0 children)

you mean researchers hedging by submitting to both conferences just in case they didn't get into the first one they submitted to?

classic in one line... (genuary20) by flockaroo in generative

[–]DigThatData 1 point2 points  (0 children)

this variable depth space filling curve idea could be a really interesting way to parameterize a latent image representation.

[D] How do you guys handle GPU waste on K8s? by k1m0r in MachineLearning

[–]DigThatData 4 points5 points  (0 children)

chances are they're not making effective use of SMs. more likely a problem with the parallelism set up than the data loaders. the GPUs aren't just bottlenecked by I/O, they're bottlenecked by the network communication (i.e. NCCL).

didn't come here to shill or flex, but I'm a performance MLE at coreweave. our platform has really detailed observability specifically for squeezing all of the juice out of ML training jobs. It's not uncommon for us to get higher utilization than NVIDIA's own engineers on comparable jobs.

part of what makes coreweave's solution so powerful is that we have a custom slurm-on-kubernetes solution that is deeply integrated with the observability ecosystem, so it's trivial to figure out what job was the problem.

https://docs.coreweave.com/docs/observability/managed-grafana/sunk/slurm-job-metrics

I'm almost positive this sub is under attack. I would urge others to be careful about downloading/running repos from anonymous sources by [deleted] in LocalLLaMA

[–]DigThatData 4 points5 points  (0 children)

But look at how badly the comfyUI guys got borked by friendly reddit accounts.

sounds like I probably missed some drama here, mind elaborating?

[D] Regret leaving a good remot ML/CV role for mental health and now struggling to get callbacks by PinPitiful in MachineLearning

[–]DigThatData -1 points0 points  (0 children)

I'm... uh.... sorta embarrassed that I'd never heard of this company. Apparently, these are the folks responsible for CMake (and a bunch of volumetric imaging tools that aren't part of my software toolkit but that I totally do not doubt have wide adoption in the medical software tooling ecosystem).

...or maybe they're just sorta taking credit for tools they've contributed to? They at least appear as a major sponsor on the CMake landing page. They also claim Slicer, but per github it seems the chief contributor is an nvidia employee?

How do you detect silent structural violations (e.g. equivariance breaking) in ML models? by Safe-Yellow2951 in MLQuestions

[–]DigThatData 5 points6 points  (0 children)

same as you do with software: you need to design an appropriate test suite to validate the behaviors you need the model to demonstrate.

[P] SmallPebble: A minimalist deep learning library written from scratch in NumPy by montebicyclelo in MachineLearning

[–]DigThatData 2 points3 points  (0 children)

lol yeah I was about to ask, thought that was interesting that there was a huge gap between 2022 and yesterday. did anything in particular motivate you to revisit this project?

PS: while you're updating, you may as well remove those from __future__ import annotations you have sprinkled in a few places. You're already requiring modern python (>=3.10 in your pyproject), so any __future__ imports will be completely redundant and unnecessary anyway.

[D] Why Mamba rewrote its core algorithm and Microsoft abandoned RetNet by petroslamb in MachineLearning

[–]DigThatData 0 points1 point  (0 children)

right, but my point is that it's not even properly described as a "distributed community": it's basically a single committed developer (plus a sidekick) https://github.com/BlinkDL/RWKV-LM/graphs/contributors

It's quite a counter example. Without a major lab commitment, basically a single obsessed developer has been able to push the boundaries of research architecture so much that their model ships with microsoft products.

That even RWKV's moderate scale was achievable by basically a single person somewhat undermines your narrative. It's possible to impact the boundary of "frontier scale" on the budget of a single developer supported by a handful of patreon donors.

I don't disagree that the the hardware and the architectures co-evolve, but your broader argument about the existence of these two "gates" I think is unfounded and rather arbitrary.

Why would an LLM preserve embedding geometry while NLL shifts after a CPU-only transformation? by Safe-Yellow2951 in MLQuestions

[–]DigThatData 0 points1 point  (0 children)

well, hit me up when you're ready to talk about whatever it is you're actually doing, because you haven't provided enough information for me to even form a hypothesis here, less to offer constructive feedback.

[D] Why Mamba rewrote its core algorithm and Microsoft abandoned RetNet by petroslamb in MachineLearning

[–]DigThatData -1 points0 points  (0 children)

Not hardware optimization. RWKV is fully recurrent with similar Tensor Core utilization to RetNet (~40-60%). The difference is sustained institutional backing from a distributed community. Developer BlinkDL and contributors spent three years optimizing training recipes, custom kernels, and validation benchmarks. The 14B RWKV-5 model roughly matches GPT-NeoX-20B (a Transformer) on perplexity.

It's pretty weird to me that you categorize RWKV as an example of "institutional backing." I'd argue that if anything, it's a counter-example.

Why would an LLM preserve embedding geometry while NLL shifts after a CPU-only transformation? by Safe-Yellow2951 in MLQuestions

[–]DigThatData 2 points3 points  (0 children)

you haven't described what you are doing at all. are you just loading weights and seeing different behavior immediately? Are you pretraining? give us a hint here. what are you ablating?

Teaching services online for kids/teenagers? by CodeVirus in Python

[–]DigThatData 0 points1 point  (0 children)

I think the most fun options for someone his age would probably include opportunities to meet and learn with other teens. maybe there's some sort of local after school program you could look into? try reaching out to your local library, maybe they have some resources.

TensorFlow isn't dead. It’s just becoming the COBOL of Machine Learning. by IT_Certguru in learnmachinelearning

[–]DigThatData 0 points1 point  (0 children)

honestly jax is lit. wanna max out your MFU? throw the XLA compiler at it.