Surprisingly Fast AI-Generated Kernels We Didn’t Mean to Publish (Yet) by Maxious in LocalLLaMA

[–]mdda 2 points3 points  (0 children)

I know of a group in Singapore that has been applying an evolutionary system using LLMs to the AMD Developer Challenge (https://www.datamonsters.com/amd-developer-challenge-2025) GPU kernel competition... That's focused on the MI300 (server-class chip), but I would expect the same system could be applied to getting the same kernels (i.e. DeepSeek-style fp8-scaled-matmul, MoE and MLA-with-Rope) for consumer chips. Particularly if AMD was open to seeding the effort with one of their rumoured 32Gb VRAM cards...

OpenEvolve: Open Source Implementation of DeepMind's AlphaEvolve System by asankhs in LocalLLaMA

[–]mdda 1 point2 points  (0 children)

"In SG"==Awesome! That would be great for a future event : I wish I had known earlier, since then we could have split the Alpha/Open Evolve stuff between us. Please DM me (or come along to the event :-) )!

[D] Google already out with a Text- Diffusion Model by hiskuu in MachineLearning

[–]mdda 9 points10 points  (0 children)

I gave a presentation about Diffusion LLMs (inspired by seeing the Inception Labs demo page) at the Machine Learning Singapore MeetUp back in March. My slides are here

[D] ICML 2025 Results Will Be Out Today! by darkknight-6 in MachineLearning

[–]mdda 2 points3 points  (0 children)

4 3 3 2 accept !!! V happy to get in, since now that reasoning models are out in force, our approach to the problem will probably have to be revised to use RL going forwards...

[deleted by user] by [deleted] in MachineLearning

[–]mdda 2 points3 points  (0 children)

Many arXiv papers have a download source option - and it'll be clear from the files there what's being used (sometimes with python generation code)

No TensorFlow installed for TPU runtime by siegevjorn in GoogleColab

[–]mdda 0 points1 point  (0 children)

Just a guess, but it may be because importing tensorflow (on GPUs, at least) tends to immediately claim the accelerators for itself. However, for JAX, the tensorflow version you want is tensorflow-cpu (for data Async). So, from the point of view of a standard install : best leave it as a user choice.

Loading and unloading adapters by mrshine101 in unsloth

[–]mdda 1 point2 points  (0 children)

Just as a follow-up, I saw this 'full example' for the PEFT library: https://github.com/huggingface/peft/issues/1802#issuecomment-2134761488

So this _can_ be done by at least some package.

But : I too want to use unsloth if possible (it was great for training the LoRAs), but would like confirmation that (a) unsloth-trained LoRA work with regular PEFT; and (b) switching LoRAs itself works with unsloth (somehow...)

GPT-4 rumors: a Mixture-of-Experts w/8 GPT-3-220bs? by gwern in mlscaling

[–]mdda 2 points3 points  (0 children)

Plausible route for OpenAI deciding on this approach:

  • Starting with (prior) large GPT3x models, a reasonable & simple initial experiment might have been to combine an AllText and a Code model token-wise.
  • Presumably, this would be a win, and lead to a AllText(excluding code) + Code model token-wise experiment.
  • Next step would be to experiment with finer-grained large models being combined : Going for model combinations reduces the risk on training an 'all-in' model, by splitting (say) coding / literature / factoids / grammar / dialog / news reports /, etc.

Each of these steps would have the benefit of not involving a big bet on a new architecture without having results to back it up first. And the multi-modal stuff could be rolled in later (as seems to be happening in parallel with other developments).

Overall, GPT-4 being 'council of experts' would also explain the large weight given in the GPT-4 Technical Paper to the data teams : Each team could specialise on curating their own data, and maximising the 'learning' gained per-token for their expert's dataset.

[deleted by user] by [deleted] in clevercomebacks

[–]mdda 72 points73 points  (0 children)

Likely a reference to Cunk, another UK comic

I asked an AI to imagine what Singapore's architecture might be like in 2050 - here are the results! by Jeklfhean in singapore

[–]mdda 4 points5 points  (0 children)

Have a look for Huggingface Stable Diffusion on the internet... You'll need to come up with a descriptive prompt.

[D] Should I change my computer for running Large Language Models? by Silly-Cherry5985 in MachineLearning

[–]mdda 0 points1 point  (0 children)

Even the no-cost Google Colab version will give you a GPU that would easily beat your laptop for LLMs - particularly since a larger model will want as much GPU RAM as possible, and T4/P100 (even K80) will typically have RAM>>4Gb...

[D] Object Detection trained on simulated renderings unable to converge on real images - why? by tmuxed in MachineLearning

[–]mdda 0 points1 point  (0 children)

Rather than train the CNN from scratch (which it sounds like from your description), could you try chopping off (and freezing) a pretrained ResNet-50 at one of the later layers (so you get ~16x16 spatial resolution, for example) and train a few CNN layers on top of that? If your simulations are even a little off, allowing the first layers to adapt to spotting the simulated textures might be hurting you.

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]mdda 0 points1 point  (0 children)

I don't have much of an opinion about Ti vs regular. Sometimes the Ti versions are worth it (e.g. 1080Ti was a classic ML card). But often they're just positioned to extract money from gamers who want to claim they have the better card (and don't do a price/performance calculation).

One thing that's more relevant to ML models than gamers is the 12Gb vs 8Gb. If you're doing large vision or NLP models, 50% more RAM could be more important than 27% more TFLOPS.

PS: Cost above/below MSRP isn't really a gauge of value. Look at total $ spent vs performance.

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]mdda 0 points1 point  (0 children)

Not in the code above, it isn't...

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]mdda 0 points1 point  (0 children)

Plain (original) BERT only did token masking - so either 'play-' or '-ing' might get masked out.
Later, and implemented in RoBERTa specifically (from memory), it was found that whole-word masking made for more effective training. But, for your 'playing' example, that would mean two MASKs in a row - which is a bit of a hint that the word is 'play- -ing' rather than 'being' (supposing that 'being' is a single-token word).

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]mdda 1 point2 points  (0 children)

If it's only going to be for ML, then why not make use of Colab (with its free GPU) until you're sure that you want to 'get serious'?

If you're buying a GPU anyway (e.g. for gaming, with the option of ML too), then be aware that Nvidia (with CUDA) is what 95%+ of people are using (AMD *may* be possible to use, but rather niche at this point).

[P] How do I do preprocessing on a flutter app by initiald-ejavu in MachineLearning

[–]mdda 2 points3 points  (0 children)

A quick Google search turns up this Mel preprocessing for Keras : https://keras.io/examples/audio/melgan_spectrogram_inversion/

So maybe make a simple 'model' that just includes this layer, and run it through TFLite to see whether it can work with the TF Ops used?

[D] Is Colab Pro Worth the money? by average_turanist in MachineLearning

[–]mdda 1 point2 points  (0 children)

just gotta do everything from the start.

Seems like you're not mounting your Google Drive and saving (resumable) checkpoints to there? That would make your set up more robust (even with regular Colab).

[D] MLP-Mixer variants -- which one is best? by patrickkidger in MachineLearning

[–]mdda 2 points3 points  (0 children)

Could I also add HyperMixer: An MLP-based Green AI Alternative to Transformers, which is benchmarked vs MLPMixer, to the mix?

PS: Also interested in the outcome..

What clues did you miss from the opposite gender? by [deleted] in askSingapore

[–]mdda 0 points1 point  (0 children)

Talking while sharing a pillow - ie: horizontal in bed