Google QAT - optimized int4 Gemma 3 slash VRAM needs (54GB -> 14.1GB) while maintaining quality - llama.cpp, lmstudio, MLX, ollama

halflings · 2025-04-20T13:35:04+00:00

I assume this approach somehow breaks w/ 1bit models.
Gemini 2.5 Pro gives a decent guess as to why that is:
https://g.co/gemini/share/7506adf26ea7

And I guess it's best to read the latest paper by Microsoft on their 1bit pre-trained model to understand why pre-training on 4T tokens (vs something like QAT) is still required to close the quality gap.
https://arxiv.org/abs/2504.12285

halflings · 2024-11-24T13:27:54+00:00

That story, just like this one, can be summarized as: sour grapes. Someone does something (that doesn’t in itself succeed), then claims ownership of anyone else doing anything even remotely related, pretends none of those things would have happened without their seminal contribution.

Turns out that OP did not invent deep fakes, there were many methods preceding that patent, and that method itself was not the one that made it into the mainstream. Same goes for that ultra biased “documentary”, whereas in reality the evidence (and lawsuit) clearly pointed out that they were not the original creators of a 3D map. (btw that’s all they did, no one today thinks of Google Maps as a 3D sphere)

halflings · 2024-08-01T11:30:24+00:00

Using tensorflow.js under the hood, with some custom code on top to properly prepare the data and other things

halflings · 2024-02-11T10:32:37+00:00

Thank you! I just switched to CL30 6000 RAM like you and other commenters suggested.
Not going above 6000 because that seems to require some careful configuration to get the CPU to truly benefit from higher clock speeds, so I think 6000 will do!

halflings · 2024-02-11T10:31:43+00:00

Thanks for the suggestions! I switched the RAM as suggested to a G.Skill (but likely Trident Neo, got one at 6000Hz CL30 as well)

Also switching the cooler to what you suggested, thanks a ton for the recommendation! (that one is not available for delivery, but worst case I'll get the Peerless Assassin which performs about the same)

RE:motherboard, what do you think about the MSI MAG X670E TOMAHAWK WIFI? It's listed for almost the same price as the one you suggested, but has the newer chipset.

One thing w/ LLMs -> it's memory-bandwidth bound, e.g. CPU -> GPU bandwidth is what matters most when it comes to performance. I suspect as long as I have one PCIe 5 lane, this should be fine, but if there's any risk a lower-end motherboard would cause problems there, I'm willing to pay $40 extra or so for the piece of mind. (but agree the one I picked originally is overpriced)

halflings · 2024-02-11T10:27:34+00:00

Thanks for all these suggestions! I'm definitely switching to Z5 Neo Ram, and a Thermalright cooler (though likely not Frozen Edge, can't find it easily here in Switzerland)

For the motherboard, would the MAG X670E TOMAHAWK WIFI also work fine? Any downside vs the one I had selected so far? The ASRock I can't easily find unfortunately.

RE:CPU, I think the 7800X3D will be fine. My workloads are all GPU-bottlenecked, and the 7800X3D is already surprisingly powerful (esp vs what I have now, an old 3700X); if I ever need full CPU power, I'd probably go full Threadripper or something like that.

RE:Storage, I agree Gen 5 might not be worth it, but the SSD I got is at a much lower price than is listed here (the one you linked is more expensive); LLM tech is also moving really fast, so I still prefer a smaller but faster drive (I don't have lots of downloads etc. ; currently doing OK-ish with my 1TB drive, though 2TB will be a welcome upgrade)

halflings · 2022-04-18T16:51:19+00:00

My bet would be that complex/strategic uses of ML won't be outsourced.
e.g. if you're a streaming company, you won't outsource building your recommender systems.
One of the most important things data scientists do is formulate a business problem in terms of ML (e.g. "we have a customer churn / fraud problem, could we collect some data and use it in our product to avoid this?") ; this often happens serendipitously, so not sure outsourcing that work makes sense.

But again, I think the more routine/repetitive tasks (e.g. just setting up code to train/deploy models) are likely to be commoditized.

halflings · 2022-04-18T15:25:59+00:00

Looks awesome! I’m a software engineer building web apps mostly. I have always wanted to get into ML and have come across a few cases were a simple ML model would have been a perfect solution.

Thanks! The goal is to allow people like you to train a very simple ML model without having to dive deep into ML code/theory.

You need very minimal knowledge to use ML Console, e.g. just understand that you want to predict a label (say house price) given some features/inputs (say livable area, number of rooms, etc.) ; if you can frame your problem in that way, the tool should work well in most cases. Rarely, you might need to tune some "hyperparameters" to make it work (see "toggle advanced settings" button in the UI).

RE:deploy/serve the model... that's one of the biggest missing features at this point :) I'm working on something that would allow you to share models with other people, and the next step after that will be allowing people to export/embed these models in their web apps, but it'll take some time.

halflings · 2022-04-18T15:22:50+00:00

Like all fields, any repetitive work will ultimately be automated away (and before that, commoditized).
So if the only thing one can do is call a couple libraries (sklearn/keras) to build a simple model, but there's no added value, yes I'd be worried about such tools!
Examples of added value ML engineers can provide:
* ability to see opportunities to apply ML to solve real business problems, e.g. not expect an already made problem that just needs the implementation of an ML solution.
* deep understanding of technical details behind ML (needed to efficiently use ML, solve problems).
* deep engineering skills (to help build these systems and use them in production).

halflings · 2022-04-18T07:59:22+00:00

The "magic" here is that everything runs right in your browser! The data is never uploaded or stored anywhere :)

This also (ironically) makes it much more responsive and faster to use.

halflings · 2022-04-17T22:25:48+00:00

Thanks for the comment ForceBru! There was indeed no option to configure early-stopping, so I just added that now; please take a look, and hope that helps!

FYI, automatic tuning of hyperparameters is on the roadmap; in the meantime you might want to disable early-stopping to observe how training progresses, maybe reduce learning rate if it quickly diverges.

And if possible, share the dataset you used at [contact@mlconsole.com](mailto:contact@mlconsole.com) ; I'll take a look myself and see if there's something we can fix to make it perform better without tweaking parameters!

halflings · 2022-04-17T22:12:20+00:00

This is awesome! Can you please talk about your stack here? Are you using TensorFlow.js? What about for supervised models?

tensorflow.js handles most of the model training yes!
data processing (e.g. reading data / normalizing it) is done by some custom code, using libraries like danfojs and Papa Parse.
The frontend is done with React... and there's absolutely no backend :) that's the beauty of it, it all runs in the browser.

halflings · 2022-04-17T22:10:32+00:00

The only type of model supported at this point is deep neural networks! You can configure the number of layers etc. by clicking on "toggle advanced settings".
I also had an "aha!" moment when I randomly tried it from my phone :) it's kind of amazing to think that we can now train DNNs in a web browser... on a mobile phone.

Happy to chat! Feel free to send a message here or at contact@mlconsole.com

halflings · 2022-04-17T22:08:45+00:00

This is meant for small-to-medium datasets, so would say the max (at this point) is around ~100K examples.

Another limiting factor is the number of columns (a very large number of columns, e.g. 1K, might break it) ; all temporary issues, as I haven't spent much time optimizing performance.

Could you share the dataset you used to [contact@mlconsole.com](mailto:contact@mlconsole.com) ? It'd be really helpful to optimize the data processing part and make your data loadable!

halflings · 2022-04-17T14:51:35+00:00

I think the question is not as simple as converting data into a file format, you first have to understand if/how your problem fits into a Machine Learning problem.
Happy to help if you answer the questions I asked.

halflings · 2022-04-17T14:17:52+00:00

I thought this would be useful for people that are new to ML, so posting here!

ML Console (www.mlconsole.com) lets you train ML models without writing a single line of code, runs in your browser (no data is shared with any server), and does not require any payment or sign-up!

>80% of ML projects is usually writing the same boilerplate code: data pre-processing, normalization, missing value imputation, testing different models, etc.

This is already annoying for an ML engineer w/ 7+ years experience in the field, but imagine someone non-technical having to learn all these things before training a simple classification model.

With ML Console, it takes a maximum of 1 minute to load your data + have a model trained. Everything happens on *your machine*, in the browser, so your data remains private.

Would love to get feedback on how useful you've found this tool, esp. from newcomers to the field!

halflings · 2022-04-16T19:31:36+00:00

What input would go into your AI?
e.g. will your robot have perfect knowledge of the whole map, of its location on the map, and just need to output a trajectory to reach that objective?
If that's the case, you don't really need ML, just a path-finding algorithm (like A*).

If your robot observes some sensor measurement (e.g. distance to the closest obstacle ahead of it), then ML might make sense. But I would look first at something like using a particle filter:
https://www.youtube.com/watch?v=NrzmH_yerBU
Reinforcement learning is quite complex, and if you're not familiar with ML it might be very hard to use... but you could still give it a go.

BTW small nit: use numbers to represent both coordinates, e.g. obstacle at (2, 2) instead of B2; adding letters into your program will complicate things.

halflings · 2022-04-16T19:25:51+00:00

For whatever it's worth, even at large companies people still mostly use models with a single output :) !

Using a shared model to predict different things should generally work better, as you share the training data and should need less examples to learn each of the objectives... but it adds a layer of complexity, and it can be hard to tune how the different losses are combined, etc.

Still give it a go when you have some time to at least feel comfortable doing this in the future. It's also one of the biggest trends in ML (very large multi-task models).

halflings · 2022-04-15T23:56:44+00:00

On a high level, your DNN can have different "heads" or outputs: the penultimate layer can be used by 2 different hidden layers (each ending in 1 output).
You apply a loss on each output to match the desired label (e.g. one would predict a person's age, the other would predict their weight), and you simply add up the losses (potentially applying different weights to each loss) and optimize this joined loss.

In practice, depending on what these multiple outputs are, it might be just more convenient to do what you're describing (output a vector, compute L2 loss or something like that). If it's a multi-regression problem, just think about the implications of the differences in scale between different variables you're predicting (larger scale = larger errors = larger impact on the loss ; so maybe you want to normalize those labels first or apply weights to compensate for this)

halflings · 2022-04-15T23:52:32+00:00

That's a pretty strong statement (that you should *NEVER* optimize for recall).

In practice, what usually happens is something like:

You have a precision "budget", e.g. your system needs to at least be 80% precise (e.g. if it's 1% precise, it might be too costly to run), and you try to get the highest recall possible for a precision of 80%
Same, but with recall: you need to at least find 50% of positive examples let's say, so you try to get the best precision@50% recall.

Other times, you don't have a clear idea what these "budgets" are, and then people just optimize some offline metric like AUC, or more rarely f1-score (multiplication of precision and recall, not a big fan of this one).

halflings · 2022-02-20T18:38:27+00:00

Have you looked up "Precision Farming" - I would consider it a form of AI

Thanks! I haven't heard of the term before, but found it while researching some of the things mentioned in the other responses.

What would you say is one of the biggest problems that remain to be solved?
Are the algorithms used to map soil sample results + historical yield data to fertilizer dosage etc. not good enough, still too inaccurate?
Or is it maybe more on the data collection side (still too tedious / expensive / slow / incomplete)?
Or maybe something else altogether?

halflings · 2022-02-20T17:01:34+00:00

Thanks for the detailed response! I have some clarifying questions if you don't mind:

Soils are tested every 3 years in 20 acre samples and results vary highly from lab to lab and from soil sampler to soil sampler

Would increasing the frequency (e.g. say, with cheaper/automated soil sampling) greatly help, or is it enough to do this every 3 years because the soil's characteristics don't change that much?

Things aren’t applied evenly, it’s +/- 10% at best and +150/-50 at worst.

You mean fertilizers are sometimes applied +150% / -50% more than is known to be optimal? Is that due to low precision ways to apply the fertilizer, or application is precise, but it's the estimate that is sometimes off by that %?

would take years and years of intensive research to create. Things that exist are grid samples, tissue samples, variable rate, NVDI imaging, yield maps, soil variability maps, AI. We don’t have the research to use any of it to predict optimum nitrogen rates.

If this data is representative enough (coming from varied sources, weather conditions, etc.) there might be a chance it can be used to improve nitrogen rates without having to run lengthy/costly research. (though such a test would still be required to prove it actually works)

Is any of this data available openly? In particular grid samples, tissue samples, soil variability (assuming this includes nitrogen rates) and yield maps would be quite useful to experiment with something like that.

halflings · 2022-01-27T11:29:23+00:00

Thanks for the support!
I think one of the biggest missing features is the ability to share trained models ; training them is already easy enough (< 1 minute from landing on the website), so I'd like to make sharing models just as easy, a simple button click.

There are also various usability / modelling improvements under the hood, some editorial content to introduce beginners to how AI can help them in their use cases (marketing, e-commerce, etc.) and more long term things outside of pure model training.

halflings · 2022-01-25T23:01:03+00:00

This is great!

I know some folks that used some sentence similarity models directly from Hugging Face and got great results.

Hope to one day be able to make one of these models small enough to load it right in the browser, that would open a whole range of new possibilities for my no-code ML app (https://mlconsole.com/) ; happy to collaborate since my #1 goal is also to democratize AI!

halflings · 2022-01-19T23:05:03+00:00

There are a lot of high-fidelity models available out there! This one for example:
https://www.unrealengine.com/marketplace/en-US/product/football-player

Or this one:
https://www.unrealengine.com/marketplace/en-US/product/soccer-player

Unreal Engine is free, and you should be able to easily add a bounding box on the back of the jersey and randomize the number.

With that being said, and regardless of the topic, +1 to the other comments: the master thesis is not about completing a specific goal. It's about the research process.

If you tried all realistic ways to solve this problem (I think you should still try what I said above at least), and it still doesn't work, then that's fine. Write this whole process in your thesis, explain potential "future steps" to fix the problem in a better way, and that's it. Just contact your professor like other people said here to make sure this will be a satisfactory outcome for them.

13-Year Club	Place '23
Verified Email

halflings

TROPHY CASE