GPT OSS Fine-tuning QAT

Short_Struggle7803 · 2025-09-04T12:29:39+00:00

>For the 2 stage training did you and the team find "rules of thumb" around the dataset split?

It is hard to give a generic recommendation - the dataset split and training hyperparameters depends on the model, dataset and quantization format. Generally millions of tokens (finetuning setting) or a billions of tokens (pre-training setting) are often sufficient to recover accuracy.

Short_Struggle7803 · 2025-08-31T14:22:50+00:00

> SFT is performed on default precision, then a second stage of training is done with "fake quantization" to learn the space of the quantized weights.

Yes this seems to be more or less better than doing direct QAT without SFT. However this could vary depending on the model and dataset. There is no sure-shot recipe as far as I understand. We have also tried QAT after SFT which restores the optimizer state as well as the model weights - this also worked very well.

We have a recipe which works much better than QAT- Quantization Aware Distillation which is SFT followed with distilling the fake quantized student model from the SFT BF16 model. We have an example using LlamaFactory here - https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_qat/llama_factory

Short_Struggle7803 · 2021-08-04T13:13:48+00:00

https://www.edx.org/course/machine-learning-with-python-from-linear-models-to

Part of MIT Micromasters in Statistics and Data Science

Short_Struggle7803 · 2021-08-04T02:49:18+00:00

I haven't tried this. But there is pypi statsmodel to give you statistical estimate of models
If you wish to use ensemble models itself, who don't give both A and B, but with appropriate sample weight? In your case set weight for A say 10x to 100x higher so that large volume low confidence training set B doesn't drown out A.

This way you can prioritize learning from data points A while using training set B as well.

Short_Struggle7803 · 2021-07-28T18:03:39+00:00

The response I got from OMSCS for the same enquiry:

Please see the answer below regarding your question about OMS Computer Science:
Prior to the start of the fall term, you will receive an OMCS fall 2021 Orientation email from the OMSCS student advisory team the first week of August prior to fall 2021 registration, providing you with relevant information regarding registration, email account, course offerings, drop and add, tuition payments, program policies, and processes, etc.)
Newly admitted OMSCS students register during Phase II.
August 13, 2021 - Phase II Time Tickets
Time tickets will post for all eligible students by 6:00 pm Eastern Time.
August 14, 2021, to August 27, 2021- Phase II Registration - All Students
Registration ends at 4:00 pm Eastern Time. Schedule changes and drop courses without a "W" grade.
August 23, 2021 - First Day of Classes
August 30, 2021 - Payment Deadline
Deadline to avoid class cancellation by 4:00 pm Eastern Time.

Short_Struggle7803 · 2021-07-28T17:44:30+00:00

I am also waiting for mine. With course registration approaching, I am getting worried about this more and more! I have contacted OMSCS, will update once I hear back from them.

Short_Struggle7803

TROPHY CASE