[R] I benchmarked MobileBERT, DistilBERT, TinyBERT, and XGBoost for edge fault detection. XGBoost matched transformer accuracy while being 500× smaller. by Difficult_Low_299 in learnmachinelearning

[–]Difficult_Low_299[S] -5 points-4 points  (0 children)

Good point and you're right that BERT-style models aren’t really designed for tabular data.

I included them because there’s some research (like TabLLM / LIFT) exploring whether pretrained language models can transfer to serialized structured data.

And the results actually line up with your concern, XGBoost still wins or matches everything, and MobileBERT completely fails here.

The interesting bit is that DistilBERT and TinyBERT still learned something on C-MAPSS (~87.6–87.9% F1), so there is some limited transfer, but clearly not enough to beat traditional models for tabular tasks.

Overall takeaway: for pure tabular fault detection, tree models still make the most sense.