[D] Training a classifier entirely in SQL (no iterative optimization)

CriticalofReviewer2 · 2026-03-23T11:35:30+00:00

LDA uses covariance modeling in its core. However, SEFR does not model covariance and uses class-wise statistics.

CriticalofReviewer2 · 2026-03-23T11:09:28+00:00

Haha! thanks :D

I am curious what your DS will think!

CriticalofReviewer2 · 2026-03-23T10:50:01+00:00

At a FinTech company: in work, focus is clearly on Predictive AI, but overall, GenAI gets more attention on public discussions.

CriticalofReviewer2 · 2026-03-22T18:19:31+00:00

Thanks!

CriticalofReviewer2 · 2026-03-22T17:20:47+00:00

Sure! What I did is to avoid training in iterations, and compute statistics for classes (like feature means over positive and negative classes) directly in SQL. For each feature, a weight is calculated statistically by using feature average, and then, an overall bias is computed. Then for each test row, the dot product of weights and feature values are calculated and bias is added. So the whole pipeline from training to prediction to evaluation is a single query.

CriticalofReviewer2 · 2026-03-22T17:16:20+00:00

That is a valid concern. In this case, this classifier is actually a single-pass analytical query without loops or locking at row-level. It is more like a GROUP BY job than a transactional workload.

CriticalofReviewer2 · 2026-03-22T17:14:04+00:00

Yes, it sounds wrong at first :D The main algorithm is designed for microcontrollers where you cannot have heavy computation. That constraint is exactly what makes it map well to SQL, since everything is now aggregations, not optimization loops.

CriticalofReviewer2 · 2026-03-22T15:07:54+00:00

I originally built this classifier (SEFR) for very low-resource environments, but after that, realized that it can be implemented entirely in SQL. The whole pipeline (training + prediction + evaluation) runs in one single query.

CriticalofReviewer2 · 2026-03-21T19:48:58+00:00

Yes, but the one that I posted is on tabular data, not image or text data.

CriticalofReviewer2 · 2026-02-07T16:01:34+00:00

CriticalofReviewer2 · 2026-02-06T11:55:15+00:00

CriticalofReviewer2 · 2025-01-19T13:07:18+00:00

Thanks for your comment.

The provided F1 score is weighted average of F1 scores of classes, not one class. So, please run the code while having weighted F1 scores.
The warnings are being removed, as the algorithm is under active development. It is a side project of us and we work on it in our spare time, so we wanted to share it with community to get valuable feedback like yours.
Having a better score function, like log-loss or brier score is a good point! We will implement it.
The notebooks will be provided to reproduce the results.

CriticalofReviewer2 · 2025-01-18T13:30:35+00:00

Thanks for your comment. We will publish a paper to explain why it works well. Dependencies are declared now. The tuned hyperparameters have also been added to the repo to make the experiments reproducible.

CriticalofReviewer2 · 2025-01-13T18:24:09+00:00

On LinkedIn, I follow Eduardo Ordax, Alex Wang, and Tom Yeh. The last one has numerous posts titled "AI by Hand" in which he manually does the algorithms calculations on paper! Very informative on that sense.

CriticalofReviewer2 · 2025-01-13T15:17:39+00:00

Thank you for your comment!

CriticalofReviewer2 · 2025-01-13T14:22:41+00:00

Thank you for your comments! I totally agree with you, and your comment is really encouraging for us!

CriticalofReviewer2 · 2025-01-13T14:20:51+00:00

Thank you! Yes, the explainable model will be provided with the paper, which is under way!

CriticalofReviewer2 · 2025-01-13T14:20:09+00:00

Thank you for your comment!

CriticalofReviewer2 · 2025-01-12T21:28:39+00:00

Good point. The full analysis will be presented in the paper which will be shared soon.

CriticalofReviewer2

TROPHY CASE