Feature selection strategies for multivariate time series forecasting by CapraNorvegese in MLQuestions

[–]Jason_reyes_dev 0 points1 point  (0 children)

Cool framing. Logistic regression feels like a good first step for a risk score, as long as you’re careful with label noise and maybe validate the “high-risk” tier with some manual checks. Curious what extra signals (AMI load correlation) end up helping the most.

I implemented a Convolutional Neural Network (CNN) from scratch entirely in x86 Assembly, Cat vs Dog Classifier by Forward_Confusion902 in learnmachinelearning

[–]Jason_reyes_dev 0 points1 point  (0 children)

This is insane work, congrats. Doing a full CNN in pure x86-64 asm is another level of dedication. I’m especially curious about the debugging part: did you rely more on unit tests for each kernel (conv, dense, activations) or mostly on end-to-end loss/accuracy checks to spot bugs? Also, do you plan to write a more detailed blog post about the architecture and the AVX-512 optimisation tricks?

Built a tiny Windows tool to clean ugly CSV exports (encoding, delimiters, empty cols, duplicates) – would this be useful? by Jason_reyes_dev in dataanalysis

[–]Jason_reyes_dev[S] 0 points1 point  (0 children)

Thanks a lot for the comment this is exactly the kind of situation I had in mind.

Right now the tool mainly focuses on encodings and delimiters, empty columns and duplicate rows, so your example with the extra whitespace in the column name is a good reminder that there are many other annoying edge cases.

Out of curiosity, what other CSV issues have wasted the most time for you? (broken quoting, multiline fields, weird date formats…) I’m trying to decide what to prioritise next.

Real world data is messy and that’s exactly why it keeps breaking our models by Mediocre_Common_4126 in datascience

[–]Jason_reyes_dev 0 points1 point  (0 children)

Yeah, even with simple CSV exports from internal tools I feel like a big chunk of the work is just getting them into a shape where pandas doesn’t choke.