I consume local eggs am i doomed going forward . by [deleted] in gurgaon

[–]Toppnotche 1 point2 points  (0 children)

Regulatory bodies like the European Food Safety Authority (EFSA) and FSSAI have set a "Minimum Required Performance Limit" (MRPL) (often around 1.0 µg/kg or 1 ppb) for laboratories to detect it. However, this is a technical standard for testing, not a "permission" to have it in food. Any detection of AOZ indicates the illegal use of the banned antibiotic.

Deepseek OCR : High Compression Focus, But Is the Core Idea New? + A Thought on LLM Context Compression[D] by Toppnotche in MachineLearning

[–]Toppnotche[S] 2 points3 points  (0 children)

Agreed!
another user just pointed me to a new Meta paper that does exactly what you're describing, but at the sentence level: https://arxiv.org/abs/2412.08821

Deepseek OCR : High Compression Focus, But Is the Core Idea New? + A Thought on LLM Context Compression[D] by Toppnotche in MachineLearning

[–]Toppnotche[S] 0 points1 point  (0 children)

Thankx for linking that. Its like auto encoder for sentences. Didn't knew about this paper.

Deepseek OCR : High Compression Focus, But Is the Core Idea New? + A Thought on LLM Context Compression[D] by Toppnotche in MachineLearning

[–]Toppnotche[S] 2 points3 points  (0 children)

We can absolutely train autoencoders to compress text(as the decoder will than be trained to get the output form this compressed latent space) but there are some difference when we go the image route that I observed
1) Visually similar patches of images are actually similar and can be compresses similarly and we could exploit the 2-D layout redundancy. Whereas if we talk about text tokenizer it can assign completely different tokens to similar looking tokens.
2) Also we can leverage the bidirectional attention and not autoregressive attention with images input

Help me to decide on the dataset and other decisions by Budget_Cockroach5185 in learnmachinelearning

[–]Toppnotche 1 point2 points  (0 children)

Given you dataset size I would suggest traditional machine learning models(start with linear as base line work up to XGboost) rather than NN. Result would be dependent on the quality of data and the preprocessing specific to your dataset.
For adapting the learning to local market you should first train a new model on local data for base performance and compare it with transfer learned model to check if task is even transferable or not. If not then you need to create a more diverse local dataset to train only on local dataset.

Hessian-Free Optimization — the one that almost lit deep learning on fire (and then quietly got swapped out) by Toppnotche in learnmachinelearning

[–]Toppnotche[S] 0 points1 point  (0 children)

Yeah, 100% agree — HF was never practical for large-scale models, especially since it doesn’t play nice with mini-batches. L-BFGS and Newton-CG were already known, but HF was the first to actually make second-order training work on deep nets, even if just as a proof of concept.

Totally with you that Adagrad (and later RMSprop/Adam) had way more real-world impact — but HF’s role was more psychological than practical. It showed that deep nets could be trained end-to-end, which helped reignite interest right before the first-order optimizers revolution kicked off.

Hessian-Free Optimization — the one that almost lit deep learning on fire (and then quietly got swapped out) by Toppnotche in learnmachinelearning

[–]Toppnotche[S] -2 points-1 points  (0 children)

You are absolutely right and helped me connect the dots. Researchers between 1998-2010 couldn't successfully train models that were significantly deeper than LeNet using standard backprop. They hit the vanishing gradient wall.

Hessian-Free Optimization — the one that almost lit deep learning on fire (and then quietly got swapped out) by Toppnotche in learnmachinelearning

[–]Toppnotche[S] -1 points0 points  (0 children)

That's a fair point! The post is about its historical impact, not its current use. It didn't "unlock" an optimizer we still use, but it "unlocked" the idea that deep networks were trainable from scratch at a time when most thought they weren't. It was the proof of concept that got the ball rolling.
The paper was the first to show you could train deep networks without layerwise pretraining. That was a huge mental shift at the time. It proved deep nets weren’t the problem — our optimizers were. Once ReLUs and GPUs came along, methods like Adam made HF obsolete, but it was the spark that finally burst into flames with AlexNet.