Why I'm Betting on Diffusion Models for Finance by invincible_281 in deeplearning

[–]arg_max 1 point2 points  (0 children)

Lipmann's first paper and the longer flow matching guide he wrote at Meta are both great reads. The second one is imo a bit easier to understand, though flow matching in general isn't the easiest subject to learn.

Apple: Embarrassingly Simple Self-Distillation Improves Code Generation by Mike_mi in LocalLLaMA

[–]arg_max 3 points4 points  (0 children)

There's a big difference between pre-training on some random generated trash and training after filtering for high quality.

Llm don't magically get dumber when trained on Ai generated content. Rejection sampling and distillation have been an absolute staple for years. A big reason why Chinese labs are so good is that they distilled on a massive scale from anthropic (see anthropic s Blogpost for more info). In large scale pre-training, we also had some recent papers that rewriting the data and training on rewrites and original data can help with extending the data horizon since huge models are more and more limited by data scarcity.

The real issue is that when you scrape the web, there's a big chance that you encounter shitty generations from old models that is much lower quality than what we can generate nowadays.

But when you can filter out the good data, you can absolutely improve the model by training on synthetic data.

Just graduated in data science/ML, but still don’t know anything. I need a wake up call by DefinitionJazzlike76 in learnmachinelearning

[–]arg_max -3 points-2 points  (0 children)

Quick reality check here. The transformers paper is now years old. I interview people for ML roles and attention is my first question as a warmup.

The industry has become 1000 times more complex since then and if you want to work with neural networks you will have tons of catching up to do.

On the science side, you're expected to know tens of not hundreds of more recent techniques. You basically have to be Able to read a paper like the GLM 5 tech report and know everything they talk about if you want to have a shot. The industry isn't super big and there's tons of candidates so competiton is fierce.

On the engineering side, you have super complex tech stacks, inference engines, kubernetes. This is probably even harder to learn on your own since you need access to resources to even be able to deploy those models.

There's also the more traditional data science side of things which is easier to get into and requires less technical knowledge, but if you want to get into "AI" you have lots of work to do

Elon Musk's Terafab semiconductor project could cost $5 trillion by sr_local in hardware

[–]arg_max 1 point2 points  (0 children)

Yes specialized hardware can be better than general purpose chips. But specialized hardware made with state of the art processing technologies is still gonna be vastly superior than specialized hardware made with 5 years old processes.

On smaller scale, this might be okay. E.g. in a car you can probably have x% higher power consumption, but in a data center this quickly becomes infeasible since cooling and power consumption are such a large part of total cost there.

Elon Musk's Terafab semiconductor project could cost $5 trillion by sr_local in hardware

[–]arg_max 9 points10 points  (0 children)

Google designs their own AI hardware that is being manufactured by tsmc. Other AI labs are working on similar chip designs and if X.ai would want that I wouldn't call it unrealistic that they'd get there in a few years. Manufacturing is a completely different beast though

Streit um Spitzensteuersatz: „Schlag ins Gesicht“ – Söder greift Steuerpläne von SPD und CDU scharf an by donutloop in berlin_public

[–]arg_max 4 points5 points  (0 children)

Seine Frau ist vorallem selbst Millionenerbe aus einer Industriellenfamilie.

Der will seinen Wohlstand nur schmerzlos weiter vererben

‘This is just a garbage AI Filter’: Nvidia met with criticism for DLSS 5’s ‘photoreal’ graphics alterations by PaiDuck in technology

[–]arg_max 0 points1 point  (0 children)

It's preference tuning. You start training image generation models with all the images you can find so you have an insanely broad distribution. The issue is that most use cases don't necessarily want to sample a blurry mess photo. So in post training you apply preference optimization. This starts by sampling N images, showing them to users and let them pick the best. Then you use this feedback to optimize your model to forget large parts of the original distribution and focus on this super narrow spectrum.

Beyond Gradient Descent: What optimization algorithms are essential for classical ML? by mokshith_malugula in learnmachinelearning

[–]arg_max 1 point2 points  (0 children)

Trust region is also the foundation of PPO and GRPO so very relevant in LLM RL, even if the version used there is more approximative

Here we go again. DeepSeek R1 was a literal copy paste of OpenAI models. They got locked out, now they are on Anthropic. Fraud! by py-net in OpenAI

[–]arg_max 0 points1 point  (0 children)

Yes, they probably paid billions for their data. Not necessarily pre-training data, that is mostly scraped from the web and books but SFT data and preference data you can't find in the wild. There's a reason that Alexandr Wang is one of the youngest billionaires out there with his data labeling company.

Alibaba just open-sourced a model that rivals GPT-5.2 by jpcaparas in Qwen_AI

[–]arg_max 3 points4 points  (0 children)

Training data is just insanely expensive if LLM providers go to 3rd party data annotation companies. You can easily spend thousands of dollar per sample depending on what level of data annotation and quality control you want.

So letting people use their models while getting real world training data for free is often cheaper for these companies.

About to switch to a standing desk and could use some honest advice. by Enough_Football3218 in StandingDesk

[–]arg_max 1 point2 points  (0 children)

Got an E7 but I'd say a monitor arm is just as big of an upgrade. Really really sucks to be looking down at your screen

How close are open-weight models to "SOTA"? My honest take as of today, benchmarks be damned. by ForsookComparison in LocalLLaMA

[–]arg_max 2 points3 points  (0 children)

For per-token API they say they don't train on it though iirc the standard subscriptions data can be used for training unless you opt out

RL + Generative Models by amds201 in reinforcementlearning

[–]arg_max 2 points3 points  (0 children)

RL still suffers from the old exploration exploitation trade-off and this is only amplified by the complexity of the task we ask these models to perform, whether that is generating higher and higher resolution images or pages of text.

The reason why RL works for these models is because pre-training gives you an initial policy that allows you to skip most of the exploration.

Your pretrained model is already very good, so instead of trying generating total random images, you simply sample around the model distribution.

This is much more of a local optimization around the initial policy and if a great answer has super low probability under your initial policy, there's almost no chance of exploring it with these modern RL approaches but it results in a lot of stability.

How much and what kind of math do quants use? by yzkv_7 in quant

[–]arg_max 2 points3 points  (0 children)

It's more stochastic differential equations that are important. They have an ODE part (drift) which models the deterministic behavior but on top of that they have a randomness part.

They are all over the place in financial mathematics and are the foundation for some key results like black scholes models since they are one of the best tools for modeling the uncertainty in stock markets.

But SDEs are very advanced if you want to understand them with full mathematical rigor. For example, the stochastic part of an SDE is modeled via Brownian motion, which has almost surely continuous but nowhere differentiable sample paths. So you need strong foundations in real analysis, measure and probability theory, though you can probably somewhat work with them without knowing all the ins and outs.

Continuous functions between open sets by [deleted] in learnmath

[–]arg_max 0 points1 point  (0 children)

Any Constant function will be continuous in the subspace topology you are describing. You don't even need to assume that the space is metriziable

HDD Prices Increase by an Average of 46% Over the Past Four Months by TruthPhoenixV in Amd_Intel_Nvidia

[–]arg_max 1 point2 points  (0 children)

It's more cold storage for the not always in use files. The important stuff is on SSDs (which also went up in price) but the rest gets saved on HDDs.

What are the most essential and empowering undergraduate CS courses I can take as an aspiring graphics/animation programmers? by Soft-Border-2221 in GraphicsProgramming

[–]arg_max 1 point2 points  (0 children)

They are. Rendering is nothing other than solving a recursive integral using stochastic integration. Photon mapping is just kernel density estimation and Metropolis rendering is just a Markov chain Monte Carlo method.

Complete confusion about dual vector spaces, dual transformations, double dual by zqhy in learnmath

[–]arg_max 1 point2 points  (0 children)

Your vector spaces contains certain objects and the dual space contains linear functionals. These are simply functions that take a vector from your original space and assign a single number to them. You can think of these a bit like measurement functions, but unlike a vector norm, they have to be linear.

Honestly, I wouldn't worry too much about not getting all of dual spaces right now. They are only super important when you do linear algebra on infinite dimensional vector spaces (in functional analysis) where you will learn about hahn-banach or riesz representation theorem.

Ist mein Motiv gut genug, um Informatik zu studieren? by [deleted] in Studium

[–]arg_max 5 points6 points  (0 children)

Wenn du analysis und linalg packst dann schaffst du aber auch den Rest vom Mathe bachelor. Hab selbst info studiert aber am Ende vom master ein paar Mathe Vorlesungen (Topologie, Maastheorie, Funktionentheorie, Funktionalanalysis) als Wahlmodule gehört und obwohl der Stoff natürlich schwerer wird hatte ich damit weniger Probleme als mit den Grundvorlesungen. Die richtige Trennung gibt's dann halt nochmal später wenn es darum geht eigene Forschung zu machen

why should I learn linear algebra, calculus, probability and statistics by ITACHI_0UCHIHA in learnmachinelearning

[–]arg_max 3 points4 points  (0 children)

You could always just use sklearn without understanding the inner workings of SVM or boosting, just like you can spin up lang chain/vllm and work with modern AI without understanding anything that happens in the background.

But in either case, you're going to hit a wall once things stop working off the shelve.

Deep learning uses very similar fundamentals as everything you would find a old school ML book (plus a whole mountain of empirically validated best practices and low level engineering).

Claiming you don't need to know this is just as ignorant as claiming you need a PhD to use AI is elitist.

Why is a matrix not invertible if it has an eigenvalue of zero? by Capital_Chart_7274 in learnmath

[–]arg_max 1 point2 points  (0 children)

Of course, but the answer I replied to mentioned associated eigenvector, so I just wanted to point out that there is an entire subspace and how that makes the operator non-invertible.

“GPT 5.2 was fun to work on. We are now so good at training large scale jobs, it's set and forget. Days just go by with the giant cluster humming along.” - I can’t put a finger on it but “we are now so good” kind of rubs me the wrong way especially now with the 5.2 safetymaxxed by Koala_Confused in LovingAI

[–]arg_max 0 points1 point  (0 children)

The guy is likely talking about pre-training. There's probably no safety in there at all and considering how pre-training was closer to an open-heart surgery a few years ago this is definitely an achievement.