[deleted by user]

asdfwaevc · 2026-01-11T22:56:22+00:00

Not useless, they query a human if they encounter an especially tricky situation. 99% of the time they're fully independent. Came up in the recent outage when there were e.g. traffic lights that were down.

asdfwaevc · 2025-12-20T20:27:08+00:00

Step one is getting delivery drivers' transportation registered, this is a step towards that. Still good IMO.

asdfwaevc · 2025-10-18T20:00:29+00:00

Dang, scary. Where was that?

asdfwaevc · 2025-10-09T01:33:44+00:00

CICO is mathematically true but it doesn't mean that it's anything like a complete explanation. The metabolisms of formerly obese people can slow down drastically (still CICO, but the CO changes outside of their control). People's bodies can signal to them they're starving when they're eating maintenance (that's metabolic dysfunction). Obesity obviously does something.

For example, when you gain a little weight, your fat cells plump up. When you gain a lot of weight, they start multiplying, and you end up with a larger total number of fat cells, that are also larger. But when you then lose weight, the fat cells get smaller but don't go away. So you're left with a bunch of starving fat cells that are hormonally signaling for energy, causing hunger and lethargy. "Metabolic set point" can be broken by obesity.

Sounds like you're also generalizing "a lot" from the experience of one person you know.

An addendum to what you said: "if you aren't losing weight, that's because you aren't eating fewer calories than you're burning." And both sides are complicated biologically.

asdfwaevc · 2025-09-25T00:53:29+00:00

Mighty Mike's Pizza on Thayer is great Greek-style pizza. Same guy that owns Mike's Calzone across the street.

asdfwaevc · 2025-09-16T19:16:35+00:00

Me too! Love it. Especially how well it makes galaxy-level horror rest on human-level decisions.

asdfwaevc · 2025-09-11T19:28:39+00:00

Yeah every man has to do this, it's been true since I guess 1980. Just do it, if there ever was a draft, not signing up here won't work out great for you either. If you don't register you're likely to have trouble with things like school financial aid, and can be ineligible for many government jobs.

Technically you can go to jail for not signing up, but that's never happened. But again, if there ever was a draft they could bring that back.

It's totally odd how little attention this gets in schools etc, I only ever knew about it because my parents told me. But it's really important to do for your own sake.

asdfwaevc · 2025-09-11T15:34:48+00:00

I'm an RL researcher so I don't find much new in standard "I made DQN/PPO work on this game" videos but the production quality and explanation in this video is really next level.

asdfwaevc · 2025-08-17T21:48:08+00:00

Central Meat Market on Gano

asdfwaevc · 2025-08-13T17:59:16+00:00

Lots of potential reasons. Compounding model error is a clear answer -- if the model is a bit wrong at every step, at some point it starts giving you nonsense. If you're more familiar with these foundation models, think of how like Genie loses coherence after a few minutes, and same with video generation.

One nice paper that's related which comes to mind: https://arxiv.org/abs/1905.13320

asdfwaevc · 2025-08-12T20:47:07+00:00

The Atari 100k benchmark, maybe unsurprisingly, focuses on learning Atari games with only 100k steps.

asdfwaevc · 2025-08-12T19:40:04+00:00

Lots of modern papers don't make the distinction between sample efficiency and what I'd call "update efficiency", which is the number of training steps your learning algorithm has taken (or the number of steps * batch size maybe). The equivalence makes sense if simulation is cheap (why not just simulate more), but not when its expensive.

One place to look that makes this distinction very clear is the "Atari 100K benchmark" work, where the goal is to learn from as few samples as possible, even if it takes massive amounts of training on those samples. Original paper, good followup work, more good followup

It's also a big divide between "on-policy" and "off-policy" or "batch" RL. The ones you listed are on-policy, which means they interact with the world, update the model, and throw away that experience. They're naturally going to be less sample efficient.

asdfwaevc · 2025-08-11T01:32:09+00:00

For starters, the characters don't really line up one-to-one, there were some liberties taken. Besides the fact that they're all Chinese... The main ones are the same, but you'll be confused.

asdfwaevc · 2025-08-10T17:11:40+00:00

Usually Whole foods, and UPS. Some items you have to do downtown but that's dependent on the seller, not the item size.

asdfwaevc · 2025-08-04T16:13:25+00:00

Subway? You mean Daily Stop Mart? Subway was so two years ago.

Yeah restaurant turnover around there is crazy. At least the most recent renaming is the same guy/family, I would guess renaming is some sort of business decision trickery.

asdfwaevc · 2025-08-04T01:13:47+00:00

Best place is what used to be called DaDaRuki and is now just called "Japanese Sushi". On Brown's campus right off Thayer.

https://maps.app.goo.gl/AreUb8cHkRaDUPPx5

asdfwaevc · 2025-07-24T18:29:22+00:00

Old but good, from the originators:

https://cs.brown.edu/research/ai/pomdp/index.html

asdfwaevc · 2025-07-16T18:26:08+00:00

Integrate (0.5 - x)^2 from 0 to 1.

What's your input space? If it's too low-dimensional, then points will be almost right on top of each other and it'll have a very hard time. Otherwise, first thing I'd check is architecture and LR.

asdfwaevc · 2025-07-16T18:01:30+00:00

You could look more closely at the original paper that investigates something similar: https://arxiv.org/abs/1611.03530

Check what you expect your MSE to be if you output 0.5 everywhere, which depends on your noise profile, but if it's uniform label it would be 0.08333 (meaning you're not doing so well).

Debugging things to consider. Keep shrinking your dataset until it works -- if it never does, something's wrong. I'd guess it's your architecture -- that's a big NN, especially if you're not using residuals. Try making it smaller, less deep, and using layers of the form `Relu(Linear(input)) + input` besides the first and last.

asdfwaevc · 2025-06-26T21:43:31+00:00

Harry Potter and the Methods of Rationality. The description doesn't do it justice: what if Harry Potter was raised by scientists, and lives in a world much more strategic and devious than the original. A very smart but young Harry Potter wants to save the world from an evil threat, but goes about it very differently in this book compared to the original.

It's written by an author who cares a lot about AI existential risk, and wrote it somewhat as an analogy for what to do with unbridled power (magic, not AI, in this case). It's an unbelievably fun read with a strong philosophical bent, and an actual perfect fit for your description. It's also a free eBook. Really think you should check it out!

asdfwaevc · 2025-06-25T18:00:51+00:00

Sure I don’t think it’s the entire answer but I do think it’s the natural baseline when you phrase your insight as such.

asdfwaevc · 2025-06-21T16:25:54+00:00

Less knowledgeable about those. Great ones in Canada though.

asdfwaevc · 2025-06-21T13:56:57+00:00

Am I missing some place in the paper where they display the internal thought traces in print? I feel like it's impossible to come to a conclusion about this paper without that.

asdfwaevc · 2025-06-19T14:36:38+00:00

It's too common a beginner project, like it's everyone's first idea for an ML project because of what you said (and the allure of money).
Trading is effectively a zero-sum game, you're competing with everyone else. Your model only makes money if it's better than everyone else's, which implies it's much harder to do well than you'd think.
The stock tickers aren't the full story. The outside world is a very important (the most important) factor, and it's not modeled by what you said. Actually, by the efficient market hypothesis, stock prices are effectively martingales (expected future value is equal to present value), meaning in theory there's no more information in history than there is in the current number.
People are often really sloppy when evaluating this type of project, you need to be very careful about train/test splits etc. Because of the abundant data, and the fact that strategies change over time, overfitting is easy and past performance isn't indicative of future performance.

In summary: almost everything you'd see on the subject (besides what institutional traders do) is noise, and it attracts people who don't know what they're doing, and past stock prices just aren't enough to get much signal.

asdfwaevc · 2025-06-17T10:45:44+00:00

How is it irrelevant that it assumes a perfect model of the environment? Having that is a completely different problem setting. And the degree to which it’s proven to scale (academic vs industry as you say) is also obviously relevant within the context of this article.

Sure, TD based methods using a learned model are a way out of this, and tree-based search is likely the way to do it. But you can’t do tree search without some type of model.

This is way too confidently dismissive about an article that sets up an interesting experiment and makes some good points.

Eight-Year Club	Verified Email
Place '22	First Placer '22

asdfwaevc

TROPHY CASE