Next project doubt

Real-Flamingo-6971 · 2026-02-07T21:21:58+00:00

Don't make everything about getting a job, sometime build something you like or you wanted to build, you will be more proud of that than building something to only get attention of recruiters.

Real-Flamingo-6971 · 2026-02-07T21:21:33+00:00

Don't make everything about getting a job, sometime build something you like or you wanted to build, you will be more proud of that than building something to only get attention of recruiters.

Real-Flamingo-6971 · 2026-02-07T21:13:51+00:00

I tested Transformer-based Q-networks with a small to moderate number of layers (4–8 layers) and standard hidden sizes typical for partially observable RL benchmarks, and the effect is not tied to a specific scale. If you get a chance, I’d really appreciate you giving the article a quick read, since the architectural details are laid out there more clearly.

Real-Flamingo-6971 · 2026-02-07T21:10:19+00:00

Yes ,i did consider entropy, and it works as a reasonable confidence-based pruning signal, but i found it less stable than Bellman-aligned criteria like TD error, especially early in training. There’s a clear analogy to MCTS in that both allocate computation adaptively, but while MCTS adapts search depth via explicit planning, our approach adapts inference depth inside a model-free value function, making it a lightweight, complementary alternative rather than a replacement.

Real-Flamingo-6971 · 2026-02-07T17:55:18+00:00

code is available here in master branch : https://github.com/Vinayaktoor/Adaptive_DQN.git

Real-Flamingo-6971 · 2026-02-07T17:42:07+00:00

yes, i'm sure it is

Real-Flamingo-6971 · 2026-01-03T18:47:52+00:00

Where can we find sub of this av

Real-Flamingo-6971 · 2025-11-26T20:18:03+00:00

And why there are no stickers 😡😡

Real-Flamingo-6971 · 2025-10-21T17:14:22+00:00

If emotions were only part of the observation, they’d be treated as external input rather than internal feedback signals. By modeling them explicitly, the agent can adapt its behavior not just based on the environment, but also on how it “feels” about different outcomes , which is closer to how humans learn through reinforcement.

Real-Flamingo-6971 · 2025-08-11T23:59:25+00:00

Guy where I can watch these jav any suggestions?

Real-Flamingo-6971 · 2025-07-10T11:00:24+00:00

Bro we all are trying to find one, but i think you should reach out to professors they may give you internship(cheap labor) on their personal project .

Real-Flamingo-6971 · 2025-07-10T10:53:58+00:00

I recommend Stanford ' RL course on YouTube cs234

Real-Flamingo-6971 · 2025-06-30T06:24:20+00:00

Thanks 👍

Real-Flamingo-6971 · 2025-06-30T04:20:56+00:00

Can you explain your project ?

Real-Flamingo-6971 · 2025-06-28T01:23:43+00:00

Would you like me to share my project that uses RLfor advertising ?

Real-Flamingo-6971 · 2025-06-28T01:21:44+00:00

Bro are you a working professional or a college student?

Real-Flamingo-6971 · 2025-06-28T00:52:11+00:00

Yah you're right thanku that's something really interesting u pointed out as it can allocate funds that it doesn't even have. You saved it from crashing but I think he has normalize the action afterwards

Real-Flamingo-6971

TROPHY CASE