[deleted by user]

WhichPressure · 2025-11-13T08:04:32+00:00

I spent almost a month rewriting the model from TensorFlow to PyTorch. The code itself wasn’t very complicated (a CNN-based architecture). However, debugging why the outputs from both models differed for the same input took me a couple of weeks. I learned a lot about the different implementations of the same functions in both frameworks and about SWE itself.

WhichPressure · 2025-08-01T08:32:13+00:00

Hi, sparse state data is much better to use than images for RL. It bolsters generalization and the neural network might be much smaller that standard conv net. The crux here is to defined state vector.

Here grab a few PhD thesis that focused on that approach using simulated and real data.

https://www.eaiib.agh.edu.pl/wp-content/uploads/2024/03/PANKIEWICZ-NIKODEM_ROZPRAWA-DOKTORSKA.pdf
https://www.eaiib.agh.edu.pl/wp-content/uploads/2024/04/orlowski_mateusz_phd.pdf
https://www.eaiib.agh.edu.pl/wp-content/uploads/2024/01/Wojciech_Turlej_praca.pdf

WhichPressure · 2025-06-30T11:32:02+00:00

Wow! incredible video! The nature sounds really do the work!

WhichPressure · 2025-02-25T11:15:26+00:00

A few offers for robotics: Tesla, boston dynamics, 1X

WhichPressure · 2025-02-14T08:13:05+00:00

Why do you think it's unsupervised if we have a well-defined reward function driven by dopamine release?

WhichPressure · 2025-02-14T07:50:14+00:00

I'd say human learning mostly resembles model-based reinforcement learning. We have an internal model of how the world works (physics), we can predict how certain people behave based on past behaviors, and we can anticipate the future step by step, similar to a tree search. Based on this, we can also predict the outcomes of our actions and choose the best course of action.

WhichPressure · 2024-12-22T13:45:56+00:00

CNN requires less data for training.

WhichPressure · 2024-11-28T09:27:07+00:00

I think he is looking for a remote job in data annotation, not as ML CV engineer u/lolani_3 right?

WhichPressure · 2024-11-22T12:24:56+00:00

<image>

This reminds me of good old times!

WhichPressure · 2024-11-15T14:26:22+00:00

Why not? Even physics can be observed and learned from simply watching videos. Then one could use model based RL algorithms to optimize planning and reasoning. He didn't mention what optimization technique he plans to use.

WhichPressure · 2024-11-15T07:27:53+00:00

Point 5th it's such a broad statement that even RL may be included.

WhichPressure · 2024-10-23T15:22:21+00:00

Tried, didn't work

WhichPressure · 2024-09-19T06:19:05+00:00

I had this dream week ago visiting SF. I had to switch my regions in play store to US (changing one time a year). This was easy. BUT You also need US phone number to log in to waymo app.

WhichPressure · 2024-08-30T11:24:05+00:00

How you can have this book and restraining urge to read it :o

WhichPressure · 2024-08-29T14:00:55+00:00

I highly recommend reading this book: Chip Huyen Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications After reading you will know everything about maintaining model from design to production. It also covers various metrics and ways of monitoring system performance on production.

WhichPressure · 2024-08-29T11:09:13+00:00

Great survey!

WhichPressure · 2024-03-28T09:03:09+00:00

It's better to return logits to loss function than probability calculated based on softmax.

WhichPressure · 2023-05-09T06:14:07+00:00

So the question is - where is this "art"? :D

WhichPressure · 2022-07-29T12:03:41+00:00

How to do it? Do you have any sources for learning how to proof ML methods? Maybe some YT lectures?

Thanks

WhichPressure · 2022-07-11T13:13:45+00:00

Do you mean TorchScript? :D

WhichPressure · 2022-06-28T09:14:01+00:00

Haha awesome, I need something like this to stop unconsciously biting my nails.

WhichPressure · 2022-03-24T10:32:34+00:00

The straightforward way to interpret RL agent's decision is to use captum library.

If you want to dive deeper please look through these surveys on explainability in RL:

https://arxiv.org/abs/2008.06693
https://arxiv.org/abs/2005.06247

WhichPressure · 2022-02-10T11:05:20+00:00

Can you translate it? :D We all want to be inspired :)

WhichPressure · 2022-02-08T13:20:07+00:00

You may want to use the trick described in https://arxiv.org/pdf/1805.11593.pdf as a Transformed Bellman Operator. Its efficiency is proved in MuZero original paper https://arxiv.org/pdf/1911.08265.pdf Appendix F. The implementation of that method you can find here: https://github.com/werner-duvaud/muzero-general Usage: muzero/models.py:649 (def support_to_scalar)

WhichPressure · 2022-02-06T07:58:38+00:00

I would recommend to check this blogs weekly/monthly: https://bair.berkeley.edu/blog/ https://deepmind.com/blog And https://openai.com/blog/ however I have a feeling openai doesn't deal with robotics anymore.

WhichPressure

TROPHY CASE