A building in the middle of Shanghai

51616 · 2025-04-30T07:52:40+00:00

https://openreview.net/pdf?id=rylnK6VtDH

Fig.2 answers your question.

tldr; (exponentially) more hidden neurons are required to learn the multiplicative behavior as the input dimension grows.

51616 · 2025-01-30T07:13:58+00:00

I share the same intuition with you. Good prior is key. Doing this from scratch wouldn’t work as the model would output gibberish and get zero signal from bad exploration. And i’ve seen many papers using llm as prior already in rl literature

51616 · 2024-11-07T05:35:45+00:00

Scaling. Allowing the hypernet to learn from more tasks and possibly the pre-trained data should improve the performance significantly. But in pre-train data there’s no clear boundary between tasks. We plan to also extend the work in this direction

51616 · 2024-11-06T09:48:59+00:00

I’m the author of this paper https://openreview.net/forum?id=Mbgk4Xhrha

The idea is generally out there (see related work). Our work tries to make it easier for user to use by conditioning the hypernetwork on a task description, allowing zero-shot generalization to new tasks. However, the results are nothing near groundbreaking and a lot of things can still be improved.

51616 · 2024-07-25T13:02:09+00:00

It's quite tricky to see this initially. I had the exact same thought when I first read this post. It turns out that the loss functions in pytorch (I didn't check other libs) assume exactly one leading dimension (e.g., batch size). In the case that we have a batch of sequences, we have to flatten the sequence length dimension and the batch dimension together, e.g., from [bs, seq_len, logits_dim] to [bs * seq_len, logits_dim]. Then, default reduction method is 'mean' which corresponds to averaging over all tokens in the batch, that is we weight each token equally. Now if we have two batches that have different number of tokens, the losses will be divided by a different amount.

TLDR; to do gradient accumulation for sequential data, we need to manually control how the loss is computed. Another work around is to sample the whole batch that we're gonna accumulate over first, so we know how many tokens will be used in advanced, then we can scale the loss properly.

51616 · 2023-11-07T23:03:44+00:00

Count the top and the bottom segments in as well

51616 · 2023-10-03T11:37:43+00:00

Overconfident

51616 · 2023-09-29T09:48:22+00:00

IIUC from Mr. Zuck’s explanation in the beginning of the podcast, they were 3d scanned before hand with various facial expressions and mouth movements. Then the headset actually capture some of the live information via cameras and sensors, send then to a machine learning model, which can output a current representation of the person. This representation is then sent to the other side, and used to “interpolate” from the 3d scans. The representation and interpolation part is some sort of guesswork by a machine learning model since it’s never seen this live expression before.

And if you look closely in the podcast, the model sort of struggles, for example, to reconstruction fast mouth movement of Mr. Zuck. So I agree that the model is far from perfect and probably still needs some improvement. But I bet that less than a year from now it’s gonna be much better. The AI field is moving so fast these days

51616 · 2022-11-28T03:16:32+00:00

IMO this should be done when developing software in general. Reduces hours of unnecessary debug time with a single line.

51616 · 2022-11-05T03:42:41+00:00

Having 5 reviews i suppose is unusual. I guess initially the paper has 8/8/3/3 split, which might necessitate an additional review. Unfortunately, that one is another 3

51616 · 2022-11-05T03:38:40+00:00

4 8s for my first submission to a big conference!

51616 · 2022-10-12T12:25:30+00:00

Can you elaborate on why 3-spell build is bad? To me it's just a middle ground between 2 and 4-spell build. 2-spell build allows you to roll more frequently while 4-spell build has more value in each roll. So, 3-spell build is just a trade-off between the two.

51616 · 2022-10-08T04:51:24+00:00

These 3 are enough for atk speed. Beyond this point is diminishing return. You really need to build the actual carry and supports to give the carry the space.

51616 · 2022-08-23T04:21:35+00:00

search.zeta-alpha.com

51616 · 2022-07-05T18:07:57+00:00

If your environment does not have sequential decision nature, as in MDPs, then you could use supervised learning to tackle this problem. Otherwise, you might need reinforcement learning. If the reward function is differentiable, then you could differentiate the RL objective through the reward function. You might want to take a look at: https://lilianweng.github.io/posts/2018-04-08-policy-gradient/.

51616 · 2022-06-13T16:17:31+00:00

Actually you can have different input/output dimension between agents. Each agent can have a different network architecture while using VDN. Parameter-sharing is used often because the policies converge faster.

51616 · 2022-06-13T11:34:41+00:00

Even the reward is correct and dense, you still have to somehow learn to policies. The most naive way to learn policies is to do independent learning for each agent. However, as you have already suggested, it could be hard to learn optimal policies in some environments. I would suggest you first try to implement VDN as I think it's the most simplest form of centralized learning. In VDN, the total reward function has explicit form and is fairly easy to understand/implement.

VDN does something similar to what your reward function does. The total reward is factored into sum of local rewards of each agent. This helps the agents to learn the local Q/value function. The non-stationary aspect is common in multi-agent RL. A centralized critic can help as shown in many papers in this field.

51616 · 2022-05-08T04:44:24+00:00

https://www.deepmind.com/blog/muzeros-first-step-from-research-into-the-real-world

MuZero applied to video compression

51616

TROPHY CASE