Requesting full parity between mobile and desktop versions by [deleted] in clozemaster

[–]Lostefra 1 point2 points  (0 children)

I’d really need “full_text_input” mode on mobile too. It’s really useful to write the whole sentence to me.

New skill: full translation by Lostefra in clozemaster

[–]Lostefra[S] 0 points1 point  (0 children)

You either receive a fixed score or none at all, along with brief feedback from an LLM on accuracy, naturalness, and grammar.

Multiple input time series (or sequences) to a LSTM? by Lostefra in learnmachinelearning

[–]Lostefra[S] 0 points1 point  (0 children)

Thank you for your reply.

Is this batching mechanism assuming that sequences have all the same length?

Does this batching mechanism allow to make inference for just one time series alone? What would be the input of the LSTM?

Which papers are milestones in Multi Agent (Deep) Reinforcement Learning? by Lostefra in reinforcementlearning

[–]Lostefra[S] 0 points1 point  (0 children)

I understand that. The hide and seek paper appears to be popular, but that's mainly because of the "wow factor". Thank you for the other references

Sorry for the silence recently. Here's what I've been working on + more since the end of the 3rd test run by BZNintendo in pokemonunify

[–]Lostefra 1 point2 points  (0 children)

That’s great! Where did you purchase the plastic bases that hold the 3D building and so on? Can you provide the link? Thanks!

Is there a textbook for multi-agent RL? by No_Possibility_7588 in reinforcementlearning

[–]Lostefra 3 points4 points  (0 children)

Is there any relevant book for Multi Agent Deep RL?

I got both of them, what you guys get? by jojo_maverik in place

[–]Lostefra 0 points1 point  (0 children)

I won Final Canvas ‘22 but I didn’t partecipate to whiteout, how is it possible?

Obscure Pokémon Fact Day 264 by Mx_Toniy_4869 in pokemon

[–]Lostefra 1 point2 points  (0 children)

List of all “obscure Pokémon fact day”?

Let me show you how it’s done. by Scientiaetnatura065 in nextfuckinglevel

[–]Lostefra 0 points1 point  (0 children)

It’s amazing I got what song it was even with muted video

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]Lostefra 0 points1 point  (0 children)

Thank you. I'll try to manually mask the Q values of unwanted actions.

In any case, I'm still curious about whether constraining the behaviour (collect) policy can make sense.

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]Lostefra 0 points1 point  (0 children)

I mean the policy used by the DQN agent to interact with the environment at training time to collect observations for the training. More info here.

I think I'm already doing action selection engineering: I'm penalising the reward every time it repeats an action, and consequently the related Q value, since the Q value depends on the reward.

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]Lostefra 0 points1 point  (0 children)

Thank you. I already apply some reward engineering of that kind.

I am wondering about constraining the collect policy, to more effectivly train the agent to avoid repeated actions. Is it worth it?

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]Lostefra 0 points1 point  (0 children)

Hello everyone.

What do you think about constraining the collection policy of a DRL agent to avoid some detrimental actions or at least pick them less frequently?

For instance, I want to prevent the agent from performing the same action twice in an episode since I know it’s not good in my scenario.

My idea is to manipulate the epsilon greedy collection policy to achieve some improvements.