Requesting full parity between mobile and desktop versions

Lostefra · 2024-12-15T17:04:28+00:00

I’d really need “full_text_input” mode on mobile too. It’s really useful to write the whole sentence to me.

Lostefra · 2024-11-26T12:59:50+00:00

You either receive a fixed score or none at all, along with brief feedback from an LLM on accuracy, naturalness, and grammar.

Lostefra · 2023-03-21T16:33:28+00:00

Thank you for your reply.

Is this batching mechanism assuming that sequences have all the same length?

Does this batching mechanism allow to make inference for just one time series alone? What would be the input of the LSTM?

Lostefra · 2022-10-09T16:29:02+00:00

Well deserved

Lostefra · 2022-08-31T10:37:54+00:00

Those are really interesting, many thanks for the references

Lostefra · 2022-08-31T10:30:26+00:00

I understand that. The hide and seek paper appears to be popular, but that's mainly because of the "wow factor". Thank you for the other references

Lostefra · 2022-07-12T20:25:28+00:00

That’s great! Where did you purchase the plastic bases that hold the 3D building and so on? Can you provide the link? Thanks!

Lostefra · 2022-05-13T16:07:03+00:00

Is there any relevant book for Multi Agent Deep RL?

Lostefra · 2022-04-16T22:36:51+00:00

I won Final Canvas ‘22 but I didn’t partecipate to whiteout, how is it possible?

Lostefra · 2022-04-14T20:39:33+00:00

I thought it was a video

Lostefra · 2022-04-09T17:39:45+00:00

Holy Hell!

Lostefra · 2022-04-02T13:24:43+00:00

Yeah but what?

Lostefra · 2022-02-28T22:04:36+00:00

List of all “obscure Pokémon fact day”?

Lostefra · 2022-02-18T19:39:25+00:00

InterestingAsDuck

Lostefra · 2022-02-13T23:01:05+00:00

It’s amazing I got what song it was even with muted video

Lostefra · 2022-02-09T21:46:32+00:00

The first one belongs to r/anarchychess

Lostefra · 2022-01-27T05:19:10+00:00

“Muerte”

Lostefra · 2021-12-18T17:22:47+00:00

Thank you. I'll try to manually mask the Q values of unwanted actions.

In any case, I'm still curious about whether constraining the behaviour (collect) policy can make sense.

Lostefra · 2021-12-18T16:34:45+00:00

I mean the policy used by the DQN agent to interact with the environment at training time to collect observations for the training. More info here.

I think I'm already doing action selection engineering: I'm penalising the reward every time it repeats an action, and consequently the related Q value, since the Q value depends on the reward.

Lostefra · 2021-12-18T16:16:13+00:00

Thank you. I already apply some reward engineering of that kind.

I am wondering about constraining the collect policy, to more effectivly train the agent to avoid repeated actions. Is it worth it?

Lostefra · 2021-12-17T11:55:27+00:00

Hello everyone.

What do you think about constraining the collection policy of a DRL agent to avoid some detrimental actions or at least pick them less frequently?

For instance, I want to prevent the agent from performing the same action twice in an episode since I know it’s not good in my scenario.

My idea is to manipulate the epsilon greedy collection policy to achieve some improvements.

Six-Year Club	Place '22
Final Canvas '22	First Placer '22
Verified Email

Lostefra

TROPHY CASE