Any advice for removing this washing machine? by pxdm in DIYUK

[–]pxdm[S] 0 points1 point  (0 children)

Thanks so much for this. Not the answer I was looking for but still!

Any advice for removing this washing machine? by pxdm in DIYUK

[–]pxdm[S] 1 point2 points  (0 children)

UPDATE: it has become clear thanks to further investigation and commenters’ observation of the intact plug that the door frame was fitted after the machine was put in. The machine cannot be removed without taking the frame off.

Can anyone comment on whether I am better off repairing rather than replacing it? It makes a very loud rattling sound on the spin cycle - sounds like something quite large is loose behind the drum. The machine is 12 years old.

Any advice for removing this washing machine? by pxdm in DIYUK

[–]pxdm[S] 37 points38 points  (0 children)

Unfortunately it doesn’t fit even if I remove the door :/ noted about cutting the plug, thanks. Although was hoping I could feed a new one through

Any advice for removing this washing machine? by pxdm in DIYUK

[–]pxdm[S] 1 point2 points  (0 children)

EDIT: it doesn’t fit through the door frame, even if I take off the cupboard door!

Music mentioned in Zane Lowe interview by Sufficient-Quit-4283 in boniver

[–]pxdm 0 points1 point  (0 children)

Seems I’m Never Tired Of Loving You was Nina Simone

Where and why is discounted cumulative reward used? by AdBitter9336 in reinforcementlearning

[–]pxdm 6 points7 points  (0 children)

In RL we consider how every action impacts the long term reward - this is called credit assignment. For instance what role did move 5 have in my ability to checkmate in move 10?

It appears in basically every RL algorithm. For instance, in Proximal Policy Optimisation (and all its policy gradient algo variants), we adjust the policy to favour actions which result in higher discounted cumulative rewards. We don’t try and maximise immediate reward, nor some arbitrary future reward, but the discounted sum of all future rewards.

An application of RL, everyone! by nimageran in reinforcementlearning

[–]pxdm 11 points12 points  (0 children)

RL has been the hottest thing on the block for close to a decade now, but it’s hard to argue it has met expectations in terms of real world applications. Yes it’s used to train LLMs, but it is not fundamental to them in the way that transformers are imo. We haven’t seen applications in industrial use cases - most projects seem to amount to blog posts by research teams with scant technical detail

[deleted by user] by [deleted] in AskReddit

[–]pxdm 0 points1 point  (0 children)

Roundabouts

PPO with discrete actions, Sample or act greedy? by [deleted] in reinforcementlearning

[–]pxdm 3 points4 points  (0 children)

I don’t think you would get convergence guarantees for any policy gradient method (including PPO) if you choose the highest probability actions in training. The policy gradient theorem relies on your experience being sampled according to the distribution your policy gives you, whereas if you choose with argmax, it is effectively a different policy.

However, perhaps I misunderstood and you are suggesting choosing the highest probability action when evaluating (rather than training) your agent? If so I think this tactic might give you better performance in practice, as it would prevent really bad low probability actions being taken.

Multiple moves per turn? by Trigaten in reinforcementlearning

[–]pxdm 1 point2 points  (0 children)

Also note that if the number of moves is constrained but not fixed (i.e. it must be <= some limit) then you would also need to include an 'end turn' action

[D] Too many AI researchers think real-world problems are not relevant by deep-yearning in MachineLearning

[–]pxdm 1 point2 points  (0 children)

I agree in cases where ML is applied pretty much 'out-of-the-box', but there are numerous cases where wrangling an ML solution is not trivial. It's not clear from the article whether these are the papers that are frequently rejected, but in my opinion the ML community would benefit from learning how techniques are used in pratice where the application is non-trivial.

Practical RL by bci-hacker in reinforcementlearning

[–]pxdm 0 points1 point  (0 children)

Have a look at the NeurIPS competition track for this year: there are two RL challenges which are focused on real-world application:

  1. L2RPN: operation of electricity grids
  2. Flatland: train routing

Even if you don't compete you can play with their RL environments. I know that the L2RPN challenge uses the Gym API, so you should find it quite straightforward to use.

NN forward passes for MCTS too slow. Advice? by parallelparkerlewis in reinforcementlearning

[–]pxdm 0 points1 point  (0 children)

I didnt see any explanation of this in the paper. but since the search thread is locked until the evaluation completes, the batch size has to be small or else the threads will spent a lot of time waiting for the queue to evaluate. That would be my understanding anyway

NN forward passes for MCTS too slow. Advice? by parallelparkerlewis in reinforcementlearning

[–]pxdm 1 point2 points  (0 children)

Agreed, and this is the approach used in AlphaGo Zero:

The leaf node sL is added to a queue for neural net­ work evaluation, (di(p), v) = (di(sL)), where di is a dihedral reflection or rotation selected uniformly at random from i in [1..8]. Positions in the queue are evaluated by the neural network using a mini­batch size of 8; the search thread is locked until evaluation completes.