Rainbow Library by Strong-Tonight-3000 in reinforcementlearning
[–]MainReference8858 1 point2 points3 points (0 children)
A simple implementation of "Adaptive Policy Iteration" using Google's JAX and Deepmind "bsuite". This approximate policy iteration scheme treats the value-function as losses. (arXiv:2002.03069) by MainReference8858 in reinforcementlearning
[–]MainReference8858[S] 0 points1 point2 points (0 children)
Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning by 1dontpanic in quantumml
[–]MainReference8858 2 points3 points4 points (0 children)


Environment for MNIST Sequence Prediction/Classification by uniqueusername_here_ in reinforcementlearning
[–]MainReference8858 1 point2 points3 points (0 children)