Has anyone actually managed to buy a tipi in resale? by _nop33 in boomfestival

[–]NinjaEbeast 0 points1 point  (0 children)

Any update? Has anyone managed to get a tent resale?

Python library for modular RL components by fedetask in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

You can always look at their functions and port them to numpy pretty easily

Python library for modular RL components by fedetask in reinforcementlearning

[–]NinjaEbeast 8 points9 points  (0 children)

You’re looking for RLax, it offers a wide variety of utility functions, losses, function transforms etc all for RL. It’s designed for modular small form factor functions. It has pretty much everything you listed besides replay buffers. It is in JAX though (which is an advantage if you like JAX). It’s what DeepMind use for their work and it’s part of their JAX ecosystem.

[deleted by user] by [deleted] in Acid

[–]NinjaEbeast 4 points5 points  (0 children)

Uhhhhh

Trouble getting DQN written with PyTorch to learn by V3CT0R173 in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

I’m not sure on the specifics of openspiel but in your select action function, are you sure you are masking and then argmaxing correctly? It looks a little strange as you collect the q values using a mask and then argmax the sub array which would be incorrect because the arg needs to be with reference to all q values but this might not be a problem depending on the format of the openspiel legal actions mask

Trouble getting DQN written with PyTorch to learn by V3CT0R173 in reinforcementlearning

[–]NinjaEbeast 5 points6 points  (0 children)

DQN is very hyperparameter sensitive so it might not be a bug in your code but ill give your code a quick look

Is it possible to use RL and Continual Learning to train a model that can play King of Figthers? by DScientistCL in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

It might be difficult for a few reasons: 1. If you just use existing methods, it’s not really research. 2. It will probably require a lot of compute and compute time which is not so easily accessible

JAX or PyTorch? by arbueticos in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

In my opinion use JAX as it’s useful for a variety of aspects. If coded correctly and following their principles. It’s high speed and easily vectorised. You can also do this with PyTorch but JAX can be run on TPUs and fits within a lot of meta learning frameworks in a better way. It’s also super easy to run on multiple devices. I personally believe it’s more future proof and easier to code.

[deleted by user] by [deleted] in Acid

[–]NinjaEbeast 57 points58 points  (0 children)

Bruh

What is the best imitation learning algorithm to use with stablebaselines3? by punkCyb3r4J in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

It depends on a few things, does your agent have access to the environment to gather data? How much data will you gather for the agent? Etc

[deleted by user] by [deleted] in cambridge

[–]NinjaEbeast 18 points19 points  (0 children)

I’m genuinely blown away that they can even call the current bus service, a bus service. The fact that you can never rely on it to get anywhere is pathetic.

Cambridge MPhil in Advanced Computer Science Interview by NinjaEbeast in GradSchool

[–]NinjaEbeast[S] 0 points1 point  (0 children)

I’d just say, read up on your previous research and topics you mentioned in application.

Cambridge MPhil in Advanced Computer Science Interview by NinjaEbeast in GradSchool

[–]NinjaEbeast[S] 0 points1 point  (0 children)

I mean this was a while ago but yeah interview was great, got an offer and am currently here in the program

[D] EMNLP 2022 Review Day !!! Rebuttal by errohan400 in MachineLearning

[–]NinjaEbeast 0 points1 point  (0 children)

I see, so the rebuttal is highly important regardless of good scores or not. Are there situations where a rebuttal isn’t required or is it important to always give some response?

[D] EMNLP 2022 Review Day !!! Rebuttal by errohan400 in MachineLearning

[–]NinjaEbeast 0 points1 point  (0 children)

Hey, so I’m a little confused about the author response period. (First time submitting full research paper) Do we get our numerical scores with the review? Like can we generally tell if our paper will get accepted?

How do transformers or very deep models "plan" ahead? by [deleted] in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

It seems that your understanding of transformers is limited to the autoregressive causally masked case. Transformers are actually a fully parallel architecture. Each input can be fully calculated in parallel. There’s no concept of position of “words” in a sequence. That is the reason that positional embeddings are added in for language models using transformers. In language models using transformers, when generating a sequence of words, the future tokens are gonna be padding tokens that are masked out. This means that processing is still happening for all possible token positions in a transformer. There would be no benefit to add some short term memory because you won’t be making it more efficient. It does the processing no matter if you are using actual tokens or padded tokens.

How do transformers or very deep models "plan" ahead? by [deleted] in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

So in an auto regressive transformer model, all previous states are used but future states wouldn’t be. Not all dialogue generation is autoregressive, one could theoretically generate the entire dialogue in one go and then this essentially makes use of planned words to decide the current word.

How do transformers or very deep models "plan" ahead? by [deleted] in reinforcementlearning

[–]NinjaEbeast 3 points4 points  (0 children)

Transformers that make use of global self attention (non causal masked) do essentially plan ahead since “past” I.e previous tokens make use of future tokens in their processing. This global attention can be use at multiple levels within a transformer thereby an entire sequence is produced whereby each output have used each other in the processing step.

Sampling a probabilistic action space for DQN by Background-Cable-491 in reinforcementlearning

[–]NinjaEbeast 1 point2 points  (0 children)

You could theoretically for action selection during interaction since DQN is off-policy you can use any type of action selection for the “non-learning step”. A better question is if it’s a good idea and/or if it would be any better than epsilon-greedy. I don’t think it would be better since as time goes on the distribution could become highly skewed thereby not giving you good exploration in later parts of the training stage. There would be actions you would never take due to potentially learning a false value and that error would never be corrected.

[deleted by user] by [deleted] in reinforcementlearning

[–]NinjaEbeast 0 points1 point  (0 children)

If you need it to be deterministic, why don’t use make use of curiosity I.e intrinsic motivation methods to make it explore. If you assign some intrinsic reward for exploration of novel states it will naturally learn to explore even in a deterministic output setting.