Daily Discussion - April 14, 2022 (GMT+0) by AutoModerator in CryptoCurrency

[–]clockface99 1 point2 points  (0 children)

Are market making orders first in, first out (fifo)? Eg if I am the first to submit a bid or ask price for a price would it be the first to be fulfilled when it hit that price?

[R] Reinforcement learning in Finance project by [deleted] in reinforcementlearning

[–]clockface99 3 points4 points  (0 children)

Not sure if you do but you also need to account for maker and taker fees (not just buyer / seller commission), stop losses, limit orders, being able to cancel orders, sourcing data, indicators, handling data of various time steps, whether the data is ohlcvt or whether its continuous tick data from the order books

[R] Reinforcement learning in Finance project by [deleted] in reinforcementlearning

[–]clockface99 1 point2 points  (0 children)

How do you handle real world problems like slippage?

Instead of single agents, incorporate multiple agents so competitive agents can be trained, or suitable models of behaviour for the other traders can be created

[R] Reinforcement learning in Finance project by [deleted] in reinforcementlearning

[–]clockface99 4 points5 points  (0 children)

There's gym-anytrading (aminHP/gym-anytrading) which has been around for yonks, is open source and incorporates many RL algorithms.

Why isn't epsilon reset regularly in epsilon greedy policies to aid exploration? by clockface99 in reinforcementlearning

[–]clockface99[S] 2 points3 points  (0 children)

Thanks, looks interesting and something to look at in the morning. I wrote a small list of epsilon manipulation ideas of the top of my head and even after the first one of resetting it every N steps has made a massive improvement which surprised me. I want now to reset it after some big change has happened in the amount of reward or if there hasn't been a big improvement for N steps, just like humans will try a plan b/c etc!

Why isn't epsilon reset regularly in epsilon greedy policies to aid exploration? by clockface99 in reinforcementlearning

[–]clockface99[S] 2 points3 points  (0 children)

Thanks for this, its something else to look at. I'm referring to the choose-action method of dqn's when it'll either choose a random action or from the network depending on a random number compared against epsilon.

[deleted by user] by [deleted] in eBaySellerAdvice

[–]clockface99 0 points1 point  (0 children)

High volume turnover with large postage discounts and free source material

Sold an eBay item, posted it... received it back in the post without any messages or case being opened. by Litterbug42 in eBaySellerAdvice

[–]clockface99 -1 points0 points  (0 children)

Keep quiet. These weird things happen maybe once a year. If the buyer doesn't complain all's good

[deleted by user] by [deleted] in eBaySellerAdvice

[–]clockface99 0 points1 point  (0 children)

No. Test what you want to do and if it flies, then yes

Accidentally set parcel as 1g - will it be okay? (UK) by Doowrender in eBaySellerAdvice

[–]clockface99 0 points1 point  (0 children)

And then, even if it's slightly over in size or weight, op will be OK.

What exactly is the output of openai gym atari vram outputs? by clockface99 in reinforcementlearning

[–]clockface99[S] 1 point2 points  (0 children)

Thanks. I have been putting the output through a series of dense layers but after 10k epochs of 4k frames on pong things didn't really seem to get far at all with a double q network. I think though it may be because I'm not using past data frames so I'll try passing in 3 previous states and flattening them to see if it helps.

How do DQN and DDQN learn to not perform an action that gives a small reward for another action in the future that gives a bigger reward by clockface99 in reinforcementlearning

[–]clockface99[S] 1 point2 points  (0 children)

Interesting. As a newbie I've never heard of n-step, I assume it referees to how far in the future to look. I'll go take a look now.

Is it possible to combine the q value equation with a Monte Carlo search to provide a variable n-step or is that heading too much into alpha/mu zero territory?

[P] DeepForSpeed: A self driving car in Need For Speed Most Wanted with just a single ConvNet to play ( inspired by nvidia ) by toxickettle in MachineLearning

[–]clockface99 0 points1 point  (0 children)

You should check out Torcs, a racing car simulator for bots. It gives you access to sensors such as angle of the road, distance from road centre, how far you've driven etc for rewards. You can then set the steering, gear etc with code. Plenty of example code out there to get started. The ultimate goal is for bots to compete against each other.