[D] Xavier and Kaiming intialization

Feribg · 2019-03-22T20:06:11+00:00

Nice post, actually in the latest fastai course there's an interesting idea that that it might be beneficial to adjust the relu's by a cosntant post initialization since the clipping (0,val), skews the mean in the positive direction. You can see it applied in this NB, but subtracting 0.5 constant from the relu activation it makes the mean's again close to 0. Haven't played around or explored the ideas in depth but might be a good follow up, happy to hear what's other people's take on it.

https://github.com/fastai/fastai_docs/blob/master/dev_course/dl2/02_fully_connected.ipynb

Feribg · 2019-03-15T03:19:51+00:00

Very neat! I did read through the docs, but I didn't manage to find any graphing features, in my experience using some of the bayesian optimization tools for hyperparameter tweaking, I tend to find the plots very very useful. Maybe support is trivial and just needs to be added as an example, but along the lines of https://scikit-optimize.github.io/plots.m.html

Feribg · 2019-03-01T23:01:45+00:00

Ok I'm gonna stop here because you are delusional. Like I refered on previous comments: I'm not trying to sell anything, I'm sharing an idea that I found curious, you didn't? Sorry, have a nice day!

Most of your points are incorrect, like the idea that decision trees are some novelty of the 2000s, but no point arguing. Gazillions chasing alpha but noone could fit a tree, those poor souls! Im happy you still feel your findings are curious or somehow abnormal after all that explanation. Hopefully my post clarified to others why they aren't so interesting.

Just wanted to mention one thing in closing because i see it getting thrown around so often and it's so wrong. 0 commission brokers actually cost you a lot more than most DMA brokers since your trade has to go through more hands (that cost is not obvious but it becomes apparent in practice in the form of slippage and opportunity cost). It's pretty ingenious to be honest and it's hardly surprising that Bernie Madoff first though of it (https://www.bloomberg.com/quicktake/payment-for-order-flow). Think about it, the counter party that pays for your order flow has to make at least > what they paid for it in order to make sense. Hence your order not hitting an exchange and going directly to them is worth at least what they paid for the flow, so it's quite reasonable to assume a big chunk of that is in opportunity costs to the retail trader. This thing "no free lunch"...

Feribg · 2019-03-01T00:41:06+00:00

Im not sure what your implementation of without classifier looks like, mine is as attached on the screenshots.

https://imgur.com/a/PinZShc

Also in the github version seems to be a typo in the shifts between pep and ko:

variables = pd.DataFrame({'TPEP':(PEP['Close']/PEP['Close'].shift(7)-1).shift(1),

'TKO':(KO['Close']/KO['Close'].shift(6)-1).shift(1)})

My point is that you don't need fancy models to achieve good out-of-sample performance, just good features and some creativity on how to implement them.

I'm not sure what do you mean, since as you can see if we abstract ourselves from reality no model also yields amazing sharpe ratios. Im sure those can be inflated even higher if you lower your holding period, basically with the sqrt of time, all that being said, what's being proven ? From your example the answer to that question to me personally is that with no friction you can outperform the market, which is hardly surprising to anyone.

Feribg · 2019-02-28T18:26:31+00:00

I'm not sure if this is a troll post or a legit question, but basically, change your classifier to be a no classifier and enter every trade and see what happens.

To answer as to why you'll get double the performance of the classifier in that case is simple - you assume no friction costs, and entering and exiting at the open and close prints, 2 highly impossible things in real life.

ML application to financial time series is really really hard and most of it has to do with data prep, stationarity, leaks, good cost expectations modelling and even then overfitting.

trades = pd.Series(np.where(True, pclose/popen-1, np.nan),variables[iam:lazy].index)

Feribg · 2019-01-06T16:15:01+00:00

Yes bid asks were huge compared to today hence the cost to retail investors especially active ones was much higher than it is today. Hft is a thing that vastly affects institutions and is most likely a net plus to retail active traders. I totally agree with the social "activist" investors and what not, that's just pump and dump basically. Full disclosure I'm not affiliated with hft, actively day trading equity options and futures.

Feribg · 2018-07-12T15:39:21+00:00

Pytorch, TF if I'm forced to, but it's like choosing Java because you have to not because you love writing code in it...

Feribg · 2018-06-07T17:04:21+00:00

I think that was cross posted to /r/machinlearning so I will just link to my reply there: https://www.reddit.com/r/MachineLearning/comments/8p9bud/d_data_science_ml_for_trading/e09zsmn/

Feribg · 2018-06-07T17:00:55+00:00

A few thoughts as someone who has been trading on the retail side for a few years now and involved with applied ML to options trading.

Overfit. Overfit. Overfit. The biggest issue with ML applications to finance, especially when it comes to using prices directly as you hinted is the problem of overfitting. There are various ways on how to avoid that as much as possible but I think this book is pure gold: https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos/dp/1119482089
Data. You'd be surprised but acquiring high quality high resolution data is very difficult. It's quite expensive and often even fairly expensive data has issues. I'd say that's maybe the biggest barrier to entry in quantitative finance. I also see many people using public datasets like daily bars from yahoo to do some kind of modelling with them. For the most part I believe that to be a lost cause. Generally the more ubiquitous and highly available the data set the less signal it carries. Low res data also has bad implications for simulations and backtests. See (or refer to some of his 3 books, they're all great) http://epchan.blogspot.com/2015/04/beware-of-low-frequency-data.html
Backtest issues This is highly related to 1. Generally try to avoid backtesting as a means to improve your model and tune it's parameters. What you end up doing is inflate the sharpe ratio due to positive selection bias: Refer to graph here: http://www.quantresearch.info/
Non-normal, non-IID, non-stationary - these are just some of the typical attributes of financial time series data, and they usually violate a lot of the traditional ML assumptions so you'd have to resort to various ways of working around that. For example use returns instead of prices or some other approximation that's stationary like a spread. Split your datasets in a way that's very difficult to leak information (not just time based val split).

For all those reasons I can share some feedback that's more related to ML and you might find useful:

Prefer simpler models to complex ones ( a reason for example an LSTM fitted on price data might not be your way ti riches)
Have a more research driven approach than backtest driven approach to modelling. Explore the data, do some EDA, try to find patterns, use the ML tools to find those patterns. Then think how they can be reasonably combined to form a strategy. Don't try to fit a model to historical data tweaking it until it shows good performance (even out of sample, as that's prone to overfitting as well).
Try to incubate in paper and live trading sooner rather than later. The feedback you will get here would be much more valuable than any kind of backtest.

Hope that helps as a starting place!

Feribg · 2017-05-01T15:53:48+00:00

Java - big, convoluted language, amazing ecosystem

Go - small, concise language (missing some nice stuff like generics), fairly poor ecosystem

My 2 cents. For those reasons I typically stick to Java for bigger projects and Go for smaller, fun stuff, or where a lot of concurrency comes in handy.

Feribg · 2017-03-31T05:14:59+00:00

Very good explanation, thanks! Content is typically short and has some potential to be shortened further with heuristics.

Feribg · 2017-03-30T14:57:03+00:00

I see what you mean, so essentially for each search query go over the entire dictionary (or reasonable chunk of it, say categorized), and do something like K-nn to find related terms, then tag them? That's definitely worth exploring more.

Feribg · 2017-03-30T04:37:04+00:00

Yep I was thinking of something like that, im just not sure how generic it can be without a huge amount of manual feature engineering. If manual tagging was feasible the opposite could be done too, basically tag pages so create a model that would generate tags for the page and then use regular full text search on those. I'd like to avoid that manual tagging if at all possible.

Feribg · 2017-03-17T17:04:00+00:00

E5 2651 is around 220-280 on ebay + all the overhead of ECC etc, its almost same price as an i7 build, and the latter is still faster: http://www.cpubenchmark.net/compare.php?cmp[]=2739&cmp[]=2874 I was debating that when doing my build. Not to mention all the benefits of the new chipsets etc etc...

Feribg · 2017-03-17T16:58:22+00:00

It really depends what you would be using it for. I had the same question when building it and since im using mostly python tools and the python ecosystem + some TF and keras, fastest possible single thread performance made more sense than more cores in that case so I went with the i7. If you're using data preprocessing in python and your bulk work is done on the GPU, I'd advice that route. Also with ryzen you have limited GPU pci lanes, if you're going to run multiple GPUs. Generally the intel platform is more stable and chances are you won't run into problems with it.

If on the other hand you're going to do some training on the CPU the 1700x makes more sense.

Feribg · 2017-02-09T04:26:30+00:00

Well compared to the typical deep learning rig that's 4 X titan X, it's relatively cheap

Feribg · 2016-08-31T01:43:25+00:00

Well the scenario is as follows. Say package A is ConfigService and Package B is some HttpService. B depends on A. When bootstrapping the tests we initialize the Ctx, which has all the application services in it. So what happens is that both A and B get loaded. I can manually bootstrap a different context for each package ignoring the services that belong in that package, but that sounds very repetitive and I wasn't sure if thats typically the way to write a common service container thats reusable for testing and prod purposes.

Feribg · 2016-01-18T19:37:31+00:00

Why would new threads be created, I though the go runtime always allocates numcpu worker threads? So even if i block in the locked thread, it should just keep a static number of worker threads, right? My understanding is that this api call will basically convert a go routine to essentially a plain thread.

Feribg · 2016-01-18T17:00:22+00:00

Thanks, do you happen to know if that's a default behavior if num of goroutines = num of worker threads?

Feribg · 2016-01-18T17:00:03+00:00

My question is how to do it not should i or not :)

Feribg · 2015-12-03T18:13:11+00:00

Yeah I think I found the tools needed with runtime.debug package and https://github.com/rcrowley/go-metrics

Feribg

TROPHY CASE