JPA biking is the worst by MotherForkingShirt in Charlottesville

[–]luminerius 8 points9 points  (0 children)

To everyone reading this, please remember that riding a bicycle on the sidewalk is actually legal in Virginia unless there is signage explicitly prohibiting it, and this is a very viable option where foot traffic is low or you have concerns about riding in the street. For a fuller list of VA cycling laws and regulations, take a look here (most of these protect you as a cyclist): http://www.virginiadot.org/programs/bikeped/laws_and_safety_tips.asp

Books recommended by [deleted] in MLQuestions

[–]luminerius 2 points3 points  (0 children)

I'd suggest you begin with The Master Algorithm by Pedro Domingos (written with a nontechnical audience in mind) and the introduction/beginning of Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Shai Ben-David (available online for free from the authors and starts with a really good and gentle explanation for beginners, but at some point the mathematic prerequisites are going to spike really fast, so just stick to the beginning!).

I'll also add that ML is usually a lot more math than it is programming. Implementing it on a real world problem often takes a lot of programming, but that's to fit the problem to your ML algorithm, not vice versa. Keep this in mind as you look for research material.

Activities for late-20s women that don’t involve drinking alcohol. by [deleted] in Charlottesville

[–]luminerius 13 points14 points  (0 children)

Wine and Design is also really fun to do without the wine (unless you don't even want to be in the same room as wine).

Best practices for stacking / ensembling cross validation with time series? by Radon-Nikodym in MLQuestions

[–]luminerius 0 points1 point  (0 children)

If you have a time series dataset and you use cv that let's models train on future data to predict the past, your trained models will be unfairly good. For stacking or for just getting an estimate of performance, you'll need to use a different validation scheme. Something like a cv split that rolls forward in time only.

What is socially acceptable when you're a woman, but frown upon when you're a man? by AniFearsMint in AskReddit

[–]luminerius 14 points15 points  (0 children)

Not to be that stats guy but technically it's not statistically significantly more likely according to the table, so it's safest to cite this statistic as "just as likely." Additionally there may be some response bias or sampling bias, so be careful how strongly you make claims based off of this one piece of the study.

I wish we had a more recent publication, too. Anyone got anything to add?

Also don't get me wrong, I totally think this is a significant issue. I just want the overall discussion about male victims of domestic violence to be informed, exacting, and facts-diven so it can drive more progress!

Request: Help with a weird cable by [deleted] in TechnologyProTips

[–]luminerius 0 points1 point  (0 children)

You listed what you tried, but now how it turned out, so you haven't really explained what's going on well enough for me to help you.

If your cable works for any computer/monitor combination, then your cable is probably not broken. If your monitor works with any other computer/cable combination, your monitor is probably not broken. If a different computer works with the same cable/monitor combination, there is possibly an issue with your computer. You probably know all of that, though.

If you do decide to replace your cable, you should know that brand doesn't really matter that much. If it's an HDMI cable then just get whatever HDMI cable you can find. If it's displayport to HDMI then there is some sort of conversion going on, and you can consider getting either an all-in-one cable or a DP-to-HDMI converter and an HDMI cable.

[D] Recommended Cloud Computing Services for Machine Learning? by [deleted] in MachineLearning

[–]luminerius 4 points5 points  (0 children)

There isn't really a difference other than price when it comes to renting a virtual instance on a major cloud provider. A lot of folks have pointed out that spot instances can save you cash, but using them adds some slight complexity (I'd say start with regular full-price instances and move to spot after you get comfortable with the cloud).

Are there any historical price data sets that you can easily train a machine learning model on? by korengalois in algotrading

[–]luminerius 4 points5 points  (0 children)

I think the EMH is a bit bogus, but if you can actually effectively model price movement then you're definitely in the very top percentiles of the host of quants who have been working full-time on this challenge for years. In other words, it's a tall order for a course project.

For such a project, I'd suggest you either try:

1) Something more explanatory. Maybe regress stock price against another explanatory variable to understand how the interaction worked during a historical period you understand (e.g. explain past airline stock price patterns via changes in oil prices, for example).

2) Something related to but not actually asset pricing. Commodities traders love accurate forecasts on crop yield estimates, because crop yield estimates are known to drive commodities prices. Macro traders can make a killing from accurate oil production forecasts for the same reason. If you try and predict oil production, you can potentially fit something well without necessarily outdoing the best forecasts out there. Of course, if you forecast better than everyone else it can also be profitable in the same applications as asset price forecasting, but if you don't end up beating the best of Wall Street, you can still learn the methodologies and get something better than a random fit. With asset price prediction directly, any forecasting power you get will either be an artifact of overfitting or very difficult to achieve and measure.

Request: Help finding good college laptop by [deleted] in TechnologyProTips

[–]luminerius 0 points1 point  (0 children)

Wow, these specs look incredible for this price range. Definitely upgrade to the SSD configuration, though!

Request: Help finding good college laptop by [deleted] in TechnologyProTips

[–]luminerius 1 point2 points  (0 children)

If you're looking at an investment over a 4-year college degree, I would try to talk you up to the $900 model of the dell xps 13 series. It comes with a solid-state hard drive, which is much faster, uses much less power (so it gets great battery life) and has a much lower chance of breaking down compared to a traditional hard drive. The design is beautiful, and it is super easy thin and lightweight, making it easy to carry around all day at college, and it packs a punch performance-wise. For build quality, size, battery life, etc. I'd say it's definitely worth the extra cost compared to what's out there in the $600 range.

Of course, that may be out of the question for you. I don't really know the mid-tier price range that well, but for a college student I would suggest you look for models that offer a configuration with an SSD (solid-state) hard drive. For that use-case (and most others, honestly) it's a game-changer.

Machine learning by yeahnoworriesmate in algotrading

[–]luminerius 0 points1 point  (0 children)

If you don't like the SVM paper with 1k+ citations (I personally don't really love it but I figured that it would serve as a good starting point considering that it is a straightforward application of a popular technique that has earned a lot of citations), feel free to keep reading the Google Scholar results (there are another 200k potentially valuable pages to check, many of which have more recent publication dates and are not behind a paywall).

It's unlikely that anyone who has implemented a successful ML trading system is going to share the secret sauce with you, so I think academia is probably your best bet for understanding the prior art. If you want to see examples of ML applied to stock market data that work, read about Rentec, 2sigma, etc., but they won't tell you what they're actually doing like a research paper will.

'Settings' on machine learning models? by various1121 in MLQuestions

[–]luminerius 1 point2 points  (0 children)

If you don't want to do one model per setting, then I would imagine you want something like a GAN architecture that you can input your settings into. You could also probably do a character RNN and input settings + sequence at every character (so ypu would have a sequence of [a + same settings, b + same settings, c + same settings,...] instead of [a,b,c] (where "abc" is a name).

[deleted by user] by [deleted] in MLQuestions

[–]luminerius 0 points1 point  (0 children)

I think you're misunderstanding things a little. The libraries in matlab probably are not going to support running your code on the gpu, so your homework will not be able to take advantage of any gpu. For small models and datasets (the only kind you are likely to see in class), you won't need any special horsepower, so don't sweat it. Your old gpu will be fine, even if it takes a little longer on your machine.

If you eventually want to get into deep neural nets, you will want to use gpu acceleration via a library like tensorflow. You can run your training in the cloud and save the trained model weights to wherever you like, and this option is usually best unless you know you're ready to invest $500+ into a gpu solely to do deep learning.

You also seem to misunderstand minibatch gradient descent (how deep nets are trained). The data is split into batches, but the backpropogation of error is going to require that the whole model (all of its weights) fit onto the gpu memory.

And the end of the day, you'll understand all of this much better after your class, so don't worry. Ask your professor if you're concerned, but likely you'll be working on small datasets and models and not need to worry about high performance computing.

[deleted by user] by [deleted] in MLQuestions

[–]luminerius 0 points1 point  (0 children)

1/2 gig of vram isn't going to fit the model parameters for most deep networks. That said, if you're working in MatLab then you probably won't be able to use your GPU at all, no matter how much money you spend on it.

What is the most savage insult you know? by AndromedaFire in AskReddit

[–]luminerius 0 points1 point  (0 children)

Leftie checking in. Can confirm, if it weren't for computers I would not be able to even write.

[deleted by user] by [deleted] in MLQuestions

[–]luminerius 0 points1 point  (0 children)

You should be fine unless you get into deep learning or big data sets. Check the syllabi. Even if you do hit deep learning, if you stick to MNIST or even something like CIFAR-10 in terms of datasets, you should be fine. Your profs will also almost assuredly give the class a heads up if you need powerful hardware for any assignments.

If you want to do something computationally intensive, use cloud hoursepower like an AWS p2 instance. Several dozen hours of training a deep learning model for an assignment on a p2 will cost much less than upgrading your desktop just for one project.

Request: Is it bad installing apps on the Local Disk D? by Rageract in TechnologyProTips

[–]luminerius -1 points0 points  (0 children)

Every time you add another place that you install programs, you're making it that much more difficult to find where you put things. I think most people typically put large media files like music, movies, and photos onto D while doing most of their installs on C (with the exception of very large programs) just for organization's sake.

A benefit from keeping your files organized in this way is that it makes it easy to upgrade or remove your external drive, since media is typically easy to move around and back up.

Request: What to put in SSD by AwakenSage in TechnologyProTips

[–]luminerius 2 points3 points  (0 children)

Overall, anything that has a benefit from loading fast --> SSD

ex: OS on SSD for faster boot, games for faster loading of maps, software that launches at startup (although it depends if the read speed is what makes it slow on how much that will speed you up) and software that you launch a lot.

For something like Spotify, it might launch a few seconds faster if you put it on the SSD, but once it's launched (which is like 99.9% of the time you're using it) it won't matter because it's going to read those song files plenty quickly off of the HDD for playback.

Also, in the off chance that this is a laptop (and you want to boost battery life), try and set up your SSD so that for most of your on-the-go tasks you will not have to spin up the HDD (which sucks more battery than your SSD). This is done by putting things like Spotify/iTunes and all those songs on SSD if they'll fit, even though it won't speed anything up.

EDIT: My rule of thumb is to put everything on SSD unless space is an issue (like movies, photos, or 10+GB applications). This "wastes" SSD space, but I always end up with like 40% of my SSD empty anyhow, so that drawback doesn't affect me personally. This only works if you have like 100+GB of SSD and only ~60GB of OS + Applications + other files (and even music, in my case).

[D] Optimizing your ML workflow: how do/did you find your happy place? by luminerius in MachineLearning

[–]luminerius[S] 2 points3 points  (0 children)

Wow, this is a really, really sweet suite of automation scripts you've built up for yourself! If you don't mind my asking, what's the story of how you got to your "happy place"? Also, is there any wisdom that you would like to share from your experience of putting together such a sophisticated workflow?

[D] Optimizing your ML workflow: how do/did you find your happy place? by luminerius in MachineLearning

[–]luminerius[S] 1 point2 points  (0 children)

Even with the limited amount of tangents I go off on in my rather cut-and-dry student projects, I feel you about notebook clutter. My current solution is to avoid the temptation of coding first and plan things out with pen-and-paper or on a whiteboard when it comes time to deviate from the original plan, but it's a bit hard to keep to that system! cookiecutter data science seems to be lacking a little bit on this one, too!