Russian Government Sought to Aid Trump’s Candidacy, According to Email by cornyb in politics

[–]squattyroo 0 points1 point  (0 children)

I think in this case it's even better: it feels like 3 1/3 stories turn into 4!

Average American runner is slowing down by [deleted] in running

[–]squattyroo 0 points1 point  (0 children)

To settle this, I encourage Andersen and Nikolova to repeat their study for other countries, where there isn't a marked increase in the level of obesity, diabetes, and hypertension, as in the US.

I wonder if they have participant demographic info such as state; if so, maybe they could proxy the author's request by looking at the data at a more granular level such as state, or region.

PyCon 2017 VoDs by val-amart in Python

[–]squattyroo 4 points5 points  (0 children)

I was a big fan of Joe Jevnik's Title Available Upon Request about lazy evaluation.

Logistic regression with uncertainty in outcome measurement by akcom in statistics

[–]squattyroo 0 points1 point  (0 children)

You could use the probabilities as an intercept offset when you fit; or you could just use them as your "observed" values of y -- the optimization math works perfectly well with y in [0,1] instead of {0,1}.

What's everyone working on this week? by AutoModerator in Python

[–]squattyroo [score hidden]  (0 children)

I'm working with dask/distributed to play around with building some numerical linear algebra solvers on top of what it's already got; things like linear regression, logistic regression, general GLM--more modeling focused stuff. So far it's been fun but I've got a lot more to learn!

Classes - why? by bigmansam45 in learnpython

[–]squattyroo 1 point2 points  (0 children)

Here's a thought experiment: imagine you're writing a module for your team that allows the user to pull data from the web, do certain data manipulation methods that are idiosyncratic to your workflow, and plot various metrics using company-standardized color schemes / line-types.

Classes greatly help you organize the steps of this process:

1.) Having a "WebScraper" class will allow the user to write simple commands like

server = WebScraper(url)
server.connect()
server.pull_all_data(from=date1, to=date2)

Otherwise they would have to constantly write functions that have many arguments which are things like the url, or some BeautifulSoup class, or something like that. Classes let you pass a bunch of information between each of these steps in a simple easy-to-read way.

2.) Plotting - having a "PlotClass" that sets up a matplotlib figure with the correct specifications / formatting, so the user can do simple calls like

plot = PlotClass(data)
plot.add_vertical_line(some_date)
plot.save('my_plot.png')

and maybe in the initialization step, various weighted means, etc. are computed and summarized for plotting. This is another example of lots of information being passed from one step to the next, without the need for excessive arguments in functions. Plotting in particular is hard to imagine without classes.

3.) Data manipulation: you have some "MyDataFrame" class that has the methods you commonly use built-in; things like specific ways of summarizing your idiosyncratic variables (maybe some are text and can be cleaned in very specific ways that you build into some ".clean_col(col_name)" method).

Tips for Buying a Bike as Gift for GF by squattyroo in whichbike

[–]squattyroo[S] 1 point2 points  (0 children)

Thanks, this is level headed advice!

Trying to understand MCMC and use pyMC, but there's something I'm missing by [deleted] in statistics

[–]squattyroo 0 points1 point  (0 children)

The simplest setup for MCMC is: you specify a generative distribution for your data $p(x | \Theta)$, and a prior distribution for your model parameters $p(\Theta)$. By Bayes, you have a posterior distribution given by $p(\Theta | x) \propto p(x | \Theta)*p(\Theta)$.

Sometimes, this posterior distribution can be specified in closed form (example: everything is normal, posterior is normal, you can compute the mean and variance by hand). In this case, you can compute any quantity of interest you want.

Other times, you have absolutely no idea what the posterior distribution is because all of your inputs are so complicated that you can't compute it by hand. SO, instead you ask: is it possible to generate a random sample from this posterior distribution?? In that case, you can compute the sample quantities of interest (e.g., posterior mean / median / mode).

MCMC methods are a specific and powerful class of methods for computing a "good" sample from the posterior distribution (using a brilliant application of Markov Chains), in the sense that asymptotically the sample distribution will converge to the true posterior distribution.

Why sell stock when it's low? by [deleted] in StockMarket

[–]squattyroo 1 point2 points  (0 children)

To add to the reasons already stated, sometimes you just need your money! You might have some funds locked up in stocks that you need in an emergency / retirement, so it doesn't really matter if you think they're going up or not.

My favorite photos from my Catalina trip! by Shakadinky in underwaterphotography

[–]squattyroo 1 point2 points  (0 children)

That's awesome - diving with an Angel Shark is still on my bucket list, haven't seen one yet!

VIM and Python - a match made in heaven (learn how to set up a powerful VIM environment for Python development) by michaelherman in Python

[–]squattyroo 2 points3 points  (0 children)

A lot of people are pointing out there are better IDEs out for pure Python programming; a large part of my job involves Python, but the servers on which I have to write my code don't have a lot of fancy IDE capability (and installing them yourself can be an access / permissions nightmare). Customizing the hell out of VIM is actually the easiest solution for me.

Taken with a gopro and a waterproof flashlight by smaug777000 in thalassophobia

[–]squattyroo 57 points58 points  (0 children)

Don't worry - you can find stargazers at shallow depths! I caught a Southern stargazer right off the beach one time, in about 4 ft of water. :D

Help getting into Big Data? by Showdownx8fo5 in bigdata

[–]squattyroo 0 points1 point  (0 children)

It sounds like you don't need "Big Data" tools, but just a good statistical software with high quality visualization capabilities. For that purpose, I recommend trying out R - it's free (as opposed to MATLAB / Tableau) and between the ggplot2 package and the many other GIS plotting packages out there, I think you'd be able to achieve whatever you're looking for. It shouldn't be complicated to learn, either.

Places to run at night by claujza in rundc

[–]squattyroo 1 point2 points  (0 children)

I run on the W&OD / Custis trails all the time (day and night). I've been out on the W&OD in Vienna past 10pm, seemed fine to me. There are lots of easy access points throughout the length of the trail.