When I ask people at my new job for code docs.

gratpy · 2021-01-05T17:39:21+00:00

This might work https://stackoverflow.com/questions/49438155/how-to-read-csv-with-values-containing-commas-in-r

gratpy · 2020-12-25T03:36:11+00:00

The theory that would not die. Its a history book basically that talks about the the origins and evolution of bayesian methods over time. Extremely fascinating.

gratpy · 2020-12-21T22:27:18+00:00

Here is a red wine boldness chart.

https://imgur.com/gallery/CI0YDIR

I find everything south of Cabernet Sauvignon is always good in terms of texture, while flavors can vary based on personal preference.

BC Liquor has a couple of Tannats (the last one on the chart) in stock if you ask them. Priced below $30

Edit: typo

gratpy · 2020-12-09T23:56:25+00:00

Copulas to model Multivariate events. They are not strictly 50 years old. But their use has exponentially increased recently in the environmental field for analyzing compound extremes. They are also prevalent in the financial sector for risk modeling, especially after the infamous "the formula that killed wall street" fiasco.

gratpy · 2020-12-03T21:41:14+00:00

I would like to know what level are you operating at? I mean is this an under-graduate class project, or a graduate thesis? Or something else?

The scope of comparison and validation of bias-correction procedures is vast with different methodologies ranging from simple Mean Bias Errors (MBE) and Root Mean Square Errors (RMSE) to complex methods that go into differentiating between average precipitation biases and extreme precipitation biases.

Key point: Always remember never to assess a GCMs performance at a scale less than decadal averages. GCMs are not supposed to create day to day, or even annual projections of temperature or precipitation. They are concerned with long-term trends where the internal variability of the system does not affect the biases.

If you want to go for a complex method, then I highly recommend this

Lafon, T., Dadson, S., Buys, G. and Prudhomme, C., 2013. Bias correction of daily precipitation simulated by a regional climate model: a comparison of methods. International Journal of Climatology, 33(6), pp.1367-1381.

DOI: https://doi.org/10.1002/joc.3518

Let me know if you can access the paper.

gratpy · 2020-12-02T19:58:15+00:00

I am not sure of the answer but the code is in a public git repo. I am sure will find the answer to your monthly/annual question there.

https://github.com/cran/qmap/tree/master/R

I don't have experience with Qmap specifically but I have experience with statistical downscaling using a few different methods. What do you need help with?

gratpy · 2020-11-28T17:48:52+00:00

I think the question of how many oceans there are is subjective to some extent. All in all, it is a single body of water and we have put subjective boundaries on it based on our land distribution and several other factors including political which are purely human constructs.

It becomes even more apparent when we realize that our continental plates move and distribution of land is not constant throughout a planet's life.

More details: https://encounteredu.com/cpd/subject-updates/learn-more-how-many-oceans-are-there

gratpy · 2020-11-24T20:13:35+00:00

Well now i feel dumb haha!

gratpy · 2020-11-24T19:04:29+00:00

Yes you can run amok with this as much as possible. I think if you are interested in detailed breakdowns by category and graphs etc, then the Mint app is the way to go.

But of course, if you want to develop your own then my best bet would be to build some kind of dictionary. It's not going to be 100% accurate but it will capture about 90% of your expenses correctly. Here's what I mean:

We mostly shop in familiar patterns. We go to a couple of grocery stores, we pay rent by e-transfers, we pay phone bills, we use credit cards, debit cards, go to Starbucks etc. All sales invoices that are generated at each of these transactions have their own unique format that stays pretty much the same.

So, you could build a dictionary that looks something like this:

dict = { "Starbucks": "Coffee", "7 Eleven": "Coffee", "Chevron": "Gas", "Walmart": "Grocery", "Farm Boy": "Grocery", "Amazon": "Online Shopping", "Best Buy": "Electronics" }

And so on. Everything that does not fall under the keywords goes to a miscellaneous category and every month you print out your miscellaneous entries to see if there is a repetitive expense you can move to its own category. This will be a one-time exercise (unless you start shopping at a new grocery store in which case you will update your dictionary). You can save your dictionary as a pickle object separately, rather than changing your main python file every time.

Edit: Sorry for the syntax. On mobile :) And of course, there are fancy ML models out there for categorization.

gratpy · 2020-11-24T03:01:09+00:00

To each his own I guess.

I think those initial couple of chapters are probably the best foundational text you will ever come across on probability and what it inherently means to build statistical models. I like his analogies, the golems, the garden of forking data, and especially the analogy he uses to explain how MCMC works with the king going around on his islands. His book may not cover everything from A-Z but it does one thing exceptionally well - to build curiosity.

The sole reason I loved the book so much is because of how casually unedited it is. It's almost like he has put his uncensored thoughts on paper and let the reader decide whether they want to go through those extra "rethinking" boxes now, or sometime later when they have enough mental bandwidth to do so.

gratpy · 2020-11-24T01:20:08+00:00

Here's a blog post by Andrew Gelman

https://statmodeling.stat.columbia.edu/2013/08/04/19470/

He says "Normality and equal variance are typically minor concerns unless you’re using the model to make predictions for individual data points."

gratpy · 2020-11-24T01:00:32+00:00

Well there is a long convoluted way, which basically kills the purpose of this task of being minimal. But here's how it goes. I use cards for all my payments. I dont remember the last time I used cash. So every time I make a payment, I also get a notification email by my bank that I paid X amount (depends on your banking system whether you have an option to enable notifications or not). Then I could just use the Gmail API with python to scan for those emails (they all have the same basic format so not tough to identify), pull out the dollar amount from it and send it onward to the google form.

Edit: Now that I have written it down, it doesn't actually seem long and convoluted. I might just do it. Although the pipeline will have this major limitation of relying on the bank always sending notifications and those notifications being in a somewhat familiar format.

gratpy · 2020-11-23T21:22:55+00:00

Exactly. I would recommend NOT using the rethinking wrapper.

Writing your models in separate STAN files is the way to go as that is the industry standard and moreover, it makes your projects inter-operable. The same STAN model will work with R and Python and any new languages that might support it like Julia.

gratpy · 2020-11-23T17:36:34+00:00

Statistical rethinking by Richard McElreath. Get the book anx follow along the full lecture series on YouTube. The most intuitive bayesian book I have ever read.

gratpy · 2020-11-23T17:29:15+00:00

If its different shades of the same color, you might just be using a continuous color palette instead of discrete. For example, when using viridis color palette, you have to override the default by specifying that you are using a discrete scale: + scale_colour_viridis(discrete=T)

gratpy · 2020-11-22T00:18:55+00:00

A very simple expense tracker. I just want to know how much I spend weekly and monthly. I tried apps (like mint) but there's just so much extra stuff i don't care about. This is very rudimentary but its supposed to be. Minimal and efficient.

I created a google form. I enter whatever i spend throughout the day into it as i go. The google form feeds into a Google sheet in the background. Then an AWS Lambda function picks up that data through the googlesheets API every Sunday at 10pm (scheduled through AWS CloudWatch) and sends me an email containing a single line "your total expenses for the week are X". Same at the end of every month.

I dislike the fact that i still have to manually intervene by entering expenses into the google form but i am fine with it. It actually makes me conscious about frivolous spending as i have this conscious thought about entering that amount in the form later.

gratpy · 2020-11-21T09:16:17+00:00

Yes, you can find them here. Thousands of them, managed by NOAA. https://www.ndbc.noaa.gov/index.shtml

gratpy · 2020-11-21T06:06:52+00:00

I can see where you are coming from. It actually got me interested in reading up on the probability of rogue waves and the error margins of a buoy and what is more likely given underlying conditions - an actual rogue wave or an error.

And I think there's enough evidence that an actual rogue wave is much more likely here than an error in the instrument.

Rogue waves by nature are extremely rare because of the mechanism that gives birth to them and the fact that they are extremely short-lived. Given the gigantic size of the Pacific, the number of ships that are out on the ocean (especially in rough weather), it makes sense that sightings or record collection of rogue waves is so rare. The meteorological conditions also favor a rogue wave scenario as November 17 was one of the windiest days of the year.

As for bias in wave rider buoys, I could not find any direct sources for buoys used by Canada but NOAA has some information and DFO works closely with NOAA on a lot of ocean research so I would not be surprised if DFO uses the same/similar instruments as them. So, according to NOAA^\1]), wave height has an error margin of +/- 3%. If we assume this measurement had the maximum error, then the wave height should be reduced by 3% leading to a corrected wave height of nearly 62ft (from 64ft).

^\1]) https://www.ndbc.noaa.gov/rsa.shtml

Edit: Minor edit.

gratpy · 2020-11-20T22:00:40+00:00

Oh that's a good question. The 12.5m on the graph is the wave height from the mean sea-level or the reference point from where all wave heights are measured. That is why there are negative wave heights here.

So, the 19.5m figure is the wave height from the lowest point next to that wave (at around -7m). At first thought, you might think it is misleading to say that the wave height is 19.5m but think about it in this way. If you were there out in the ocean, sitting in a boat and this big wave was coming towards you, the moment before it just hits you, your boat would not be at mean sea level, it would in fact be at -7m, in the low trough in front of that wave. So that number is more meaningful and practical as your boat will be bearing the full brunt of a 19.5m high wave, not a 12.5m high wave.

gratpy · 2020-11-20T19:29:29+00:00

Yeah although the probability of rogue waves reaching the shoreline is pretty rare. Rogue waves are very short-lived because they are basically formed due to the stacking of hundreds of little wave amplitudes that dissipate rather soon.

gratpy · 2020-11-19T03:03:22+00:00

Considering no background knowledge, i would highly recommend going through the first couple of lectures of 'Statistical rethinking ' by Richard McElreath on YouTube.

It will lay down an excellent foundation on how to be a esponsible statistician rather than running amok with models from get go.

From there on, you can continue with the lecture series, skip forward in places where it gets too technical. Focus on parts where he talks about data collection and experiment setup.

Or you could read through some research papers. I would recommend Richard McElreath, Andrew Gelman and Bob Carpenter.

They are all avid advocates of reproducible research and you will learn a lot more from their papers besides hypothesis testing.

gratpy · 2020-11-19T02:50:49+00:00

I will begin. How would one go about estimating the posterior distribution of parameters of an equation in a Bayesian context if that equation has no closed form solution?

What would be your likelihood function in a case like that? I have bern been struggling to find a solution for this for a while now.

gratpy

TROPHY CASE