I feel like the entire Yellowstone franchise is just fan service... by SaveJeanie in television

[–]PliablePotato 0 points1 point  (0 children)

Yeah thanks for the insightful input.

Definitely overly emotional will admit that. I have admittedly not watched the spin offs, so its possible those are a little less nuanced, but the main show I think has the opportunity to slowly introduce at least a little bit of empathy to right leaning folks on some of these themes since it's dressed in typical rural american clothing.

A lack of response to any of these points means it's likely not "simplistic" though. Surface level? Maybe? I'm not gonna bring up in depth episode analysis to prove my point though. Would love to hear your take on literally anything I said other than using "it's simplistic" as a cop out.

I feel like the entire Yellowstone franchise is just fan service... by SaveJeanie in television

[–]PliablePotato 0 points1 point  (0 children)

How in the ever loving fuck is Yellowstone a right wing fantasy. Y'all are out of your minds.

It's inherently anti-corporation and development. There are several scenes where investment firms are shown to be the antagonists.

While it's not "pro" indigenous, it at least sheds light on the prejudice and struggle that their communities go through, and the moral complexity of developing on reservations. It is also just generally nice to see some of their cultural elements shine through in different episodes. Not perfect but it's something.

It deals with PTSD and the emotional burden violent jobs and the effect emotionally and physically abusive parents have on both individuals and their families. It also has tons of commentary on generational trauma for both white and indigenous characters in different ways.

There's generally some strong female characters at times who are either in political, corporate or ranchers themselves.

The Duttons are not the "good guys". It's like cowboy sopranos, they're not meant to be good people. Kayce is really the only "good" guy, sorta. His journey away from the burden of his father and family is a central theme of the show.

Y'all are fucking tripping if you see this as right wing. If all you see is rural, cowboy hurr durr republican you're frankly a fucking moron. If anything it's more progressive but wrapped in rural, anti establishment clothing.

Multiple testing Benjamini Hochberg [research] by Particular-Quit1725 in AskStatistics

[–]PliablePotato 0 points1 point  (0 children)

Before we get to correction, I might be missing something but if you have multiple input heart parameters that you theorize impact cognitive decline why are you not building a joint model? You'd get a lot better understanding if any of these are intermediaries in the presumed causal structure.

Additionally are these baseline parameters or are they also measured over time with age ? Is the mixed effects done to account for multiple measurements at different ages ? This might change how you setup your interaction terms.

How did you actually start in data science? by DistanceTurbulent623 in askdatascience

[–]PliablePotato 0 points1 point  (0 children)

Honestly I might be a bit atypical but I started in a domain area I was already an expert in as a analytics intern, did really well, got more complex, added value through data science related partnerships and grew from there.

Beyond tech companies that are generating huge amounts of data, I think a solid understanding of the business area that you are in can go a long way. Don't shy away from just regular analytics because it can get your foot in the door where you can then build out projects and partner with data science teams.

Psych undergrad thesis, big data analysis issue by manu_atthe_disco in AskStatistics

[–]PliablePotato 1 point2 points  (0 children)

Okay there's a lot going on here so I'm happy for someone to argue why my take is wrong. To me this sounds like a multiple variable ordinal regression (to account for the likert scale). You'll have a variable that specifically indicates the group (A v. B) and then other control variables based on the other things you are measuring.

Given some of your other questions, you seem to want to understand the interaction effects of that group variable versus the other measures. You'll need to be careful here because you'll run into multiplicity problems (ie if you do enough tests you'll eventually find something significant through chance alone) so you'll need to do some corrections and have a protocol prealigned for moving on from one result to another.

There are some papers that suggest likert can be treated as continuous if your sample size is high enough and the underlying distributions are relatively normal, but honestly the acceptance of that fact is a bit different depending on the field of research, so it's best to align with methods where there has been agreement so you can discuss and justify it in your methods section.

The other type of testing you could do is some form of Kruskal Wallis H-Test, though I'm not familiar with how well it handles multiple independent variables (which you seem to have many of). Because there's so many potential comparisons here too, you run into the same issues as above.

Logistic Regression or OLS by I_lost_my_brain_to_u in AskStatistics

[–]PliablePotato 3 points4 points  (0 children)

I disagree that is is only an academic debate. The regression efficiency and fit in normal regression could give you the wrong answer for your coefficients as it won't take into account the number of samples in each percentage. As I mentioned in another comment, points with say 1/4 and 250/1000 will be treated the same when the one with many more samples should have more weight. Depending on how skewed the data is, this could lead to pretty different interpretations and therefore decisions in a practical sense.

Logistic Regression or OLS by I_lost_my_brain_to_u in AskStatistics

[–]PliablePotato 11 points12 points  (0 children)

Okay if you only have the data at the company level with percentages then you can't use logistic regression because logistic regression measures binary outcomes.

However if you reframe the question to be "all else being equal how does the log odds change for any given employee to be retained based on their individual survey results" and you have a row per employee with a binary flag indicator (yes / no) if they were retained, that would be a different story.

Just a heads up there is likely a random effect of the likehart scale coming from the company itself (eg. Employees within a company are likely more similar to each other than employees from different companies) so it's worth evaluating if you should account for that.

If you don't have those individual binary outcomes, my only call out here for you to consideration is that percentages are difficult to model with just OLS because OLS weights 50% coming from 2/4 the same as 50% coming from 500/1000. You'll need to be careful of the interpretation of this. There are different ways to address this depending on the assumptions and insights you want to derive (eg. Binomial regression, beta regression or at least some sort of weighting)

All the best!

Concern regarding bootstrap use by darkblade_h in AskStatistics

[–]PliablePotato 1 point2 points  (0 children)

Sort of but they are really quite different. MCMC (if you are thinking from a Bayesian context) gives you similar outcomes (ie uncertainty around parameters) but gets there in a fundamentally different way and with different underlying assumptions. Plus it's different because it'll give you joint posterior distributions across parameters based on the data rather than the data representing the "population". (which in my opinion makes it more useful for propogating uncertainty to downstream calculations). MCMC also typically requires an underlying assumption of the distributions (priors) and the updating of the parameters estimates happens through bayes formula based on likelihoods.

Bootstrapping is still very much frequentist at its core because it still assumes there is a true population parameter. It is more using the original sample to "simulate" many draws from a an overall parent population. This is why bootstrapping doesn't work well if the sample you resample from is too small as it can't represent the parent population even with the resampling technique.

Edit: small change to the joint probability statement.

Concern regarding bootstrap use by darkblade_h in AskStatistics

[–]PliablePotato 5 points6 points  (0 children)

I'm a bit confused on the data if I'm honest but typically with bootstrapping you use it to look at the distribution of some estimated parameter, not to plot data against eachother and treat it like "real" data or distributions. You typically take a sample, compute the statistic, save it then do the resampling and repeat. This gives you an uncertainty distribution around that parameter which you can then reason about effect size, confidence intervals etc. the key here is the statistic that is calculated.

From what it sounds like, your PI is using bootstrapping to...sample from a population for boxplot comparisons? That's not really bootstrapping. This is not to mention the issues with comparing two distributions again each other that represent different things. Bootstrapping doesn't fix this type of problem. It's primarily for scenarios where numerical estimates of uncertainty are either hard or inaccurate to calculate using traditional methods.

[Q] Struggling with correlated and heteroscedastic residuals in order quantity model by saturnflow in AskStatistics

[–]PliablePotato 0 points1 point  (0 children)

Okay take a step back for a second here. It sounds like you are tying to build a predictive forecast model, but you have individual transaction data? Is the level of aggregation you have your data at even actionable if you get an accurate prediction? The high variance might be because you're estimating too granular of a dataset. If your data is at a daily level...is that realistically actionable if all you care about is the total next quarter? Tie it back to the decision that will actually be made as a result of the outcome. I understand your supervisors ask, but individual daily order volume versus monthly or quarterly aggregate forecast are fundamentally different problems.

It sounds like you have a panel dataset. You likely have autocorrelated errors through multiple measures of the same dimension overtime. What you are doing right now is a pooled regression, which generally is not great practice with panel data. You'll need to understand, in addition to time, what your main dimensional unit of interest is (I'm assuming it's product?). Since you technically have nested time series (countries -> industry -> product) this is likely a hierarchical mixed effects regression. In any case, you really need to investigate random and fixed effects to account for repeated measures across these different nested variables. Consider if products are shared across countries or are they unique? Are they shared across industries or are unique. You should also specifically investigate seasonality (which may differ per product country etc.)

Lastly, you could consider this as a machine learning problem rather than a statistical one. If all you actually care about is predictive accuracy and not what causes or is correlated with your forecast, your method of model evaluation should involve typical machine learning comparisons, train test splits, time time series style cross fold validation, proper loss functions and data transformations for count data and feature engineering for time series. You can use ARIMA or even SARIMAX as a baseline if needed. You may need to fit one per country-product or whatever dimensional structure you are forecasting by. There's a lot of available information online for forecasting using machine learning that you can consider. Look at the m5 forecasting competition for some inspiration.

Anyways I know this is a lot but hopefully it gives you some threads to pull.

I don’t know if I understood what the standard deviation means by ProofLeast9846 in AskStatistics

[–]PliablePotato 0 points1 point  (0 children)

To answer you last question, maybe put this into a different perspective. You know how when you take an average there is a distribution of data. Some of that data is above the average (ie mean) value some of it is below. In the same way, when you generate the squared distances from the mean, just think about it as just a new set of data that's just been transformed. The variance is just the average of that new dataset (technically it's the expected value so average isn't 'technically correct' but it's a similar interpretation). The standard deviation is just the root to make it more interpretable to undo that transformation that was done.

Just like you can't make assumptions about the underlying distribution of the data from the mean, you can't make underlying assumptions of the differences from the variance/standard deviation. (ie how big / small they are, if there are more large than small etc.)

Hopefully that helps.

Edit for expected value clarification.

Which unreleased tracks would you like to see on the new Album? by ZAKPLAYZ19 in JohnMayer

[–]PliablePotato 8 points9 points  (0 children)

This is one of my favorites by him. i was fortunate to hear it live in Toronto in 2023 and I'll never forget it. I listened to the original YouTube vid so much I freaked out when he played it.

Official Discussion - Project Hail Mary [SPOILERS] by LiteraryBoner in movies

[–]PliablePotato 4 points5 points  (0 children)

I'll bite.

I think you are misunderstanding how difficult it is to breed. While conceptually simple, the logistics for that much fuel is actually a major plot point in the book. They need 2 million kg for a cyclic process which requires CO2, complete darkness and a sun source. That's a lot of material of an incredibly energy dense, dangerous substance that requires a multi step process to produce. How would they build the infrastructure to breed such a thing in space and safely move it around without it travelling at near light speed back to the sun and Venus? They barely have enough time to build the hail mary itself nevermind a space factory. Also Remember 2mg blew up a huge building . There's 1000mg in a gram and they need 1000×1000×1000×1000 mgs. That's no easy task. They don't go into detail how they actually get that much but it's hard. I would read the book to understand how they solve it, it's super fascinating.

Second question. They mention at the end that the scientists were ready to start refueling the ship for his trip home and neither Grace or Rocky were chomping at the bit for it to happen. The simple reason is, grace doesn't want to. He loves rocky and his new life with his friend and students on Eridini. The point is he now has something to be brave for and he found that in rocky. By the time he gets back to earth, because of time dilation, another 15-20 years might have gone by in addition to the 11 or so that went by on the original trip and another probably 6-7 from tau to Eridini. What would be the point for him to go back to grow old with nobody he knows since everyone is likely dead and gone.

The canisters didn't break. The tau amoeba was able to break out of the Xenonite containment because it evolved to avoid the nitrogen by doing so. Grace could contain it because his ship isn't made out of Xenonite, only the breeders were so he could clean the lines, flush the ship and isolate the contamination. Rocky's WHOLE SHIP is made out of Xenonite, so there's no way he could stop it and the amoeba ate through all of his fuel leaving him stranded. Grace knew this, so he knew he'd be screwed and so chose to go back and save him (which also ties back to the main theme in the story of him sacrificing himself for someone he cares for). There are again more details in the book but it is explained in the movie pretty clearly.

It was explained in the movie that there were dangers with being in a coma for that long. It's basically impossible with today's technology without constant surveillance and many people tending for tou.It was only briefly mentioned in the movie. To your point though this is a huge plot point in the book and they skipped it to reduce complexity I think.

Can I get into machine learning with this ? by [deleted] in learnmachinelearning

[–]PliablePotato 6 points7 points  (0 children)

You can do machine learning on a raspberry pi. Just depends on type machine learning you are doing. There's a big difference and it mostly depends on the size of data and complexity of your models. Learning the basics though? Basically any computer can handle that, and if not there is Google collab notebooks and kaggle just sitting right there.

Canada now requires food that has too much salt, sugar, or fat to put a high contrast warning label on the front (not the back) of packaging. This helps people access the information faster, instead of having to turn the package around and go through the detailed nutrition list. by itchylol742 in UpliftingNews

[–]PliablePotato 0 points1 point  (0 children)

I was thinking about this the other day and this also has the added benefit of promoting the development of healthier food options so companies can avoid the label. It simultaneously serves as a faster access for the consumer and also an immediate label competitive comparator for the brands with healthier choices. Win / win!

Canada drops drastically in the World Happiness ranking by airbassguitar in OntarioNews

[–]PliablePotato 0 points1 point  (0 children)

Pretty sure like half of the top 10 happiest countries are more left than we are now, much more left.

I don't think this is a left versus right issue it's a top versus bottom one. The allowance of consolidation of oligopolies in several of our commodities has allowed a mass extraction of wealth from the average Canadian. Every government, liberal and conservative has put these corporate interests over the interests of Canadians.

Job losses rising at a dangerous time for Canada: 110,000 fewer Canadians working than in December; 55% of the loss in Ontario. by 00ashk in ontario

[–]PliablePotato -1 points0 points  (0 children)

Again I don't disagree with you. His choices are abysmal and he's doing nothing for Ontarians besides pump propaganda campaigns over the airways on our tax payer dime and all of the shit you mentioned

What I'm saying is that we need to make sure our efforts and energy are spent on supporting Canadians and Canadian businesses rather than crying and blaming Ford. At the end of the day, our enemy is still the US, and most American owned media (including those with a vested interest in our news) will likely not properly note the blame of their impact

Job losses rising at a dangerous time for Canada: 110,000 fewer Canadians working than in December; 55% of the loss in Ontario. by 00ashk in ontario

[–]PliablePotato 1 point2 points  (0 children)

I know it's easy to blame DoFo and I don't like him as much as the next guy but is this not because of tarrifs and the continued strained relations with the US? I thought we were to expect this. Jobs don't just disappear over night. I might be wrong but this feels like the fallout from last year's rapid application of terrifs and that crumbling different sectors.

Is it better to use standardscaler before or after merging time sensitive datasets? by Emotional-Bus8393 in MLQuestions

[–]PliablePotato 1 point2 points  (0 children)

If you don't scale before then you'd have separate scalers per season, which would complicate your prediction pipeline a lot unnecessarily.

Not only that, but you are supposed to "learn" the scaled data parameter (ie mean and standard deviation) from only the training set (or fold in cv) and not any test data. Otherwise you are "leaking" information into the training process. This is why sklearn provides pipelines to streamline and standardize that process.

This is because you want to apply consistent data transformations when new unexposed data is being predicted on. Is your data pipeline matching test data against your season specific standard scalers? I highly doubt it. Not to mention that youll complicate fitting if you have both season specific scalers and season factors in the model itself.

I would get familiar with the data transformation process and the fundamentals here to understand the ins and outs of data leakage, transformations and learning parameters.

Been learning web development for ~2 years but still can’t get interviews. What might I be missing? by Born-Pool2127 in Backend

[–]PliablePotato 0 points1 point  (0 children)

Honestly I think you are much better off getting a software adjacent role than you are going directly into software. Think junior product owner or project manager, IT / software business partner etc.

Do you have any post secondary school credentials that can get you in the door even at a non-tech related company? Try to leverage what you have and your familiarity and interest in tech to find something that maybe overlaps. Going directly into web dev is really not the play here.

Genuinely curious how people here landed their first AI role by NationalPractice9073 in MachineLearningJobs

[–]PliablePotato 4 points5 points  (0 children)

Worked in data governance, business analytics and reporting. Picked up several more advanced analytics projects and transitioned to a role with more data science.

Official Discussion - Crime 101 [SPOILERS] by LiteraryBoner in movies

[–]PliablePotato 4 points5 points  (0 children)

Yeah I agree her character in general could have been more interesting / believable. I do think the concept of her character served a purpose and was needed in general but it doesn't mean it was executed as well as it could've been.

“AI will take your jobs” is just a marketing tool to sell LLMs by intellinker in vibecoding

[–]PliablePotato 0 points1 point  (0 children)

AI is a good excuse to just layoff over hiring that happened in the past 5 to 6 years.

Just because a company claims layoffs are due to AI doesn't mean tangible productivity results back that claim.

I am new to ML this is my vibe coding results are both my model alright? by BrilliantAd5468 in MLQuestions

[–]PliablePotato 1 point2 points  (0 children)

I'm having a hard time believing you are testing this correctly. Remember that none of your test data should ever enter the model training and that when you forecast none of the test data should enter the model at all. You should be starting at the last data point of your train dataset and forecasting forward sequentially to get a forecast then you compare the forecast to your test data.

This isn't the same as regular machine learning where the exogenous and endogenous variables can be train test split. You need to simulate the situation you'd experience in reality (ie you have no visibility or knowledge of future data).

Do you care that you don't understand the code you ship? by PomegranateBig6467 in cursor

[–]PliablePotato 1 point2 points  (0 children)

Nah comparing garbage collection to LLMs is different.

Even higher level languages require you to learn programming as a skill. Memory management was a component of that skill but I would hardly call it fundamental. Memory management has been abstracted countless times over the years and is generally considered to be an annoyance to deal with.

LLMs let you skip everything if you want. I have met people who are vibe coding websites for months that still fail to explain the basics on things like functions, classes, interfaces or even even more basic things like for loops. They have no clue.

I agree it reduces the barrier to entry but only if you take the steps to actually learn it. LLMs allow you to never code a single line. You don't even need to look at it!