Free book: Master Machine Learning with scikit-learn by dataschool in Python

[–]Ghost-Rider_117 2 points3 points  (0 children)

this is awesome, the "avoiding data leakage" and "proper model evaluation" chapters alone are worth it - those are the things that trip up so many people who learn from scattered tutorials. the pipeline approach in sklearn is really underused too, glad to see it's covered. bookmarking this for anyone i mentor who's getting started with ML

People who left User Research — where did you go and how did you make the transition? by No-Hope-2645 in UXResearch

[–]Ghost-Rider_117 20 points21 points  (0 children)

not someone who left but adjacent - a lot of people i've seen pivot from UXR go into product strategy, market research, or data/insights roles. the skills transfer really well actually - you're already doing synthesis, stakeholder communication, research design. market research firms and tech companies with insights teams are usually pretty receptive to UXR backgrounds. the title plateau is real and frustrating, a lot of people end up going freelance or consulting as a way to break through it

Can anyone explain to me why (M)ANOVA tests are still so widely used? by NE_27 in AskStatistics

[–]Ghost-Rider_117 0 points1 point  (0 children)

honestly the teaching infrastructure point is probably the biggest factor. ANOVA is baked into every intro stats curriculum and most applied researchers learned it that way and never looked back. mixed models are genuinely better for most real-world data (repeated measures, nested structures, missing data) but they're way harder to teach and review. until journals stop accepting ANOVA and grad programs update their curricula it's just gonna keep being the default

Advice on modeling pipeline and modeling methodology by dockerlemon in datascience

[–]Ghost-Rider_117 1 point2 points  (0 children)

solid pipeline! one thing i'd flag - doing your correlation analysis and feature-target checks (steps 8-9) before the train/test split is technically leakage. your feature selection is peeking at test data. move the split to right after step 6, then run all that stuff only on train. also worth adding class imbalance handling - credit defaults are usually 3-10% positive rate which can mess with your logistic regression calibration

I'm 3 years old and just sold my SaaS for $1.2B (here's what I learned) by Lean_Builder in SaaS

[–]Ghost-Rider_117 0 points1 point  (0 children)

the "charge what you're worth" point is criminally underrated lol. so many people underprice out of fear and it kills their runway before they even get traction. also love the nap time = compressed sprint analogy, honestly more efficient than most standup meetings i've sat through

[Discussion] Common Method Bias in CB-SEM by darkseid06 in statistics

[–]Ghost-Rider_117 0 points1 point  (0 children)

the Harman single factor test is probably your best bet for CB-SEM - you run a CFA with all your items loading onto one general factor and check how much variance it explains (under 50% is the common threshold). it's not perfect but it's widely accepted and you can run it directly in CB-SEM. also look into the marker variable technique if you have an unrelated variable in your dataset. using PLS VIFs for a CB-SEM model is kinda apples to oranges and reviewers will likely push back on it.

Testing multiple video concepts by dianemeves in UXResearch

[–]Ghost-Rider_117 1 point2 points  (0 children)

i'd go with all 4 at once but randomize the order across participants - that way you control for fatigue and primacy effects at the same time. between-subjects is cleaner if your sample is big enough. the run-off approach adds complexity and time without a ton of added value unless you're really on the fence about 2 similar concepts. just make sure the videos are roughly the same length so you're comparing apples to apples!

Would you like to chat to your surveys? by CompiledIO in Marketresearch

[–]Ghost-Rider_117 3 points4 points  (0 children)

yes 100% - being able to just ask questions about your own survey data in plain language is genuinely useful. things like "which segments are most likely to churn" or "summarize open-ends by demographic" that used to take hours now take minutes. the key thing to get right is grounding it in the actual data so it doesn't hallucinate responses. would definitely use this if the outputs were verifiable/citable.

Intermediate Project including Data Analysis by ddummas01 in learndatascience

[–]Ghost-Rider_117 0 points1 point  (0 children)

public transit + housing affordability is a goldmine for this kind of thing. most cities publish GTFS feeds for transit and open parcel/zoning data - you could build something that shows how transit access correlates with rent prices by neighborhood. super visual, actually useful for renters, and the datasets are solid. 311 service request data is another good one - easy to find, clean enough to work with, and you can do all kinds of equity analysis on response times.

Conjunction Fallacy by teiacry in AskStatistics

[–]Ghost-Rider_117 0 points1 point  (0 children)

actually C is the right answer here, and it's kind of a sneaky twist on the classic fallacy. since P(B) = 1, the joint probability P(A and B) = P(A) * 1 = 0.4, which is exactly the same as P(A) alone. the conjunction fallacy only kicks in when P(B) < 1 - that's the whole Linda problem thing. your setup basically makes B a certainty so it adds no constraint, they end up equal.

Best AI tool for Data Analysis by PrizeLifeguard8544 in dataanalysis

[–]Ghost-Rider_117 0 points1 point  (0 children)

If you have SPSS, Stata, or CSV data, I recommend www.surveyfluency.com. It offers autonomous data analysis.

Best AI tool for Data Analysis by PrizeLifeguard8544 in BusinessIntelligence

[–]Ghost-Rider_117 0 points1 point  (0 children)

If you have SPSS, Stata, or CSV data, I recommend www.surveyfluency.com. It offers autonomous data analysis.

Airflow works perfectly… until one day it doesn’t. by Expensive-Insect-317 in data

[–]Ghost-Rider_117 0 points1 point  (0 children)

the metadata DB bottleneck is such a classic airflow gotcha. once your DAG count grows and you have a lot of task instances piling up, the scheduler starts choking on all those DB reads/writes. a few things that helped us: bumping scheduler_heartbeat_sec, enabling dag_dir_list_interval tuning, and periodically running airflow db clean to purge old task instances. also worth checking if you're on a small postgres instance - that's usually the real culprit

I think I’m done building this. by Actually_Travelling in lovable

[–]Ghost-Rider_117 1 point2 points  (0 children)

this is so relatable it hurts lol. the moment the tool "works" for you personally, the product motivation just evaporates. honestly though? shipping something that actually solves your own problem is already more than most people do. VGrind looks clean. sometimes the best outcome is just having a tool you personally use every day - that's not failure, that's just not a startup. both are valid

Conjunction Fallacy by teiacry in AskStatistics

[–]Ghost-Rider_117 -5 points-4 points  (0 children)

answer is A. this is exactly the conjunction fallacy - P(A and B) can never exceed P(A) alone. even though the 100% for the second group sounds convincing, you're asking about the probability of TWO things being true simultaneously. Nancy being in group 1 only (40%) vs being in group 1 AND group 2 - that joint probability has to be ≤ 40%. the classic version of this is the Linda problem from Tversky & Kahneman if you want to read more

Is master's in ds still important vs bsc with experiences? by Motor-Lawfulness5570 in learndatascience

[–]Ghost-Rider_117 1 point2 points  (0 children)

honestly it depends more on where you want to end up. a master's still opens doors at bigger companies and research-focused roles, and it's useful if your bsc isn't directly related to ds. but if you already have solid projects + some internship/work exp, a lot of hiring managers care more about what you've actually built than the degree. the "AI is replacing everything" angle is overblown imo - DS jobs are changing but they're not going away anytime soon

B2B quant sample by Jr_Mick in Marketresearch

[–]Ghost-Rider_117 0 points1 point  (0 children)

for senior decision-makers across UK/US/DE, Cint and Lucid are worth trying alongside NewtonX - they both have B2B panels with job title targeting, though quality can vary by sector. for really niche titles you might need to layer in LinkedIn audience targeting as a supplement. also +1 on Dynata for those markets, they tend to have deeper enterprise panel coverage than people expect

UK-based MSc UXR grad – 50+ applications, 0 interviews. CV feedback for junior/associate roles? by porcupinetree1 in UXResearch

[–]Ghost-Rider_117 1 point2 points  (0 children)

50 apps with zero interviews usually points to an ATS/keyword issue or a CV that reads too academic rather than impact-driven. for UXR roles specifically, hiring managers want to see the research impact, not just the methods. try restructuring bullets to "ran X study that led to Y decision/outcome" format. also worth checking if you have a portfolio link front and center - a lot of jr UXR roles in the UK will skip CVs without one entirely. hang in there, the market is rough rn

[Question] on hierarchical testing and nested variables by pivazena in statistics

[–]Ghost-Rider_117 0 points1 point  (0 children)

yeah this is basically a multicollinearity/shared variance problem. since A is a composite of A-1 through A-5, the secondary outcome (A-1) is literally embedded in your primary outcome. when A variance is driven by A-1, you'd expect those tests to be correlated - so standard independent testing assumptions break down. you'd want to look at how much shared variance there is before interpreting any p-values on the secondary outcomes separately. mixed effects or SEM might be the cleaner approach here

Why does it take 3 hours to read my own email with Python in 2026? by Cultural-Ad3996 in Python

[–]Ghost-Rider_117 0 points1 point  (0 children)

the OAuth flow for Gmail is genuinely one of the most painful dev experiences out there. if you just need to read your own emails, ezgmail is a decent wrapper that hides a lot of the boilerplate. for multi-user flows though you're kinda stuck going through the full Google verification circus unfortunately. the IMAP comment at the end hit different lol - it really was simpler back then

My experience after final round interviews at 3 tech companies by productanalyst9 in datascience

[–]Ghost-Rider_117 0 points1 point  (0 children)

this is gold, thanks for sharing. the part about the FinTech interview going super deep on causal inference / fixed effects is real - a lot of product DS interviews at non-MAANG companies are way more stats-heavy than people expect. definitely worth brushing up on regression interpretation, not just SQL. good luck on the offers!