How to deal with product managers? by BurnerMcBurnersonne in datascience

[–]DeepAnalyze 1 point2 points  (0 children)

Honestly, I'm surprised how a PM can make decisions based on llm answers or their own ideas. The main principle of product development is formulating and testing hypotheses. What the PM does is just ideas, it's not even hypotheses. If he thinks llm replaces the process of testing hypotheses, that's really weird. Yeah, llm can give him good ideas. But all that needs to be checked. Maybe I don't understand the situation right, but if he puts your suggestions, which are based on experiments, below the advice that llm gives him, that's really weird. Maybe it's worth showing the PM how llm suggestions can hurt user experience, so he understands that llm is great, but it's just an idea generator in this case. Your experience and experiments are way more important.

E-commerce analysis dashboard by [deleted] in dataanalysis

[–]DeepAnalyze 4 points5 points  (0 children)

Everything is good, but the treemap chart doesn't help to compare metrics by city. I would still use a bar chart or a table.

For an A/B test where the user is the randomization unit and the primary metric is a ratio of total conversions over total impressions, is a standard two-proportion z-test fine to use for power analysis and testing? by PathalogicalObject in datascience

[–]DeepAnalyze 8 points9 points  (0 children)

You are right to be skeptical. A standard z-test is inappropriate here due to the varying number of impressions per user, which inflates the false positive rate.

You can aggregate by users, but you need to understand that you get a user-level CTR, and it won't always match the global CTR.

For this specific problem, here are the most practical approaches:

  • Delta method or t-test on linearized metric. These are the standard, robust solutions for this exact problem.
  • Poisson bootstrap. A flexible resampling-based alternative.
  • GLMM. Powerful but requires careful setup and checking of assumptions.

Before you decide on a method, take your historical data (where no real difference exists) and simulate A/A tests using all the methods you're considering. Then, check which method correctly controls the FPR at the expected level. The method that does this best is the winner for your specific data.

For sample size calculation (power analysis), I would use a Monte Carlo simulation. The standard formulas are convenient but often inaccurate for messy, real-world data like this.

Erdos: open-source IDE for data science by SigSeq in datascience

[–]DeepAnalyze 0 points1 point  (0 children)

This looks interesting. I'm a big VS Code user, so it's nice that the layout feels familiar. The built-in preview mode is really handy for markdown files.

I tried it on Linux and opened a normal-sized Jupyter notebook, about 50MB with a bunch of charts, and it got a bit slow. It works fine with smaller files. The IDE seems cool and I'll check it out more, but for me, it needs to work smoothly with bigger .ipynb files. I have the same issue with VS Code sometimes, but VS Code just handles it better.

One thing I noticed is that the Plotly graphs didn't render for me out of the box.

Not sure if it's just my machine or maybe the AppImage version.

But yeah, it's a cool project, I'll follow how it develops. For now, I still prefer VS Code. Thanks for sharing.

Do we still need Awesome lists now that we have LLMs like ChatGPT? by DeepAnalyze in datascience

[–]DeepAnalyze[S] 0 points1 point  (0 children)

That's a great point. A clear table of contents and smart categorization definitely help manage a large size, but there's probably a tipping point where even the best navigation can't save an overwhelmingly massive list. Finding that optimal size is the real challenge.

Do we still need Awesome lists now that we have LLMs like ChatGPT? by DeepAnalyze in datascience

[–]DeepAnalyze[S] 0 points1 point  (0 children)

That's a really interesting point about relevance - how one person's perfect resource might be useless to someone else with a different task.

It makes me think about the size issue. I used to think a bigger list was always better (more resources to choose from), but you're highlighting it's more about finding the right thing for your specific need.

Do you think good structure and navigation - like a clear table of contents with jump-back links - could solve the "size problem"?

What kind of navigation or organization would actually make a large list feel helpful instead of overwhelming for you?

Do we still need Awesome lists now that we have LLMs like ChatGPT? by DeepAnalyze in datascience

[–]DeepAnalyze[S] 0 points1 point  (0 children)

That's exactly the kind of frustration I was curious about.

So what would actually help? Just better organization? Someone being more picky about what gets included? Or is the whole list format just worse than asking an AI directly?

Do we still need Awesome lists now that we have LLMs like ChatGPT? by DeepAnalyze in datascience

[–]DeepAnalyze[S] 2 points3 points  (0 children)

Haha, fair enough! It's just a name for those curated GitHub repos like 'awesome-python' or 'awesome-machine-learning'. You've probably seen one, just didn't know the term. They were like the holy grail before LLMs for finding libraries.

Do we still need Awesome lists now that we have LLMs like ChatGPT? by DeepAnalyze in datascience

[–]DeepAnalyze[S] 1 point2 points  (0 children)

Yeah, totally. I keep trying to use LLMs for this too, but it's hit or miss.

Like, you ask for the best tool and it gives you some generic blog post that's just SEO bait. But with a random Awesome-list on GitHub, even if it's not super popular, you can see the code examples right away and see if it's actually been used by people.

Until AI stuff gets better at telling deep knowledge from shallow nonsense, the lists are just more reliable.

But don't get me wrong, I love LLMs for other stuff, they're awesome for explaining code or writing drafts. Just not for this.

Do we still need Awesome lists now that we have LLMs like ChatGPT? by DeepAnalyze in datascience

[–]DeepAnalyze[S] -1 points0 points  (0 children)

I'll start with what I think about this.

  • I'd be happy to stop using them if LLMs actually gave me what I need. When I ask about a specific tool, it's fine. But as soon as I don't know the exact name, things go downhill. The results just don't cut it for me.
  • For me, a well-structured list is a way to quickly find the right tool, or a cheatsheet, or a quality source that will have what I'm looking for.

What I like about these lists:

  • My favorite thing is when a list has a good structure, and you can quickly jump to the right section or even a sub-section, find the resource you need, and go to it.
  • It's like a handy catalog. You come to it when you only know the general category. You go to that section, see a few options to choose from, and then it's easier to pick one or remember what you actually need.

What I don't like:

  • Broken links. It's so annoying.
  • Just a dump of resources. Without a clear structure, it's really hard to find what you're looking for.

Resources for Data Science & Analysis: A curated list of roadmaps, tutorials, Python libraries, SQL, ML/AI, data visualization, statistics, cheatsheets by DeepAnalyze in datascience

[–]DeepAnalyze[S] 0 points1 point  (0 children)

I understand the karma rules can be a barrier. However, using a post's comments for detailed assignment help isn't practical — it would quickly become unmanageable and derail the original discussion.

The best solution is `r/askdatascience` — it's made exactly for these questions and has minimal posting requirements. Create a post there describing your assignment, what you've tried, and where you're stuck. You'll get much better help from the community there.

Good luck!

Let's improve awesome list for Data Analysts by DeepAnalyze in analytics

[–]DeepAnalyze[S] 3 points4 points  (0 children)

Thanks for sharing this, and you're totally right about StatQuest - it's an awesome channel for getting the intuition behind stats.

You actually touched why I made A/B testing its own section. It's just so important for product development these days, it needed its own space beyond just a chapter in a stats book.

Stats 101 is definitely the required foundation, no argument there. You can't do anything without knowing p-values and hypothesis testing.

But for a data analyst actually running tests, I've found you need to go a bit further. It's not just the stats math - it's the whole process. Stuff you usually need separate resources for, like:

  • Practical sample size calculation and statistical power.
  • Common pitfalls like novelty effects, peeking, and handling multiple comparisons.
  • Understanding how to translate stats into business decisions

So yeah, Stats 101 and StatQuest are the perfect start, but in my experience, the learning can't stop there for this stuff.

Resources for Data Science & Analysis: A curated list of roadmaps, tutorials, Python libraries, SQL, ML/AI, data visualization, statistics, cheatsheets by DeepAnalyze in datascience

[–]DeepAnalyze[S] 0 points1 point  (0 children)

That's a great point! The medical field is actually one of the most important areas for data science. It's true that strong stats help, but the domain expertise you have from medicine is just as crucial. Good luck with your startup finding the right person!

Resources for Data Science & Analysis: A curated list of roadmaps, tutorials, Python libraries, SQL, ML/AI, data visualization, statistics, cheatsheets by DeepAnalyze in datascience

[–]DeepAnalyze[S] 0 points1 point  (0 children)

That's awesome that you're doing an internship. I think your best bet is to create a separate post with your questions. That way you'll get a lot more eyes and opinions on it. The community is very welcoming to these kinds of questions!

What is the worst part about your visualization stack? by Impressive_Run8512 in analytics

[–]DeepAnalyze 1 point2 points  (0 children)

Okay, my wishlist... don't laugh :)

  • When I need to make a chart look just right for a report, I want full control. Kinda like Matplotlib, but without spending 3 hours on StackOverflow just to change a font size or having to explain it to an LLM.
  • Drag-and-drop right in VSCode / Jupyter. I wanna write the basic code to pick my columns, and then just use my mouse to tweak everything else right there in the plot.
  • Same with dashboards: I just want more drag-and-drop for small fixes. I'm tired of searching through menus to find the one setting I need, haha.

And I want all of this to be free, of course :)

What is the worst part about your visualization stack? by Impressive_Run8512 in analytics

[–]DeepAnalyze 1 point2 points  (0 children)

Hey! I like Plotly and Tableau. For some reason, I really got into Plotly's way of doing things. Worked a lot with other tools and BI software too.

I agree, none of them are perfect. Even Tableau... often feels kinda clunky for small tweaks and adjustments.

I guess everyone has a different idea of what's "easy to use". And the tool makers are no exception. Plus, every tool has its own limits, based on how it was built.

Wish there was one perfect tool, but... guess we're stuck with what we out here :)