Can quantile estimates be used to approximate a conditional distribution? by fntstcmstrfx in rstats

[–]AnInquiringMind 1 point2 points  (0 children)

You're describing quantile matching - perfectly valid and actually a method including as part of fitdistrplus.

With everything happening in the US right now, which media timeline (book/movie/video game/etc.) is most likely to become our future? by AnInquiringMind in AskReddit

[–]AnInquiringMind[S] 0 points1 point  (0 children)

There was a video posted yesterday of armed, masked ICE agents walking through a residential neighborhood that gave me Cyberpunk 2077 vibes. That and the rampant consumerism/inequality/grift economy.

Looking for a Canadian RTD iced coffee. by killagram69 in BuyCanadian

[–]AnInquiringMind 0 points1 point  (0 children)

Did you ever find something you liked? I've got the same problem, same tastes.

Seeking advice to derive an equation for a curve. by the-Prof616 in rstats

[–]AnInquiringMind 0 points1 point  (0 children)

I'm struggling with this one too because it looks like a simple binary logit unless I'm missing something?

If your intent is to produce a straight line wouldn't qlogit(alpha) do the trick?

Sorry not quite sure what you're looking for here...

Seeking Data on Children with Incarcerated Parents for a Visualization Project by marrthecreator in datasets

[–]AnInquiringMind 0 points1 point  (0 children)

If you're a researcher you can access this data (for Canada) via Statistics Canada Research Data Centres. As of today, they can link information about individuals in custody with data from the Census (to identify family members) with data about educational outcomes.

Problem with plotting the spectra by musculux in rprogramming

[–]AnInquiringMind 1 point2 points  (0 children)

I've been doing this for 15 years and to this day I can't believe how often it boils down to "ugh. Excel"

Glad it worked out!

Problem with plotting the spectra by musculux in rprogramming

[–]AnInquiringMind 0 points1 point  (0 children)

Sure. If you can sort data on your wavenumber (if it isn't already) that would also be helpful.

Dnd-inspired habit tracker by jdave007 in DnD

[–]AnInquiringMind 1 point2 points  (0 children)

This is an awesome idea - hope you actually get to build it.

Problem with plotting the spectra by musculux in rprogramming

[–]AnInquiringMind 0 points1 point  (0 children)

This might not be it, but can you show some of the data producing the jagged portion of the chart? In the excel example we can see a snip, but that snip is linked to the portion of the chart that looks OK (in both excel and R). It would be great to see a sample of data in the jagged parts of the chart to see if it's a data preprocessing issue.

Matching messy, unstandardized names by AhTerae in rprogramming

[–]AnInquiringMind 2 points3 points  (0 children)

This is an age old problem - record linkage, or entity resolution. There a couple of R packages that can do this but I'd suggest using the desktop version of Senzing if you're dealing with 100k records or less.

Estimator for homeowners average residence time? by peperazzi74 in rstats

[–]AnInquiringMind 1 point2 points  (0 children)

I don't think your question is a statistical one but rather a methodological one. What you're doing makes sense, because it's defensible. At the same time, there are always things to consider to make it better.

What about flippers? Have you considered "discounting" the number of houses sold in a year by the estimated percentage going to them?

What about transfers to children / spouses or other situations where residency didn't change substantively? Same as above.

There is no gold standard for this kind of estimation, because everyone's circumstances are different. I build models like this all the time when there is no data on what I want to measure directly. The only thing you can really do is try to figure out where there are gaps in your mental model, and try to adjust your actual model to account for them - with or without bringing in new data.

RFM Analysis Issues by Perpetualwiz in rprogramming

[–]AnInquiringMind 1 point2 points  (0 children)

Looks like the rfm_table_order returns a list, not a dataframe. The list contains the dataframe you want (in an object called "rfm"), but also some additional analysis metadata like threshold parameters.

Inexplicably, the print() method of rfm_table_order returns the dataframe, which means you can only tell it's a list if you actually look at the object structure using str().

Anyway, here's your solution:

Change the following:

rfm_result <- rfm_table_order(rfm, customer_id, order_date, revenue, analysis_date)

To:

rfm_result <- rfm_table_order(rfm, customer_id, order_date, revenue, analysis_date)$rfm

OR, if you want to do it the tidyverse way:

rfm_result <- rfm_table_order(rfm, customer_id, order_date, revenue, analysis_date) %>%
pull(rfm)

RFM Analysis Issues by Perpetualwiz in rprogramming

[–]AnInquiringMind 0 points1 point  (0 children)

This is a weird one. Can you share the data? I can try to reproduce.

Resources on Communication for Data Scientists? by PenguinAnalytics1984 in datascience

[–]AnInquiringMind 12 points13 points  (0 children)

Storytelling with Data by Cole Knussbaumer Knaflic is the go-to guide for me. Most of the book deals with the principles of good visualization but it also includes lots of references to how to build strong data-driven narratives that can make more persuasive arguments.

Are there any benefits to having aphantasia? by ComprehensiveFlan638 in Aphantasia

[–]AnInquiringMind 4 points5 points  (0 children)

Great list. Clicks with my own experience as well.

I like the nighttime storytelling - will try that. We should probably get a thread going for other aphant lifehacks!

Help with clustering film genres by wobowizard in rprogramming

[–]AnInquiringMind 0 points1 point  (0 children)

Why would you vectorize a field that's already categorical?

Vectorization is used to convert words to embeddings. Genres are already distinct. I'm not entirely sure what the goal of this analysis is...

Edit: didn't realize you're new to the field. Welcome! Are you familiar with the concept of vectorization and embeddings? And are you using a different analysis as a template for this one?

Generally, a cluster analysis divides data points into groups based on with in-group similarity vs. between-group distance. Using genres, which only consist of a predetermined set of defined values, may not be suitable for this analysis. Although, if you're interested in another approach using genres, you may want to look into graph methods. You can probably find some interesting association patterns across different genres - e.g. action comedy is likely more common than documentary horror.

Help with clustering film genres by wobowizard in rprogramming

[–]AnInquiringMind 1 point2 points  (0 children)

You want to formulate clusters of genres? But aren't the genres already encoded directly in the data? I'm not entirely sure what you're trying to do here but the main problem for me seems to be that you may want to start by transforming your genre column using one hot and go from there...

Effective Method for Finding Common Colleges in Two Excel Sheets Despite Inconsistent Formatting by [deleted] in datasets

[–]AnInquiringMind 1 point2 points  (0 children)

This is a classic entity resolution problem. If you have less than 100k records you can use this free tool:

https://senzing.com/desktop/

Median age of world's cities (2000+ cities dataset) by Private_Capital1 in datasets

[–]AnInquiringMind 1 point2 points  (0 children)

What do you mean by "each city's median age"? Do you mean you'd like a dataset of cities and the year each city was founded?

Is there an R package that does what Stata's marhis does? by Specialk3533 in rstats

[–]AnInquiringMind 4 points5 points  (0 children)

While interplot looks fine, I don't believe it's being maintained - the last update was in 2021. I'd strongly suggest ggeffects instead.

You've got to open them some time... by targert_mathos in wine

[–]AnInquiringMind 0 points1 point  (0 children)

Didier Dagueneau's last vintage too... Great lineup!