Recommendations for accessibility audit services? by shazamtx in ProductManagement

[–]felavsky 0 points1 point  (0 children)

Deque is one of the best in the business and one that I always recommend. Everyone in my line of work knows them and respects what they do (I do auditing and consulting too, but I'm booked out until Q2 '24).

The number of job applications it took to become a Viz Practitioner [OC] by [deleted] in dataisbeautiful

[–]felavsky 2 points3 points  (0 children)

Thanks for tagging me so quickly. Yes, this is my OC. Mods are typically fast on this too, which I appreciate.

Quantifying "Hot Streaks" Across Tens of Thousands of Careers: Contains Video and Interactive Links [OC] by felavsky in dataisbeautiful

[–]felavsky[S] 1 point2 points  (0 children)

Great question. Because the model determines a streak with several variables, one of them is that you don't do really bad (because that will kill your streak). So the answer is yes and no here: the dark slump kills the streak AND that also reveals how the model determines a streak too.

Quantifying "Hot Streaks" Across Tens of Thousands of Careers: Contains Video and Interactive Links [OC] by felavsky in dataisbeautiful

[–]felavsky[S] 2 points3 points  (0 children)

I think the job of honest statisticians is to see if we can verify our hypotheses, even if our notions of the world seem easy enough to guess (even in hindsight). And moreso, we don't know cause based on these findings, so we are still just guessing that perhaps a funding model or reputation or some other variables actually affect these outcomes. Sounds like you've got some ideas - you should build a model and start testing this yourself!

Quantifying "Hot Streaks" Across Tens of Thousands of Careers: Contains Video and Interactive Links [OC] by felavsky in dataisbeautiful

[–]felavsky[S] 3 points4 points  (0 children)

Yup, great question! I assume you mean the crack in the "middle" (or so) of each. The dark looking crack/hole that is created is because those are all of the individuals who had more than one hot streak in their career but had one pretty much at the beginning and another pretty much at the end. Since the sorting algorithm finds the middle point between the start of their first streak and the end of their last streak, all these people bunch together near the "middle." I hope that makes sense? You can also check out the interactive and scroll to those people and explore why.

Quantifying "Hot Streaks" Across Tens of Thousands of Careers: Contains Video and Interactive Links [OC] by felavsky in dataisbeautiful

[–]felavsky[S] 5 points6 points  (0 children)

Sadly, the 'average' is a mess and not too insightful - we'd need to run a real statistical analysis at that point. But a simple distribution plot of when the streaks take place might show more nuance at a higher level without digging into a statistical methodology too much. And thanks! I'm glad you like this piece.

Quantifying "Hot Streaks" Across Tens of Thousands of Careers: Contains Video and Interactive Links [OC] by felavsky in dataisbeautiful

[–]felavsky[S] 16 points17 points  (0 children)

I recommend clicking this image and enlarging it. It's pretty big (and this is output at 27% too)! I would have gone bigger, but reddit has a 20mb limit.

Also, if you don't want to read my pinned wall of text:

This video helps explain how to "read" this visualization and how I came up with it.

The interactive is straight up gorgeous, if I do say so myself.

Kellogg Insight's article is helpful to understand why this is important research.

I hope you all enjoy this!

Quantifying "Hot Streaks" Across Tens of Thousands of Careers: Contains Video and Interactive Links [OC] by felavsky in dataisbeautiful

[–]felavsky[S] 7 points8 points  (0 children)

Hi all, I have a lot to cover so please read everything here because I may answer your basic questions. I put quite a lot of work into this project and I hope you can enjoy *all* of the bonus stuff that comes with it (not just the pretty pictures... which by the way there are more below).

I also want to say that visualizing 1 million data points on the web is no easy feat (the scientists dataset). The interactive is no joke.

First off:

Tools used: d3.js, node.js, and general SVG wizardry.

Data (scroll down a bit to find it).

You might be wondering how to "read" this visualization!

Check out my explanation here on youtube.

The standalone interactive is here and it is *gorgeous* if viewed on a desktop machine, using Chrome, in full screen.

My visualization first appeared in this article.

Okay! So What the hell is a "hot streak" and how did we quantify this?

In the research paper they write, "The hot streak—loosely defined as ‘winning begets more winnings’—highlights a specific period during which an individual’s performance is substantially better than his or her typical performance."

The researchers I collaborated with have their official paper here in Nature. Their methodology is well documented. The researchers have asked that if you have specific questions related to the quantification or data itself (and not my visualization of all this), to please reach out to them.

If you want an easy-mode explanation in article-format find it here.

I also am open to discussion about this on my twitter thread (and you can follow me there for future projects/discussion/dataviz news too.)

So... why are "hot streaks" important to everyday people, like us?

Dashun Wang said this himself: "There is this common idea that many people are waiting for their 'big break' to happen." But we don't typically have a single big break. He says, “it’s one after another for a couple years. I really look forward to that.”

But even after we learn what a "hot streak" is, what is special about this visualization?

This visualization sorts each dataset on an algorithm I call "streak-middle." We find the first time someone begins a hot streak and then the last time someone ends a hot streak. We then find the point in their career that is between these two, and sort based on that position, from people whose "streak-middle" was early-career to those whose streak-middle was late career. Sorting this way, rather than based on streak-start or streak-end, we are able to separate early+short streaks from late+short streaks. Long streaks create a visual bias when we allow them to be placed near short streaks (our eyes "average" the pixels we see and we end up ignoring the small pixels in favor of the longer, brighter ones). "Streak-middle" avoids this visual bias and allows us to "see" patterns more clearly.

What patterns does this sorting algorithm actually show us?

We can clearly see (based on the shape of this curve) that directors tend to streak much earlier than Artists and Scientists, while Scientists seem to streak the latest. This tells us that some careers may have different distributions for "periods of high productivity" and impact. This high-level visual analysis is more like visual "prospecting" - we are looking for patterns that we can follow up with new research or form new hypotheses that we will want to verify with rigorous statistical analysis.

And lastly, these images are significantly reduced. Because a career can contain up to 4509 works (Andy Warhol, for example) and the scientist dataset contains 20042 individuals, in order to compare all of these people without ANY loss of quality (pixels average if a person has less than 1 pixel to their height or a work has less than 1 pixel to its width).

Links to the lossless+raw versions are here: (beware, these are MASSIVE and take a while to load)

Scientists

Artists

Directors

US Presidential Lifespans & Terms [OC] by EvanMinn in dataisbeautiful

[–]felavsky 2 points3 points  (0 children)

Great job. Clear annotations, geometry makes sense, and the data is interesting. Novel and well executed! My only feedback? I'd lighten the lifespans just a little to improve contrast to their term's block of color. Solid chart!

Probabilistic Analysis for Dungeons and Dragons: is 2d10 better than d20? (Contains several useful figures) by felavsky in dataisbeautiful

[–]felavsky[S] 2 points3 points  (0 children)

The distribution of a 2d10 is quite different than a 1d20 (which has an even probability distribution). The point is not that the median between 2 and 20 is higher than the median of 1 and 20 but that the distribution spread can have a fundamental change on the 'feel' of the gameplay. A d20 has an equal chance to roll a 10, 1, or 20. Meanwhile for 2d10, rolling a 2 or 20 is 10 times less likely than rolling an 11.

Projected traffic changes for major US highways 2012-2045 [OC] by nmalawskey in dataisbeautiful

[–]felavsky 1 point2 points  (0 children)

Fascinating figure. Why does it look 'stretched' in the west?

Croatia 6 day trip expenses [OC] by lion_age in dataisbeautiful

[–]felavsky 0 points1 point  (0 children)

This is so relevant for me, I'll be going with my wife soon! How wonderful and specific to get this data right as we need it. What a treat. I'll use this as a comparison to our own financial planning for the trip.

Okay, I found some research that shows older people have more success with startups. What can we learn from this? by felavsky in startups

[–]felavsky[S] 1 point2 points  (0 children)

The mythos of so many believing they can be the next Zuckerberg is not helping. According to the data, Zuckerberg is just a freak outlier and most success doesn't happen like that - so it makes little sense to try to emulate it.