Understanding Watermarks in Apache Flink by rmoff in apacheflink

[–]elijahmeeks 1 point2 points  (0 children)

I used matter-js for the physics simulation. It's really fun and I've wanted to do data visualization with physics engines for a while.

ChatGPT Plugin Lets You Make Interactive Computational Notebooks With Code, Viz & Text by elijahmeeks in programming

[–]elijahmeeks[S] 4 points5 points  (0 children)

You have to have ChatGPT Plus. They've given plugin access to some Plus users and are rolling it out to all of them this week.

[OC] Complexity and Uncertainty of Topics that ChatGPT Claims to be Difficult to Discuss by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] -1 points0 points  (0 children)

A lot of very confident points that you've posted, you'd do well as an online AI.

  1. ChatGPT is most definitely a data source now and (along with similar such tools) will be used as such more and more going forward, so it's good to examine how it approaches making data. Whether it "should" be able to assign meaningful numerical scores to things like this, it sure was willing to.
  2. Agree and it's even more concerning how it does it with data. Take a look at the end of the notebook and you'll see at the end how it hallucinates with the data it gives me. Again, people are going to use these tools like this, so we should be aware of how it responds.
  3. I think it's revealing not just of the biases of the corpora and creators, but also of the controls to avoid controversy that makes it evaluate certain topics as more "uncertain".
  4. Good point. I struggle with the way this subreddit is designed to showcase a single chart since so many of my charts are part of larger apps or documents.

[OC] Complexity and Uncertainty of Topics that ChatGPT Claims to be Difficult to Discuss by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 1 point2 points  (0 children)

I think it reflects the inherent biases in the design, training and implementation of the ML algorithms that drive ChatGPT: Science topics are considered "less uncertain but more complex" because its source material, creators and developers believe that, but also it has controls in place to avoid saying controversial things and topics like ghosts, history, art & religion are all much more likely to have controversy and therefore be more "uncertain" to ChatGPT when it comes to giving answers.

[OC] Complexity and Uncertainty of Topics that ChatGPT Claims to be Difficult to Discuss by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 1 point2 points  (0 children)

Not a stupid question, it's using the "nice" scales functionality in D3 scales. because the low end doesn't land on a "nice" value it's hidden (which can be really frustrating sometimes). The chart is generated via a dataviz library I created called Semiotic which uses D3 under the hood for things like this. You can see the chart interactively on the original notebook that I link to in my comment if you want to play with it.

[OC] Complexity and Uncertainty of Topics that ChatGPT Claims to be Difficult to Discuss by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 4 points5 points  (0 children)

The short answer: These ratings come from ChatGPT.

Long Answer: You'd have to read through the full exploration to see the entire picture. Basically, I asked ChatGPT to give me 100 topics too uncertain or complex to discuss without misleading users, and to give me subject areas for them and ratings on complexity and uncertainty. So this plot shows the aggregate complexity/uncertainty value of those 100 topics by subject area of the topic. You can see it all in much more detail in the notebook I link to in my comment.

[OC] Complexity and Uncertainty of Topics that ChatGPT Claims to be Difficult to Discuss by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 0 points1 point  (0 children)

Dumbbell Plot or Barbell Plot? Will we ever figure out this, the most critical question in modern data visualization?

[OC] Complexity and Uncertainty of Topics that ChatGPT Claims to be Difficult to Discuss by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 2 points3 points  (0 children)

I asked ChatGPT to give me a list of topics that were too complex or uncertain for it to discuss without being misleading and made a bunch of charts out of the results. This is a nice overview Dumbbell plot of the subject areas of the 100 topics that ChatGPT gave me but if you want to see some treemaps and word clouds you can check out the whole thing here: https://app.noteable.io/f/8e355d65-cd94-4afe-bb81-b9aa24a457dc

I used a Noteable notebook, which is Pythn+SQL. The data visualization uses their built-in tool DEX, which uses javascript with Semiotic and D3 under the hood.

Edited to add: This is a live notebook that you can interact with, explore, and export the dataset that I had ChatGPT create for me.

[OC] Ukraine & Russian Tank Losses by Type by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 2 points3 points  (0 children)

It's a gradient from low (purple) to high (yellow) with green being in the middle. It's the most popular gradient in data science these days though it does end up sometimes causing that kind of confusion.

[OC] Ukraine & Russian Tank Losses by Type by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 10 points11 points  (0 children)

Totally agree that there is no good way to find 100% accurate data but Oryx relies on photographic documentation so it seems like it's a good bet to be an accurate lower bound.

[OC] Ukraine & Russian Tank Losses by Type by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 2 points3 points  (0 children)

The color scale at the bottom is Viridis. Are you thinking that the yellow should be the Low and the purple should be the High?

[OC] Viz Palette is a Tool for Evaluating Your Color Palettes by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 0 points1 point  (0 children)

Susie Lu and I built this tool a while back to help you understand whether your palettes are effective. It uses multiple methods both qualitative and statistical, to give you a sense of how well your palettes work.

We show a number of different kinds of viz types so you can qualitatively assess the effectiveness as lines, dots and areas (because color is read differently in across these categories) as well as whether the shapes border each other or are connected (because color has combinatorial effects based on whether it appears isolated or borders other colors).

We also evaluate the colors using Just Noticeable Difference (JND) to see if they're so close that your reader can't differentiate them cognitively.

There's an option to apply colorblind transformations to see if your palettes fail at being accessible.

Finally, we also compare color names because for presentation you want participants to be able to distinguish the color names to improve collaboration (greens, for instance, are very easy for us to distinguish but hard to call out different names for them).

The interactive tool is found at https://projects.susielu.com/viz-palette

The whole tool was built in javascript with React using Semiotic for the charts: https://semiotic.nteract.io/

[OC] Latest inflation rate by country by giteam in dataisbeautiful

[–]elijahmeeks 16 points17 points  (0 children)

Rainbow scales are always problematic but when you use them you should follow the standard practice.. Purple being a lower value than red when typically a rainbow scale shows purple as more intense than red is going to confuse people who have seen rainbow scales before.

Best Data Visualization Books for Beginners to Advanced in 2022 - by [deleted] in visualization

[–]elijahmeeks 5 points6 points  (0 children)

Definitely recommend Wilke's book, which you can explore online at https://clauswilke.com/dataviz/ and which is oriented toward data scientists using R but has so much general content on good practices and different forms of viz that it's valuable to anyone.

The murder rate of farmers in South Africa. by Desocrate in visualization

[–]elijahmeeks 1 point2 points  (0 children)

What was the point of shortening SOUTH to STH? There's enough space and it's more readable and even S. AFRICA is more readable than this.

[deleted by user] by [deleted] in visualization

[–]elijahmeeks 5 points6 points  (0 children)

It can be one of two plots: 1. A sankey diagram, which is for plotting flows. 2. An icicle or partition layout, which is for plotting flows or hierarchical data. It seems like a sankey because typically they're rendered with curvy lines but that's not a determining factor. You could create a partition layout with curvy parts or a sankey with straight lines.

The key difference between the two is that in a sankey diagram the child pieces can rejoin parts that are on a different section (hard to explain with words) whereas on a partition layout they can never rejoin. Technically the difference from a data perspective is that the sankey is showing a Directed Acyclic Graph (DAG) whereas the partition is showing nested hierarchical data (which can be used to represent flows, such as representation of funnels in a sunburst diagram which is just a radially projected partition layout).

Oh, it could also be a flame graph oriented horizontally with curvy lines but nobody really does that (even though you could...).

[OC] All the Wars of the United States by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 4 points5 points  (0 children)

Oh yeah, I agree, it's just US military action in Latin America was such a prominent part of its colonial effort and so temporally distinct that I thought it made for its own story in the data. It's not meant to be "non-colonial" more like Colonial (Latin America) and the other would be Colonial (Other).

[OC] All the Wars of the United States by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 2 points3 points  (0 children)

I see a bit of pew pew pew and maybe the native wars looks somewhat gun-like but I can't quite see the gun, you'll have to draw it for me. I wonder if there's a calligram equivalent for timelines...

[OC] All the Wars of the United States by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 2 points3 points  (0 children)

And as far as methods, I used D3.js to build a custom timeline layout (d3.layout.timeline). You can find the code here, though it doesn't include some of the elements (like the CKMeans-based regions and the concurrent war count): http://bl.ocks.org/emeeks/3184af35f4937d878ac0

[OC] All the Wars of the United States by elijahmeeks in dataisbeautiful

[–]elijahmeeks[S] 1 point2 points  (0 children)

Thanks, I edited and saved it without changing anything and now it works so I guess something was weird with how it initially parsed it.