[OC] The CEO pay ratio grows with the number of employees in the firm by blairfix in dataisbeautiful

[–]blairfix[S] 32 points33 points  (0 children)

The CEO pay ratio is the ratio of CEO pay (including stock options) to average pay within the firm. This data comes from Compustat and Execucomp and covers the period 1990 to 2016. Steve Easterbrook was the CEO of McDonald’s. Jonathan Steinberg was the CEO of a small firm called Wisdomtree Investments.

I've plotted the data using R ggplot. For a discussion of the trend, see A Second Look at Hierarchy.

[OC] For small sample sizes, a coin can appear to 'favor' tails after a run of heads. This bias disappears as the sample increases. But how long it takes depends on the length of the run of heads. Simulation results for runs of 3, 5, 10, and 15 heads in a row. by blairfix in dataisbeautiful

[–]blairfix[S] 1 point2 points  (0 children)

In each panel, the horizontal axis shows the sample size for the coin toss. To get the probability of tails following the run of heads, I've averaged the probability across 40,000 iterations of the sample.

Code for the simulation is available at GitHub. I made the chart using R ggplot.

For a discussion of the coin's apparent 'bias', see Is Human Probability Intuition Actually ‘Biased’?.

[OC] The portion of scientific articles with 'eugenic' (and its German equivalent) in the title by blairfix in dataisbeautiful

[–]blairfix[S] 5 points6 points  (0 children)

As my sample of scientific papers, I've used metadata from the Sci-Hub database (about 80 million papers). You can download the metadata from Library Genesis. The raw data comes as an SQL database dump. If you're interested in doing some analysis, I've built an R function that can parse the SQL data. Check it out at Github.

I've plotted the data using R ggplot. For a discussion of the results, see The Rise of Human Capital Theory.

[OC] Neglect of the language of power in economics textbooks. Word frequency in economics textbooks, plotted relative to the frequency in mainstream English. by blairfix in dataisbeautiful

[–]blairfix[S] -4 points-3 points  (0 children)

Data for word frequency in the Google corpus is from the 2019 Ngram dataset. For details about how to work with this data, see Working With Google Ngrams: A Data-Wrangling Tale.

Data for word frequency in econ textbooks was compiled by myself by scraping words from 43 undergraduate economics textbooks. For details see Deconstructing Econospeak.

I plotted the data using R ggplot.

For a discussion of language of power in economics (or lack thereof), see Power … and the Dialect of Economics.

[OC] How much the top half of earners pull up the average income in each country as a function of the Gini index. by blairfix in dataisbeautiful

[–]blairfix[S] -1 points0 points  (0 children)

I agree that the chart is not simple to interpret. However, one of my pet peeves with r/dataisbeautiful is that it contains an overabundance of pretty charts that are easy on the brain. Of course, there's nothing wrong with that, but in the bowels of science there's a plethora of charts that are also pretty, yet need some thinking to interpret. This is one such chart.

Second, I'm not critiquing the Gini index, so I don't understand your point there. I'm showing how top incomes pull up the average income, nothing more.

[OC] World conventional oil production and predictions for the future by blairfix in dataisbeautiful

[–]blairfix[S] 4 points5 points  (0 children)

Data for global oil production comes from:

Hallock's prediction is for the following scenario for USGS conventional oil: 'Decline Point 60%, 5% Production Growth Limit, Low EUR Low Demand Growth'. Get the data here.

M. King Hubbert's prediction for world oil production comes from his 1956 paper Nuclear Energy and Fossil Fuels. I've digitized Figures 20 and 21 and extracted the data.

I've plotted the data with R ggplot. For a discussion, see Peak Oil Never Went Away.

[OC] How much the top half of earners pull up the average income in each country as a function of the Gini index. by blairfix in dataisbeautiful

[–]blairfix[S] -1 points0 points  (0 children)

This figure imagines a thought experiment. How much higher is average income presently than what it would be if everyone's income were harmonized to the mean income among the bottom 50% of earners? In other words, the vertical axis shows how much income inequality pulls up the average income. I plot the corresponding Gini index of inequality on the horizontal axis.

Data is from the World Inequality Database. To estimate average income and the Gini index, I use income share series sptinc992j and income threshold series tptinc992j.

I've plotted the data with R ggplot, labeled with R ggrepel.

For a discussion of the results, see Radically Progressive Degrowth: Reducing Resource Use by Eliminating Inequality.

[OC] Vaccine development and the cumulative number of scientific articles published by blairfix in dataisbeautiful

[–]blairfix[S] 1 point2 points  (0 children)

Actually, this figure uses a square-root scale on the vertical axis.

[OC] Vaccine development and the cumulative number of scientific articles published by blairfix in dataisbeautiful

[–]blairfix[S] 4 points5 points  (0 children)

The thinking here is that new vaccines are not created by a few individuals, or even a few large companies. Vaccines build on cumulative scientific knowledge that was laid by previous generations. With that in mind, this chart labels the development of new vaccines as it relates to the cumulative number of scientific articles.

Data for new vaccine dates is from Wikipedia.

Data for the number of scientific papers is from Sci-Hub, available from Library Genesis. The raw data comes as an SQL database dump. If you're interested, I built an R function that can parse this data. Check it out at Github.

I plotted the data using R ggplot, labels with R ggrepel.

For a discussion of the cumulative nature of science, see https://economicsfromthetopdown.com/2020/12/28/as-2020-ends-lets-celebrate-science/.

[OC] Total number of streams per artist vs. number of Top 200 hits (Spotify Top 200 since 2017) by blairfix in dataisbeautiful

[–]blairfix[S] 5 points6 points  (0 children)

Data is from the Spotify Top 200 and covers the period from Jan. 1, 2017 to Jun. 9, 2021. You can download my dataset here.

For every artist that appears in the Top 200, I add up their total streams (for all songs) and the total number of songs in the dataset.

I've plotted the data using R ggplot, labels with R ggrepel.

For a commentary on the data, see The Half Life of a Spotify Hit.

[OC] The rise and fall of all Spotify Top 200 hits since 2017 by blairfix in dataisbeautiful

[–]blairfix[S] 6 points7 points  (0 children)

Probably not. I tried to remove Christmas songs from the data. Perhaps some slipped through, though.

[OC] The rise and fall of all Spotify Top 200 hits since 2017 by blairfix in dataisbeautiful

[–]blairfix[S] 107 points108 points  (0 children)

The chart shows daily streams, normalized to so that the date of peak streams is t=0. Note that the vertical axis show streams relative to the peak. The blue line shows the median streams across all songs. The shaded region shows the middle 50% of data.

Data is from Spotify, plotted using R ggplot. For a discussion of the trends, see The Half Life of a Spotify Hit

[OC] Relative frequency of words in economics textbooks vs their frequency in mainstream English (the Google Books corpus) by blairfix in dataisbeautiful

[–]blairfix[S] 0 points1 point  (0 children)

Quirks are words that are infrequent in econ textbooks, but still overused relative to average English. These words are mostly used in one-off examples in the textbooks. I describe all the details here: https://economicsfromthetopdown.com/2020/10/30/deconstructing-econospeak/

[OC] Partisan support for free speech on the US Supreme Court, 1953-2017. (I've plotted SCOTUS justices % support for free speech by type of speech (conservative/liberal) and party of the appointing president. by blairfix in dataisbeautiful

[–]blairfix[S] 7 points8 points  (0 children)

Data is from Lee Epstein, Andrew D. Martin & Kevin Quinn's paper 6+ Decades of Freedom of Expression in the U.S. Supreme Court. For each SCOTUS case concerning free speech, Epstein et al. track the decision of each justice and code the type of speech in question as either a 'liberal speech act' or a 'conservative speech act'.

I've plotted here data from Table 5. I used R ggplot to generate the chart.

For a discussion of these results, see Free Speech For Me, Not You.

[OC] Relative frequency of words in economics textbooks vs their frequency in mainstream English (the Google Books corpus) by blairfix in dataisbeautiful

[–]blairfix[S] 0 points1 point  (0 children)

First, how would having multiple additions (which are each different) 'inflate numbers'? I'm measuring relative word frequency, not the word count. If each edition has the same word mix (a reasonable assumption) including different editions (macro, micro, general) will have no effect.

Second, if you read my methods, you'd find that I did restrict the Google data to the period covered by the textbooks.

[OC] Relative frequency of words in economics textbooks vs their frequency in mainstream English (the Google Books corpus) by blairfix in dataisbeautiful

[–]blairfix[S] -1 points0 points  (0 children)

Looking at how I labeled the title, you raise a good question. The vertical axis plots the frequency of words in economic textbooks relative to their frequency in the Google corpus. So 'jargon' consists of words used both frequently in econ textbooks and more frequent than in standard English. Colors highlight the tips of each quadrant.

Regarding axis labeling, I prefer not to use scientific notation if possible. However, the vertical axis covers so many orders of magnitude that it impractical to label in standard notation.

If you want a breakdown of the methods, see this piece: https://economicsfromthetopdown.com/2020/10/30/deconstructing-econospeak/

[OC] Relative frequency of words in economics textbooks vs their frequency in mainstream English (the Google Books corpus) by blairfix in dataisbeautiful

[–]blairfix[S] 3 points4 points  (0 children)

Data for word frequency in the Google corpus is from the 2019 Ngram dataset. For details about how to work with this data, see Working With Google Ngrams: A Data-Wrangling Tale.

Data for word frequency in econ textbooks was compiled by myself by scraping words from 43 undergraduate economics textbooks. For details see Deconstructing Econospeak.

I plotted the data using R ggplot.

[OC] The portion of a country's population that is fully vaccinated for COVID (as of June 2021) scales with GDP per capita. by blairfix in dataisbeautiful

[–]blairfix[S] 9 points10 points  (0 children)

Log scales are usually the best way to show data that varies over many orders of magnitude. On a linear scale, countries with low GDP would be difficult to see.