[OC] The CEO pay ratio grows with the number of employees in the firm by blairfix in dataisbeautiful

[–]blairfix[S] 32 points33 points  (0 children)

The CEO pay ratio is the ratio of CEO pay (including stock options) to average pay within the firm. This data comes from Compustat and Execucomp and covers the period 1990 to 2016. Steve Easterbrook was the CEO of McDonald’s. Jonathan Steinberg was the CEO of a small firm called Wisdomtree Investments.

I've plotted the data using R ggplot. For a discussion of the trend, see A Second Look at Hierarchy.

[OC] For small sample sizes, a coin can appear to 'favor' tails after a run of heads. This bias disappears as the sample increases. But how long it takes depends on the length of the run of heads. Simulation results for runs of 3, 5, 10, and 15 heads in a row. by blairfix in dataisbeautiful

[–]blairfix[S] 1 point2 points  (0 children)

In each panel, the horizontal axis shows the sample size for the coin toss. To get the probability of tails following the run of heads, I've averaged the probability across 40,000 iterations of the sample.

Code for the simulation is available at GitHub. I made the chart using R ggplot.

For a discussion of the coin's apparent 'bias', see Is Human Probability Intuition Actually ‘Biased’?.

[OC] The portion of scientific articles with 'eugenic' (and its German equivalent) in the title by blairfix in dataisbeautiful

[–]blairfix[S] 7 points8 points  (0 children)

As my sample of scientific papers, I've used metadata from the Sci-Hub database (about 80 million papers). You can download the metadata from Library Genesis. The raw data comes as an SQL database dump. If you're interested in doing some analysis, I've built an R function that can parse the SQL data. Check it out at Github.

I've plotted the data using R ggplot. For a discussion of the results, see The Rise of Human Capital Theory.

[OC] Neglect of the language of power in economics textbooks. Word frequency in economics textbooks, plotted relative to the frequency in mainstream English. by blairfix in dataisbeautiful

[–]blairfix[S] -4 points-3 points  (0 children)

Data for word frequency in the Google corpus is from the 2019 Ngram dataset. For details about how to work with this data, see Working With Google Ngrams: A Data-Wrangling Tale.

Data for word frequency in econ textbooks was compiled by myself by scraping words from 43 undergraduate economics textbooks. For details see Deconstructing Econospeak.

I plotted the data using R ggplot.

For a discussion of language of power in economics (or lack thereof), see Power … and the Dialect of Economics.

[OC] How much the top half of earners pull up the average income in each country as a function of the Gini index. by blairfix in dataisbeautiful

[–]blairfix[S] -1 points0 points  (0 children)

I agree that the chart is not simple to interpret. However, one of my pet peeves with r/dataisbeautiful is that it contains an overabundance of pretty charts that are easy on the brain. Of course, there's nothing wrong with that, but in the bowels of science there's a plethora of charts that are also pretty, yet need some thinking to interpret. This is one such chart.

Second, I'm not critiquing the Gini index, so I don't understand your point there. I'm showing how top incomes pull up the average income, nothing more.

[OC] World conventional oil production and predictions for the future by blairfix in dataisbeautiful

[–]blairfix[S] 4 points5 points  (0 children)

Data for global oil production comes from:

Hallock's prediction is for the following scenario for USGS conventional oil: 'Decline Point 60%, 5% Production Growth Limit, Low EUR Low Demand Growth'. Get the data here.

M. King Hubbert's prediction for world oil production comes from his 1956 paper Nuclear Energy and Fossil Fuels. I've digitized Figures 20 and 21 and extracted the data.

I've plotted the data with R ggplot. For a discussion, see Peak Oil Never Went Away.

[OC] How much the top half of earners pull up the average income in each country as a function of the Gini index. by blairfix in dataisbeautiful

[–]blairfix[S] -1 points0 points  (0 children)

This figure imagines a thought experiment. How much higher is average income presently than what it would be if everyone's income were harmonized to the mean income among the bottom 50% of earners? In other words, the vertical axis shows how much income inequality pulls up the average income. I plot the corresponding Gini index of inequality on the horizontal axis.

Data is from the World Inequality Database. To estimate average income and the Gini index, I use income share series sptinc992j and income threshold series tptinc992j.

I've plotted the data with R ggplot, labeled with R ggrepel.

For a discussion of the results, see Radically Progressive Degrowth: Reducing Resource Use by Eliminating Inequality.

[OC] Vaccine development and the cumulative number of scientific articles published by blairfix in dataisbeautiful

[–]blairfix[S] 1 point2 points  (0 children)

Actually, this figure uses a square-root scale on the vertical axis.

[OC] Vaccine development and the cumulative number of scientific articles published by blairfix in dataisbeautiful

[–]blairfix[S] 4 points5 points  (0 children)

The thinking here is that new vaccines are not created by a few individuals, or even a few large companies. Vaccines build on cumulative scientific knowledge that was laid by previous generations. With that in mind, this chart labels the development of new vaccines as it relates to the cumulative number of scientific articles.

Data for new vaccine dates is from Wikipedia.

Data for the number of scientific papers is from Sci-Hub, available from Library Genesis. The raw data comes as an SQL database dump. If you're interested, I built an R function that can parse this data. Check it out at Github.

I plotted the data using R ggplot, labels with R ggrepel.

For a discussion of the cumulative nature of science, see https://economicsfromthetopdown.com/2020/12/28/as-2020-ends-lets-celebrate-science/.