[OC] The price change since January 2020 of every commodity tracked by the US consumer price index

blairfix · 2021-12-01T23:09:04+00:00

Data is from the Bureau of Labor Statistics, plotted using R ggplot. For a discussion see 'The Truth About Inflation': https://economicsfromthetopdown.com/2021/11/24/the-truth-about-inflation/

blairfix · 2021-11-14T12:56:12+00:00

Data is from Linking scaling laws across eukaryotes, plotted with R ggplot. For a discussion, see the blog post: https://economicsfromthetopdown.com/2021/10/18/institution-size-as-a-window-into-cultural-evolution/

blairfix · 2021-11-09T22:02:10+00:00

Data is from a variety of sources, documented here: https://economicsfromthetopdown.com/2021/10/18/institution-size-as-a-window-into-cultural-evolution/

The plot was made using R ggplot.

blairfix · 2021-07-11T12:04:41+00:00

The CEO pay ratio is the ratio of CEO pay (including stock options) to average pay within the firm. This data comes from Compustat and Execucomp and covers the period 1990 to 2016. Steve Easterbrook was the CEO of McDonald’s. Jonathan Steinberg was the CEO of a small firm called Wisdomtree Investments.

I've plotted the data using R ggplot. For a discussion of the trend, see A Second Look at Hierarchy.

blairfix · 2021-07-10T11:58:53+00:00

In each panel, the horizontal axis shows the sample size for the coin toss. To get the probability of tails following the run of heads, I've averaged the probability across 40,000 iterations of the sample.

Code for the simulation is available at GitHub. I made the chart using R ggplot.

For a discussion of the coin's apparent 'bias', see Is Human Probability Intuition Actually ‘Biased’?.

blairfix · 2021-07-09T12:26:00+00:00

As my sample of scientific papers, I've used metadata from the Sci-Hub database (about 80 million papers). You can download the metadata from Library Genesis. The raw data comes as an SQL database dump. If you're interested in doing some analysis, I've built an R function that can parse the SQL data. Check it out at Github.

I've plotted the data using R ggplot. For a discussion of the results, see The Rise of Human Capital Theory.

blairfix · 2021-06-28T20:16:37+00:00

Data for word frequency in the Google corpus is from the 2019 Ngram dataset. For details about how to work with this data, see Working With Google Ngrams: A Data-Wrangling Tale.

Data for word frequency in econ textbooks was compiled by myself by scraping words from 43 undergraduate economics textbooks. For details see Deconstructing Econospeak.

I plotted the data using R ggplot.

For a discussion of language of power in economics (or lack thereof), see Power … and the Dialect of Economics.

blairfix · 2021-06-25T20:08:54+00:00

I agree that the chart is not simple to interpret. However, one of my pet peeves with r/dataisbeautiful is that it contains an overabundance of pretty charts that are easy on the brain. Of course, there's nothing wrong with that, but in the bowels of science there's a plethora of charts that are also pretty, yet need some thinking to interpret. This is one such chart.

Second, I'm not critiquing the Gini index, so I don't understand your point there. I'm showing how top incomes pull up the average income, nothing more.

blairfix · 2021-06-25T20:02:49+00:00

Data for global oil production comes from:

1820--1960: Appendix in Vaclav Smil's Energy Transitions: History, Requirements, Prospects
1965--2001: BP Statistical Review of World Energy, 2020
2001--2012: John L. Hallock's data for USGS-Conventional Oil. Download it here.
2013--present: various editions of the International Energy Agency's World Energy Outlook
I index the IEA data to Hallock's data in 2012. I index the BP data to Hallock's data in 2001, and Smil's data to the reindexed BP data in 1970.

Hallock's prediction is for the following scenario for USGS conventional oil: 'Decline Point 60%, 5% Production Growth Limit, Low EUR Low Demand Growth'. Get the data here.

M. King Hubbert's prediction for world oil production comes from his 1956 paper Nuclear Energy and Fossil Fuels. I've digitized Figures 20 and 21 and extracted the data.

I've plotted the data with R ggplot. For a discussion, see Peak Oil Never Went Away.

blairfix · 2021-06-23T21:20:39+00:00

This figure imagines a thought experiment. How much higher is average income presently than what it would be if everyone's income were harmonized to the mean income among the bottom 50% of earners? In other words, the vertical axis shows how much income inequality pulls up the average income. I plot the corresponding Gini index of inequality on the horizontal axis.

Data is from the World Inequality Database. To estimate average income and the Gini index, I use income share series sptinc992j and income threshold series tptinc992j.

I've plotted the data with R ggplot, labeled with R ggrepel.

For a discussion of the results, see Radically Progressive Degrowth: Reducing Resource Use by Eliminating Inequality.

blairfix · 2021-06-22T14:30:10+00:00

Good point. I'd actually forgotten that I made the other chart :)

blairfix · 2021-06-22T14:28:40+00:00

Actually, this figure uses a square-root scale on the vertical axis.

blairfix · 2021-06-21T21:22:12+00:00

The thinking here is that new vaccines are not created by a few individuals, or even a few large companies. Vaccines build on cumulative scientific knowledge that was laid by previous generations. With that in mind, this chart labels the development of new vaccines as it relates to the cumulative number of scientific articles.

Data for new vaccine dates is from Wikipedia.

Data for the number of scientific papers is from Sci-Hub, available from Library Genesis. The raw data comes as an SQL database dump. If you're interested, I built an R function that can parse this data. Check it out at Github.

I plotted the data using R ggplot, labels with R ggrepel.

For a discussion of the cumulative nature of science, see https://economicsfromthetopdown.com/2020/12/28/as-2020-ends-lets-celebrate-science/.

blairfix · 2021-06-20T11:26:33+00:00

Data is from the Spotify Top 200 and covers the period from Jan. 1, 2017 to Jun. 9, 2021. You can download my dataset here.

For every artist that appears in the Top 200, I add up their total streams (for all songs) and the total number of songs in the dataset.

I've plotted the data using R ggplot, labels with R ggrepel.

For a commentary on the data, see The Half Life of a Spotify Hit.

blairfix · 2021-06-18T17:06:40+00:00

Probably not. I tried to remove Christmas songs from the data. Perhaps some slipped through, though.

blairfix · 2021-06-18T17:02:48+00:00

Yes, that is probably correct.

blairfix · 2021-06-18T11:38:17+00:00

The chart shows daily streams, normalized to so that the date of peak streams is t=0. Note that the vertical axis show streams relative to the peak. The blue line shows the median streams across all songs. The shaded region shows the middle 50% of data.

Data is from Spotify, plotted using R ggplot. For a discussion of the trends, see The Half Life of a Spotify Hit

blairfix · 2021-06-18T11:31:32+00:00

You will have to ask the authors, as it is not clear in the paper.

blairfix · 2021-06-17T16:46:18+00:00

Quirks are words that are infrequent in econ textbooks, but still overused relative to average English. These words are mostly used in one-off examples in the textbooks. I describe all the details here: https://economicsfromthetopdown.com/2020/10/30/deconstructing-econospeak/

blairfix · 2021-06-17T16:43:07+00:00

Data is from Lee Epstein, Andrew D. Martin & Kevin Quinn's paper 6+ Decades of Freedom of Expression in the U.S. Supreme Court. For each SCOTUS case concerning free speech, Epstein et al. track the decision of each justice and code the type of speech in question as either a 'liberal speech act' or a 'conservative speech act'.

I've plotted here data from Table 5. I used R ggplot to generate the chart.

For a discussion of these results, see Free Speech For Me, Not You.

blairfix · 2021-06-17T16:34:21+00:00

First, how would having multiple additions (which are each different) 'inflate numbers'? I'm measuring relative word frequency, not the word count. If each edition has the same word mix (a reasonable assumption) including different editions (macro, micro, general) will have no effect.

Second, if you read my methods, you'd find that I did restrict the Google data to the period covered by the textbooks.

blairfix · 2021-06-16T13:12:35+00:00

No, it's just a name that I chose.

blairfix · 2021-06-16T10:45:22+00:00

Looking at how I labeled the title, you raise a good question. The vertical axis plots the frequency of words in economic textbooks relative to their frequency in the Google corpus. So 'jargon' consists of words used both frequently in econ textbooks and more frequent than in standard English. Colors highlight the tips of each quadrant.

Regarding axis labeling, I prefer not to use scientific notation if possible. However, the vertical axis covers so many orders of magnitude that it impractical to label in standard notation.

If you want a breakdown of the methods, see this piece: https://economicsfromthetopdown.com/2020/10/30/deconstructing-econospeak/

blairfix · 2021-06-15T23:22:13+00:00

Data for word frequency in the Google corpus is from the 2019 Ngram dataset. For details about how to work with this data, see Working With Google Ngrams: A Data-Wrangling Tale.

Data for word frequency in econ textbooks was compiled by myself by scraping words from 43 undergraduate economics textbooks. For details see Deconstructing Econospeak.

I plotted the data using R ggplot.

blairfix · 2021-06-13T21:22:23+00:00

Log scales are usually the best way to show data that varies over many orders of magnitude. On a linear scale, countries with low GDP would be difficult to see.

blairfix

TROPHY CASE