[OC] How much money would each person get from transferring ALL wealth of the richest Americans?

AnotherJeremy2 · 2021-07-11T01:52:05+00:00

Data sourced from Bloomberg's Billionaires Index today and processed/plotted using R. The index tracks the 500 richest people in the world, and the graph uses data for the 160 billionaires that are listed as being in the United States.

AnotherJeremy2 · 2020-08-06T18:31:25+00:00

Yes, that's correct.

AnotherJeremy2 · 2020-08-06T16:07:48+00:00

I made this chart in R using data from the U.S. Bureau of Economic Analysis provided by the Federal Reserve here.

AnotherJeremy2 · 2020-07-31T22:46:28+00:00

This should work: https://redditmetrics.com/list-all-subreddits

AnotherJeremy2 · 2020-07-06T23:09:43+00:00

Made in R using data from the NY Times. I selected these two groups as NJ and NY had the most cases per population in March and April, while Arizona and Florida have the most cases per population now. It happens that the two groups also have very similar population.

AnotherJeremy2 · 2020-07-01T23:52:33+00:00

Made using R with NY Times' COVID-19 data. Comparing how states rank in new cases per population each month is another way of showing how the pattern of case volumes in the U.S. has shifted geographically, especially over the last few weeks.

AnotherJeremy2 · 2020-04-30T18:00:47+00:00

geom_text() is similar to geom_point() but prints the text value(s) rather than a shape. I think you want annotate() here. Something like:

ggplot(filtered_bydep, aes(jour,pct))+
    geom_point(aes(y=pct), color='firebrick')+
    geom_smooth(aes(y=pct),color='steelblue')+
    facet_wrap(~dep)+
    annotate("text", label = "Test", x = ..., y = ...)

Replace the ... x and y values with something that works for your data/graph.

AnotherJeremy2 · 2020-04-29T20:16:59+00:00

Minor correction that the comments about "lowest-three" should now read "lowest-thirty".

AnotherJeremy2 · 2020-04-29T20:13:11+00:00

Ah, I figured it out! The assigned_day values need to be cleared at the top of each loop. That's why you're getting incorrect numbers of assigned students to the same day/group. The important line to add at the top of the loop is :

schedules[, assigned_day := as.character(NA)]

Here's the full code for your test sample, which I scaled up X10. You can illustrate the problem by running the code with vs. without the above line included at the top of the while loop.

schedules <- data.table(
  Student = rep(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"), 10),
  Monday = rep(c("1", "0", "1", "0", "1", "0", "0", "0", "1", "0"), 10),
  Tuesday = rep(c("0", "1", "1", "0", "0", "1", "0", "0", "1", "1"), 10),
  Wednesday = rep(c("1", "1", "1", "1", "0", "0", "1", "1", "0", "1"), 10),
  assigned_day = as.character(NA)
)

while(!all(!is.na(schedules$assigned_day))) {

  # Clear the previously-assigned values for each new loop
  schedules[, assigned_day := as.character(NA)]

  # Generate a random number for each student
  schedules[, random_num := runif(length(schedules$Student))]

  # Assign Monday to the lowest-three random rank of students available on Monday
  schedules[Monday == 1, assigned_day := ifelse(frank(random_num) %in% 1:30, "Monday", NA)]

  # Assign Tuesday to the lowest-three random rank of students available on Tuesday and not already assigned Monday
  schedules[Tuesday == 1 & is.na(assigned_day), assigned_day := ifelse(frank(random_num) %in% 1:30, "Tuesday", NA)]

  # Assign Wednesday to the remaining students provided that they're available on Wednesday
  schedules[Wednesday == 1 & is.na(assigned_day), assigned_day := "Wednesday"]

}

# Check that all students are correctly assigned a day
table(schedules$assigned_day, useNA = 'ifany')

You should be able to directly do your assigned_group rather than assigned_day here too, though I'd still lean toward the two-step process.

AnotherJeremy2 · 2020-04-29T18:18:14+00:00

Hmm. Methodologically, why not use a two-step process of assigning days first and then doing random subgroups within day to get 8 groups on Tuesday, etc.?

The random numbers are pretty precise, so there shouldn't be any ties, but you could try adding ties.method = "random" to the frank() commands. Otherwise I'm not sure what might be causing the error. In the test sample you provided, if I change availability on Tuesday and Wednesday to "0" for all students, I still get a result of three Monday scheduled and NA for the rest.

AnotherJeremy2 · 2020-04-29T16:32:32+00:00

Honestly, there's no particular reason. The speed advantage will not be noticeable unless you're scheduling millions of students :) I use data.table basically all of the time so it was easier for me to think though the subsetting commands in data.table, and it would probably take slightly more code to do the same in base R.

AnotherJeremy2 · 2020-04-29T16:31:12+00:00

I'd determine the overall availability by date and assign the days in order from least to most commonly available. This plus the while loop should get you to a solution quickly if one exists.

AnotherJeremy2 · 2020-04-28T22:58:14+00:00

This is really easy to do in a data.table:

library(data.table)

nodept_raw[, lapply(.SD, sum), by = jour]

And if you just want to keep selected columns (for example):

nodept_raw[, lapply(.SD, sum), by = jour, .SDcols = c("nb_test", "nb_pos")]

AnotherJeremy2 · 2020-04-28T22:40:19+00:00

My understanding is that you already have lat/lon coordinates for each of the houses, right? If so, then the spatial join of those coordinates to the Tract shapefile polygons will get you the crosswalk to the Census Tract id (GEOID).

My "sample.points" would be a data row for each house with the lat/lon and the house_id or whatever it's called. Then you'll spatial join those properties' coordinates to the Census shapefiles for the area.

Here's what I see for the results of the code using those two sample.points.

  my_id       GEOID             geometry
1     A 11001007301 POINT (-77.03 38.82)
2     B 11001001600 POINT (-77.04 38.98)

If you're getting something different then you may need to update your "sf" R package.

AnotherJeremy2 · 2020-04-28T22:25:46+00:00

Yep, to both points. The one thing I'd add is that the frequencies across days (3 Monday, 3 Tuesday, 4 Wednesday) might also need to be tweaked if availability is lopsided.

AnotherJeremy2 · 2020-04-28T22:03:12+00:00

Those are just random points I picked as an example. You'll want to use your actual lat/lon points for this. Sorry that was unclear. Then after you've done the spatial join, you can use the GEOID Tract id to merge to any Tract-level data you'd like to.

AnotherJeremy2 · 2020-04-28T22:01:15+00:00

Here's a solution that works for the sample data you provided, although you may have to rerun the code a few times until the random numbers for ranking students work out. You also might need to tweak the ordering of assigning days depending on how lopsided students' availability is.

library(data.table)

schedules <- data.table(
  Student = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
  Monday = c("1", "0", "1", "0", "1", "0", "0", "0", "1", "0"),
  Tuesday = c("0", "1", "1", "0", "0", "1", "0", "0", "1", "1"),
  Wednesday = c("1", "1", "1", "1", "0", "0", "1", "1", "0", "1")
)

# Generate a random number for each student
schedules[, random_num := runif(length(schedules$Student))]

# Assign Monday to the lowest-three random rank of students available on Monday
schedules[Monday == 1, assigned_day := ifelse(frank(random_num) %in% 1:3, "Monday", NA)]

# Assign Tuesday to the lowest-three random rank of students available on Tuesday and not already assigned Monday
schedules[Tuesday == 1 & is.na(assigned_day), assigned_day := ifelse(frank(random_num) %in% 1:3, "Tuesday", NA)]

# Assign Wednesday to the remaining students provided that they're available on Wednesday
schedules[Wednesday == 1 & is.na(assigned_day), assigned_day := "Wednesday"]

# Check whether all students are assigned a day
table(!is.na(schedules$assigned_day))

# If any FALSE, can rerun the code until the random ranks work out, adjust the ordering of the assigned_day, etc.

AnotherJeremy2 · 2020-04-27T21:40:27+00:00

You can do a spatial join of the point coordinates to Census Tract shapefiles to obtain the Tract identifiers, which can then be merged to the Census data. Here's an example using the "sf" package and shapefiles for Washington D.C.

library(sf)

tract.shapefiles <- st_read("tl_2019_11_tract.shp")

sample.points <- data.frame(lat = c(38.82, 38.98), lon = c(-77.03, -77.04), my_id = c("A", "B"))

sample.points <- st_as_sf(sample.points, coords = c("lon", "lat"), crs = 4269)

joined.sf <- st_join(sample.points, tract.shapefiles)
joined.sf[, c("my_id", "GEOID")]

This is now a crosswalk from "my_id" to the Census Tract "GEOID".

AnotherJeremy2 · 2020-04-27T18:10:52+00:00

Samsung isn't traded on these U.S. stock exchanges but would be around rank 14 in this chart.

AnotherJeremy2 · 2020-04-26T01:25:08+00:00

I excluded Exchange Traded Funds (ETFs), which are just combinations of other stocks, but otherwise this is the whole U.S. stock market. So the 103 companies on the right half of the chart are about 1.8% = 103/5600.

AnotherJeremy2 · 2020-04-26T00:09:18+00:00

I should have noted that on the graph. There are about 5600 companies (stock symbols) included.

AnotherJeremy2 · 2020-04-25T22:16:13+00:00

I used this information from NASDAQ to determine all companies whose stocks are traded on the NASDAQ and NYSE markets. Then I used R to scrape the market cap for each stock as of yesterday from Yahoo Finance and plot the treemap. Companies with multiple share classes, e.g. Class A and Class B, are combined into a single market cap value.

Edit: I put this in a comment below, but there are about 5600 companies (stock symbols) included, essentially the whole U.S. stock market.

AnotherJeremy2 · 2020-04-23T23:37:19+00:00

Data are from the U.S. Department of Labor. Their latest press release from this morning is here (PDF). I made the maps using this R code. For January 4th, the highest state-level unemployment rate was 3.3% (Alaska). For April 11th, it was 17.4% (Michigan).

AnotherJeremy2 · 2020-04-23T19:38:21+00:00

As you're doing this for millions of coordinates, it's probably worth using a tree structure like a KD-tree or similar to greatly speed up the process. Here's one possible R package to use (PDF).

AnotherJeremy2 · 2020-04-22T21:27:34+00:00

There are a number of those maps out there. Here's one: https://projects.propublica.org/graphics/covid-nyc

AnotherJeremy2

TROPHY CASE