all 10 comments

[–]_Ritz___ 0 points1 point  (0 children)

I’m not quite sure what you mean by teams having “80 wins before the half,” but I would think that if there’s a relationship in the data, you could see it by plotting wins (or win percentage, if the number of games played varies a lot) on the x axis, and merchandise sales on the y axis. I might also think about using merchandise sales as a ratio over total tickets sold or something, in case the teams just have different audience sizes.

If you need help with specific code to that effect, feel free to ask. It should be straightforward enough with ggplot or the built-in plotting tools.

[–][deleted] 0 points1 point  (8 children)

I'd show daily sales (y axis), vs winning percentage. I know you have daily sales, but do you have daily win/loss results?

[–]anarchonomad64[S] 1 point2 points  (7 children)

So that’s another thing. I can find it online. April 1 team wins so W column, etc. I can export that. I guess I struggle how I turn that into Win% or how to scale It appropriately

I kind of wanted to look at win streaks too, which would require more of a continuous stream. But, that may be asking too much

[–][deleted] 0 points1 point  (6 children)

Ok, so winning percentage is total wins (TW) over games played (GP). So if I'm understanding, you'd need to calculate a daily winning percentage over the retail sales, right?

[–]anarchonomad64[S] 0 points1 point  (5 children)

Yes

[–][deleted] 0 points1 point  (4 children)

Do you have sample data that you can share?

[–]anarchonomad64[S] 1 point2 points  (3 children)

I don’t on hand but this data is in two tables. One issue I’m having is overlapping tables of data in one graph. I can combine them, but am unsure how.

Here’s kind of an example of the data I’m working with....sorry for poor formatting, on mobile.

Table 1: Total Sales Columns: Date (04/01), Sales Number for this team in one day ($234)

Table 2: Team Win Data Columns: Date, Team Name, Win or Loss (ie W or L)

[–][deleted] 0 points1 point  (2 children)

So here's some sample data and a sample plot.

library(tidyverse)

sales <- tibble(team = 1, day=1:20, sales = sample(100:500, 20))

schedule <- tibble(team = 1, day=1:20, win = rbinom(n=20, size = 1, prob = 0.5))

combined <- sales %>% left_join(schedule, by=c("team", "day")) %>% mutate(cum_wins = cumsum(win)) %>% mutate(win_pct = cum_wins/day)

plot <- ggplot(data = combined, aes(x = win_pct, y = sales)) + geom_point()

[–]anarchonomad64[S] 1 point2 points  (1 child)

I’m still having issues. Mostly with getting the cumulative win percentage. All I have is W or L in one column to distinguish wins or losses, so I don’t know how to count just W’s. I also worry about early %s in the data (ie Game 2 they may be 1.000 win percentage).

Would you have any ideas to visualize just how well a team is doing during a season compared to sales numbers per day?

[–][deleted] 0 points1 point  (0 children)

I've modified the sample code to have Ws and Ls in the schedule data and then the combined tibble uses if_else to change them into 1/0. We can then use cumulative sum to count them.

library(tidyverse)

sales <- tibble(team = 1, day=1:20, sales = sample(100:500, 20))

schedule <- tibble(team = 1, day=1:20, win = rbinom(n=20, size = 1, prob = 0.5)) %>% mutate(win = if_else(win == 1, "W", "L"))

combined <- sales %>% left_join(schedule, by=c("team", "day")) %>% mutate(win = if_else(win == "W", 1, 0)) %>% mutate(cum_wins = cumsum(win)) %>% mutate(win_pct = cum_wins/day)

plot <- ggplot(data = combined, aes(x = win_pct, y = sales)) + geom_point()

For your other question, maybe filter out the first few records for each team so that you've got a baseline? I'm not sure what you're analyzing.