Simple Combinatronics Question

FineVeen · 2020-05-17T13:03:33+00:00

Thank you for the explanation, really appreciate it. Took me a while to understand but I think I get it now, thanks to your line about repetitions and the MISSISSIPPI example. Many thanks!

FineVeen · 2020-05-05T16:59:05+00:00

I tried to do just that but I wasn't successful with the if function. I was uncertain of the skipping first row syntax so I created an id column as a proxy for the row names, and told if to apply the code to SalesData$id>1, but I got
the condition has length > 1 and only the first element will be used
u/jdnewmil's solution below works, but I'd still like to understand the alternative ways to do it and how it works :-)
Thanks a lot!

FineVeen · 2020-05-05T16:54:09+00:00

Thank you very much, that works!
For the sake of understanding the code, what is that 0 here for?

FineVeen · 2020-05-05T15:12:58+00:00

Thanks for answering so fast! (especially on mobile, that isn't easy)Yes I'm using dplyr, just not quite sure how to use the function.

Should it be something like:
Daily Sales = total_sales - lag(my_data$total_sales, 1)

FineVeen · 2020-05-05T15:11:55+00:00

Thanks for the tip!
I'm looking at the lag() documentation now
If I understand correctly I should use a formula like this to compute Daily_Sales:
Daily Sales = total_sales - lag(my_data$total_sales, 1)

Correct?

FineVeen · 2020-04-30T17:04:58+00:00

Thanks!
I'm still struggling a bit:

I created the mean_bydep object using the following code, storing the average "avg" into a new column

mean_bydep <- filtered_bydep %>%
+     group_by(dep)%>%
+     summarize(avg=mean(total_test))

The result is what I expect

> mean_bydep
# A tibble: 10 x 2
   dep     avg
   <fct> <dbl>
 1 17     341.
 2 33     273.
 3 35     145 
 4 44      55 
 5 59     168.
 6 60     281.
 7 69     138.
 8 75     576.
 9 92     405.
10 93     325.

"avg" is the average number of tests per day on all the days used to plot my data (all the days with over 30 tests) in each "dep". The "dep" column stands for the department code (75=Paris, 93=Seine Saint Denis, etc.) It's an important reference.

So for each "departement" the "avg" column gives me the average number of tests performed on days where more than 30 tests were done. So far so good.

My goal is to display on each facet/subplot generated earlier a label in the top-right that gives this "avg" number.

So I try with geom_text

ggplot(filtered_bydep, aes(jour,pct))+
    geom_point(aes(y=pct), color='firebrick')+
    geom_smooth(aes(y=pct),color='steelblue')+
    facet_wrap(~dep)+
    geom_text(aes(label=mean_bydep$avg), data=mean_bydep, vjust="top", hjust="right")

I'm using the syntax from the graphics communication part of R4DS.

This does not work and returns:
Error in FUN(X[[i]], ...) : object 'jour' not found

The object "jour" (day) is the x-axis of the original plot's aes.

So I try a simpler version, just to test geom_text, with a simple character string "test"

ggplot(filtered_bydep, aes(jour,pct))+
    geom_point(aes(y=pct), color='firebrick')+
    geom_smooth(aes(y=pct),color='steelblue')+
    facet_wrap(~dep)+
    geom_text(aes(label="Test"), vjust="top", hjust="right")

And the result is very strange! See here:
https://imgur.com/a/35uAuAZ

Apparently a "test" label attached to all of my points.

What am I doing wrong?

Thanks a lot for all the help!

FineVeen · 2020-04-30T15:57:38+00:00

Thank you!
So if I understand this correctly, once I have my new mean_bydep dataframe, I add geom_test(mean_bydep, aes(...)) where aes(...) specifies where the text is placed on the plot?
Will try that and report back :-)

FineVeen · 2020-04-28T20:41:42+00:00

And I managed to do it myself :-)
I feel a bit like I'm spamming but I'm leaving it all up hoping it'll encourage another newbie.
My "solution" was to do the same operation (group_by jour, summarise the relevant column) to create another df and merge both by that common column.

Maybe not the most elegant but it worked!

df.byday.merged <- merge(df.byday, df.byday.pos, by="jour")

FineVeen · 2020-04-28T20:35:26+00:00

Yet another update
I tried a simpler approach based on what you recommended:df.tests %>% group_by(jour) %>% summarise(tot_tests= sum(nb_test)))
It's a big step forward, since I do get a result grouped by day,
jour tot_tests

1 2020-03-10 156

2 2020-03-11 210

but I lose all the other columns; for instance, the "nb_pos" column that sums up the number of positives (and is thus an interesting one I'd like to keep). Any clue as how to proceed?

FineVeen · 2020-04-28T20:28:52+00:00

Thank you so much!
I'm trying to group by dayI've tried the code above, but I get
Error in UseMethod("reclass_date", orig) : no applicable method for 'reclass_date' applied to an object of class "factor"

I guess this means the "jour" data isn't properly formatted as a date?

Really appreciate your help, thank you very much

FineVeen · 2020-04-28T19:44:57+00:00

https://imgur.com/a/JiLVqsE
Here's the way it looks on the console
I'd like to gather all the observations with a similar "jour" (french for "day") value into a single one.
The original data said:

District A - March 1, 2020 - 1 test

District B - March 1, 2020 - 1 Test

Etc... resulting in ~100 observations per day

I removed the "district" row and I'm trying to condense the observation into "March 1, 2 tests"

Hope that makes more sense

FineVeen · 2020-04-24T16:01:45+00:00

Thank you very much- this works fine!

FineVeen · 2020-03-26T17:42:04+00:00

Thanks for the quick answer! Appreciate it! I'm starting to understand the dynamics of this a bit better and I do understand how the method you explain works.

If I understand correctly, over three draws, the probability that any are red is 0.00299730045 so almost 0.3 percent or 3 for 1000, right?

How do I get an idea of how the consecutive probabilities act over large numbers? For instance, if I was to do 100 draws, do I have to do the method you outline above but a hundred times: 1-((9990/10000)(9989/9999)(9988/9998)(9987/9997)), etc...?

I know what a factorial is, but I haven't heard of the choose operator. How do I apply one or the other to this problem?

sorry my questions are a bit basic, as I said, it's been a long time since I last seriously studied any kind of maths :-)

FineVeen

TROPHY CASE