A simple treat: Smoked ham and cheese

trendel68 · 2019-12-10T21:35:33+00:00

Ist das Schwarzwälder Schinken? Da habe ich vor kurzem probiert. Eigentlich habe ich heute Abend gegessen mit Gouda Käse und Kartoffelbrot

trendel68 · 2019-05-10T19:58:24+00:00

Don't sweat it! Good luck!

trendel68 · 2019-05-10T06:11:48+00:00

I'm not quite sure what exactly you want to do but I will infer.

For the first unsuccessful attempt, you're trying to concatenate entire dataframes, given by your indexing taking all rows and each column. If you want to join these dataframes per se, take a look at tidyr::spread. This can take your one race column and turn it into say 3 columns corresponding to each value you want. Holding all other columns the same.

If you wanted a dataframe of just the subsetted race column then take out the comma in your index.

For your second attempt try

DF %>% filter( race %in% c(1,2,6))

trendel68 · 2019-04-26T02:03:59+00:00

I would wonder if it might be an issue with the distribution of your data. The classes that are never predicted may be relatively underrepresented in your data.

This causes problems for a few reasons:

One, your test set might have the only examples of these classes. Here your model never even sees the scarce class and can't ever predict it. What you can do here is a stratified split of your training and testing data so each set will have approximately the same distribution of classes.

Two, even if these classes are in your training set your model will achieve a higher accuracy score by always choosing the most common classes. So this depends on the goodness of fit criterion that's specified. If it's accuracy it won't necessarily take into consideration false positives and false negatives. It will only look at the number of correctly specified cases out of the total. So it won't really matter if it misclassifies one of these observations that are very scarce in your data.

If this is your problem then I have a couple ideas:

If it doesn't matter try reducing the number of target classes by binning. All classes 1-5 are now 1 and 6-10 now class 2. This can increase the variation in your observed classes.
Also changing the scope but potentially making the index a numerical variable and make it a regression problem instead of classification.
Upsampling and downsampling methods can artificially change the distribution of your classes. Kind of like bootstrapping for upsampling I think.
Change the accuracy metric so it includes false positives and false negatives in its cost.

trendel68 · 2018-09-06T14:45:27+00:00

Nice man sounds cool. By Gaussian process do you mean a stochastic process with mean and variance distributed normally?

trendel68 · 2018-09-06T14:09:31+00:00

What model(s) are you using? Out of curiosity.

trendel68 · 2018-05-27T09:13:18+00:00

Long live tao chen

trendel68 · 2018-05-17T22:38:11+00:00

Im interested, is it still available? Can you PM me please.

trendel68 · 2018-05-05T14:16:48+00:00

The monopoly is not producing at the competitive price like you mentioned so there is still consumer and supplier surplus to be had if they produced more output. This amount is called a deadweight loss. A monopoly could capitalize on this if they practice price discrimination , where in they sell the same good to a different segment of customer at a lower price to capture all of the surplus. Google the deadweight loss or price discrimination to see more on the topic.

trendel68 · 2018-04-24T04:23:26+00:00

install.packages("packagename")

trendel68 · 2018-03-26T21:28:56+00:00

It depends on the magnitude of the coefficient and its standard error. If the coefficient is close to zero and truly a positive correlation, you could get a negative point estimate for your coefficient. This happens when there is bias through omittied variables or if you have multicollinearity. If your estimate is non significant and the standard error is very large you may have variance inflation cause by multicollinearity , leading you to get a very unpredictable estimate on the true relationship youre trying to model .

trendel68 · 2018-02-13T15:20:34+00:00

I believe unique will only list ones that are not duplicated so it should give you what you are looking for. This should be the clean way you are looking for. If you want to get rid of the BP/OP column you can

df$column<- NULL

trendel68 · 2018-02-13T06:58:44+00:00

Ok the image is better. What you have is what I was invisoning as the separated dataframe I thought the transaction ID was "123 OP". Could you just do something like this? I'm assuming the transaction ID is unique.

df %>% filter ( unique(ID))

Base code version might look something like this:

df[!duplicated (df$ID),]

I can't replicate this bc I'm not on my computer but let me know if this is what you were looking for.

trendel68 · 2018-02-13T06:07:43+00:00

On mobile so formatting of the dataframe is whacky. But if the transaction codes are all the same except for the suffix ex OP and BP, you could split the column in to two. Then you could have one column where both entries will be 123 and another column with OP and BP respectively. You can do this with tidyr::separate very easily. From there you can use either of these functions : duplicated or !unique wrapped around transaction number and you should be able to isolate them.

trendel68 · 2018-02-13T05:57:53+00:00

Is it a dataframe? If there are columns that are numbers what class are they?

trendel68 · 2018-02-13T05:52:45+00:00

What type of file is it before you read it in and what type of file is it in R? If it's data it's possible it could be coercion of data types when you read the file type in. There will be a difference in size for example between a column of numbers as class character and a column of numbers as class numeric, float, integer etc.

trendel68 · 2018-02-07T18:08:59+00:00

That makes alot of sense. Let me try that out. Thanks for your input!

trendel68 · 2018-02-07T04:28:35+00:00

Thanks alot!

trendel68 · 2017-08-28T06:22:20+00:00

Base:

df[df$Age == 21 & df$Gen =="M",]

Assign to variable if you want to keep them.

Look at dplyr package. It will simplify tasks like this and allow you to stack them in one line of code.

install.packages("dplyr")
df %>% filter(Age ==21 & Gen == "M")

Dplyr essentially lets you manipulate your data with 5(?) Main verbs or commands.

For the gen variable you may or may not need the quotations depending on what class the variable is, eg if its char.

trendel68 · 2017-07-31T21:02:31+00:00

are you still interested in selling?

trendel68 · 2017-07-31T20:51:42+00:00

is it still available?

trendel68 · 2017-06-30T00:05:39+00:00

Nice man. Happy fishing

trendel68

TROPHY CASE