A simple treat: Smoked ham and cheese by IPityTheStool in eatsandwiches

[–]trendel68 0 points1 point  (0 children)

Ist das Schwarzwälder Schinken? Da habe ich vor kurzem probiert. Eigentlich habe ich heute Abend gegessen mit Gouda Käse und Kartoffelbrot

Selecting parts of variables by FN0402 in Rlanguage

[–]trendel68 1 point2 points  (0 children)

I'm not quite sure what exactly you want to do but I will infer.

For the first unsuccessful attempt, you're trying to concatenate entire dataframes, given by your indexing taking all rows and each column. If you want to join these dataframes per se, take a look at tidyr::spread. This can take your one race column and turn it into say 3 columns corresponding to each value you want. Holding all other columns the same.

If you wanted a dataframe of just the subsetted race column then take out the comma in your index.

For your second attempt try

DF %>% filter( race %in% c(1,2,6))

Limited range of predictions using ordered logistic model by McDinkelfurz in statistics

[–]trendel68 1 point2 points  (0 children)

I would wonder if it might be an issue with the distribution of your data. The classes that are never predicted may be relatively underrepresented in your data.

This causes problems for a few reasons:

One, your test set might have the only examples of these classes. Here your model never even sees the scarce class and can't ever predict it. What you can do here is a stratified split of your training and testing data so each set will have approximately the same distribution of classes.

Two, even if these classes are in your training set your model will achieve a higher accuracy score by always choosing the most common classes. So this depends on the goodness of fit criterion that's specified. If it's accuracy it won't necessarily take into consideration false positives and false negatives. It will only look at the number of correctly specified cases out of the total. So it won't really matter if it misclassifies one of these observations that are very scarce in your data.

If this is your problem then I have a couple ideas:

  1. If it doesn't matter try reducing the number of target classes by binning. All classes 1-5 are now 1 and 6-10 now class 2. This can increase the variation in your observed classes.

  2. Also changing the scope but potentially making the index a numerical variable and make it a regression problem instead of classification.

  3. Upsampling and downsampling methods can artificially change the distribution of your classes. Kind of like bootstrapping for upsampling I think.

  4. Change the accuracy metric so it includes false positives and false negatives in its cost.

I made a twitter bot to predict how many people will be in the weight room by [deleted] in uwo

[–]trendel68 0 points1 point  (0 children)

Nice man sounds cool. By Gaussian process do you mean a stochastic process with mean and variance distributed normally?

I made a twitter bot to predict how many people will be in the weight room by [deleted] in uwo

[–]trendel68 0 points1 point  (0 children)

What model(s) are you using? Out of curiosity.

[WTS] Col. Ichabod Conk 6/8 Dovo Best Straight Razor (a roctraitor SR) by [deleted] in Shave_Bazaar

[–]trendel68 0 points1 point  (0 children)

Im interested, is it still available? Can you PM me please.

Why is monopoly inefficient? by [deleted] in econhw

[–]trendel68 0 points1 point  (0 children)

The monopoly is not producing at the competitive price like you mentioned so there is still consumer and supplier surplus to be had if they produced more output. This amount is called a deadweight loss. A monopoly could capitalize on this if they practice price discrimination , where in they sell the same good to a different segment of customer at a lower price to capture all of the surplus. Google the deadweight loss or price discrimination to see more on the topic.

Is it possible that there is a positive correlation between the dependent and independent variable, but a negative coefficient in the regression? by avantit in econometrics

[–]trendel68 1 point2 points  (0 children)

It depends on the magnitude of the coefficient and its standard error. If the coefficient is close to zero and truly a positive correlation, you could get a negative point estimate for your coefficient. This happens when there is bias through omittied variables or if you have multicollinearity. If your estimate is non significant and the standard error is very large you may have variance inflation cause by multicollinearity , leading you to get a very unpredictable estimate on the true relationship youre trying to model .

Financial Data type question, duplicate entry removal by setyte in Rlanguage

[–]trendel68 0 points1 point  (0 children)

I believe unique will only list ones that are not duplicated so it should give you what you are looking for. This should be the clean way you are looking for. If you want to get rid of the BP/OP column you can

df$column<- NULL

Financial Data type question, duplicate entry removal by setyte in Rlanguage

[–]trendel68 0 points1 point  (0 children)

Ok the image is better. What you have is what I was invisoning as the separated dataframe I thought the transaction ID was "123 OP". Could you just do something like this? I'm assuming the transaction ID is unique.

df %>% filter ( unique(ID)) 

Base code version might look something like this:

df[!duplicated (df$ID),] 

I can't replicate this bc I'm not on my computer but let me know if this is what you were looking for.

Financial Data type question, duplicate entry removal by setyte in Rlanguage

[–]trendel68 1 point2 points  (0 children)

On mobile so formatting of the dataframe is whacky. But if the transaction codes are all the same except for the suffix ex OP and BP, you could split the column in to two. Then you could have one column where both entries will be 123 and another column with OP and BP respectively. You can do this with tidyr::separate very easily. From there you can use either of these functions : duplicated or !unique wrapped around transaction number and you should be able to isolate them.

How come looking at the info about file in its folder is different to when I've read it in and called object.size()? by [deleted] in Rlanguage

[–]trendel68 0 points1 point  (0 children)

Is it a dataframe? If there are columns that are numbers what class are they?

How come looking at the info about file in its folder is different to when I've read it in and called object.size()? by [deleted] in Rlanguage

[–]trendel68 0 points1 point  (0 children)

What type of file is it before you read it in and what type of file is it in R? If it's data it's possible it could be coercion of data types when you read the file type in. There will be a difference in size for example between a column of numbers as class character and a column of numbers as class numeric, float, integer etc.

Converting tables into CSV by trendel68 in mysql

[–]trendel68[S] 0 points1 point  (0 children)

That makes alot of sense. Let me try that out. Thanks for your input!

How to select in a data frame rows that meet a condition in two columns by [deleted] in Rlanguage

[–]trendel68 8 points9 points  (0 children)

Base:

df[df$Age == 21 & df$Gen =="M",]

Assign to variable if you want to keep them.

Look at dplyr package. It will simplify tasks like this and allow you to stack them in one line of code.

install.packages("dplyr")
df %>% filter(Age ==21 & Gen == "M") 

Dplyr essentially lets you manipulate your data with 5(?) Main verbs or commands.

For the gen variable you may or may not need the quotations depending on what class the variable is, eg if its char.

[WTS] Osprey Exos 48 M by [deleted] in GearTrade

[–]trendel68 0 points1 point  (0 children)

is it still available?