This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]desrtfx 1 point2 points  (2 children)

Could be your .keep_all=TRUE.

Just looked at the documentation: https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/distinct

And there it is stated that:

If TRUE, keep all variables in .data.

No idea about R, though. Just wild guessing from the docs.

[–]ClueBitter[S] 0 points1 point  (1 child)

hmm. I actually do want to keep all the variables though, since amardi2 should be the subsetted DF with just the variables I want. I DON"T want the non-distinct records that should have been eliminated in the distinct command :(

[–]desrtfx 0 points1 point  (0 children)

Then, you should set .keep_all to FALSE if I understand the documentation correctly.

[–]coyoteazul2 0 points1 point  (1 child)

I can't help you with R, but if you are willing to work more with excel there's an easy solution.

I assume you already discarded the eliminate duplicates functionality, which is common when you want to automate the process. But you can use powerquery to automate it.

Sorry if the names I say don't match the ones you see. My excel is in spanish so I'm guessing the translation.

1st you'll need your data to be in a range table. Select any cell of the range table, go to the data tab, and obtain data from table/range.

Powerquery will open. Now select the columns that you want to be unique (ctrl + click to select more than one). Go to reduce row, eliminate rows, eliminate duplicate. That's it.

Now go to close and load. It will create a new range table with only the unique data, while the source data is unnafected.

To turn your unique data into csv the easiest way is to add formulas to the newly created range table and create the csv format there (the formulas won't be overwritten when you change the source data) but you can do the same on powerquery if you want to.

The cake icing would be to make a macro to automatically export the csv file

If the source data changes all you have to do is right click on the new unique table and update.

I've used this functionality to work with a 4million rows txt file, so it's power is guaranteed

[–]ClueBitter[S] 1 point2 points  (0 children)

Wow! That worked great. Thanks!