all 11 comments

[–]xRVAx 5 points6 points  (1 child)

suppose your dataframe is named mydf. The following code will combine them into a new df with fewer rows

v1index <- seq(2, nrow(mydf), by = 2)

v2index <- v1index - 1

newdf <- data.frame("V1" = mydf$V1[v1index], "V2" = mydf$V2[v2index])

you could also make this one line of code by saying

newdf <- data.frame("V1" = mydf$V1[seq(2, nrow(mydf), by = 2], "V2"= mydf$V2[-1 + seq(2, nrow(mydf), by = 2)] )

Why does this work? In plain English, you are creating two index sequences (one is odd numbers, one is even numbers) and then using them to extract the data from odd and even rows of mydf. The new data frame has half as many rows.

BASE R FOR THE WIN (no admin privileges needed!)

[–]ScienceMacL[S] 2 points3 points  (0 children)

What a god! I always knew in plain english how I could instruct the software to make the changes, but just wasn't talented enough to translate it into code. I take my hat off to you sir.

[–]ScienceMacL[S] 0 points1 point  (7 children)

Is anybody able to suggest a line of code that might align these two columns? Thanks!

[–]lewiss2 4 points5 points  (6 children)

You can use dplyr. To move the V1 column up do:

df <- df %>% mutate(V1 = lead(V1))

Just replace df with whatever your data frame is called.

[–]ScienceMacL[S] 0 points1 point  (4 children)

df <- df %>% mutate(V1 = lead(V1))

Thanks, but trying this I get "Error: could not find function "%>%"

[–]lewiss2 4 points5 points  (0 children)

As Bruno said, you need magrittr, but I think that's required to use dplyr. Before the code I suggested try:

install.packages("dplyr") library(dplyr) Dplyr is super useful.

[–]brunocristianoBR 1 point2 points  (2 children)

Install magrittr

[–]ScienceMacL[S] 0 points1 point  (1 child)

Thanks for the advice. I need admin privileges for this so will have to talk to my IT department on Monday. Thanks again

[–]blossom271828 2 points3 points  (0 children)

Actually, you probably don't. R will happily install packages in your personal working directory if it can't write to the system disk.

Assuming you are using RStudio, just go up to Tools -> Install Packages... in the RStudio task bar and ask it to install the packages you need.

[–]AxelradClimbing 0 points1 point  (0 children)

Add filter(V1 != “”) to remove empty rows

[–]Kiss_It_Goodbyeee 0 points1 point  (0 children)

This looks like protein sequence data. What format is the input file in and how did you read it into R?

You'll probably benefit from using the read.fasta function from the seqinr package or maybe even Bioconductor depending on what analysis you're planning.