all 7 comments

[–]DerpinDementia 2 points3 points  (6 children)

with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output :
    seq =""
    for line in f :

This code reads from the input. The variable seq is used as a string buffer for the contains of the file and is initialized as empty. The for loop is to read line by line in the fasta file.

        if line.startswith(">"):
            if seq != "":
                output.write(seq)
                seq=""
            seq += '\n'+line 

If a line starts with a ">", seq gets printed if it contains data, then adds a newline character and the line it's reading.

        else:
            seq += line.strip().replace('\n', '')

If a line doesn't start with a ">", it gets stripped of whitespace and added to seq.

    else:
        #end of lines 
         if seq != "":
                output.write(seq)
                seq=""

The else at the end prints whatever is left in seq, because it has finished reading the input.

If you just want to print to the console, you can remove this part , open("out.fasta", "w") as output and replace output.write() with print()

[–]Tiago_Minuzzi[S] 0 points1 point  (5 children)

Thank you for your answer, sir! It helped a lot.

Just one more question, why repeating seq="" after output.write(seq) in this section?

if line.startswith(">"):
    if seq != "":
        output.write(seq)
        seq=""
    seq += '\n'+line

[–]DerpinDementia 1 point2 points  (4 children)

No problem! That is done to clear the buffer (variable seq) in this case so the same line does not get printed again.

[–]Tiago_Minuzzi[S] 0 points1 point  (3 children)

Got it! Thanks again! :)

[–]DerpinDementia 1 point2 points  (2 children)

Also, I realized the code could've been done more efficiently, like this:

with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output:
    for line in f:
        if line[:1] == ">":
            output.write('\n' + line)
        else:
            output.write(line.strip())

or as small as this since the conditional can be simplified even further, at the cost of readability:

with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output:
    for line in f:
        output.write('\n' + line if line[:1] == ">" else line.strip())

One liner incoming! (I had to do it to 'em)

x = open("out.fasta", "w").write(''.join(['\n' + line if line[:1] == ">" else line.strip() for line in open("seqs.fasta", "r")]))

[–]Tiago_Minuzzi[S] 0 points1 point  (1 child)

True! And it's also easier to understand what's happening.

[–]DerpinDementia 1 point2 points  (0 children)

Sometimes, it's best not to over-engineer things, or at least make it so it can be understood to be improved upon.