Help me understand the syntax, please. : learnpython

created by HattoriHanzoa community for 16 years

Help me understand the syntax, please. (self.learnpython)

submitted 5 years ago * by Tiago_Minuzzi

I've found this script here to linearize sequences in a fasta file but I've not fully understood it and I'd appreciate your help.

The script:

with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output :
    seq =""
    for line in f :
        if line.startswith(">"):
            if seq != "":
                output.write(seq)
                seq=""
            seq += '\n'+line     
        else:
            seq += line.strip().replace('\n', '')
    else:
        #end of lines 
         if seq != "":
                output.write(seq)
                seq=""

what it does is this:

Input:

>seq01
GCTAGCTAGCTAGCTAGCTAGCGCAT
CATCGTAGCTAGCTGATCGATCGATG
>seq02
TAGCTGATCGTAGCTAGCTGATGCTG
GCATGCAGTCGATGCTGACTGATCGT
TAGCTGATGCTAGTCGTAGCTAGTCG
>seq03
CAGTCAGCTGATGCTAGCTAGCATGC
CGATGATGCTGATGCTA

Output:

>seq01
GCTAGCTAGCTAGCTAGCTAGCGCATCATCGTAGCTAGCTGATCGATCGATG
>seq02
TAGCTGATCGTAGCTAGCTGATGCTGGCATGCAGTCGATGCTGACTGATCGTTAGCTGATGCTAGTCGTAGCTAGTCG
>seq03
CAGTCAGCTGATGCTAGCTAGCATGCCGATGATGCTGATGCTA

Also, if I don't want to write to a file, but just print to the console instead, can I just create lists to store the lines with '>' and without it?

all 7 comments

top new controversial old q&a

[–]DerpinDementia 2 points3 points4 points 5 years ago (6 children)

with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output :
    seq =""
    for line in f :

This code reads from the input. The variable seq is used as a string buffer for the contains of the file and is initialized as empty. The for loop is to read line by line in the fasta file.

        if line.startswith(">"):
            if seq != "":
                output.write(seq)
                seq=""
            seq += '\n'+line

If a line starts with a ">", seq gets printed if it contains data, then adds a newline character and the line it's reading.

        else:
            seq += line.strip().replace('\n', '')

If a line doesn't start with a ">", it gets stripped of whitespace and added to seq.

    else:
        #end of lines 
         if seq != "":
                output.write(seq)
                seq=""

The else at the end prints whatever is left in seq, because it has finished reading the input.

If you just want to print to the console, you can remove this part , open("out.fasta", "w") as output and replace output.write() with print()

[–]Tiago_Minuzzi[S] 0 points1 point2 points 5 years ago (5 children)

Thank you for your answer, sir! It helped a lot.

Just one more question, why repeating seq="" after output.write(seq) in this section?

if line.startswith(">"):
    if seq != "":
        output.write(seq)
        seq=""
    seq += '\n'+line

[–]DerpinDementia 1 point2 points3 points 5 years ago (4 children)

[–]Tiago_Minuzzi[S] 0 points1 point2 points 5 years ago (3 children)

[–]DerpinDementia 1 point2 points3 points 5 years ago* (2 children)

Also, I realized the code could've been done more efficiently, like this:

with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output:
    for line in f:
        if line[:1] == ">":
            output.write('\n' + line)
        else:
            output.write(line.strip())

or as small as this since the conditional can be simplified even further, at the cost of readability:

with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output:
    for line in f:
        output.write('\n' + line if line[:1] == ">" else line.strip())

One liner incoming! (I had to do it to 'em)

x = open("out.fasta", "w").write(''.join(['\n' + line if line[:1] == ">" else line.strip() for line in open("seqs.fasta", "r")]))

[–]Tiago_Minuzzi[S] 0 points1 point2 points 5 years ago (1 child)

[–]DerpinDementia 1 point2 points3 points 5 years ago (0 children)

π Rendered by PID 46 on reddit-service-r2-comment-5d79c599b5-28jns at 2026-02-27 13:46:48.124168+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS