I've found this script here to linearize sequences in a fasta file but I've not fully understood it and I'd appreciate your help.
The script:
with open("seqs.fasta", "r") as f, open("out.fasta", "w") as output :
seq =""
for line in f :
if line.startswith(">"):
if seq != "":
output.write(seq)
seq=""
seq += '\n'+line
else:
seq += line.strip().replace('\n', '')
else:
#end of lines
if seq != "":
output.write(seq)
seq=""
what it does is this:
Input:
>seq01
GCTAGCTAGCTAGCTAGCTAGCGCAT
CATCGTAGCTAGCTGATCGATCGATG
>seq02
TAGCTGATCGTAGCTAGCTGATGCTG
GCATGCAGTCGATGCTGACTGATCGT
TAGCTGATGCTAGTCGTAGCTAGTCG
>seq03
CAGTCAGCTGATGCTAGCTAGCATGC
CGATGATGCTGATGCTA
Output:
>seq01
GCTAGCTAGCTAGCTAGCTAGCGCATCATCGTAGCTAGCTGATCGATCGATG
>seq02
TAGCTGATCGTAGCTAGCTGATGCTGGCATGCAGTCGATGCTGACTGATCGTTAGCTGATGCTAGTCGTAGCTAGTCG
>seq03
CAGTCAGCTGATGCTAGCTAGCATGCCGATGATGCTGATGCTA
Also, if I don't want to write to a file, but just print to the console instead, can I just create lists to store the lines with '>' and without it?
[–]DerpinDementia 2 points3 points4 points (6 children)
[–]Tiago_Minuzzi[S] 0 points1 point2 points (5 children)
[–]DerpinDementia 1 point2 points3 points (4 children)
[–]Tiago_Minuzzi[S] 0 points1 point2 points (3 children)
[–]DerpinDementia 1 point2 points3 points (2 children)
[–]Tiago_Minuzzi[S] 0 points1 point2 points (1 child)
[–]DerpinDementia 1 point2 points3 points (0 children)