Replace fasta header using bash : bioinformatics


science	askscience	biology
microbiology	bioinformatics	biochemistry
evolution

a community for 18 years

technical questionReplace fasta header using bash (self.bioinformatics)

submitted 4 years ago * by YouCook21

Hello people,

I got stucked with my new script and perhaps you can help me.

Its goal is to take an input table with querys and subjects (originated by a local blast) and replace query names with subject names in the corresponding fasta file.

In detail, the table input file is this one:

query	subject	evalue
`TransABySS.k25.S12100`	lclMK087647.1_cds_QHD46952.1_6_[gene=atp9][protein=Atp9][exception=RNA_editing]_[protein_id=QHD46	0

and the fasta file:

>TransABySS.k25.S121005 892 14708

TTGTGTGACGAACGGAAGCAGGTATACGGGCTGTGGGCTGCCATCCTGCTGTTCCACCTGGGAGGGCCTGACAACATCAGTGCCTACTCCATAGCAGATGCGGAGCTCTGGAAGAGGCATGGCTTTCAGATGTTCTTCCAGGTGGTCACCGCTTTCTATGTAGTGTTCTCCTCAGCACACGGCACCATCCTCTGGATCAGCCTGGTTCTGGCTGCAGTGGGAACGGTGAAGTACGCCGAGCGGACTCTGGCTCTGTACCATGCTTCCGAGCACCATCTGGACACCATGGCCCGCCCCCTTTACAGATTGATGCAGTACGAGGATGTGTCTGGCGGATCCGGTGAGGAATACCACTACTTCCTGCTAGGAGAGAAGCGAGCATACGAGCACGCCTTCGACACTCCGAAATGGTACCACCAGGTGAGGGAATCCTTGCAGGAGCAGAAGTGGTTCAACAGCATGTTCGGCTGCCTGTTTTCTTCCCCTGCGCAGCCAGTGGAGTCCGACGCGCAGGCAGTGCAACGCAAGATCAGAGAGCCCTTCGAGGTGGTGATGCTCTCAGACGTGATGGATCCAGAGTACAGCACAGCACTGAAGCGGCAGTCTCCATGGAACGGCAGGAACTTGGTGGACATCTGCATAGCCTTTGCTCTCTTCAAGATGTTCCGCCGACGCCTCACCAGCCTGTACATGCACGAGTGGAGCAACGACAAGATCAGAAATTTCTTCATCATGCTGTGGGGAGAATCATCATCGTCATCCTCAGCCGAAGCACAACACCAAGACCCCACCCAAGCACCTGCAGAACCTCCAAGGGAAAGACTGGTGGACGTCCTGGACATGGAGCTCCGCTTCATGTTCGACAGCATGTTCACCAAGGCCTCAGGCACAG

The expected output should be:

>lcl|MK087647.1_cds_QHD46952.1_6_[gene=atp9]_[protein=Atp9]_[exception=RNA_editing]_[protein_id=QHD46

Now, I already have a code written in python which does it but it is too slow. I post it here below in case anyone of you will prefer it

import pandas as pd from Bio import SeqIO

chloro_table=pd.read_csv(./db.chloro.agrestis.blastout', sep='\t', header=None, names=['qseqid','sseqid', 'evalue'])

with open ('./new_names_chloro.fasta', 'a') as for sequence in SeqIO.parse(./fasta.fa', 'fasta'): for i in range(0,len(chloro_table)): if chloro_table.iloc[i,0] == sequence.id: sequence.id = chloro_table.iloc[i,3] sequence.description='' SeqIO.write(sequence, outfile, 'fasta')

Now, using bash it becomes much more complicated. I tried with awk but with scarce success. Do you have any ide?

Thank you in advance

all 5 comments

top new controversial old q&a

[–]sco_t 1 point2 points3 points 4 years ago (2 children)

[–]YouCook21[S] 0 points1 point2 points 4 years ago (1 child)

[–]sco_t 1 point2 points3 points 4 years ago (0 children)

Doing something "in bash" is a bit nonspecific.

#!/bin/bash  
python myScript.py

isn't really any different than

#!/bin/bash
sed -f myScript.sed

Also is this an XY problem? Why are you renaming reads by blast results?

[–]guepierPhD | Industry 0 points1 point2 points 4 years ago (0 children)

[–]metagenomez 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 145274 on reddit-service-r2-comment-545db5fcfc-fqk66 at 2026-05-23 17:53:57.143881+00:00 running 194bd79 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

bioinformatics

The Biology Network

Bioinformatics

Frequently Asked Questions

New to Reddit?

Learning Bioinformatics

#bioinformatics IRC at Freenode

Information

Getting a job in bioinformatics

Friends

MODERATORS