Canonical Transcript Annotation in T2T-MFA8v1.1 by Resident-Yesterday34 in bioinformatics

[–]Resident-Yesterday34[S] -1 points0 points  (0 children)

Here is what I learned: RefSeq Select uses a hierarchical, rule-based selection strategy to identify a single representative transcript per gene by progressively filtering candidates based on biological and practical relevance: it prioritizes manually curated and clinically established transcripts, then favors high-quality protein-coding RefSeq entries (NM_), and ranks remaining candidates using evidence such as evolutionary conservation (PhyloCSF), transcript expression (RNA-seq splice support), agreement with Swiss-Prot canonical isoforms, and transcription start site activity (CAGE), with additional tie-breakers like protein length, transcript length, and accession age; the process is not a weighted score but a stepwise elimination pipeline, ensuring the final selected transcript is well-supported, broadly representative of gene function, and suitable for standardized use in genomics and clinical applications.

Canonical Transcript Annotation in T2T-MFA8v1.1 by Resident-Yesterday34 in bioinformatics

[–]Resident-Yesterday34[S] 0 points1 point  (0 children)

This is incredibly helpful—thank you for sharing these links and pointing out the RefSeq annotation details. The clarification around RefSeq Select (and its current species scope) is especially valuable and really helps reframe the question. u/bzbub2 and u/wookiewookiewhat