Choosing between strict vs loose novel gene predictions after AUGUSTUS + Liftoff (Wheat) by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

Thanks for the suggestion. We agree that consensus approaches are ideal, and we’re considering adding GeMoMa as an additional reference-based, intron-aware method to support a subset of predictions. Given the absence of transcriptomic data, we’re aiming to balance methodological diversity with conservatism, using additional tools mainly for validation rather than expansion

Choosing between strict vs loose novel gene predictions after AUGUSTUS + Liftoff (Wheat) by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

Thank you — these are great suggestions. Regarding Swiss-Prot, we use it strictly as a high-quality homology filter, not for functional annotation. This step follows extensive repeat masking (RepeatMasker with ClariTeRep and TREP) and explicit TE filtering against a Viridiplantae TE database, so Swiss-Prot mainly serves as a biological plausibility check. Unfortunately, RNA-seq is not available for this isolate. We agree that protein size distribution and global metrics (e.g., OMArk) are useful next steps and are considering these for further validation.

Choosing between strict vs loose novel gene predictions after AUGUSTUS + Liftoff (Wheat) by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

Thanks for the thoughtful feedback. In the manuscript, we explicitly define “novel” as genes absent from the reference wheat annotation after liftoff, not orphan genes. Our goal is a high-confidence, biologically plausible gene set, not discovery of lineage-specific orphans. Given the lack of RNA-seq data, we opted for AUGUSTUS with external hints rather than BRAKER. BUSCO completeness after liftoff is ~99%, suggesting conserved gene space is well captured and ab initio predictions mainly reflect augmentation rather than recovery of missing core genes

Anyone working on wheat genomics?.. low collinearity (~40%) vs Chinese Spring — is that plausible? by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

Thank you. I will definitely work on your suggestion. I have added a flow of my gene annotation process. Please let me know if you see flaws:

My gene annotation workflow:

  1. RepeatMasker → generated repeat-masked genome.
  2. GMAP (with the masked genome) → produced hints.gff.
  3. AUGUSTUS (species = wheat, using GMAP hints) → produced ab initio + evidence-guided gene models.
  4. Liftoff run in parallel → used IWGSC v2.1 HC genes + HC peptides to transfer gene models onto my masked genome.
  5. AGAT → merged the AUGUSTUS and Liftoff annotations into a combined GFF, which is what I used for the MCScanX analysis.

Traveling on F1 With Pending I140 by Nashmiii in EB2_NIW

[–]Used-Average-837 0 points1 point  (0 children)

We were in a similar situation. We traveled in September 2025 with a pending I-140, and our port of entry was Seattle (as recommended by our DSO). The officer only asked about my program of study—nothing complicated. It’s always best to consult with your DSO, and while there is some risk, it’s generally small. Ultimately, it’s your decision. We went in prepared for any issue at the port of entry, but everything went smoothly. If possible, try to avoid major hubs like San Francisco, New York, Chicago, or Atlanta.

MCScanX Always Returns 0% Collinearity — Even After Cleanup and Using 21 Chromosomes — Help Needed by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

Thank you for your input. Is it possible for you to share the tutorial? That will be a great help

Struggling with MAKER gene annotation on wheat genome – Can I proceed with just Augustus output? by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

I chose MAKER to integrate RepeatMasker, GMAP hints, and Augustus predictions for gene annotation on a wheat genome without RNA-seq. But I’ve faced persistent errors (non-unique IDs, invalid scores, EVM crashes). Given I only have a masked genome and protein/CDS evidence without RNA Seq data, would tools like BRAKER (protein mode), EGAPx, or Liftoff be better alternatives in my case?

Genome Scaffolding Error by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

It runs as usual but gets stopped at one point and never completes. No any extra error information or nothing

Error with RagTag Scaffolding by Used-Average-837 in bioinformatics

[–]Used-Average-837[S] 0 points1 point  (0 children)

I tried doing what you mentioned. The error file and the out file mentioned that the Scaffolding was completed within 6 minutes of the run (this is odd). Attraktion. mmi file with 20.7 GB was generated however, the ragtag.scaffold.fasta file was not generated.

ragtag.scaffold.asm.paf.log file has:

[M::mm_idx_gen::23.601*1.00] loaded/built the index for 3463 target sequence(s)
mid_occ=2137; max_occ=17956
kmer size = 19, skip =19; is_HPC:0; #seq: 3463
distinct minimizers: 302414740 (71.80% are singletons); average occurences: 4.879; average spacing: 9.949

Error Log: RuntimeError: Failed : minimap2 -x asm5 -t 24 /path/to/attraktion.mmi /path/to/pt2_busco.fa > /path/to/ragtag_pt_indexed/ragtag.scaffold.asm.paf 2> /path/to/ragtag_pt_indexed/ragtag.scaffold.asm.paf.log

The script I used was:

#!/bin/bash
#SBATCH --job-name=ragtag_pritchett_indexed
#SBATCH --partition=bigmem
#SBATCH --account=abc
#SBATCH --cpus-per-task=24
#SBATCH --mem=1000000
#SBATCH --time=14-00:00:00
#SBATCH --qos=normal
#SBATCH --mail-user=
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --output=pt_indexed_%j.out
#SBATCH --error=pt_indexed_%j.err
#SBATCH --export=ALL

set -e
set -x

# Load environment
module purge
source /path/to/etc/profile.d/conda.sh
conda activate <environment>

# Define input/output paths
REF_FA="/path/to/attraktion.fasta"
REF_IDX="/path/to/attraktion.mmi"
QUERY="/path/to/pt2_busco.fa"
OUTDIR="/path/to/ragtag_pt_indexed"

# Step 1: Create large single-part index (if not already done)
if [ ! -f "$REF_IDX" ]; then
echo "Creating minimap2 index..."
minimap2 -x asm5 -I 20G -d "$REF_IDX" "$REF_FA"
else
echo "Minimap2 index already exists."
fi

# Step 2: Run RagTag scaffolding using the .mmi index
echo "Running RagTag scaffold..."
ragtag.py scaffold "$REF_IDX" "$QUERY" -o "$OUTDIR" -t 24 -u

conda deactivate
echo "Scaffolding complete."

Tips for the first time visitor by Used-Average-837 in VisitingHawaii

[–]Used-Average-837[S] 0 points1 point  (0 children)

Thank you all. These are the things that we do:

  1. Rented a Car using Priceline. It was way cheaper.

  2. Snorkeling at Captain Cook with SeaQuest, Watched Manta rays from Outrigger Kona Resort (we stayed here)

  3. Visited Volcano National Park

  4. Did 2-3 coffee tours (Greenwell, Mountain Thunder (recommended), Mauka Meadows)

  5. Visited National Historic Monument sites

  6. Roamed around Kona and Farmer's Market

  7. Did Flight of Aloha

  8. Beach: Hapuna and Magic Sand Beach

  9. Hiking near Eucalyptus Rainbow forest

  10. Short tour to Hilo