Introduction

This page offers detailed introduction of HybSuite. Feel free to explore!


🧬 Pipeline overview

HybSuite performs end-to-end hybrid capture (Hyb-Seq) phylogenomic analysis from raw reads (Hyb-Seq preferred; compatible with RNA-seq, WGS, and genome skimming data) to phylogenetic trees.

The full pipeline is composed of 4 stages:

HybSuite workflow

  • Stage 1: NGS dataset construction

    • (1) Optionally download public raw reads from NCBI (via SRA Toolkit );
    • (2) Optionally integrate user-provided raw reads (if provided);
    • (3) Raw reads trimming (via Trimmomatic);

  • Stage 2: Data assembly and paralog retrieval

    • (1) Target loci assembly and putative paralogs retrieval (via HybPiper)
    • (2) Integrate pre-assembled sequences (if provided);
    • (3) Filter putative paralogs;
    • (4) Plot recovery heatmap and paralog heatmap of original and filtered sequences;

  • Stage 3: Paralog handling

    • Optionally execute seven paralogs-handling methods (HRS, RLWP, LS, MO, MI, RT, 1to1; see our Tutorial and generate filtered alignments for downstream analysis:
      • HRS:
        (1) Retrieve seqeunces via command hybpiper retrieve_sequences in HybPiper;
        (2) Integrate pre-assembled sequences (if provided);
        (3) Filter sequences by length to remove potential mis-assembled seqeunces;
        (4) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner);
        (5) Filter trimmed alignments to generate final alignments.
      • RLWP:
        (1) Retrieve seqeunces via hybpiper retrieve_sequences via HybPiper;
        (2) Integrate pre-assembled sequences (if provided);
        (3) Filter sequences by length to remove potential mis-assembled seqeunces;
        (4) Remove loci with putative paralogs masked in more than samples;
        (5) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner);
        (6) Filter trimmed alignments to generate final alignments.
      • PhyloPypruner pipeline (LS, MI, MO, RT, 1to1):
        (1) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner) for all putative paralogs;
        (2) Gene trees inference of all putative paralogs;
        (3) Obtain orthogroup alignments using tree-based orthology inference algorithms (via PhyloPypruner);
        (4) Realign (via MAFFT) and trim (via trimAl or HMMCleaner) the orthogroup alignments;
        (5) Filter trimmed orthogroup alignments to generate final alignments.
      • ParaGone pipeline (MI, MO, RT, 1to1):
        (1) Use the directory cantaining all putative paralogs generated in stage 2 as input;
        (2) Obtain orthogroup alignments using tree-based orthology inference algorithms via ParaGone;
        (3) Filter trimmed orthogroup alignments to generate final alignments.

  • Stage 4: Species tree inference


✨ Features

🔄 Transparent: Full workflow visibility with real-time progress logging at each step
📝 Reproducible: Automatically archives exact software commands & parameters for every run
🧩 Modular: Execute individual stages or complete pipeline in one command
Flexible: 7 paralog handling methods & 5+ species tree inference options
🚀 Scalable: Built-in parallelization for large-scale phylogenomic datasets


🏆 Advantages

1. End-to-end pipeline from reads to trees

  • Processes data from raw reads to phylogenetic trees with single-command workflows
  • Supports both full pipeline execution and modular stage-specific operations
  • Minimizes manual intervention while maintaining flexibility

2. Unique functionality of integrating pre-assembled sequences

  • Allows for integrating pre-assembled loci sequences into the working dataset. (click here to grasp skills)

3. Customizable sequences filtering strategies

  • Dual filtering strategies for both loci and samples
  • Configurable thresholds for read depth, missing data, and sequence quality
  • Enables dataset optimization for different study goals

4. Advanced paralog-handling methods

  • Implements 7 distinct methods for paralog detection and processing
  • Includes both similarity-based and topology-based approaches
  • Improves orthology assessment accuracy

5. Multi-method Phylogenetic tree inference

6. Integrated visualization tools

  • plot_paralog_heatmap.py (click here to grasp skills);
  • plot_recovery_heatmap.py (click here to grasp skills)
  • modified_phypartspiecharts.py (click here to grasp skills)

7. High-Performance Computing

  • Parallel processing across samples and loci (option -process), which can significantly improve computational efficiency.

Last modified March 5, 2026: Update plotly.html (84cc3e0)