Introduction

This page offers detailed introduction of HybSuite. Feel free to explore!

🧬 Pipeline overview

HybSuite performs end-to-end hybrid capture (Hyb-Seq) phylogenomic analysis from raw reads (Hyb-Seq preferred; compatible with RNA-seq, WGS, and genome skimming data) to phylogenetic trees.

The full pipeline is composed of 4 stages:

HybSuite workflow

Stage 1: NGS dataset construction
- (1) Optionally download public raw reads from NCBI (via SRA Toolkit );
- (2) Optionally integrate user-provided raw reads (if provided);
- (3) Raw reads trimming (via Trimmomatic);
Stage 2: Data assembly and paralog retrieval
- (1) Target loci assembly and putative paralogs retrieval (via HybPiper)
- (2) Integrate pre-assembled sequences (if provided);
- (3) Filter putative paralogs;
- (4) Plot recovery heatmap and paralog heatmap of original and filtered sequences;
Stage 3: Paralog handling
- Optionally execute seven paralogs-handling methods (HRS, RLWP, LS, MO, MI, RT, 1to1; see our Tutorial and generate filtered alignments for downstream analysis:
  - HRS:
    (1) Retrieve seqeunces via command hybpiper retrieve_sequences in HybPiper;
    (2) Integrate pre-assembled sequences (if provided);
    (3) Filter sequences by length to remove potential mis-assembled seqeunces;
    (4) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner);
    (5) Filter trimmed alignments to generate final alignments.
  - RLWP:
    (1) Retrieve seqeunces via hybpiper retrieve_sequences via HybPiper;
    (2) Integrate pre-assembled sequences (if provided);
    (3) Filter sequences by length to remove potential mis-assembled seqeunces;
    (4) Remove loci with putative paralogs masked in more than samples;
    (5) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner);
    (6) Filter trimmed alignments to generate final alignments.
  - PhyloPypruner pipeline (LS, MI, MO, RT, 1to1):
    (1) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner) for all putative paralogs;
    (2) Gene trees inference of all putative paralogs;
    (3) Obtain orthogroup alignments using tree-based orthology inference algorithms (via PhyloPypruner);
    (4) Realign (via MAFFT) and trim (via trimAl or HMMCleaner) the orthogroup alignments;
    (5) Filter trimmed orthogroup alignments to generate final alignments.
  - ParaGone pipeline (MI, MO, RT, 1to1):
    (1) Use the directory cantaining all putative paralogs generated in stage 2 as input;
    (2) Obtain orthogroup alignments using tree-based orthology inference algorithms via ParaGone;
    (3) Filter trimmed orthogroup alignments to generate final alignments.
Stage 4: Species tree inference
- Multiple species tree inference methods available:
  - Concatenation-based approach: IQ-TREE, RAxML, or RAxML-NG;
  - Coalescent-based approach: ASTRAL-IV or wASTRAL;
  - Multi-copy genes aware coalescent-based approach: ASTRAL-pro3.

✨ Features

🔄 Transparent: Full workflow visibility with real-time progress logging at each step
📝 Reproducible: Automatically archives exact software commands & parameters for every run
🧩 Modular: Execute individual stages or complete pipeline in one command
⚡ Flexible: 7 paralog handling methods & 5+ species tree inference options
🚀 Scalable: Built-in parallelization for large-scale phylogenomic datasets

🏆 Advantages

1. End-to-end pipeline from reads to trees

Processes data from raw reads to phylogenetic trees with single-command workflows
Supports both full pipeline execution and modular stage-specific operations
Minimizes manual intervention while maintaining flexibility

2. Unique functionality of integrating pre-assembled sequences

Allows for integrating pre-assembled loci sequences into the working dataset. (click here to grasp skills)

3. Customizable sequences filtering strategies

Dual filtering strategies for both loci and samples
Configurable thresholds for read depth, missing data, and sequence quality
Enables dataset optimization for different study goals

4. Advanced paralog-handling methods

Implements 7 distinct methods for paralog detection and processing
Includes both similarity-based and topology-based approaches
Improves orthology assessment accuracy

5. Multi-method Phylogenetic tree inference

Integrated softwares for concatenation-based methods: IQ-TREE, RAxML, and RAxML-NG
Integrated softwares for coalescent-based methods: ASTRAL-III or wASTRAL

6. Integrated visualization tools

plot_paralog_heatmap.py (click here to grasp skills);
plot_recovery_heatmap.py (click here to grasp skills)
modified_phypartspiecharts.py (click here to grasp skills)

7. High-Performance Computing

Parallel processing across samples and loci (option -process), which can significantly improve computational efficiency.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified March 5, 2026: Update plotly.html (84cc3e0)

Introduction

🧬 Pipeline overview

✨ Features

🏆 Advantages

1. End-to-end pipeline from reads to trees

2. Unique functionality of integrating pre-assembled sequences

3. Customizable sequences filtering strategies

4. Advanced paralog-handling methods

5. Multi-method Phylogenetic tree inference

6. Integrated visualization tools

7. High-Performance Computing

Changelog

Example dataset

Extension tools

Full parameters

Tutorial

Installation

Output files

Feedback