Introduction
This page offers detailed introduction of HybSuite. Feel free to explore!
🧬 Pipeline overview
HybSuite performs end-to-end hybrid capture (Hyb-Seq) phylogenomic analysis from raw reads (Hyb-Seq preferred; compatible with RNA-seq, WGS, and genome skimming data) to phylogenetic trees.
The full pipeline is composed of 4 stages:

Stage 1: NGS dataset construction
- (1) Optionally download public raw reads from NCBI (via SRA Toolkit );
- (2) Optionally integrate user-provided raw reads (if provided);
- (3) Raw reads trimming (via Trimmomatic);
Stage 2: Data assembly and paralog retrieval
- (1) Target loci assembly and putative paralogs retrieval (via HybPiper)
- (2) Integrate pre-assembled sequences (if provided);
- (3) Filter putative paralogs;
- (4) Plot recovery heatmap and paralog heatmap of original and filtered sequences;
Stage 3: Paralog handling
- Optionally execute seven paralogs-handling methods (HRS, RLWP, LS, MO, MI, RT, 1to1; see our Tutorial and generate filtered alignments for downstream analysis:
- HRS:
(1) Retrieve seqeunces via commandhybpiper retrieve_sequencesin HybPiper;
(2) Integrate pre-assembled sequences (if provided);
(3) Filter sequences by length to remove potential mis-assembled seqeunces;
(4) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner);
(5) Filter trimmed alignments to generate final alignments. - RLWP:
(1) Retrieve seqeunces viahybpiper retrieve_sequencesvia HybPiper;
(2) Integrate pre-assembled sequences (if provided);
(3) Filter sequences by length to remove potential mis-assembled seqeunces;
(4) Remove loci with putative paralogs masked in more thansamples;
(5) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner);
(6) Filter trimmed alignments to generate final alignments. - PhyloPypruner pipeline (LS, MI, MO, RT, 1to1):
(1) Mutiple sequences aligning (via MAFFT) and trimming (via trimAl or HMMCleaner) for all putative paralogs;
(2) Gene trees inference of all putative paralogs;
(3) Obtain orthogroup alignments using tree-based orthology inference algorithms (via PhyloPypruner);
(4) Realign (via MAFFT) and trim (via trimAl or HMMCleaner) the orthogroup alignments;
(5) Filter trimmed orthogroup alignments to generate final alignments. - ParaGone pipeline (MI, MO, RT, 1to1):
(1) Use the directory cantaining all putative paralogs generated in stage 2 as input;
(2) Obtain orthogroup alignments using tree-based orthology inference algorithms via ParaGone;
(3) Filter trimmed orthogroup alignments to generate final alignments.
- HRS:
- Optionally execute seven paralogs-handling methods (HRS, RLWP, LS, MO, MI, RT, 1to1; see our Tutorial and generate filtered alignments for downstream analysis:
Stage 4: Species tree inference
✨ Features
🔄 Transparent: Full workflow visibility with real-time progress logging at each step
📝 Reproducible: Automatically archives exact software commands & parameters for every run
🧩 Modular: Execute individual stages or complete pipeline in one command
⚡ Flexible: 7 paralog handling methods & 5+ species tree inference options
🚀 Scalable: Built-in parallelization for large-scale phylogenomic datasets
🏆 Advantages
1. End-to-end pipeline from reads to trees
- Processes data from raw reads to phylogenetic trees with single-command workflows
- Supports both full pipeline execution and modular stage-specific operations
- Minimizes manual intervention while maintaining flexibility
2. Unique functionality of integrating pre-assembled sequences
- Allows for integrating pre-assembled loci sequences into the working dataset. (click here to grasp skills)
3. Customizable sequences filtering strategies
- Dual filtering strategies for both loci and samples
- Configurable thresholds for read depth, missing data, and sequence quality
- Enables dataset optimization for different study goals
4. Advanced paralog-handling methods
- Implements 7 distinct methods for paralog detection and processing
- Includes both similarity-based and topology-based approaches
- Improves orthology assessment accuracy
5. Multi-method Phylogenetic tree inference
- Integrated softwares for concatenation-based methods: IQ-TREE, RAxML, and RAxML-NG
- Integrated softwares for coalescent-based methods: ASTRAL-III or wASTRAL
6. Integrated visualization tools
plot_paralog_heatmap.py(click here to grasp skills);plot_recovery_heatmap.py(click here to grasp skills)modified_phypartspiecharts.py(click here to grasp skills)
7. High-Performance Computing
- Parallel processing across samples and loci (option
-process), which can significantly improve computational efficiency.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.