Full parameters

This page provides the full options and parameters for each subcommand, along with additional explanations and links where necessary. The available subcommands can be viewed using the command:

hybsuite -h/--help

or:

bash <the path to HybSuite.sh> -h/--help

Parameters for running `hybsuite stage1`

Stage 1 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage1 ...

Mandatory arguments: -input_list -input_data (required when including user-provided data) -output_dir

Essential arguments: -sra_maxsize -NGS_dir -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types. (Default: None)
  -input_data <DIR>     The directory containing all input data (required when the inputs include your own data / pre-assembled data). (Default: None).

Arguments for outputs:
  -output_dir <DIR>     The output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -NGS_dir <DIR>        The output directory containing raw and cleaned reads files (Default: <output_dir>/NGS_dataset).
                        Notes: Pre-existing cleaned reads will skip reads trimming steps.

General arguments:
  === Threads control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_fasterq_dump <INT>               
                        fasterq-dump threads. (Default: 1)
  -nt_pigz <INT>        pigz compression threads. (Default: 1)
  -nt_trimmomatic <INT> Trimmomatic threads. (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of public data downloading and raw reads trimming to run concurrently. (Default: 1)
                        "all" means running all samples concurrently. (be cautious to set this option)
   
  === Public raw reads doenloading control ===
  -rm_sra <TRUE/FALSE>  Whether to remove SRA files after conversion. (Default: TRUE)
  -download_format <fastq|fastq_gz>
                        Downloaded data format. (Default: fastq_gz)

  === Logfile Control ===
  -log_mode <simple|cmd|full>
                        The output mode of hybsuite logfile. (Default: cmd)

Arguments for integrated tools:
  === SRAToolkit ===
  -sra_maxsize <NUM>    The maximum size of sra files to download. (Default: 20GB)

  === Trimmomatic ===
  -trimmomatic_leading_quality <3-40> 
                        Leading base quality cutoff. (Default: 3)
  -trimmomatic_trailing_quality <3-40> 
                        Trailing base quality cutoff. (Default: 3)
  -trimmomatic_min_length <36-100>     
                        Minimum read length. (Default: 36)
  -trimmomatic_sliding_window_s <4-10> 
                        Sliding window size. (Default: 4)
  -trimmomatic_sliding_window_q <15-30>
                        Window average quality. (Default: 15)

Command example:
  # Run HybSuite stage1 with 1 thread and 1 parallel processing
  $ hybsuite stage1 -input_list ./input_list.txt -input_data ./Input_data -NGS_dir ./NGS_dir -output_dir ./
  
  # Run HybSuite stage1 with 5 threads and 5 parallel processing
  $ hybsuite stage1 -input_list ./input_list.txt -input_data ./Input_data -NGS_dir ./NGS_dir -output_dir ./ -nt 5 -process 5

Parameters for running `hybsuite stage2`

Stage 2 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage2 ...

Mandatory arguments: -input_list -NGS_dir -t -output_dir

Essential arguments: -eas_dir -seqs_min_length -seqs_min_sample_coverage -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types used in stage 1. (Default: None)
  -input_data <DIR>     The directory containing all input data (in this stage, only required when the inputs include pre-assembled data). (Default: None).
  -NGS_dir <DIR>        The directory containing NGS raw and cleaned reads files (generated in stage 1). (Default: ./NGS_dir)
  -t <FILE>             Target file for data assembly. (follows the format required in HybPiper)

Arguments for outputs:
  -output_dir <DIR>     The Output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -eas_dir <DIR>        The output directory containing HybPiper assembly sequences. (Default: <output_dir>/01-Assembled_data)
                        Note: Pre-existing data in this directory will skip redundant assembly steps.

General arguments:
  === Putative paralogs filtering control ===
  -seqs_min_length <INT>         
                        Minimum sequence length for filtered paralogs. (Default: 0)
                        Putative paralogs shorter than this value will be filtered.             
  -seqs_mean_length_ratio <0-1>    
                        Minimum sequence length ratio relative to the mean value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_max_length_ratio <0-1>              
                        Minimum length ratio relative to the longest value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_min_sample_coverage <0-1>           
                        Minimum sample coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, HRS and RLWP sequences in stage 3, loci lower than this sample coverage will be filtered.
  -seqs_min_locus_coverage <0-1>            
                        Minimum locus coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, taxa (samples) with lower than this locus coverage will be filtered.
  
  === Heatmap control ===
  -heatmap_color {black,blue,red,green,purple,orange,yellow,brown,pink}
                        Color scheme for heatmap gradient. (Default: black)
  
  === Threads control ===
  -nt <INT|AUTO>        Global thread setting (Default: 1)
  -nt_hybpiper <INT>    HybPiper threads (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of data assembly ('hybpiper assemble') to run concurrently (Default: 1)
                        "all" means running all samples concurrently (be cautious to set this option)
  
  === Logfile control ===
  -log_mode <simple|cmd|full>
                        The output mode of hybsuite logfile. (Default: cmd)

Arguments for integrated tools:
   === HybPiper ===
  -hybpiper_mapping_tool <blast|diamond>     
                        The tool used for mapping reads to targets in HybPiper (only for protein targets) (Default: blast)
  -hybpiper_check_chimeric_contigs	<FALSE|TRUE>
                        Check whether a stitched contig is a potential chimera of contigs from multiple paralogs when running "hybpiper assemble". (Default: TRUE)
  -hybpiper_cov_cutoff <INT>
                        Specify the value of "-cov_cutoff" when running "hybpiper assemble" in Stage 2. (Default: 8)
                        Increasing this value may increase the loci recovery efficiency but potentially introducing errors.

Command example:
  # Run HybSuite stage2 with filtering paralog sequences
  $ hybsuite stage2 -NGS_dir ./NGS_dir -t ./Angiosperms353.fasta -output_dir ./ -nt 5 -process 5 -seqs_min_length 100 -seqs_min_sample_coverage 0.1

  # Run HybSuite stage2 without filtering paralog sequences
  $ hybsuite stage2 -NGS_dir ./NGS_dir -t ./Angiosperms353.fasta -output_dir ./ -nt 5 -process 5

Parameters for running `hybsuite stage3`

Stage 3 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage3 ...

Mandatory arguments: -input_list -eas_dir -paralogs_dir -t -output_dir

Essential arguments: -PH -prefix -run_phyparts -aln_min_sample -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types used in stage 1&2. (Default: None)
  -input_data <DIR>     The directory containing all input data (in this stage, only required when the inputs include pre-assembled data). (Default: None).
  -eas_dir <DIR>        The output directory containing HybPiper assembly sequences (generated in stage 3). (Default: <output_dir>/01-Assembled_data)
  -paralogs_dir <DIR>   The directory containing all paralog sequences generated in stage 2 or by users themselves. (Default: None)
                        It's advisable to set this parameter as '<output_dir>/02-All_paralogs/03-Filtered_paralogs'.
  -t <FILE>             Target file for data assembly. (follows the format required in HybPiper)

Arguments for outputs:
  -output_dir <DIR>     Output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -prefix <STRING>      Prefix for output files. (Default: HybSuite)

General arguments:
  === Paralog handling control ===
  -PH <1-7|a|b|all>     Paralog handling methods to execute: (one or more of them can be chosen)
                        1: HRS, 2: RLWP, 3: LS, 4: MI, 5: MO, 6: RT, 7: 1to1
                        a: PhyloPyPruner, b: ParaGone (Default: 1a)
  
  === Sequences and alignments filtering control ===
  -seqs_min_length <INT>
                        Minimum sequence bp length for filtering HRS and RLWP sequences. (Default: 0)
                        HRS and RLWP sequences shorter than this value will be removed.
  -aln_min_length <INT> 
                        Minimum sequence bp length for filtering HRS and RLWP final alignments. (Default: 4)
  -aln_min_sample <INT>
                        Minimum sample number for final alignments. (Default: 0)
                        Final alignments (aligned and trimmed) with sample number below this threshold will be removed.

  === Gene tree builder control ===
  -gene_tree <1/2>      Choose the software to construct paralogs gene trees. (1: IQ-TREE; 2: FastTree) (Default: 1) 
  -gene_tree_bb <INT>   Choose the bootstrap value for paralogs gene trees inference. (Default: 1000)

  === Alignments trimming tool control ===
  -trim_tool <1/2>      Choose the software to trim/clean alignments. (1: trimAl; 2: HMMCleaner) (Default: 1) 

  === Nucleotide ambiguity character replacement ===
  -replace_n <TRUE|FALSE>
                        Replace ambiguous characters ('n', 'N', '?') with gaps ('-') in alignment files. (Default: FALSE)
                        Note: Recommended for phylogenetic software compatibility (e.g., IQ-TREE, trimAl).

  === Threads control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_paragone <INT>    ParaGone threads. (Default: 1)
  -nt_phylopypruner <INT>              
                        PhyloPyPruner threads. (Default: 1)
  -nt_mafft <INT>       MAFFT threads. (Default: 1)
  -nt_amas <INT>        AMAS.py threads. (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads. (Default: 1)
  -nt_iqtree <INT>      IQ-TREE threads. (Default: 1)
  -nt_fasttree <INT>    FastTree threads. (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of multiple sequences aligning, alignments trimming, and gene trees inference to run concurrently. (Default: 1)
                        "all" means running all samples concurrently. (be cautious to set this option)

  === Heatmap control ===
  -heatmap_color {black,blue,red,green,purple,orange,yellow,brown,pink}
                        Color scheme for heatmap gradient. (Default: black)

Arguments for integrated tools :
  === PhyloPyPruner ===
  -pp_min_taxa <INT>    Minimum taxa per cluster. (Default: 4)
  -pp_min_support <0-1> Minimum support value. (Default: 0=auto)
  -pp_trim_lb <INT>     Trim long branches. (Default: 5)

  === ParaGone ===  
  -paragone_pool <INT>  Parallel alignment tasks. (Default: 1, same as the option '-process')
  -treeshrink_q_value <0-1>        
                        TreeShrink quantile threshold (Default: 0.05)
  -paragone_cutoff_value <FLOAT>       
                        Branch length cutoff (Default: 0.3)
  -paragone_minimum_taxa <INT>         
                        Minimum taxa per alignment (Default: 4)
  -paragone_min_tips <INT>             
                        Minimum tips per tree (Default: 4)
  
  === HybPiper ===
  -hybpiper_skip_chimeric_genes <FALSE|TRUE>
                        Whether to skip recovering sequences for putative chimeric genes when running "hybpiper retrieve_sequences" (HRS method) in Stage 3. (Default: FALSE)
  -hybpiper_retrieved_seqs_type <dna|intron|supercontig>
                        The type of sequence to extract when running "hybpiper retrieve_sequences" in Stage 3. (default:dna, which means extracting coding sequences)

  === MAFFT ===  
  -mafft_algorithm <str>               
                        MAFFT algorithm [auto|linsi] (Default: auto)
  -mafft_adjustdirection <TRUE/FALSE>  
                        Whether to adjust sequence directions (Default: TRUE)
  -mafft_maxiterate <INT>              
                        Maximum number of iterations for MAFFT (Default: auto)
                        Specifies the maximum number of iterations MAFFT will perform during multiple sequence alignment. Higher iteration counts may improve alignment accuracy but will increase computation time.
  -mafft_pair <str>                    
                        Pairing strategy for MAFFT (Default: auto)
                        Specifies the pairing strategy used by MAFFT during multiple sequence alignment. Options include auto, localpair, globalpair, etc. Choosing the appropriate strategy can affect the alignment results and efficiency.
  
  === trimAl ===
  -trimal_mode <str>                   
                        trimAl mode [automated1|strict|strictplus|gappyout|nogaps|noallgaps] (Default: automated1)
  -trimal_gapthreshold <0-1>           
                        Gap threshold (Default: 0.12)
  -trimal_simthreshold <0-1>           
                        Similarity threshold (Default: auto)
  -trimal_cons <0-100>                 
                        Consensus threshold (Default: auto)
  -trimal_block <INT>                  
                        Minimum block size (Default: auto)
  -trimal_w <INT>                      
                        Window size (Default: auto)
  -trimal_gw <INT>                     
                        Gap window size (Default: auto)
  -trimal_sw <INT>                     
                        Similarity window size (Default: auto)
  -trimal_resoverlap <0-1>             
                        Minimum overlap of a positions with other positions in the column. (Default: auto) 
  -trimal_seqoverlap <0-100>           
                        Minimum percentage of sequences without gaps in a column. (Default: auto)
  
  === HMMCleaner ===
  -hmmcleaner_cost <NUM1_NUM2_NUM3_NUM4>
                        Cost parameters that defines the low similarity segments detected by HmmCleaner. (Default: -0.15_-0.08_0.15_0.45) 
                        Users can change each value but they have to be in increasing order. (NUM1 < NUM2 < 0 < NUM3 < NUM4)

Command example :
  # Run HybSuite stage3 without alignments filtering
  $ hybsuite stage3 -eas_dir ./01-Assembled_data -paralogs_dir ./02-All_paralogs/03-Filtered_paralogs -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5
  
  # Run HybSuite stage3 with alignments filtering
  $ hybsuite stage3 -eas_dir ./01-Assembled_data -paralogs_dir ./02-All_paralogs/03-Filtered_paralogs -t ./Angiosperms353 -PH 124567b -output_dir ./ -nt -process 5 -aln_min_length 100 -aln_min_sample 0.1

Parameters for running `hybsuite stage4`

Stage 4 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage4 ...

Mandatory arguments: -input_list -aln_dir -output_dir

Essential arguments: -PH -sp_tree -prefix -run_phyparts -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types used in stage 1&2. (Default: None)
  -aln_dir              The directory containing different orthogroups alignments generated in stage 3. (Default: <output_dir>/06-Final_alignments)
                        It's advisable to set this parameter as '<output_dir>/06-Final_alignments'.
  -PH <1-7|a|b|all>     Choose alignments generated via paralog handling methods as input:
                        1: HRS, 2: RLWP, 3: LS, 4: MI, 5: MO, 6: RT, 7: 1to1 (one or more of them can be chosen)
                        a: PhyloPyPruner, b: ParaGone (Default: 1a)

Arguments for outputs:
  -output_dir <DIR>     Output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -prefix <STRING>      Prefix for output files. (Default: HybSuite)

General arguments:
  === Species tree builder control ===
  -sp_tree <1-5|all>    Species tree inference method:
                        1: IQ-TREE, 2: RAxML, 3: RAxML-NG, 4: ASTRAL-IV, 5: wASTRAL
  
  === Steps control ===
  -run_coalescent_step <INT> 
                        Control which coalescent analysis steps to run:
                        1: Construct single gene trees, 2: Combine and collapse gene trees, 3: Infer species tree, 4: Reroot gene trees, 5: PhyParts concordance analysis
                        (Default: 1234)
  -run_concatenated_step <INT> 
                        Control which concatenated analysis steps to run:
                        1: Construct concatenated alignment, 2: Infer species tree
                        (Default: 12)
  
  === Gene tree builder control ===
  -gene_tree <1/2>      Choose the software to construct paralogs gene trees. (1: IQ-TREE; 2: FastTree) (Default: 1) 
  -gene_tree_bb <INT>   Choose the bootstrap value for paralogs gene trees inference. (Default: 1000)
  
  === Gene trees collapse threshold ===
  -collapse_threshold <VALUE>
                        Specify the minimum support value threshold for internal nodes in gene trees. (Default: 0)
                        Nodes with support values ≤ this threshold will be collapsed into polytomies.

  === Nucleotide ambiguity character replacement ===
  -replace_n <TRUE|FALSE>
                        Replace ambiguous characters ('n', 'N', '?') with gaps ('-') in alignment files. (Default: FALSE)
                        Note: Recommended for phylogenetic software compatibility (e.g., IQ-TREE, trimAl).

  === Threads control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_amas <INT>        AMAS.py threads (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads (Default: 1)
  -nt_iqtree <INT>      IQ-TREE threads (Default: 1)
  -nt_fasttree <INT>    FastTree threads (Default: 1)
  -nt_raxml_ng <INT>    RAxML-NG threads (Default: 1)
  -nt_raxml <INT>       RAxML threads (Default: 1)
  -nt_astral4 <INT>     ASTRAL-IV threads (Default: 1)
  -nt_wastral <INT>     wASTRAL threads (Default: 1)
  -nt_astral_pro <INT>  ASTRAL-Pro3 threads (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of gene trees inference in coalescent analysis to run concurrently. (Default: 1)
                        "all" means running all samples concurrently. (be cautious to set this option)

Arguments for integrated tools :
  === IQ-TREE (cancatenated analysis)===
  -iqtree_bb <INT>      IQ-TREE bootstrap replicates (Default: 1000)
  -iqtree_alrt <INT>    SH-aLRT replicates (Default: 1000)
  -iqtree_run_option <str>      
                        IQ-TREE run mode [standard|undo] (Default: undo)
  -iqtree_partition <TRUE/FALSE>       
                        Whether to use partition models in IQ-TREE (Default: TRUE)
  -iqtree_constraint_tree <Treefile>           
                        The pathway to the constraint tree for running IQ-TREE (Default: none)

  === ModelTest-NG ===
  -run_modeltest_ng <TRUE/FALSE>       
                        Whether to run ModelTest-NG (Default: TRUE)

  === RAxML ===
  -raxml_m <str>        RAxML model [GTRGAMMA|PROTGAMMA] (Default: GTRGAMMA)
  -raxml_bb <INT>       RAxML bootstrap replicates (Default: 1000)
  -raxml_constraint_tree <Treefile>              
                        The pathway to the constraint tree for running RAxML (Default: no constraint tree)

  === RAxML-NG ===
  -rng_bs_trees <INT>   RAxML-NG bootstrap replicates (Default: 1000)
  -rng_force <TRUE/FALSE>              
                        Ignore thread warnings (Default: FALSE)
  -rng_constraint_tree <Treefile>                
                        The pathway to the constraint tree for running RAxML-NG (Default: no constraint tree)

  === ASTRAL-IV ===
  -astral4_root         Outermost (most distant) outgroup taxon name for ASTRAL-IV branch length calculation. (Default: none)
                        (Strongly recommended for accurate branch length estimation. Specify only the single outermost outgroup.)  
  -astral_r <INT>       ASTRAL-IV rounds of search. (Default: 4)
  -astral_s <INT>       ASTRAL-IV rounds of subsampling. (Default: 4)

  === wASTRAL ===
  -wastral_mode <1-4>   wASTRAL mode [1|2|3|4] (Default: 1)
                        1: hybrid weighting, 2: support only, 3: length only, 4: unweighted
  -wastral_r <INT>      wASTRAL rounds of search. (Default: 4)
  -wastral_s <INT>      wASTRAL rounds of subsampling. (Default: 4)

  === ASTRAL-Pro ===
  -astral_pro_r <INT>   ASTRAL-Pro rounds of search. (Default: 4)
  -astral_pro_s <INT>   ASTRAL-Pro rounds of subsampling. (Default: 4)

  === MAFFT (only for paralogs inclusion method -> ASTRAL-Pro) ===  
  -mafft_algorithm <str>               
                        MAFFT algorithm [auto|linsi] (Default: auto)
  -mafft_adjustdirection <TRUE/FALSE>  
                        Whether to adjust sequence directions (Default: TRUE)
  -mafft_maxiterate <INT>              
                        Maximum number of iterations for MAFFT (Default: auto)
                        Specifies the maximum number of iterations MAFFT will perform during multiple sequence alignment. Higher iteration counts may improve alignment accuracy but will increase computation time.
  -mafft_pair <str>                    
                        Pairing strategy for MAFFT (Default: auto)
                        Specifies the pairing strategy used by MAFFT during multiple sequence alignment. Options include auto, localpair, globalpair, etc. Choosing the appropriate strategy can affect the alignment results and efficiency.
  
  === trimAl (only for paralogs inclusion method -> ASTRAL-Pro) ===
  -trimal_mode <str>                   
                        trimAl mode [automated1|strict|strictplus|gappyout|nogaps|noallgaps] (Default: automated1)
  -trimal_gapthreshold <0-1>           
                        Gap threshold (Default: 0.12)
  -trimal_simthreshold <0-1>           
                        Similarity threshold (Default: auto)
  -trimal_cons <0-100>                 
                        Consensus threshold (Default: auto)
  -trimal_block <INT>                  
                        Minimum block size (Default: auto)
  -trimal_w <INT>                      
                        Window size (Default: auto)
  -trimal_gw <INT>                     
                        Gap window size (Default: auto)
  -trimal_sw <INT>                     
                        Similarity window size (Default: auto)
  -trimal_resoverlap <0-1>             
                        Minimum overlap of a positions with other positions in the column. (Default: auto) 
  -trimal_seqoverlap <0-100>           
                        Minimum percentage of sequences without gaps in a column. (Default: auto)
  
  === HMMCleaner (only for paralogs inclusion method -> ASTRAL-Pro) ===
  -hmmcleaner_cost <NUM1_NUM2_NUM3_NUM4>
                        Cost parameters that defines the low similarity segments detected by HmmCleaner. (Default: -0.15_-0.08_0.15_0.45) 
                        Users can change each value but they have to be in increasing order. (NUM1 < NUM2 < 0 < NUM3 < NUM4)

  === PhyPartsPieCharts & modified_phypartspiecharts ===
  -run_phyparts <TRUE|FALSE>
                        Enable/disable PhyParts concordance analysis and modified pie chart visualization. (Default: TRUE)
                        Note: Requires successful completion of previous coalescent analysis.
  -phypartspiecharts_tree_type <cladogram/circle>
                        The tree type of displaying when running modified_phypartspiecharts.py (Default: cladogram)
  -phypartspiecharts_num_mode <num>
                        Control what numbers to show on branches (specify 0-2 digits) (Default: 12)
                        0: Hide all numbers
                        1: Number of genes supporting species tree (blue)
                        2: Number of genes conflicting with species tree (red+green)
                        3: Number of genes with no signal (gray)
                        4: Proportion of supporting genes (blue/total)
                        5: Proportion of conflicting genes ((red+green)/total)
                        6: Proportion of no signal genes (gray/total)
                        7: Ratio of supporting to all signal genes (blue/(blue+red+green))
                        8: Ratio of conflicting to all signal genes ((red+green)/(blue+red+green))
                        9: Original node support values from the input tree

Command example :
  # Run HybSuite stage4 with IQ-TREE
  $ hybsuite stage4 -aln_dir ./06-Final_alignments -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5 -sp_tree 1
  
  # Run HybSuite stage4 with ASTRAL-IV
  $ hybsuite stage4 -aln_dir ./06-Final_alignments -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5 -sp_tree 4

  # Run HybSuite stage4 with ASTRAL-IV and PhyParts
  $ hybsuite stage4 -aln_dir ./06-Final_alignments -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5 -sp_tree 4 -run_phyparts TRUE

Parameters for running `hybsuite full_pipeline`

HybSuite full pipeline Manual
--------------------------------------------------------------------------------
Usage: hybsuite full_pipeline ...

Mandatory arguments: -input_list -input_data (required when including user-provided data) -t -output_dir

Essential arguments: -PH -sp_tree -seqs_min_length -aln_min_sample -prefix -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types. (Default: None)
  -input_data <DIR>     The directory containing all input data (required when the inputs include your own data / pre-assembled data). (Default: None).
  -t <FILE>             Target file for data assembly. (follows the format required in HybPiper)

Arguments for outputs:
  -output_dir <DIR>     The output directory for all pipeline results. (Default: None)
  -NGS_dir <DIR>        The output directory containing raw and cleaned reads files (see GitHub documentation).
                        Notes: Pre-existing cleaned reads will skip reads trimming steps.
  -eas_dir <DIR>        The output directory containing HybPiper assembly sequences. (Default: <output_dir>/01-Assembled_data)
                        Note: Pre-existing data in this directory will skip redundant assembly steps.
  -prefix <STRING>      Prefix for output files. (Default: HybSuite)

General arguments:
  === Stages running control ===
  -skip_stage <1|2|3|12|123|>
                        Specify pipeline stages to skip during execution. (Default: None, running all stages)
                        Note: Particularly useful for re-running specific HybSuite pipeline stages.
                        (e.g., '-skip_stage 1' for skipping stage 1)
  -run_to_stage <1|2|3> Specify pipeline stages to run up to (Default: None, running all stages)
                        (e.g., '-run_to_stage 3' for stopping before stage 4)

  === Public raw reads downloading control (Stage 1) ===
  -rm_sra <TRUE/FALSE>  Whether to remove SRA files after conversion. (Default: TRUE)
  -download_format <fastq|fastq_gz>
                        Downloaded data format. (Default: fastq_gz)

  === Putative paralogs filtering control (Stage 2) ===
  -seqs_min_length <INT>         
                        Minimum sequence length for filtered paralogs. (Default: 0)
                        Putative paralogs shorter than this value will be filtered.             
  -seqs_mean_length_ratio <0-1>    
                        Minimum sequence length ratio relative to the mean value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_max_length_ratio <0-1>              
                        Minimum length ratio relative to the longest value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_min_sample_coverage <0-1>           
                        Minimum sample coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, HRS and RLWP sequences in stage 3, loci lower than this sample coverage will be filtered.
  -seqs_min_locus_coverage <0-1>            
                        Minimum locus coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, taxa (samples) with lower than this locus coverage will be filtered.

  === Heatmap control (Stage 2&3) ===
  -heatmap_color {black,blue,red,green,purple,orange,yellow,brown,pink}
                        Color scheme for heatmap gradient. (Default: black)

  === Paralog handling control (Stage 3) ===
  -PH <1-7|a|b|all>     Paralog handling methods to execute: (one or more of them can be chosen)
                        1: HRS, 2: RLWP, 3: LS, 4: MI, 5: MO, 6: RT, 7: 1to1
                        a: PhyloPyPruner, b: ParaGone (Default: 1a)
  
  === Sequences and alignments filtering control (Stage 3) ===
  -seqs_min_length <INT>
                        Minimum sequence bp length for filtering HRS and RLWP sequences. (Default: 0)
                        HRS and RLWP sequences shorter than this value will be removed.
  -aln_min_length <INT> 
                        Minimum sequence bp length for filtering HRS and RLWP final alignments. (Default: 4)
  -aln_min_sample <INT>
                        Minimum sample number for final alignments. (Default: 5)
                        Final alignments (aligned and trimmed) with sample number below this threshold will be removed.

  === Alignments trimming tool control (Stage 3) ===
  -trim_tool <1/2>      Choose the software to trim/clean alignments. (1: trimAl; 2: HMMCleaner) (Default: 1)
  
  === Gene trees builder control (Stage 3&4) ===
  -gene_tree <1/2>      Choose the software to construct paralogs gene trees. (1: IQ-TREE; 2: FastTree) (Default: 1) 
  -gene_tree_bb <INT>   Choose the bootstrap value for paralogs gene trees inference. (Default: 1000)

  === Species tree builder control (Stage 4) ===
  -sp_tree <1-5|all>    Species tree inference method: (Default: 1)
                        1: IQ-TREE, 2: RAxML, 3: RAxML-NG, 4: ASTRAL-IV, 5: wASTRAL, 6: ASTRAL-Pro
  
  === Steps control in stage 4 ===
  -run_coalescent_step  <INT> 
                        Control which coalescent analysis steps to run:
                        1: Construct single gene trees, 2: Combine and collapse gene trees, 3: Infer species tree, 4: Reroot gene trees, 5: PhyParts concordance analysis
                        (Default: 1234)
  -run_concatenated_step <INT> 
                        Control which concatenated analysis steps to run:
                        1: Construct concatenated alignment, 2: Infer species tree
                        (Default: 12)
  
  === Nucleotide ambiguity character replacement (Stage 3&4) ===
  -replace_n <TRUE|FALSE>
                        Replace ambiguous characters ('n', 'N', '?') with gaps ('-') in alignment files. (Default: FALSE)
                        Note: Recommended for phylogenetic software compatibility (e.g., IQ-TREE, trimAl).

  === Gene trees collapse threshold ===
  -collapse_threshold <VALUE>
                        Specify the minimum support value threshold for internal nodes in gene trees. (Default: 0)
                        Nodes with support values ≤ this threshold will be collapsed into polytomies.
  
  === Threads Control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_fasterq_dump <INT>               
                        fasterq-dump threads. (Default: 1)
  -nt_pigz <INT>        pigz compression threads. (Default: 1)
  -nt_trimmomatic <INT> Trimmomatic threads. (Default: 1)
  -nt_hybpiper <INT>    HybPiper threads (Default: 1)
  -nt_paragone <INT>    ParaGone threads. (Default: 1)
  -nt_phylopypruner <INT>              
                        PhyloPyPruner threads. (Default: 1)
  -nt_mafft <INT>       MAFFT threads. (Default: 1)
  -nt_amas <INT>        AMAS.py threads. (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads. (Default: 1)
  -nt_iqtree <INT>      IQ-TREE threads. (Default: 1)
  -nt_fasttree <INT>    FastTree threads. (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads (Default: 1)
  -nt_raxml_ng <INT>    RAxML-NG threads (Default: 1)
  -nt_raxml <INT>       RAxML threads (Default: 1)
  -nt_astral4 <INT>     ASTRAL-IV threads (Default: 1)
  -nt_wastral <INT>     wASTRAL threads (Default: 1)
  -nt_astral_pro <INT>  ASTRAL-Pro3 threads (Default: 1)

  === Parallel Control ===
  -process <INT|all>    Number of subprocess to run concurrently. (Default: 1)
                        "all" means running all subprocesses concurrently. (be cautious to set this option)
                        The related steps are: 
                        Stage 1: public data downloading and raw reads trimming;
                        Stage 2: data assembly ('hybpiper assemble');
                        Stage 3: multiple sequences aligning, alignments trimming, and gene trees inference;
                        Stage 4: gene trees inference in coalescent analysis.

  === Logfile Control ===
  -log_mode <simple|cmd|full>
                        The output mode of hybsuite logfile. (Default: cmd)

Arguments for integrated tools :
  === SRAToolkit (Stage 1) ===
  -sra_maxsize <NUM>    The maximum size of sra files to download. (Default: 20GB)

  === Trimmomatic (Stage 1) ===
  -trimmomatic_leading_quality <3-40>
                        Leading base quality cutoff. (Default: 3)
  -trimmomatic_trailing_quality <3-40> 
                        Trailing base quality cutoff. (Default: 3)
  -trimmomatic_min_length <36-100>
                        Minimum read length. (Default: 36)
  -trimmomatic_sliding_window_s <4-10> 
                        Sliding window size. (Default: 4)
  -trimmomatic_sliding_window_q <15-30>
                        Window average quality. (Default: 15)

  === HybPiper (Stage 2 & 3) ===
  -hybpiper_mapping_tool <blast|diamond>     
                        The tool used for mapping reads to targets in HybPiper (only for protein targets) (Default: blast)
  -hybpiper_check_chimeric_contigs	<FALSE|TRUE>
                        Check whether a stitched contig is a potential chimera of contigs from multiple paralogs when running "hybpiper assemble". (Default: FALSE)
  -hybpiper_cov_cutoff <INT>
                        Specify the value of "-cov_cutoff" when running "hybpiper assemble" in Stage 2. (Default: 8)
                        Increasing this value may increase the loci recovery efficiency but potentially introducing errors.
  -hybpiper_skip_chimeric_genes <FALSE|TRUE>
                        Whether to recover sequences for putative chimeric genes when running "hybpiper retrieve_sequences" (HRS method) in Stage 3. (Default: FALSE)
  -hybpiper_retrieved_seqs_type <dna|intron|supercontig>
                        The type of sequence to extract when running "hybpiper retrieve_sequences" in Stage 3.
  
  === PhyloPyPruner (Stage 3) ===
  -pp_min_taxa <INT>    Minimum taxa per cluster. (Default: 4)
  -pp_min_support <0-1> Minimum support value. (Default: 0=auto)
  -pp_trim_lb <INT>     Trim long branches. (Default: 5)

  === ParaGone (Stage 3) ===  
  -paragone_pool <INT>  Parallel alignment tasks. (Default: 1, same as the option '-process')
  -treeshrink_q_value <0-1>        
                        TreeShrink quantile threshold (Default: 0.05)
  -paragone_cutoff_value <FLOAT>       
                        Branch length cutoff (Default: 0.3)
  -paragone_minimum_taxa <INT>         
                        Minimum taxa per alignment (Default: 4)
  -paragone_min_tips <INT>             
                        Minimum tips per tree (Default: 4)
  
  === TreeShrink (Stage 3) ===
  -treeshrink_q_value <0-1>        
                        TreeShrink quantile threshold (Default: 0.05)

  === MAFFT (Stage 3) ===  
  -mafft_algorithm <str>               
                        MAFFT algorithm [auto|linsi] (Default: auto)
  -mafft_adjustdirection <TRUE/FALSE>  
                        Whether to adjust sequence directions (Default: TRUE)
  -mafft_maxiterate <INT>              
                        Maximum number of iterations for MAFFT (Default: auto)
                        Specifies the maximum number of iterations MAFFT will perform during multiple sequence alignment. Higher iteration counts may improve alignment accuracy but will increase computation time.
  -mafft_pair <str>                    
                        Pairing strategy for MAFFT (Default: auto)
                        Specifies the pairing strategy used by MAFFT during multiple sequence alignment. Options include auto, localpair, globalpair, etc. Choosing the appropriate strategy can affect the alignment results and efficiency.
  
  === trimAl (Stage 3) ===
  -trimal_mode <str>                   
                        trimAl mode [automated1|strict|strictplus|gappyout|nogaps|noallgaps] (Default: automated1)
  -trimal_gapthreshold <0-1>           
                        Gap threshold (Default: 0.12)
  -trimal_simthreshold <0-1>           
                        Similarity threshold (Default: auto)
  -trimal_cons <0-100>                 
                        Consensus threshold (Default: auto)
  -trimal_block <INT>                  
                        Minimum block size (Default: auto)
  -trimal_w <INT>                      
                        Window size (Default: auto)
  -trimal_gw <INT>                     
                        Gap window size (Default: auto)
  -trimal_sw <INT>                     
                        Similarity window size (Default: auto)
  -trimal_resoverlap <0-1>             
                        Minimum overlap of a positions with other positions in the column. (Default: auto) 
  -trimal_seqoverlap <0-100>           
                        Minimum percentage of sequences without gaps in a column. (Default: auto)
  
  === HMMCleaner (Stage 3) ===
  -hmmcleaner_cost <NUM1_NUM2_NUM3_NUM4>
                        Cost parameters that defines the low similarity segments detected by HmmCleaner. (Default: -0.15_-0.08_0.15_0.45) 
                        Users can change each value but they have to be in increasing order. (NUM1 < NUM2 < 0 < NUM3 < NUM4)
  
  === IQ-TREE (Stage 4) ===
  -iqtree_bb <INT>      IQ-TREE bootstrap replicates (Default: 1000)
  -iqtree_alrt <INT>    SH-aLRT replicates (Default: 1000)
  -iqtree_run_option <str>      
                        IQ-TREE run mode [standard|undo] (Default: undo)
  -iqtree_partition <TRUE/FALSE>       
                        Whether to use partition models in IQ-TREE (Default: TRUE)
  -iqtree_constraint_tree <Treefile>           
                        The pathway to the constraint tree for running IQ-TREE (Default: none)

  === ModelTest-NG (Stage 4) ===
  -run_modeltest_ng <TRUE/FALSE>       
                        Whether to run ModelTest-NG (Default: TRUE)

  === RAxML (Stage 4) ===
  -raxml_m <str>        RAxML model [GTRGAMMA|PROTGAMMA] (Default: GTRGAMMA)
  -raxml_bb <INT>       RAxML bootstrap replicates (Default: 1000)
  -raxml_constraint_tree <Treefile>              
                        The pathway to the constraint tree for running RAxML (Default: no constraint tree)

  === RAxML-NG (Stage 4) ===
  -rng_bs_trees <INT>   RAxML-NG bootstrap replicates (Default: 1000)
  -rng_force <TRUE/FALSE>              
                        Ignore thread warnings (Default: FALSE)
  -rng_constraint_tree <Treefile>                
                        The pathway to the constraint tree for running RAxML-NG (Default: no constraint tree)

  === ASTRAL-IV (Stage 4) ===
  -astral4_root         Outermost (most distant) outgroup taxon name for ASTRAL-IV branch length calculation. (Default: none)
                        (Strongly recommended for accurate branch length estimation. Specify only the single outermost outgroup.)
  -astral_r <INT>       ASTRAL-IV rounds of search. (Default: 4)
  -astral_s <INT>       ASTRAL-IV rounds of subsampling. (Default: 4)

  === wASTRAL (Stage 4) ===
  -wastral_mode <1-4>   wASTRAL mode [1|2|3|4] (Default: 1)
                        1: hybrid weighting, 2: support only, 3: length only, 4: unweighted
  -wastral_r <INT>      wASTRAL rounds of search. (Default: 4)
  -wastral_s <INT>      wASTRAL rounds of subsampling. (Default: 4)

  === ASTRAL-Pro ===
  -astral_pro_r <INT>   ASTRAL-Pro rounds of search. (Default: 4)
  -astral_pro_s <INT>   ASTRAL-Pro rounds of subsampling. (Default: 4)

  === PhyPartsPieCharts & modified_phypartspiecharts (Stage 4) ===
  -run_phyparts <TRUE|FALSE>
                        Enable/disable PhyParts concordance analysis and modified pie chart visualization. (Default: TRUE)
                        Note: Requires successful completion of previous coalescent analysis.
  -phypartspiecharts_tree_type <cladogram/circle>
                        The tree type of displaying when running modified_phypartspiecharts.py (Default: cladogram)
  -phypartspiecharts_num_mode <num>
                        Control what numbers to show on branches (specify 0-2 digits) (Default: 12)
                        0: Hide all numbers
                        1: Number of genes supporting species tree (blue)
                        2: Number of genes conflicting with species tree (red+green)
                        3: Number of genes with no signal (gray)
                        4: Proportion of supporting genes (blue/total)
                        5: Proportion of conflicting genes ((red+green)/total)
                        6: Proportion of no signal genes (gray/total)
                        7: Ratio of supporting to all signal genes (blue/(blue+red+green))
                        8: Ratio of conflicting to all signal genes ((red+green)/(blue+red+green))
                        9: Original node support values from the input tree

Command example :
  === Run the full pipeline with all paralog-handling methods and all species trees inference approaches ===
  hybsuite full_pipeline \
  -input_list ./Input_list.txt \
  -input_data ./Input_data \
  -t Angiosperms353.fasta \
  -PH 1234567 \
  -sp_tree 12345 \
  -output_dir ./ \
  -nt 5 -process 5
  
  === Run the full pipeline with only tree-based orthology inference methods (MO/MI/RT/1to1) in ParaGone and ASTRAL-IV ===
  hybsuite full_pipeline \
  -input_list ./Input_list.txt \
  -input_data ./Input_data \
  -t Angiosperms353.fasta \
  -PH 4567b \
  -sp_tree 4 \
  -output_dir ./ \
  -nt 5 -process 5

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified March 5, 2026: Update plotly.html (84cc3e0)

Full parameters

Parameters for running hybsuite stage1

Parameters for running hybsuite stage2

Parameters for running hybsuite stage3

Parameters for running hybsuite stage4