Full parameters

This page provides the full options and parameters for each subcommand, along with additional explanations and links where necessary. The available subcommands can be viewed using the command:

hybsuite -h/--help

or:

bash <the path to HybSuite.sh> -h/--help

Parameters for running `hybsuite stage1`

Stage 1 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage1 ...

Mandatory arguments: -input_list -input_data (required when including user-provided data) -output_dir

Essential arguments: -sra_maxsize -NGS_dir -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types. (Default: None)
  -input_data <DIR>     The directory containing all input data (required when the inputs include your own data / pre-assembled data). (Default: None).

Arguments for outputs:
  -output_dir <DIR>     The output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -NGS_dir <DIR>        The output directory containing raw and cleaned reads files (Default: <output_dir>/NGS_dataset).
                        Notes: Pre-existing cleaned reads will skip reads trimming steps.

General arguments:
  === Threads control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_fasterq_dump <INT>               
                        fasterq-dump threads. (Default: 1)
  -nt_pigz <INT>        pigz compression threads. (Default: 1)
  -nt_trimmomatic <INT> Trimmomatic threads. (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of public data downloading and raw reads trimming to run concurrently. (Default: 1)
                        "all" means running all samples concurrently. (be cautious to set this option)
   
  === Public raw reads doenloading control ===
  -rm_sra <TRUE/FALSE>  Whether to remove SRA files after conversion. (Default: TRUE)
  -download_format <fastq|fastq_gz>
                        Downloaded data format. (Default: fastq_gz)

  === Logfile Control ===
  -log_mode <simple|cmd|full>
                        The output mode of hybsuite logfile. (Default: cmd)

Arguments for integrated tools:
  === SRAToolkit ===
  -sra_maxsize <NUM>    The maximum size of sra files to download. (Default: 20GB)

  === Trimmomatic ===
  -trimmomatic_leading_quality <3-40> 
                        Leading base quality cutoff. (Default: 3)
  -trimmomatic_trailing_quality <3-40> 
                        Trailing base quality cutoff. (Default: 3)
  -trimmomatic_min_length <36-100>     
                        Minimum read length. (Default: 36)
  -trimmomatic_sliding_window_s <4-10> 
                        Sliding window size. (Default: 4)
  -trimmomatic_sliding_window_q <15-30>
                        Window average quality. (Default: 15)

Command example:
  # Run HybSuite stage1 with 1 thread and 1 parallel processing
  $ hybsuite stage1 -input_list ./input_list.txt -input_data ./Input_data -NGS_dir ./NGS_dir -output_dir ./
  
  # Run HybSuite stage1 with 5 threads and 5 parallel processing
  $ hybsuite stage1 -input_list ./input_list.txt -input_data ./Input_data -NGS_dir ./NGS_dir -output_dir ./ -nt 5 -process 5

Parameters for running `hybsuite stage2`

Stage 2 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage2 ...

Mandatory arguments: -input_list -NGS_dir -t -output_dir

Essential arguments: -eas_dir -seqs_min_length -seqs_min_sample_coverage -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types used in stage 1. (Default: None)
  -input_data <DIR>     The directory containing all input data (in this stage, only required when the inputs include pre-assembled data). (Default: None).
  -NGS_dir <DIR>        The directory containing NGS raw and cleaned reads files (generated in stage 1). (Default: ./NGS_dir)
  -t <FILE>             Target file for data assembly. (follows the format required in HybPiper)

Arguments for outputs:
  -output_dir <DIR>     The Output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -eas_dir <DIR>        The output directory containing HybPiper assembly sequences. (Default: <output_dir>/01-Assembled_data)
                        Note: Pre-existing data in this directory will skip redundant assembly steps.

General arguments:
  === Putative paralogs filtering control ===
  -seqs_min_length <INT>         
                        Minimum sequence length for filtered paralogs. (Default: 0)
                        Putative paralogs shorter than this value will be filtered.             
  -seqs_mean_length_ratio <0-1>    
                        Minimum sequence length ratio relative to the mean value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_max_length_ratio <0-1>              
                        Minimum length ratio relative to the longest value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_min_sample_coverage <0-1>           
                        Minimum sample coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, HRS and RLWP sequences in stage 3, loci lower than this sample coverage will be filtered.
  -seqs_min_locus_coverage <0-1>            
                        Minimum locus coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, taxa (samples) with lower than this locus coverage will be filtered.
  
  === Heatmap control ===
  -heatmap_color {black,blue,red,green,purple,orange,yellow,brown,pink}
                        Color scheme for heatmap gradient. (Default: black)
  
  === Threads control ===
  -nt <INT|AUTO>        Global thread setting (Default: 1)
  -nt_hybpiper <INT>    HybPiper threads (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of data assembly ('hybpiper assemble') to run concurrently (Default: 1)
                        "all" means running all samples concurrently (be cautious to set this option)
  
  === Logfile control ===
  -log_mode <simple|cmd|full>
                        The output mode of hybsuite logfile. (Default: cmd)

Arguments for integrated tools:
   === HybPiper ===
  -hybpiper_mapping_tool <blast|diamond>     
                        The tool used for mapping reads to targets in HybPiper (only for protein targets) (Default: blast)
  -hybpiper_check_chimeric_contigs	<FALSE|TRUE>
                        Check whether a stitched contig is a potential chimera of contigs from multiple paralogs when running "hybpiper assemble". (Default: TRUE)
  -hybpiper_cov_cutoff <INT>
                        Specify the value of "-cov_cutoff" when running "hybpiper assemble" in Stage 2. (Default: 8)
                        Increasing this value may increase the loci recovery efficiency but potentially introducing errors.

Command example:
  # Run HybSuite stage2 with filtering paralog sequences
  $ hybsuite stage2 -NGS_dir ./NGS_dir -t ./Angiosperms353.fasta -output_dir ./ -nt 5 -process 5 -seqs_min_length 100 -seqs_min_sample_coverage 0.1

  # Run HybSuite stage2 without filtering paralog sequences
  $ hybsuite stage2 -NGS_dir ./NGS_dir -t ./Angiosperms353.fasta -output_dir ./ -nt 5 -process 5

Parameters for running `hybsuite stage3`

Stage 3 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage3 ...

Mandatory arguments: -input_list -eas_dir -paralogs_dir -t -output_dir

Essential arguments: -PH -prefix -run_phyparts -aln_min_sample -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types used in stage 1&2. (Default: None)
  -input_data <DIR>     The directory containing all input data (in this stage, only required when the inputs include pre-assembled data). (Default: None).
  -eas_dir <DIR>        The output directory containing HybPiper assembly sequences (generated in stage 3). (Default: <output_dir>/01-Assembled_data)
  -paralogs_dir <DIR>   The directory containing all paralog sequences generated in stage 2 or by users themselves. (Default: None)
                        It's advisable to set this parameter as '<output_dir>/02-All_paralogs/03-Filtered_paralogs'.
  -t <FILE>             Target file for data assembly. (follows the format required in HybPiper)

Arguments for outputs:
  -output_dir <DIR>     Output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -prefix <STRING>      Prefix for output files. (Default: HybSuite)

General arguments:
  === Paralog handling control ===
  -PH <1-7|a|b|all>     Paralog handling methods to execute: (one or more of them can be chosen)
                        1: HRS, 2: RLWP, 3: LS, 4: MI, 5: MO, 6: RT, 7: 1to1
                        a: PhyloPyPruner, b: ParaGone (Default: 1a)
  
  === Sequences and alignments filtering control ===
  -seqs_min_length <INT>
                        Minimum sequence bp length for filtering HRS and RLWP sequences. (Default: 0)
                        HRS and RLWP sequences shorter than this value will be removed.
  -aln_min_length <INT> 
                        Minimum sequence bp length for filtering HRS and RLWP final alignments. (Default: 4)
  -aln_min_sample <INT>
                        Minimum sample number for final alignments. (Default: 0)
                        Final alignments (aligned and trimmed) with sample number below this threshold will be removed.

  === Gene tree builder control ===
  -gene_tree <1/2>      Choose the software to construct paralogs gene trees. (1: IQ-TREE; 2: FastTree) (Default: 1) 
  -gene_tree_bb <INT>   Choose the bootstrap value for paralogs gene trees inference. (Default: 1000)

  === Alignments trimming tool control ===
  -trim_tool <1/2>      Choose the software to trim/clean alignments. (1: trimAl; 2: HMMCleaner) (Default: 1) 

  === Nucleotide ambiguity character replacement ===
  -replace_n <TRUE|FALSE>
                        Replace ambiguous characters ('n', 'N', '?') with gaps ('-') in alignment files. (Default: FALSE)
                        Note: Recommended for phylogenetic software compatibility (e.g., IQ-TREE, trimAl).

  === Threads control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_paragone <INT>    ParaGone threads. (Default: 1)
  -nt_phylopypruner <INT>              
                        PhyloPyPruner threads. (Default: 1)
  -nt_mafft <INT>       MAFFT threads. (Default: 1)
  -nt_amas <INT>        AMAS.py threads. (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads. (Default: 1)
  -nt_iqtree <INT>      IQ-TREE threads. (Default: 1)
  -nt_fasttree <INT>    FastTree threads. (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of multiple sequences aligning, alignments trimming, and gene trees inference to run concurrently. (Default: 1)
                        "all" means running all samples concurrently. (be cautious to set this option)

  === Heatmap control ===
  -heatmap_color {black,blue,red,green,purple,orange,yellow,brown,pink}
                        Color scheme for heatmap gradient. (Default: black)

Arguments for integrated tools :
  === PhyloPyPruner ===
  -pp_min_taxa <INT>    Minimum taxa per cluster. (Default: 4)
  -pp_min_support <0-1> Minimum support value. (Default: 0=auto)
  -pp_trim_lb <INT>     Trim long branches. (Default: 5)

  === ParaGone ===  
  -paragone_pool <INT>  Parallel alignment tasks. (Default: 1, same as the option '-process')
  -treeshrink_q_value <0-1>        
                        TreeShrink quantile threshold (Default: 0.05)
  -paragone_cutoff_value <FLOAT>       
                        Branch length cutoff (Default: 0.3)
  -paragone_minimum_taxa <INT>         
                        Minimum taxa per alignment (Default: 4)
  -paragone_min_tips <INT>             
                        Minimum tips per tree (Default: 4)
  
  === HybPiper ===
  -hybpiper_skip_chimeric_genes <FALSE|TRUE>
                        Whether to skip recovering sequences for putative chimeric genes when running "hybpiper retrieve_sequences" (HRS method) in Stage 3. (Default: FALSE)
  -hybpiper_retrieved_seqs_type <dna|intron|supercontig>
                        The type of sequence to extract when running "hybpiper retrieve_sequences" in Stage 3. (default:dna, which means extracting coding sequences)

  === MAFFT ===  
  -mafft_algorithm <str>               
                        MAFFT algorithm [auto|linsi] (Default: auto)
  -mafft_adjustdirection <TRUE/FALSE>  
                        Whether to adjust sequence directions (Default: TRUE)
  -mafft_maxiterate <INT>              
                        Maximum number of iterations for MAFFT (Default: auto)
                        Specifies the maximum number of iterations MAFFT will perform during multiple sequence alignment. Higher iteration counts may improve alignment accuracy but will increase computation time.
  -mafft_pair <str>                    
                        Pairing strategy for MAFFT (Default: auto)
                        Specifies the pairing strategy used by MAFFT during multiple sequence alignment. Options include auto, localpair, globalpair, etc. Choosing the appropriate strategy can affect the alignment results and efficiency.
  
  === trimAl ===
  -trimal_mode <str>                   
                        trimAl mode [automated1|strict|strictplus|gappyout|nogaps|noallgaps] (Default: automated1)
  -trimal_gapthreshold <0-1>           
                        Gap threshold (Default: 0.12)
  -trimal_simthreshold <0-1>           
                        Similarity threshold (Default: auto)
  -trimal_cons <0-100>                 
                        Consensus threshold (Default: auto)
  -trimal_block <INT>                  
                        Minimum block size (Default: auto)
  -trimal_w <INT>                      
                        Window size (Default: auto)
  -trimal_gw <INT>                     
                        Gap window size (Default: auto)
  -trimal_sw <INT>                     
                        Similarity window size (Default: auto)
  -trimal_resoverlap <0-1>             
                        Minimum overlap of a positions with other positions in the column. (Default: auto) 
  -trimal_seqoverlap <0-100>           
                        Minimum percentage of sequences without gaps in a column. (Default: auto)
  
  === HMMCleaner ===
  -hmmcleaner_cost <NUM1_NUM2_NUM3_NUM4>
                        Cost parameters that defines the low similarity segments detected by HmmCleaner. (Default: -0.15_-0.08_0.15_0.45) 
                        Users can change each value but they have to be in increasing order. (NUM1 < NUM2 < 0 < NUM3 < NUM4)

Command example :
  # Run HybSuite stage3 without alignments filtering
  $ hybsuite stage3 -eas_dir ./01-Assembled_data -paralogs_dir ./02-All_paralogs/03-Filtered_paralogs -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5
  
  # Run HybSuite stage3 with alignments filtering
  $ hybsuite stage3 -eas_dir ./01-Assembled_data -paralogs_dir ./02-All_paralogs/03-Filtered_paralogs -t ./Angiosperms353 -PH 124567b -output_dir ./ -nt -process 5 -aln_min_length 100 -aln_min_sample 0.1

Parameters for running `hybsuite stage4`

Stage 4 Manual
--------------------------------------------------------------------------------
Usage: hybsuite stage4 ...

Mandatory arguments: -input_list -aln_dir -output_dir

Essential arguments: -PH -sp_tree -prefix -run_phyparts -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types used in stage 1&2. (Default: None)
  -aln_dir              The directory containing different orthogroups alignments generated in stage 3. (Default: <output_dir>/06-Final_alignments)
                        It's advisable to set this parameter as '<output_dir>/06-Final_alignments'.
  -PH <1-7|a|b|all>     Choose alignments generated via paralog handling methods as input:
                        1: HRS, 2: RLWP, 3: LS, 4: MI, 5: MO, 6: RT, 7: 1to1 (one or more of them can be chosen)
                        a: PhyloPyPruner, b: ParaGone (Default: 1a)

Arguments for outputs:
  -output_dir <DIR>     Output directory for all pipeline results (better to be consistent across all stages). (Default: None)
  -prefix <STRING>      Prefix for output files. (Default: HybSuite)

General arguments:
  === Species tree builder control ===
  -sp_tree <1-5|all>    Species tree inference method:
                        1: IQ-TREE, 2: RAxML, 3: RAxML-NG, 4: ASTRAL-IV, 5: wASTRAL
  
  === Steps control ===
  -run_coalescent_step <INT> 
                        Control which coalescent analysis steps to run:
                        1: Construct single gene trees, 2: Combine and collapse gene trees, 3: Infer species tree, 4: Reroot gene trees, 5: PhyParts concordance analysis
                        (Default: 1234)
  -run_concatenated_step <INT> 
                        Control which concatenated analysis steps to run:
                        1: Construct concatenated alignment, 2: Infer species tree
                        (Default: 12)
  
  === Gene tree builder control ===
  -gene_tree <1/2>      Choose the software to construct paralogs gene trees. (1: IQ-TREE; 2: FastTree) (Default: 1) 
  -gene_tree_bb <INT>   Choose the bootstrap value for paralogs gene trees inference. (Default: 1000)
  
  === Gene trees collapse threshold ===
  -collapse_threshold <VALUE>
                        Specify the minimum support value threshold for internal nodes in gene trees. (Default: 0)
                        Nodes with support values ≤ this threshold will be collapsed into polytomies.

  === Nucleotide ambiguity character replacement ===
  -replace_n <TRUE|FALSE>
                        Replace ambiguous characters ('n', 'N', '?') with gaps ('-') in alignment files. (Default: FALSE)
                        Note: Recommended for phylogenetic software compatibility (e.g., IQ-TREE, trimAl).

  === Threads control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_amas <INT>        AMAS.py threads (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads (Default: 1)
  -nt_iqtree <INT>      IQ-TREE threads (Default: 1)
  -nt_fasttree <INT>    FastTree threads (Default: 1)
  -nt_raxml_ng <INT>    RAxML-NG threads (Default: 1)
  -nt_raxml <INT>       RAxML threads (Default: 1)
  -nt_astral4 <INT>     ASTRAL-IV threads (Default: 1)
  -nt_wastral <INT>     wASTRAL threads (Default: 1)
  -nt_astral_pro <INT>  ASTRAL-Pro3 threads (Default: 1)

  === Parallel control ===
  -process <INT|all>    Number of gene trees inference in coalescent analysis to run concurrently. (Default: 1)
                        "all" means running all samples concurrently. (be cautious to set this option)

Arguments for integrated tools :
  === IQ-TREE (cancatenated analysis)===
  -iqtree_bb <INT>      IQ-TREE bootstrap replicates (Default: 1000)
  -iqtree_alrt <INT>    SH-aLRT replicates (Default: 1000)
  -iqtree_run_option <str>      
                        IQ-TREE run mode [standard|undo] (Default: undo)
  -iqtree_partition <TRUE/FALSE>       
                        Whether to use partition models in IQ-TREE (Default: TRUE)
  -iqtree_constraint_tree <Treefile>           
                        The pathway to the constraint tree for running IQ-TREE (Default: none)

  === ModelTest-NG ===
  -run_modeltest_ng <TRUE/FALSE>       
                        Whether to run ModelTest-NG (Default: TRUE)

  === RAxML ===
  -raxml_m <str>        RAxML model [GTRGAMMA|PROTGAMMA] (Default: GTRGAMMA)
  -raxml_bb <INT>       RAxML bootstrap replicates (Default: 1000)
  -raxml_constraint_tree <Treefile>              
                        The pathway to the constraint tree for running RAxML (Default: no constraint tree)

  === RAxML-NG ===
  -rng_bs_trees <INT>   RAxML-NG bootstrap replicates (Default: 1000)
  -rng_force <TRUE/FALSE>              
                        Ignore thread warnings (Default: FALSE)
  -rng_constraint_tree <Treefile>                
                        The pathway to the constraint tree for running RAxML-NG (Default: no constraint tree)

  === ASTRAL-IV ===
  -astral4_root         Outermost (most distant) outgroup taxon name for ASTRAL-IV branch length calculation. (Default: none)
                        (Strongly recommended for accurate branch length estimation. Specify only the single outermost outgroup.)  
  -astral_r <INT>       ASTRAL-IV rounds of search. (Default: 4)
  -astral_s <INT>       ASTRAL-IV rounds of subsampling. (Default: 4)

  === wASTRAL ===
  -wastral_mode <1-4>   wASTRAL mode [1|2|3|4] (Default: 1)
                        1: hybrid weighting, 2: support only, 3: length only, 4: unweighted
  -wastral_r <INT>      wASTRAL rounds of search. (Default: 4)
  -wastral_s <INT>      wASTRAL rounds of subsampling. (Default: 4)

  === ASTRAL-Pro ===
  -astral_pro_r <INT>   ASTRAL-Pro rounds of search. (Default: 4)
  -astral_pro_s <INT>   ASTRAL-Pro rounds of subsampling. (Default: 4)

  === MAFFT (only for paralogs inclusion method -> ASTRAL-Pro) ===  
  -mafft_algorithm <str>               
                        MAFFT algorithm [auto|linsi] (Default: auto)
  -mafft_adjustdirection <TRUE/FALSE>  
                        Whether to adjust sequence directions (Default: TRUE)
  -mafft_maxiterate <INT>              
                        Maximum number of iterations for MAFFT (Default: auto)
                        Specifies the maximum number of iterations MAFFT will perform during multiple sequence alignment. Higher iteration counts may improve alignment accuracy but will increase computation time.
  -mafft_pair <str>                    
                        Pairing strategy for MAFFT (Default: auto)
                        Specifies the pairing strategy used by MAFFT during multiple sequence alignment. Options include auto, localpair, globalpair, etc. Choosing the appropriate strategy can affect the alignment results and efficiency.
  
  === trimAl (only for paralogs inclusion method -> ASTRAL-Pro) ===
  -trimal_mode <str>                   
                        trimAl mode [automated1|strict|strictplus|gappyout|nogaps|noallgaps] (Default: automated1)
  -trimal_gapthreshold <0-1>           
                        Gap threshold (Default: 0.12)
  -trimal_simthreshold <0-1>           
                        Similarity threshold (Default: auto)
  -trimal_cons <0-100>                 
                        Consensus threshold (Default: auto)
  -trimal_block <INT>                  
                        Minimum block size (Default: auto)
  -trimal_w <INT>                      
                        Window size (Default: auto)
  -trimal_gw <INT>                     
                        Gap window size (Default: auto)
  -trimal_sw <INT>                     
                        Similarity window size (Default: auto)
  -trimal_resoverlap <0-1>             
                        Minimum overlap of a positions with other positions in the column. (Default: auto) 
  -trimal_seqoverlap <0-100>           
                        Minimum percentage of sequences without gaps in a column. (Default: auto)
  
  === HMMCleaner (only for paralogs inclusion method -> ASTRAL-Pro) ===
  -hmmcleaner_cost <NUM1_NUM2_NUM3_NUM4>
                        Cost parameters that defines the low similarity segments detected by HmmCleaner. (Default: -0.15_-0.08_0.15_0.45) 
                        Users can change each value but they have to be in increasing order. (NUM1 < NUM2 < 0 < NUM3 < NUM4)

  === PhyPartsPieCharts & modified_phypartspiecharts ===
  -run_phyparts <TRUE|FALSE>
                        Enable/disable PhyParts concordance analysis and modified pie chart visualization. (Default: TRUE)
                        Note: Requires successful completion of previous coalescent analysis.
  -phypartspiecharts_tree_type <cladogram/circle>
                        The tree type of displaying when running modified_phypartspiecharts.py (Default: cladogram)
  -phypartspiecharts_num_mode <num>
                        Control what numbers to show on branches (specify 0-2 digits) (Default: 12)
                        0: Hide all numbers
                        1: Number of genes supporting species tree (blue)
                        2: Number of genes conflicting with species tree (red+green)
                        3: Number of genes with no signal (gray)
                        4: Proportion of supporting genes (blue/total)
                        5: Proportion of conflicting genes ((red+green)/total)
                        6: Proportion of no signal genes (gray/total)
                        7: Ratio of supporting to all signal genes (blue/(blue+red+green))
                        8: Ratio of conflicting to all signal genes ((red+green)/(blue+red+green))
                        9: Original node support values from the input tree

Command example :
  # Run HybSuite stage4 with IQ-TREE
  $ hybsuite stage4 -aln_dir ./06-Final_alignments -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5 -sp_tree 1
  
  # Run HybSuite stage4 with ASTRAL-IV
  $ hybsuite stage4 -aln_dir ./06-Final_alignments -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5 -sp_tree 4

  # Run HybSuite stage4 with ASTRAL-IV and PhyParts
  $ hybsuite stage4 -aln_dir ./06-Final_alignments -t ./Angiosperms353 -PH 1234567a -output_dir ./ -nt -process 5 -sp_tree 4 -run_phyparts TRUE

Parameters for running `hybsuite full_pipeline`

HybSuite full pipeline Manual
--------------------------------------------------------------------------------
Usage: hybsuite full_pipeline ...

Mandatory arguments: -input_list -input_data (required when including user-provided data) -t -output_dir

Essential arguments: -PH -sp_tree -seqs_min_length -aln_min_sample -prefix -nt -process

Arguments for inputs:
  -input_list <FILE>    The file listing input sample names and corresponding data types. (Default: None)
  -input_data <DIR>     The directory containing all input data (required when the inputs include your own data / pre-assembled data). (Default: None).
  -t <FILE>             Target file for data assembly. (follows the format required in HybPiper)

Arguments for outputs:
  -output_dir <DIR>     The output directory for all pipeline results. (Default: None)
  -NGS_dir <DIR>        The output directory containing raw and cleaned reads files (see GitHub documentation).
                        Notes: Pre-existing cleaned reads will skip reads trimming steps.
  -eas_dir <DIR>        The output directory containing HybPiper assembly sequences. (Default: <output_dir>/01-Assembled_data)
                        Note: Pre-existing data in this directory will skip redundant assembly steps.
  -prefix <STRING>      Prefix for output files. (Default: HybSuite)

General arguments:
  === Stages running control ===
  -skip_stage <1|2|3|12|123|>
                        Specify pipeline stages to skip during execution. (Default: None, running all stages)
                        Note: Particularly useful for re-running specific HybSuite pipeline stages.
                        (e.g., '-skip_stage 1' for skipping stage 1)
  -run_to_stage <1|2|3> Specify pipeline stages to run up to (Default: None, running all stages)
                        (e.g., '-run_to_stage 3' for stopping before stage 4)

  === Public raw reads downloading control (Stage 1) ===
  -rm_sra <TRUE/FALSE>  Whether to remove SRA files after conversion. (Default: TRUE)
  -download_format <fastq|fastq_gz>
                        Downloaded data format. (Default: fastq_gz)

  === Putative paralogs filtering control (Stage 2) ===
  -seqs_min_length <INT>         
                        Minimum sequence length for filtered paralogs. (Default: 0)
                        Putative paralogs shorter than this value will be filtered.             
  -seqs_mean_length_ratio <0-1>    
                        Minimum sequence length ratio relative to the mean value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_max_length_ratio <0-1>              
                        Minimum length ratio relative to the longest value per locus for putative paralogs. (Default: 0)
                        Putative paralogs shorter than this percentage of the maximum length will be filtered.
  -seqs_min_sample_coverage <0-1>           
                        Minimum sample coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, HRS and RLWP sequences in stage 3, loci lower than this sample coverage will be filtered.
  -seqs_min_locus_coverage <0-1>            
                        Minimum locus coverage for putative paralogs. (Default: 0)
                        For all putative paralogs in stage 2, taxa (samples) with lower than this locus coverage will be filtered.

  === Heatmap control (Stage 2&3) ===
  -heatmap_color {black,blue,red,green,purple,orange,yellow,brown,pink}
                        Color scheme for heatmap gradient. (Default: black)

  === Paralog handling control (Stage 3) ===
  -PH <1-7|a|b|all>     Paralog handling methods to execute: (one or more of them can be chosen)
                        1: HRS, 2: RLWP, 3: LS, 4: MI, 5: MO, 6: RT, 7: 1to1
                        a: PhyloPyPruner, b: ParaGone (Default: 1a)
  
  === Sequences and alignments filtering control (Stage 3) ===
  -seqs_min_length <INT>
                        Minimum sequence bp length for filtering HRS and RLWP sequences. (Default: 0)
                        HRS and RLWP sequences shorter than this value will be removed.
  -aln_min_length <INT> 
                        Minimum sequence bp length for filtering HRS and RLWP final alignments. (Default: 4)
  -aln_min_sample <INT>
                        Minimum sample number for final alignments. (Default: 5)
                        Final alignments (aligned and trimmed) with sample number below this threshold will be removed.

  === Alignments trimming tool control (Stage 3) ===
  -trim_tool <1/2>      Choose the software to trim/clean alignments. (1: trimAl; 2: HMMCleaner) (Default: 1)
  
  === Gene trees builder control (Stage 3&4) ===
  -gene_tree <1/2>      Choose the software to construct paralogs gene trees. (1: IQ-TREE; 2: FastTree) (Default: 1) 
  -gene_tree_bb <INT>   Choose the bootstrap value for paralogs gene trees inference. (Default: 1000)

  === Species tree builder control (Stage 4) ===
  -sp_tree <1-5|all>    Species tree inference method: (Default: 1)
                        1: IQ-TREE, 2: RAxML, 3: RAxML-NG, 4: ASTRAL-IV, 5: wASTRAL, 6: ASTRAL-Pro
  
  === Steps control in stage 4 ===
  -run_coalescent_step  <INT> 
                        Control which coalescent analysis steps to run:
                        1: Construct single gene trees, 2: Combine and collapse gene trees, 3: Infer species tree, 4: Reroot gene trees, 5: PhyParts concordance analysis
                        (Default: 1234)
  -run_concatenated_step <INT> 
                        Control which concatenated analysis steps to run:
                        1: Construct concatenated alignment, 2: Infer species tree
                        (Default: 12)
  
  === Nucleotide ambiguity character replacement (Stage 3&4) ===
  -replace_n <TRUE|FALSE>
                        Replace ambiguous characters ('n', 'N', '?') with gaps ('-') in alignment files. (Default: FALSE)
                        Note: Recommended for phylogenetic software compatibility (e.g., IQ-TREE, trimAl).

  === Gene trees collapse threshold ===
  -collapse_threshold <VALUE>
                        Specify the minimum support value threshold for internal nodes in gene trees. (Default: 0)
                        Nodes with support values ≤ this threshold will be collapsed into polytomies.
  
  === Threads Control ===
  -nt <INT|AUTO>        Global thread setting. (Default: 1)
  -nt_fasterq_dump <INT>               
                        fasterq-dump threads. (Default: 1)
  -nt_pigz <INT>        pigz compression threads. (Default: 1)
  -nt_trimmomatic <INT> Trimmomatic threads. (Default: 1)
  -nt_hybpiper <INT>    HybPiper threads (Default: 1)
  -nt_paragone <INT>    ParaGone threads. (Default: 1)
  -nt_phylopypruner <INT>              
                        PhyloPyPruner threads. (Default: 1)
  -nt_mafft <INT>       MAFFT threads. (Default: 1)
  -nt_amas <INT>        AMAS.py threads. (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads. (Default: 1)
  -nt_iqtree <INT>      IQ-TREE threads. (Default: 1)
  -nt_fasttree <INT>    FastTree threads. (Default: 1)
  -nt_modeltest_ng <INT>               
                        ModelTest-NG threads (Default: 1)
  -nt_raxml_ng <INT>    RAxML-NG threads (Default: 1)
  -nt_raxml <INT>       RAxML threads (Default: 1)
  -nt_astral4 <INT>     ASTRAL-IV threads (Default: 1)
  -nt_wastral <INT>     wASTRAL threads (Default: 1)
  -nt_astral_pro <INT>  ASTRAL-Pro3 threads (Default: 1)

  === Parallel Control ===
  -process <INT|all>    Number of subprocess to run concurrently. (Default: 1)
                        "all" means running all subprocesses concurrently. (be cautious to set this option)
                        The related steps are: 
                        Stage 1: public data downloading and raw reads trimming;
                        Stage 2: data assembly ('hybpiper assemble');
                        Stage 3: multiple sequences aligning, alignments trimming, and gene trees inference;
                        Stage 4: gene trees inference in coalescent analysis.

  === Logfile Control ===
  -log_mode <simple|cmd|full>
                        The output mode of hybsuite logfile. (Default: cmd)

Arguments for integrated tools :
  === SRAToolkit (Stage 1) ===
  -sra_maxsize <NUM>    The maximum size of sra files to download. (Default: 20GB)

  === Trimmomatic (Stage 1) ===
  -trimmomatic_leading_quality <3-40>
                        Leading base quality cutoff. (Default: 3)
  -trimmomatic_trailing_quality <3-40> 
                        Trailing base quality cutoff. (Default: 3)
  -trimmomatic_min_length <36-100>
                        Minimum read length. (Default: 36)
  -trimmomatic_sliding_window_s <4-10> 
                        Sliding window size. (Default: 4)
  -trimmomatic_sliding_window_q <15-30>
                        Window average quality. (Default: 15)

  === HybPiper (Stage 2 & 3) ===
  -hybpiper_mapping_tool <blast|diamond>     
                        The tool used for mapping reads to targets in HybPiper (only for protein targets) (Default: blast)
  -hybpiper_check_chimeric_contigs	<FALSE|TRUE>
                        Check whether a stitched contig is a potential chimera of contigs from multiple paralogs when running "hybpiper assemble". (Default: FALSE)
  -hybpiper_cov_cutoff <INT>
                        Specify the value of "-cov_cutoff" when running "hybpiper assemble" in Stage 2. (Default: 8)
                        Increasing this value may increase the loci recovery efficiency but potentially introducing errors.
  -hybpiper_skip_chimeric_genes <FALSE|TRUE>
                        Whether to recover sequences for putative chimeric genes when running "hybpiper retrieve_sequences" (HRS method) in Stage 3. (Default: FALSE)
  -hybpiper_retrieved_seqs_type <dna|intron|supercontig>
                        The type of sequence to extract when running "hybpiper retrieve_sequences" in Stage 3.
  
  === PhyloPyPruner (Stage 3) ===
  -pp_min_taxa <INT>    Minimum taxa per cluster. (Default: 4)
  -pp_min_support <0-1> Minimum support value. (Default: 0=auto)
  -pp_trim_lb <INT>     Trim long branches. (Default: 5)

  === ParaGone (Stage 3) ===  
  -paragone_pool <INT>  Parallel alignment tasks. (Default: 1, same as the option '-process')
  -treeshrink_q_value <0-1>        
                        TreeShrink quantile threshold (Default: 0.05)
  -paragone_cutoff_value <FLOAT>       
                        Branch length cutoff (Default: 0.3)
  -paragone_minimum_taxa <INT>         
                        Minimum taxa per alignment (Default: 4)
  -paragone_min_tips <INT>             
                        Minimum tips per tree (Default: 4)
  
  === TreeShrink (Stage 3) ===
  -treeshrink_q_value <0-1>        
                        TreeShrink quantile threshold (Default: 0.05)

  === MAFFT (Stage 3) ===  
  -mafft_algorithm <str>               
                        MAFFT algorithm [auto|linsi] (Default: auto)
  -mafft_adjustdirection <TRUE/FALSE>  
                        Whether to adjust sequence directions (Default: TRUE)
  -mafft_maxiterate <INT>              
                        Maximum number of iterations for MAFFT (Default: auto)
                        Specifies the maximum number of iterations MAFFT will perform during multiple sequence alignment. Higher iteration counts may improve alignment accuracy but will increase computation time.
  -mafft_pair <str>                    
                        Pairing strategy for MAFFT (Default: auto)
                        Specifies the pairing strategy used by MAFFT during multiple sequence alignment. Options include auto, localpair, globalpair, etc. Choosing the appropriate strategy can affect the alignment results and efficiency.
  
  === trimAl (Stage 3) ===
  -trimal_mode <str>                   
                        trimAl mode [automated1|strict|strictplus|gappyout|nogaps|noallgaps] (Default: automated1)
  -trimal_gapthreshold <0-1>           
                        Gap threshold (Default: 0.12)
  -trimal_simthreshold <0-1>           
                        Similarity threshold (Default: auto)
  -trimal_cons <0-100>                 
                        Consensus threshold (Default: auto)
  -trimal_block <INT>                  
                        Minimum block size (Default: auto)
  -trimal_w <INT>                      
                        Window size (Default: auto)
  -trimal_gw <INT>                     
                        Gap window size (Default: auto)
  -trimal_sw <INT>                     
                        Similarity window size (Default: auto)
  -trimal_resoverlap <0-1>             
                        Minimum overlap of a positions with other positions in the column. (Default: auto) 
  -trimal_seqoverlap <0-100>           
                        Minimum percentage of sequences without gaps in a column. (Default: auto)
  
  === HMMCleaner (Stage 3) ===
  -hmmcleaner_cost <NUM1_NUM2_NUM3_NUM4>
                        Cost parameters that defines the low similarity segments detected by HmmCleaner. (Default: -0.15_-0.08_0.15_0.45) 
                        Users can change each value but they have to be in increasing order. (NUM1 < NUM2 < 0 < NUM3 < NUM4)
  
  === IQ-TREE (Stage 4) ===
  -iqtree_bb <INT>      IQ-TREE bootstrap replicates (Default: 1000)
  -iqtree_alrt <INT>    SH-aLRT replicates (Default: 1000)
  -iqtree_run_option <str>      
                        IQ-TREE run mode [standard|undo] (Default: undo)
  -iqtree_partition <TRUE/FALSE>       
                        Whether to use partition models in IQ-TREE (Default: TRUE)
  -iqtree_constraint_tree <Treefile>           
                        The pathway to the constraint tree for running IQ-TREE (Default: none)

  === ModelTest-NG (Stage 4) ===
  -run_modeltest_ng <TRUE/FALSE>       
                        Whether to run ModelTest-NG (Default: TRUE)

  === RAxML (Stage 4) ===
  -raxml_m <str>        RAxML model [GTRGAMMA|PROTGAMMA] (Default: GTRGAMMA)
  -raxml_bb <INT>       RAxML bootstrap replicates (Default: 1000)
  -raxml_constraint_tree <Treefile>              
                        The pathway to the constraint tree for running RAxML (Default: no constraint tree)

  === RAxML-NG (Stage 4) ===
  -rng_bs_trees <INT>   RAxML-NG bootstrap replicates (Default: 1000)
  -rng_force <TRUE/FALSE>              
                        Ignore thread warnings (Default: FALSE)
  -rng_constraint_tree <Treefile>                
                        The pathway to the constraint tree for running RAxML-NG (Default: no constraint tree)

  === ASTRAL-IV (Stage 4) ===
  -astral4_root         Outermost (most distant) outgroup taxon name for ASTRAL-IV branch length calculation. (Default: none)
                        (Strongly recommended for accurate branch length estimation. Specify only the single outermost outgroup.)
  -astral_r <INT>       ASTRAL-IV rounds of search. (Default: 4)
  -astral_s <INT>       ASTRAL-IV rounds of subsampling. (Default: 4)

  === wASTRAL (Stage 4) ===
  -wastral_mode <1-4>   wASTRAL mode [1|2|3|4] (Default: 1)
                        1: hybrid weighting, 2: support only, 3: length only, 4: unweighted
  -wastral_r <INT>      wASTRAL rounds of search. (Default: 4)
  -wastral_s <INT>      wASTRAL rounds of subsampling. (Default: 4)

  === ASTRAL-Pro ===
  -astral_pro_r <INT>   ASTRAL-Pro rounds of search. (Default: 4)
  -astral_pro_s <INT>   ASTRAL-Pro rounds of subsampling. (Default: 4)

  === PhyPartsPieCharts & modified_phypartspiecharts (Stage 4) ===
  -run_phyparts <TRUE|FALSE>
                        Enable/disable PhyParts concordance analysis and modified pie chart visualization. (Default: TRUE)
                        Note: Requires successful completion of previous coalescent analysis.
  -phypartspiecharts_tree_type <cladogram/circle>
                        The tree type of displaying when running modified_phypartspiecharts.py (Default: cladogram)
  -phypartspiecharts_num_mode <num>
                        Control what numbers to show on branches (specify 0-2 digits) (Default: 12)
                        0: Hide all numbers
                        1: Number of genes supporting species tree (blue)
                        2: Number of genes conflicting with species tree (red+green)
                        3: Number of genes with no signal (gray)
                        4: Proportion of supporting genes (blue/total)
                        5: Proportion of conflicting genes ((red+green)/total)
                        6: Proportion of no signal genes (gray/total)
                        7: Ratio of supporting to all signal genes (blue/(blue+red+green))
                        8: Ratio of conflicting to all signal genes ((red+green)/(blue+red+green))
                        9: Original node support values from the input tree

Command example :
  === Run the full pipeline with all paralog-handling methods and all species trees inference approaches ===
  hybsuite full_pipeline \
  -input_list ./Input_list.txt \
  -input_data ./Input_data \
  -t Angiosperms353.fasta \
  -PH 1234567 \
  -sp_tree 12345 \
  -output_dir ./ \
  -nt 5 -process 5
  
  === Run the full pipeline with only tree-based orthology inference methods (MO/MI/RT/1to1) in ParaGone and ASTRAL-IV ===
  hybsuite full_pipeline \
  -input_list ./Input_list.txt \
  -input_data ./Input_data \
  -t Angiosperms353.fasta \
  -PH 4567b \
  -sp_tree 4 \
  -output_dir ./ \
  -nt 5 -process 5

Full parameters

Parameters for running hybsuite stage1

Parameters for running hybsuite stage2

Parameters for running hybsuite stage3

Parameters for running hybsuite stage4

Parameters for running hybsuite full_pipeline

Parameters for running `hybsuite stage1`

Parameters for running `hybsuite stage2`

Parameters for running `hybsuite stage3`

Parameters for running `hybsuite stage4`

Parameters for running `hybsuite full_pipeline`