Output files
Output File Naming Conventions
| Placeholder | Represents |
|---|---|
<PH> | Any of the 7 orthology inference methods: HRS, RLWP, LS, MI, MO, RT, 1to1 |
<taxon> | Taxon name (e.g., <taxon1>, <taxon2>, etc.) from your sample list file |
<prefix> | User-specified output prefix (via -prefix option) |
<locus_name> | Target sequence locus (e.g., <locus_name1>, <locus_name2>, etc.) |
The specific explanation of every output file is illustrated as follows.
Stage1 output
<NGS_dataset> (specified by -d)
A directory containing next-generation raw sequencing data downloaded from public databases, existing raw reads provided by the user, and clean data produced by Trimmomatic-0.39.
<NGS_dataset>/
├── 01-Downloaded_raw_data/
├── 02-Downloaded_clean_data/
└── 03-My_clean_data/
<NGS_dataset> -> 01-Downloaded_raw_data
A directory containing next-generation raw sequencing data downloaded from public databases.
<NGS_dataset>/
└── 01-Downloaded_raw_data/
├── 01-Raw-reads_sra/
└── 02-Raw-reads_fastq_gz/
<NGS_dataset> -> 01-Downloaded_raw_data -> 01-Raw-reads_sra
A directory containing raw sequencing data downloaded from NCBI in .sra format.
<NGS_dataset>/
└── 01-Downloaded_raw_data/
└── 01-Raw-reads_sra/
├── <taxon>.sra
...
<taxon>.sra: File with raw sequencing data in SRA format.
By default, all
*.srafiles in this directory will be removed after converting them intofastqformat to save space, unless you specify the option-rm_sraasFALSEto keep them.
<NGS_dataset> -> 01-Downloaded_raw_data -> 02-Raw-reads_fastq_gz
A directory containing raw sequencing data in .fastq or .fastq.gz format.
<NGS_dataset>/
└── 01-Downloaded_raw_data/
└── 02-Raw-reads_fastq_gz/
├── <taxon>.fastq.gz or <taxon>.fastq
...
<taxon>.fastq.gzor<taxon>.fastq:
If the user specifies the option
-download_formatasfastq, pigz will not be used to compress the original.fastqfiles to.fastq.gzfiles, which will produce<taxon>.fastqin this folder.
If the user specifies the option-download_formatasfastq.gz, pigz will be used to compress the original.fastqfiles to.fastq.gzfiles, which will produce<taxon>.fastq.gzin this folder.
Default:-download_formatis specified asfastq.gz.
<NGS_dataset> -> 02-Downloaded_clean_data
A directory containing sequencing data cleaned from downloaded public raw reads.
<NGS_dataset>/
└── 02-Downloaded_clean_data/
├── <taxon>_1_clean.paired.fq.gz
├── <taxon>_2_clean.paired.fq.gz
├── <taxon>_1_clean.unpaired.fq.gz
├── <taxon>_2_clean.unpaired.fq.gz
├── <taxon>_clean.single.fq.gz
├── <taxon>_clean.single.fq.gz
...
<taxon>_1_clean.paired.fq.gz&<taxon>_2_clean.paired.fq.gz
Files with compressed cleaned and paired sequencing data (paired-end type) infq.gzformat (these files will be used for downstream analysis).<taxon>_1_clean.unpaired.fq.gz&<taxon>_2_clean.unpaired.fq.gz
Files with compressed cleaned and unpaired sequencing data (paired-end type) infq.gzformat.<taxon>_clean.single.fq.gz
File with compressed cleaned sequencing data (single-end type) infq.gzformat (these files will be used for downstream analysis).
<NGS_dataset> -> 03-My_clean_data
A directory containing user-provided cleaned sequencing data or sequencing data cleaned from user-provided raw data.
<NGS_dataset>/
└── 03-My_clean_data/
├── <taxon>_1_clean.paired.fq.gz
├── <taxon>_2_clean.paired.fq.gz
├── <taxon>_1_clean.unpaired.fq.gz
├── <taxon>_2_clean.unpaired.fq.gz
├── <taxon>_clean.single.fq.gz
...
<taxon>_1_clean.paired.fq.gz&<taxon>_2_clean.paired.fq.gz
Files with user-provided compressed cleaned and paired sequencing data (paired-end type) infq.gzformat (these files will be used for downstream analysis).<taxon>_1_clean.unpaired.fq.gz&<taxon>_2_clean.unpaired.fq.gz
Files with user-provided compressed cleaned and unpaired sequencing data (paired-end type) infq.gzformat.<taxon>_clean.single.fq.gz
File with user-provided compressed cleaned sequencing data (single-end type) infq.gzformat (these files will be used for downstream analysis).
Stage2 output
01-Assembled_data
A directory containing assembled sequence data produced by hybpiper assemble command in HybPiper.
01-Assembled_data/
├── Assembled_data_namelist.txt
├── Old_assembled_data_namelist_<current_time>.log
├── <taxon>/
...
Assembled_data_namelist.txtA file containing sample names used as input to run thehybpiper assemblecommand.Old_assembled_data_namelist_<current_time>.logA file containing previous sample names used as input to run thehybpiper assemblecommand.<taxon>More details can be found here.
02-All_paralogs
A directory containing all original putative paralogs retrieved by the hybpiper paralog_retriever command in HybPiper, filtered paralogs, along with their paralog heatmaps and related statistical results.
02-All_paralogs/
├── 01-Original_paralogs
├── 02-Original_paralog_reports_and_heatmap
├── 03-Filtered_paralogs
└── 04-Filtered_paralog_reports_and_heatmap
02-All_paralogs -> 01-Original_paralogs
A directory containing all original putative paralogs retrieved by the hybpiper paralog_retriever command in HybPiper.
02-All_paralogs/
└── 01-Original_paralogs/
└── <locus_name>_paralogs_all.fasta
<locus_name>_paralogs_all.fasta: FASTA files for each sample/locus, containing all putative paralogs recovered by the HybPiperhybpiper paralog_retrievercommand.
02-All_paralogs -> 02-Original_paralog_reports_and_heatmap
A directory containing all original reports and heatmaps.
02-All_paralogs/
└── 02-Original_paralog_reports_and_heatmap/
├── Original_paralog_heatmap.png
├── Original_paralog_report.tsv
├──Original_recovered_seqs_length.tsv
└── Original_recovery_heatmap.html
Original_paralog_heatmap.png
A heatmap image file in PNG format, depicting the number of original putative paralog sequences for each locus/sample.Original_paralog_report.tsv
A TSV file recording the number of original putative paralog sequences for each locus/sample.Original_recovered_seqs_length.tsv
A TSV file recording the length of original recovered sequences for each locus/sample.Original_recovery_heatmap.html
An interactive HTML file for visualizing target locus recovery across all original paralogs (including both single-copy and multi-copy genes).
Here is a recovery heatmap example file you can play with: it shows a recovery result of the Angiosperms353 (Johnson et al., 2019) loci from 10 Elaeagnaceae species in our example dataset.
- The blue bars along with x- and y-axes indicate how many loci are recovered in each sample and how many samples each locus are recovered in, respectively.
- The color intensity of each cell indicates the proportion of gene length recovered for a given sample (y-axis) at a specific target locus (x-axis). When multiple sequences are recovered for a locus within a sample (putative paralogs), only the longest sequence is retained for visualization in the heatmap.
Now, let’s play with this interactive html file for fun and better effect!
- Choose the button “Sort by” as “Descending” to sort samples and loci on the heatmap from high to low recovery.
- Click on the “Plus” (+) and “Minus” (-) icons in the upper right corner to zoom in and out of the heatmap.
- Click on the “AutoScale” icon in the upper right corner to auto-scale the heatmap.
- Click the “Camera” (📷) icon in the upper right corner to download the current heatmap view as a PNG file.
- If some samples recover very few or no loci, we recommend replacing their data sources or increasing the value of
-seqs_min_loci_coverageto exclude these low-quality samples from downstream analyses.
02-All_paralogs -> 03-Filtered_paralogs
A directory containing all filtered putative paralogs retrieved by the hybpiper paralog_retriever command in HybPiper.
02-All_paralogs/
└── 03-Filtered_paralogs/
└── <locus_name>_paralogs_all.fasta
02-All_paralogs -> 04-Filtered_paralog_reports_and_heatmap
A directory containing all filtered reports and heatmaps.
02-All_paralogs/
└── 04-Filtered_paralog_reports_and_heatmap/
├── Filtered_paralog_heatmap.png
└── Filtered_paralog_report.tsv
Filtered_paralog_heatmap.png
A heatmap image file in PNG format, depicting the number of filtered putative paralog sequences for each locus/sample.Filtered_paralog_report.tsv
A TSV file recording the number of filtered putative paralog sequences for each locus/sample.Filtered_recovered_seqs_length.tsv
A TSV file recording the length of original recovered sequences for each locus/sample.Filtered_recovery_heatmap.html
An interactive HTML file for visualizing target locus recovery across all filtered paralogs (including both single-copy and multi-copy genes).
The layout is identical to that ofOriginal_recovery_heatmap.png, but it reflects the occupancy of filtered sequences rather than the original ones.
Stage3 output
03-Paralog_handling
03-Paralog_handling/
├── HRS/ (optional)
├── RLWP/ (optional)
├── ParaGone/ (optional)
└── PhyloPyPruner/ (optional)
Different arguments for the option
-PHwill lead to different subdirectories in this output folder:HRS/: created when-PHincludes number 1 (the user applies the HRS orthology inference method)RLWP/: created when-PHincludes number 2 (the user applies the RLWP orthology inference method)PhyloPyPruner/: created when-PHincludes one or several numbers of4, 5, 6, and 7(using MI/MO/RT/1to1 orthology inference methods) and “a” (default), or includes number “3” (directly running PhyloPyPruner to carry out LS method). More details about the interpretation of these orthology inference methods can be found hereParaGone/: created when-PHincludes one or several numbers of4, 5, 6, and 7(the user applies MI/MO/RT/1to1 orthology inference methods) and “b” (running ParaGone rather than PhyloPyPruner). More details about the interpretation of these orthology inference methods can be found here
For example:
- Using
-PH 12will createHRSandRLWPdirectories. - Using
-PH 1234bwill createHRS,RLWP, andParaGonedirectories.
- Using
03-Paralog_handling -> HRS
A directory containing original and filtered HRS sequences, including the recovery heatmap and filtering reports.
03-Paralog_handling/
└── HRS/
├── 01-Original_HRS_sequences
└── <locus_name>.FNA
├── 02-Original_HRS_sequences_reports_and_heatmap
├── Original_HRS_heatmap.png
└── Original_HRS_seq_lengths.tsv
├── 03-Filtered_HRS_sequences
└── <locus_name>.FNA
└── 04-Filtered_HRS_sequences_reports_and_heatmap
├── Filtered_HRS_heatmap.png
├── Filtered_HRS_seq_lengths.tsv
├── Removed_HRS_seqs_with_low_length_info.tsv
├── Removed_samples_with_low_locus_coverage_info.tsv
└── Removed_loci_with_low_sample_coverage_info.tsv
01-Original_HRS_sequences
<locus_name>.FNA
Files with retrieved sequences in FASTA format, produced byhybpiper retrieve_sequences(referred to as HRS sequences in the following).
Notes:
- In the HybSuite pipeline, supercontigs are automatically retrieved, including introns and exons, for downstream analysis. HybSuite doesn’t support retrieving only introns or exons.
- Since the downstream analysis requires DNA sequences, only DNA sequences can be retrieved; protein sequences are not supported for the next stage.
02-Original_HRS_sequences_reports_and_heatmap
Original_HRS_heatmap.png
A heatmap image file in PNG format, depicting the length of the original HRS sequences for each gene and sample, relative to the mean length (default setting; users can customize by runningplot_recovery_heatmap.py) of the sequences in the target file, produced byplot_recovery_heatmap.pyin HybSuite.Original_HRS_seq_lengths.tsv
A TSV file recording all original HRS sequences’ bp length, length ratio relative to the maximum, and mean length of each locus’ sequences in the target file.
03-Filtered_HRS_sequences
<locus_name>.FNA
Files with filtered HRS sequences in FASTA format, produced byhybpiper retrieve_sequences.
04-Filtered_HRS_sequences_reports_and_heatmap
Filtered_HRS_heatmap.png
A heatmap image file in PNG format, depicting the length of the filtered HRS sequences for each gene and sample, relative to the mean length (default setting; users can customize by runningplot_recovery_heatmap.py) of the sequences in the target file, produced byplot_recovery_heatmap.pyin HybSuite.Filtered_HRS_seq_lengths.tsv
A TSV file recording all filtered HRS sequences’ bp length, length ratio relative to the maximum, and mean length of each locus’ sequences in the target file.Removed_HRS_seqs_with_low_length_info.tsv
A TSV file recording the information of the HRS sequences with low bp length/length ratio that have been filtered out from the dataset.Removed_samples_with_low_locus_coverage_info.tsv
A TSV file recording the information of the samples with low locus coverage that have been filtered out from the dataset.Removed_loci_with_low_sample_coverage_info.tsv
A TSV file recording the information of the loci with low sample coverage that have been filtered out from the dataset.
03-Paralog_handling -> RLWP
A directory containing original and filtered RLWP sequences, including the recovery heatmap and filtering reports.
03-Paralog_handling/
└── RLWP/
├── 01-Original_RLWP_sequences
└── <locus_name>.FNA
├── 02-Original_RLWP_sequences_reports_and_heatmap
├── Original_RLWP_heatmap.png
└── Original_RLWP_seq_lengths.tsv
├── 03-Filtered_RLWP_sequences
└── <locus_name>.FNA
└── 04-Filtered_RLWP_sequences_reports_and_heatmap
├── Filtered_RLWP_heatmap.png
├── Filtered_RLWP_seq_lengths.tsv
├── Removed_RLWP_seqs_with_low_length_info.tsv
├── Removed_samples_with_low_locus_coverage_info.tsv
└── Removed_loci_with_low_sample_coverage_info.tsv
01-Original_RLWP_sequences
<locus_name>.FNA
Files with retrieved sequences in FASTA format, produced byhybpiper retrieve_sequences(referred to as RLWP sequences in the following).
Notes:
- In the HybSuite pipeline, supercontigs are automatically retrieved, including introns and exons, for downstream analysis. HybSuite doesn’t support retrieving only introns or exons.
- Since the downstream analysis requires DNA sequences, only DNA sequences can be retrieved; protein sequences are not supported for the next stage.
02-Original_RLWP_sequences_reports_and_heatmap
Original_RLWP_heatmap.png
A heatmap image file in PNG format, depicting the length of the original RLWP sequences for each gene and sample, relative to the mean length (default setting; users can customize by runningplot_recovery_heatmap.py) of the sequences in the target file, produced byplot_recovery_heatmap.pyin HybSuite.Original_RLWP_seq_lengths.tsv
A TSV file recording all original RLWP sequences’ bp length, length ratio relative to the maximum, and mean length of each locus’ sequences in the target file.
03-Filtered_RLWP_sequences
<locus_name>.FNA
Files with filtered RLWP sequences in FASTA format, produced byhybpiper retrieve_sequences.
04-Filtered_RLWP_sequences_reports_and_heatmap
Filtered_RLWP_heatmap.png
A heatmap image file in PNG format, depicting the length of the filtered RLWP sequences for each gene and sample, relative to the mean length (default setting; users can customize by runningplot_recovery_heatmap.py) of the sequences in the target file, produced byplot_recovery_heatmap.pyin HybSuite.Filtered_RLWP_seq_lengths.tsv
A TSV file recording all filtered RLWP sequences’ bp length, length ratio relative to the maximum, and mean length of each locus’ sequences in the target file.Removed_RLWP_seqs_with_low_length_info.tsv
A TSV file recording the information of the RLWP sequences with low bp length/length ratio that have been filtered out from the dataset.Removed_samples_with_low_locus_coverage_info.tsv
A TSV file recording the information of the samples with low locus coverage that have been filtered out from the dataset.Removed_loci_with_low_sample_coverage_info.tsv
A TSV file recording the information of the loci with low sample coverage that have been filtered out from the dataset.
03-Paralog_handling -> ParaGone
03-Paralog_handling/
└── ParaGone/
├── 00_logs_and_reports
...
├── 28_RT_final_alignments_trimmed
└── HybSuite_1to1_final_alignments
- From
00_logs_and_reportsto28_RT_final_alignments_trimmed: More details about these output folders can be found on this wiki page of ParaGone. - If the user specifies the
-paragone_keep_filesoption in HybSuite asTRUE, the intermediate folders from01_input_paralog_fastato22_RT_stripped_nameswill be kept. If the user specifies the-paragone_keep_filesoption asFALSE, intermediate folders will be removed. HybSuite_1to1_final_alignments: A directory containing orthology group alignments produced via the 1to1 algorithm, which were retrieved from results produced by ParaGone.
03-Paralog_handling -> PhyloPyPruner
03-Paralog_handling/
└── PhyloPyPruner/
├── Input
├── Output_LS
├── Output_MI
├── Output_MO
├── Output_RT
└── Output_1to1
Input
A directory containing trimmed alignments of each locus and their gene trees (input files for running PhyloPyPruner).<locus_name>_paralogs_all.aln.trimmed.fasta
The trimmed alignment of locus<locus_name>from02-All_paralogs/03-Filtered_paralogs/<locus_name>_paralogs_all.fasta, generated by MAFFT and TrimAl.<locus_name>_paralogs_all.aln.trimmed.fasta.tre
The gene tree of locus<locus_name>, constructed by FastTree.Output_<PH>
A directory containing PhyloPyPruner output files for the<PH>algorithm (<PH>includesLS,MI,MO,RT,1to1; more details can be found here).
04-Alignments
A directory containing alignments produced by different paralog-handling methods specified by the user. These alignments are then trimmed and filtered in stage 3.
04-Alignments/
└── <PH>/
└──<ortholog_group_name>.*.aln.fasta
<PH>/<ortholog_group_name>.*.aln.fasta
The alignments which are inferred via the<PH>paralog-handling method and multiple sequence alignment by MAFFT.
NOTE:
<ortholog_group_name>is the name of the ortholog group inferred by the<PH>algorithm. For example,4757_1and4757_2are inferred ortholog group names from locus4757.
05-Trimmed_alignments
A directory containing trimmed alignments inferred via different <PH> paralog-handling methods.
05-Trimmed_alignments/
└── <PH>/
└── <ortholog_group_name>.*.aln.trimmed.fasta
<PH>/<ortholog_group_name>.*.aln.trimmed.fasta
The alignments which are inferred via the<PH>paralog-handling method, aligned using MAFFT, and trimmed via TrimAl or cleaned via HMMCleaner.
NOTE:
<ortholog_group_name>is the name of the ortholog group inferred by the<PH>algorithm. For example,4757_1and4757_2are inferred ortholog group names from locus4757.
06-Final_alignments
A directory containing final <PH> orthogroup alignments ready for downstream species tree inference.
06-Final_alignments/
└── <PH>/
└── <ortholog_group_name>.*.aln.trimmed.fasta
<PH>/<ortholog_group_name>.*.aln.trimmed.fastaFinal<PH>orthogroup alignments for downstream species tree inference (stage4).
Stage4 output
07-Concatenated_analysis
A directory containing concatenated analysis results.
07-Concatenated_analysis/
<PH>
├── 01-Supermatrix
├── partition.txt
└── <prefix>_<PH>.fasta
└── 02-Species_tree
├── IQTREE
└── IQ-TREE*
├── RAxML
└── RAxML*
├── RAxML-NG
└── RAxML-NG*
├── <prefix>_<PH>_ModelTest_NG.txt.tree
├── <prefix>_<PH>_ModelTest_NG.txt.log
├── <prefix>_<PH>_ModelTest_NG.txt.out
└── <prefix>_<PH>_ModelTest_NG.txt.ckp
07-Concatenated_analysis -> <PH> -> 01-Supermatrix
A directory containing the supermatrix concatenated from <PH> orthogroup alignments and the partition file.
<prefix>_<PH>.fasta
The concatenated supermatrix file for orthology groups inferred by the<PH>method.partition.txt
The partition file for concatenation.
07-Concatenated_analysis -> <PH> -> 02-Species_tree
IQ-TREE/
A directory containing IQ-TREE results and final rooted trees (created only when IQ-TREE is applied by setting -sp_tree 1).
IQ-TREE_<prefix>_<PH>.*
IQ-TREE intermediate output files.IQ-TREE_<prefix>_<PH>.treefile
The tree file with branch lengths and bootstrap values, generated by IQ-TREE.IQ-TREE_<prefix>_<PH>.rr.tre
The rerooted tree file with branch lengths and bootstrap values from IQ-TREE results.
RAxML/
A directory containing RAxML results and final rooted trees (created only when RAxML is applied by setting -sp_tree 2).
RAxML_*.<prefix>_<PH>.*
RAxML intermediate output files.RAxML_<prefix>_<PH>.rr.tre
The rerooted tree file with branch lengths and bootstrap values from RAxML results.
RAxML-NG/
A directory containing RAxML-NG results and final rooted trees (created only when RAxML-NG is applied by setting -sp_tree 3).
RAxML-NG_<prefix>_<PH>.raxml.*
RAxML-NG intermediate output files.RAxML-NG_<prefix>_<PH>.rr.tre
The rerooted tree file with branch lengths and bootstrap values from RAxML-NG results.
07-Concatenated_analysis -> <PH> -> <prefix>_<PH>_ModelTest_NG.txt.*
The output files generated by ModelTest-NG.
08-Coalescent_analysis
A directory containing coalescent analysis results.
08-Coalescent_analysis/
├── <PH>
├── 01-Gene_trees
├── 02-Combined_gene_trees
├── 03-Species_tree
├── 04-Rerooted_gene_trees
└── 05-PhyParts_PieCharts
└── ASTRAL-Pro
├── 01-Gene_trees
├── 02-Combined_gene_trees
├── 03-Species_tree
└── 04-Rerooted_gene_trees
08-Coalescent_analysis -> <PH>
A directory containing coalescent-based phylogenetic tree results for a specific dataset generated by the <PH> paralog-handling method.
08-Coalescent_analysis -> <PH> -> 01-Gene_trees
A directory containing gene trees inferred from final <PH> alignments.
<ortholog_group_name>.tre: The gene tree for locus/orthogroup<ortholog_group_name>.
NOTE:
<ortholog_group_name>is the name of the ortholog group inferred by the<PH>algorithm. For example, ortholog group names4757_1and4757_2are inferred from locus4757.
08-Coalescent_analysis -> <PH> -> 02-Combined_gene_trees
A directory containing combined gene trees generated from <PH> alignments.
Combined_gene_trees.tre: File containing all gene trees combined into a single file.Combined_gene_trees.tre.collapsed: File containing all gene trees with low-support branches collapsed.
08-Coalescent_analysis -> <PH> -> 03-Species_tree
A directory containing species trees inferred from <PH> alignments.
ASTRAL-IV/
A directory containing the final species tree for the <PH> dataset, generated by ASTRAL-IV.
ASTRAL-IV_<prefix>_<PH>.log
The log file generated by ASTRAL-IV.ASTRAL-IV_<prefix>_<PH>.tre
The species tree inferred by ASTRAL-IV from the combined gene trees.ASTRAL-IV_<prefix>_<PH>.bootstrap.tre
The species tree generated by ASTRAL-IV and bootstrapped using ASTRAL-III, following the ASTER protocol.ASTRAL-IV_<prefix>_<PH>.bootstrap.rr.tre
The rerooted species tree generated by ASTRAL-IV and bootstrapped using ASTRAL-III.ASTRAL-III_LPP.log
The ASTRAL-III log file which documents the bootstrapping process performed by ASTRAL-III.
wASTRAL/
A directory containing the final species tree for the <PH> dataset, generated by wASTRAL.
wASTRAL_<prefix>_<PH>.tre
The species tree inferred by wASTRAL from the combined gene trees.wASTRAL_<prefix>_<PH>.log
The log file generated by wASTRAL.wASTRAL_<prefix>_<PH>.rr.tre
The rerooted species tree generated by wASTRAL.
08-Coalescent_analysis -> <PH> -> 04-Rerooted_gene_trees
A directory containing rerooted gene trees from the <PH> dataset.
<ortholog_group_name>.rr.tre
File containing the rerooted gene tree for<ortholog_group_name>alignments in the<PH>dataset, generated using Phyx or the MAD method.
08-Coalescent_analysis -> <PH> -> 05-PhyParts_PieCharts
A directory containing phylogenetic concordance analysis results using rerooted gene trees and species trees.
ASTRAL-IV/
A directory containing ASTRAL-IV species tree conflict assessment using rerooted gene trees from directory 04-Rerooted_gene_trees (created only when users choose to run ASTRAL-IV by setting -sp_tree 4).
ASTRAL_PhyParts.*
Files containing the PhyParts output (more details can be found here).ASTRAL_PhyPartsPieCharts_<prefix>_<PH>.svg
Visualization of concordance and conflict between gene trees and the species tree, generated by our newly developed modified_phypartspiecharts.py script.
wASTRAL/
A directory containing wASTRAL species tree conflict assessment using rerooted gene trees from directory 04-Rerooted_gene_trees (created only when users choose to run wASTRAL by setting -sp_tree 5).
wASTRAL_<prefix>_<PH>.treThe species tree inferred by wASTRAL from rerooted<PH>gene trees.wASTRAL_<prefix>_<PH>_sorted_rr.tre
The final rerooted species tree, rerooted by Phyx and sorted by Newick_Utilities.
Comprehensive output
hybsuite_logs
A directory containing the comprehensive log file generated by HybSuite.
hybsuite_logs/
└── hybsuite_<current_time>.log
hybsuite_<current_time>.log
The log file produced when running the HybSuite pipeline (running the extension tools will not produce this logfile).
hybsuite_checklists
A directory containing checklist files, including species checklists and locus checklists.
hybsuite_checklists/
├── All_Spname_list.txt
├── My_Spname.txt
├── Outgroup.txt
├── Pre-assembled_Spname.txt
├── Public_Spname.txt
├── Public_Spname_SRR.txt
├── Recovered_locus_num_for_samples.tsv
├── Recovered_sample_num_for_loci.tsv
└── Ref_gene_name_list.txt
All_Spname_list.txt
A file containing all sample names from your research.My_Spname.txt
A file containing all sample names for user-provided raw data in your research.Outgroup.txt
A file containing all outgroup taxa specified by the user.Pre-assembled_Spname.txt
A file containing the names of all pre-assembled samples specified by the user.Public_Spname.txt
A file containing all sample names whose Next Generation Sequencing (NGS) raw data was downloaded from NCBI.Public_Spname_SRR.txt
A file containing all Sequence Read Archive (SRA) IDs used to download NGS raw data from NCBI. These SRA IDs correspond with the sample names listed in thePublic_Spname.txtfile.Recovered_locus_num_for_samples.tsv
A file containing the number of recovered loci by HybPiper for each sample.Recovered_sample_num_for_loci.tsv
A file containing the number of recovered samples by HybPiper for each locus.Ref_gene_name_list.txt
A file containing the names of all genes in the target sequences (specified by the-toption).
hybsuite_reports
A directory containing comprehensive statistical summaries of the results generated by the pipeline.
hybsuite_results/
├── Alignments_stats
├── <PH>-01_Alignments_stats_AMAS.tsv
├── <PH>-02_Trimmed_alignments_stats_AMAS.tsv
├── <PH>-03_Removed_alignments_without_parsimony_informative_sites.txt
├── <PH>-04_Removed_alignments_with_length_less_than_4.txt
├── <PH>-05_Removed_alignments_with_sample_number_less_than_5.txt
├── <PH>-06_Final_alignments_list.txt
└── <PH>-07_Final_alignments_stats_AMAS.tsv
└── Supermatrix_stats
└── <PH>-Supermatrix_stats_AMAS.tsv
hybsuite_reports -> Alignments_stats
<PH>-01_Alignments_stats_AMAS.tsv
Summary table of orthogroup alignments inferred via the<PH>paralog-handling method (generated byAMAS.py).<PH>-02_Trimmed_alignments_stats_AMAS.tsv
Summary table of trimmed orthogroup alignments inferred via the<PH>paralog-handling method (generated byAMAS.py).<PH>-03_Removed_alignments_without_parsimony_informative_sites.txt
List of alignments without parsimony informative sites. These alignments are removed for downstream species tree inference.<PH>-04_Removed_alignments_with_length_less_than_4.txt
List of alignments with base pair length less than 4. These alignments are removed for downstream species tree inference.<PH>-05_Removed_alignments_with_sample_number_less_than_5.txt
List of alignments with fewer than 5 samples. These alignments are removed for downstream species tree inference.<PH>-06_Final_alignments_list.txt
List of final<PH>alignments selected for downstream species tree inference.<PH>-07_Final_alignments_stats_AMAS.tsv
Summary table of final<PH>alignments for downstream species tree inference (generated byAMAS.py).
Filtering process: Alignments without parsimony-informative sites, low bp length, and with low sample number are removed.
hybsuite_reports -> Supermatrix_stats
<PH>-Supermatrix_stats_AMAS.tsv
Summary table of final<PH>supermatrix for downstream concatenation-based species tree inference (generated byAMAS.py).
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.