Example dataset

This page provides detailed instructions on how to run the example dataset included with HybSuite.


1. Download the example dataset

If you have downloaded the HybSuite source package, a directory named example_dataset is already included. In this case, no additional download is required.

Alternatively, you can download the repository on your server using:

git clone https://github.com/Yuxuanliu-HZAU/HybSuite
cd HybSuite/example_dataset

2. Configure inputs

The directory example_dataset contains two folders: Angiosperms353 and Arabidopsis100, respectively encompassing all inputs for running HybSuite pipeline for the corresponding two example datasets in our analyss.

Example dataset 1: Angiosperms353

Angiosperms353/
├── Input_list.txt
├── Target_file_Angiosperms353.fasta
├── Input_sequences/
    ├── Elaeagnus_pungens.fasta
    └── Hippophae_rhamnoides.fasta

Input_list.txt

This file documents taxon names and their corresponding sequence sources (marked in the second row,seperated by tab key):

Elaeagnus_angustifolia	SRR12569928
Elaeagnus_bambusetorum	SRR27547630
Elaeagnus_henryi	SRR15533155
Elaeagnus_macrophylla	SRR23618743
Elaeagnus_mollis	SRR30566771
Hippophae_neurocarpa	SRR17549374
Hippophae_salicifolia	ERR7621632
Hippophae_tibetana	SRR17549370
Shepherdia_argentea	ERR7621633
Barbeya_oleoides	SRR16214280	Outgroup
Elaeagnus_oldhamii	A
Elaeagnus_pungens	B
Hippophae_rhamnoides	B
  • Identifiers prefixed with SRR or ERR: Public raw NGS data of the corresponding samples (the first row) ready to be downloaded in HybSuite pipeline.
  • Identifier A: User-provided raw NGS data of the corresponding samples (the first row) ready to be inputted to HybSuite pipeline.
  • Identifier B: User-provided pre-assembled sequences of the corresponding samples (the first row) ready to be inputted to HybSuite pipeline.
  • Identifier Outgroup : Specifing the outgroup taxon.

Input_sequences

This directory should contain either user-provided raw reads, pre-assembled sequences, or both, according to the information provided in Input_list.txt.

  • type1: user-provided raw reads
    In our analysis, only the data of species Elaeagnus oldhamii belongs to user-provided raw reads, which needs to be downloaded here prior to running HybSuite pipeline. After downloading the raw data, transfer them to FASTQ.GZ format and move them to this directory. The two pair-ended files should be named as:
Elaeagnus_oldhamii_1.fastq.gz
Elaeagnus_oldhamii_2.fastq.gz
  • type2: pre-assembled sequences
    Two taxa with pre-assembled sequences are provided: Elaeagnus_pungens, and Hippophae_rhamnoides (corresponding to the taxon name along with the identifier B provided in the sample list file Sample_list.tsv. Their FASTA files are named as Elaeagnus_pungens.fasta and Hippophae_rhamnoides.fasta respectively. (<taxon>.fasta)

Target_file_Angiosperms353.fasta

This file is the target sequence file for Angiosperms353.
The gene name for a sequence should be placed immediately after the final hyphen (-) in the line:

>Elaeagnus-pungens-4471
AATGTCATCCAGGATAAATATCGGTTGGAAGCTGCAAATACTGACTGGATGAACAAGTAC
AAAGGCTCTAGTAAGCTTCTATTGCATCCAAGGAACACTGAGGAGGTTTCACAGATACTC
...
>Hippophae-rhamnoides-4527
GAAGAGAGGGTTGTAGTATTAGTGATTGGTGGAGGAGGAAGAGAACATGCTCTTTGCTAT
GCAATGAATCGATCACCATCCTGCGATGCAGTCTTTTGTGCTCCTGGCAATGCTGGGATT
...
>Hippophae-salicifolia-4691
CAGAGACTGCCTCCATTGTCAACTGATCCCAACAGATGCGAGCGTGCATTTGTTGGAAAC
ACGATAGGTCAAGCAAATGGTGTGTACGACAAGCCAATCGATCTCCGATTCTGTGATTAC
...

Example dataset 2: Arabidopsis100

Arabidopsis100/
├── Input_list.txt
└── Target_file_Arabidopsis100.fasta

Input_list.txt

This file documents taxon names and their corresponding sequence sources (marked in the second row,seperated by tab key):

Elaeagnus angustifolia	SRR26705271
Elaeagnus bambusetorum	SRR26757993
Elaeagnus henryi	SRR26705270
Elaeagnus macrophylla	SRR26753865
Elaeagnus mollis	SRR26758012
Elaeagnus oldhamii	SRR26705501
Elaeagnus pungens	SRR26705285
Hippophae neurocarpa	SRR26705287
Hippophae rhamnoides	SRR26756417
Hippophae salicifolia	SRR26705274
Hippophae tibetana	SRR26704952
Shepherdia argentea	SRR26756705
Barbeya_oleoides	SRR26756183	Outgroup

Target_file_Arabidopsis_thaliana100.fasta

This file is the target sequence file for Arabidopsis100.
The gene name for a sequence should be placed immediately after the final hyphen (-) in the line:

>Locus-1
MAFRRVLTTVILFCYLLISSQSIEFKNSQKPHKIQGPIKTIVVVVMENRSFDHILGWLKSTRPEIDGLTGKESNPLNVSDPNSKKIFVSDDAVFVDMDPGHSFQAIREQIFGSNDTSGDPKMNGFAQQSESMEPGMAKNVMSGFKPEVLPVYTELANEFGVFDRWFASVPTSTQPNRFYVHSATSHGCSSNVKKDLVKGFPQKTIFDSLDENGLSFGIYYQNIPATFFFKSLRRLKHLVKFHSYALKFKLDAKLGKLPNYSVVEQRYFDIDLFPANDDHPSHDVAAGQRFVKEVYETLRSSPQWKEMALLITYDEHGGFYDHVPTPVKGVPNPDGIIGPDPFYFGFDRLGVRVPTFLISPWIEKGTVIHEPEGPTPHSQFEHSSIPATVKKLFNLKSHFLTKRDAWAGTFEKYFRIRDSPRQDCPEKLPEVKLSLRPWGAKEDSKLSEFQVELIQLASQLVGDHLLNSYPDIGKNMTVSEGNKYAEDAVQKFLEAGMAALEAGADENTIVTMRPSLTTRTSPSEGTNKYIGSY*
>Locus-2
MSDQQLETEINFWGETSEEDYFNLKGIIGSKSFFTSPRGLNLFTRSWLPSSSSPPRGLIFMVHGYGNDVSWTFQSTPIFLAQMGFACFALDIEGHGRSDGVRAYVPSVDLVVDDIISFFNSIKQNPKFQGLPRFLFGESMGGAICLLIQFADPLGFDGAVLVAPMCKISDKVRPKWPVDQFLIMISRFLPTWAIVPTEDLLEKSIKVEEKKPIAKRNPMRYNEKPRLGTVMELLRVTDYLGKKLKDVSIPFIIVHGSADAVTDPEVSRELYEHAKSKDKTLKIYDGMMHSMLFGEPDDNIEIVRKDIVSWLNDRCGGDKTKTQV*
>Locus-3
MSSRENPSGICKSIPKLISSFVDTFVDYSVSGIFLPQDPSSQNEILQTRFEKPERLVAIGDLHGDLEKSREAFKIAGLIDSSDRWTGGSTMVVQVGDVLDRGGEELKILYFLEKLKREAERAGGKILTMNGNHEIMNIEGDFRYVTKKGLEEFQIWADWYCLGNKMKTLCSGLDKPKDPYEGIPMSFPRMRADCFEGIRARIAALRPDGPIAKRFLTKNQTVAVVGDSVFVHGGLLAEHIEYGLERINEEVRGWINGFKGGRYAPAYCRGGNSVVWLRKFSEEMAHKCDCAALEHALSTIPGVKRMIMGHTIQDAGINGVCNDKAIRIDVGMSKGCADGLPEVLEIRRDSGVRIVTSNPLYKENLYSHVAPDSKTGLGLLVPVPKQVEVKA*

3. Run the pipeline

First of all, change your working directory to the downloaded example dataset file:

cd <the path to the directory of "example_dataset">

Next, create output directories (or specify an existing directory when running HybSuite):

mkdir -p ./Angiosperms353/Output ./Arabidopsis100/Output

After setting the right working directory, run the following commands for the two example datasets:

Angiosperms353

hybsuite full_pipeline \
-input_list ./Angiosperms353/Input_list.txt \
-input_data ./Angiosperms353/Input_sequences \
-output_dir ./Angiosperms353/Output \
-nt 5 \
-process 5 \
-t ./Angiosperms353/Target_file_Angiosperms353.fasta \
-seqs_min_length 100 \
-seqs_min_sample_coverage 0.1 \
-PH 1234567 \
-sp_tree 14

Arabidopsis100

hybsuite full_pipeline \
-input_list ./Arabidopsis100/Input_list.txt \
-output_dir ./Arabidopsis100/Output \
-nt 5 \
-process 5 \
-t ./Arabidopsis100/Target_file_Arabidopsis_thaliana100.fasta \
-seqs_min_length 100 \
-seqs_min_sample_coverage 0.1 \
-PH 1234567 \
-sp_tree 14

Last modified March 5, 2026: Update plotly.html (84cc3e0)