Sequence Assembly
This step assembles short reads into a longer one for taxonomic and functional profiling. This operation has been found to imrove performance of taxonomic assignment (Tran and Phan 2020)
We will employ MegaHit tool (Li et al. 2015) for assembly. This tool is memory efficient, making it an ideal choice for larger datasets. The following command performs assembly task for a single paired-read sequence.
megahit -1 sample_R1.fastq -2 sample_R2.fastq -o megahit_output_dir
Here, sample_R1.fastq
and sample_R2.fastq
are forward and backward reads. The results are stored in megahit_output_dir
specified using -o
flag.
References
Li, Dinghua, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, and Tak-Wah Lam. 2015. “MEGAHIT: An Ultra-Fast Single-Node Solution for Large and Complex Metagenomics Assembly via Succinct de Bruijn Graph.” Bioinformatics 31 (10): 1674–76.
Tran, Quang, and Vinhthuy Phan. 2020. “Assembling Reads Improves Taxonomic Classification of Species.” Genes 11 (8): 946.