Host DNA Removal
In this step, we will remove human genomes from our raw sequences to ensure a high-quality analysis of gut microbiome. For this task, we will use bowtie2
(Langmead and Salzberg 2012) and samtools
(Li 2009) tools.
The removal of human genome is a multi-step process. These steps are given below
Download human genome index files for bowtie
The first step is to download bowtie2 indices for human genomes. There are multiple human genomes indices are available here. For this tutorial, we will use hg19
human genome (Thomas et al. 2019).
The command below download the genome zipped file and extract human genome indices.
wget https://genome-idx.s3.amazonaws.com/bt/hg19.zip
unzip hg19.zip
Remove human genome
This step consisting of three tasks:
- Aligning raw sequences to human genomes,
- Removing mapped sequences
- Converting resultant files back to fastq format.
The first task is executed by the following command
bowtie2 -x hg19/hg19 -1 reads_1.fastq.gz -2 reads_2.fastq.gz -S mapped.sam --very-sensitive
The second task, which remove mapped sequences, is performed by the following command.
# Convert sam file into bam file
samtools view -Sb mapped.sam > mapped.bam
# Extract unmapped sequences
samtools view -b -f 12 -F 256 mapped.bam > unmapped.bam
The third task is executed by the following command
# Convert bam file back to fastq format
samtools fastq unmapped.bam -1 unmapped_1.fastq -2 unmapped_2.fastq -0 unmapped_singletons.fastq