Host DNA Removal

In this step, we will remove human genomes from our raw sequences to ensure a high-quality analysis of gut microbiome. For this task, we will use bowtie2 (Langmead and Salzberg 2012) and samtools (Li 2009) tools.

The removal of human genome is a multi-step process. These steps are given below

Download human genome index files for bowtie

The first step is to download bowtie2 indices for human genomes. There are multiple human genomes indices are available here. For this tutorial, we will use hg19 human genome (Thomas et al. 2019).

The command below download the genome zipped file and extract human genome indices.


wget https://genome-idx.s3.amazonaws.com/bt/hg19.zip
unzip hg19.zip

Remove human genome

This step consisting of three tasks:

  1. Aligning raw sequences to human genomes,
  2. Removing mapped sequences
  3. Converting resultant files back to fastq format.

The first task is executed by the following command

bowtie2 -x hg19/hg19  -1 reads_1.fastq.gz -2 reads_2.fastq.gz -S mapped.sam --very-sensitive

The second task, which remove mapped sequences, is performed by the following command.


# Convert sam file into bam file
samtools view -Sb mapped.sam > mapped.bam

# Extract unmapped sequences
samtools view -b -f 12 -F 256 mapped.bam > unmapped.bam

The third task is executed by the following command

# Convert bam file back to fastq format
samtools fastq unmapped.bam -1 unmapped_1.fastq -2 unmapped_2.fastq -0 unmapped_singletons.fastq

References

Langmead, Ben, and Steven L Salzberg. 2012. “Fast Gapped-Read Alignment with Bowtie 2.” Nature Methods 9 (4): 357–59.
Li, Heng. 2009. “The Sequence Alignment/Map (SAM) Format and SAMtools 1000 Genome Project Data Processing Subgroup.” Bioinformatics 25: 1.
Thomas, Andrew Maltez, Paolo Manghi, Francesco Asnicar, Edoardo Pasolli, Federica Armanini, Moreno Zolfo, Francesco Beghini, et al. 2019. “Metagenomic Analysis of Colorectal Cancer Datasets Identifies Cross-Cohort Microbial Diagnostic Signatures and a Link with Choline Degradation.” Nature Medicine 25 (4): 667–78.