Loading data in QIIME

Now, we will import our dataset into QIIME2 for further processing. There are different data formats supported by QIIME2 when it comes to importing. So, having a basic knowledge about those formats and their corresponding procedures to import them would be helpful.

More information can be accessed here

In our previous step, we applied filtering using trimmomatic tool. This filtering step generated new sequence files with names with suffixes like 1P_L001.fastq.gz. We we will create a manifest file containing filepath of forward and backward sequence read files.

Create manifest file

To automate the generation of manifest file, we will use the following shell script

create_manifest.sh

# author: Pankaj chejara
# Script to create manifest file for qiime2

samples=()

for filename in ./raw_files/*.fastq.gz; do
    base=$(basename "$filename" .fastq.gz)
    nf=$(echo $base | sed -e 's/.......$//');
    if ! [[ ${samples[@]} =~ $nf ]]
    then
      samples+=("$nf");
    fi
  done

echo "sample-id    forward-absolute-filepath    reverse-absolute-filepath" > manifest.csv

for sample in "${samples[@]}"; do
  forward="$PWD/trimm_outputs/${sample}_1P.fastq.gz"
  backward="$PWD/trimm_outputs/${sample}_2P.fastq.gz"
  sampleid=$(echo $sample | sed -e 's/_.*//')
  echo "$sampleid    $forward    $backward" >> manifest.csv
  #echo "$sampleid,$backward,backward" >> manifest.csv
done;

This script will generate a manifest file which looks like this

sample-id	absolute-filepath	direction
ADenoma12-2799	/Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/ADenoma12-2799_S12_L001_1P.fastq.gz	forward
ADenoma12-2799	/Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/ADenoma12-2799_S12_L001_2P.fastq.gz	backward
Ademona1-2065	/Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/Ademona1-2065_S1_L001_1P.fastq.gz	forward

Tip

Do not forget to change the directories in the script as per your environment. The script scans raw_data directory to fetch all sample ids and then generate manifest file with filepaths (of files obtained after trimmomatic).

Import sequences reads in QIIME

To import our dataset, we will use the generated manifest file and run the following command.

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest.csv \
  --input-format PairedEndFastqManifestPhred33V2 \
  --output-path paired-end-demux.qza

On successful execution of the above command, a new qiime artifact paired-end-demux.qza will be generated.