Loading data in QIIME
Now, we will import our dataset into QIIME2 for further processing. There are different data formats supported by QIIME2 when it comes to importing. So, having a basic knowledge about those formats and their corresponding procedures to import them would be helpful.
More information can be accessed here
In our previous step, we applied filtering using trimmomatic tool. This filtering step generated new sequence files with names with suffixes like 1P_L001.fastq.gz. We we will create a manifest file containing filepath of forward and backward sequence read files.
Create manifest file
To automate the generation of manifest file, we will use the following shell script
create_manifest.sh
# author: Pankaj chejara
# Script to create manifest file for qiime2
samples=()
for filename in ./raw_files/*.fastq.gz; do
    base=$(basename "$filename" .fastq.gz)
    nf=$(echo $base | sed -e 's/.......$//');
    if ! [[ ${samples[@]} =~ $nf ]]
    then
      samples+=("$nf");
    fi
  done
echo "sample-id    forward-absolute-filepath    reverse-absolute-filepath" > manifest.csv
for sample in "${samples[@]}"; do
  forward="$PWD/trimm_outputs/${sample}_1P.fastq.gz"
  backward="$PWD/trimm_outputs/${sample}_2P.fastq.gz"
  sampleid=$(echo $sample | sed -e 's/_.*//')
  echo "$sampleid    $forward    $backward" >> manifest.csv
  #echo "$sampleid,$backward,backward" >> manifest.csv
done;This script will generate a manifest file which looks like this
| sample-id | absolute-filepath | direction | 
|---|---|---|
| ADenoma12-2799 | /Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/ADenoma12-2799_S12_L001_1P.fastq.gz | forward | 
| ADenoma12-2799 | /Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/ADenoma12-2799_S12_L001_2P.fastq.gz | backward | 
| Ademona1-2065 | /Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/Ademona1-2065_S1_L001_1P.fastq.gz | forward | 
Do not forget to change the directories in the script as per your environment. The script scans raw_data directory to fetch all sample ids and then generate manifest file with filepaths (of files obtained after trimmomatic).
Import sequences reads in QIIME
To import our dataset, we will use the generated manifest file and run the following command.
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest.csv \
  --input-format PairedEndFastqManifestPhred33V2 \
  --output-path paired-end-demux.qzaOn successful execution of the above command, a new qiime artifact paired-end-demux.qza will be generated.