Loading data in QIIME
Now, we will import our dataset into QIIME2 for further processing. There are different data formats supported by QIIME2 when it comes to importing. So, having a basic knowledge about those formats and their corresponding procedures to import them would be helpful.
More information can be accessed here
In our previous step, we applied filtering using trimmomatic tool. This filtering step generated new sequence files with names with suffixes like 1P_L001.fastq.gz
. We we will create a manifest file containing filepath of forward and backward sequence read files.
Create manifest file
To automate the generation of manifest file, we will use the following shell script
create_manifest.sh
# author: Pankaj chejara
# Script to create manifest file for qiime2
samples=()
for filename in ./raw_files/*.fastq.gz; do
base=$(basename "$filename" .fastq.gz)
nf=$(echo $base | sed -e 's/.......$//');
if ! [[ ${samples[@]} =~ $nf ]]
then
samples+=("$nf");
fi
done
echo "sample-id forward-absolute-filepath reverse-absolute-filepath" > manifest.csv
for sample in "${samples[@]}"; do
forward="$PWD/trimm_outputs/${sample}_1P.fastq.gz"
backward="$PWD/trimm_outputs/${sample}_2P.fastq.gz"
sampleid=$(echo $sample | sed -e 's/_.*//')
echo "$sampleid $forward $backward" >> manifest.csv
#echo "$sampleid,$backward,backward" >> manifest.csv
done;
This script will generate a manifest file which looks like this
sample-id | absolute-filepath | direction |
---|---|---|
ADenoma12-2799 | /Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/ADenoma12-2799_S12_L001_1P.fastq.gz | forward |
ADenoma12-2799 | /Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/ADenoma12-2799_S12_L001_2P.fastq.gz | backward |
Ademona1-2065 | /Users/pankaj/Documents/Metrosert/Public dataset/zakular2014/trimm_outputs/Ademona1-2065_S1_L001_1P.fastq.gz | forward |
Do not forget to change the directories in the script as per your environment. The script scans raw_data
directory to fetch all sample ids and then generate manifest file with filepaths (of files obtained after trimmomatic).
Import sequences reads in QIIME
To import our dataset, we will use the generated manifest file and run the following command.
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.csv \
--input-format PairedEndFastqManifestPhred33V2 \
--output-path paired-end-demux.qza
On successful execution of the above command, a new qiime artifact paired-end-demux.qza
will be generated.