Species filtering

We will filter out species whose relative abundance does not exceed 0.001 in any sample. This criterion is derived from the study by Zeller et al., the same study that provided the dataset (Zeller et al. 2014).

After applying this abundance filter, the dataset is reduced to 491 species. We will now proceed with model development using these filtered species.

import numpy as np
import matplotlib.pyplot as plt

# dataset containing only bacterial microoganism's relative abundace
microbiome = zeller_db[bacteria_colnames]

# converting data types
for col in microbiome:
    microbiome.loc[:,col] = pd.to_numeric(microbiome[col], errors='coerce')

# fetching names of columns with abundance exceeding .001
columns_to_fetch = microbiome.columns[microbiome.max(axis=0) > 0.001]

# filtering dataset
microbiome_filtered = microbiome[columns_to_fetch]

plt.figure()
plt.bar([1,2],[len(microbiome.columns),len(columns_to_fetch)],alpha=.8)
plt.xticks([1,2],['before','after'])
plt.ylabel('Number of microbial species')
plt.title('Before and after species filtering')
plt.show()
Figure 1: Before and after species filtering

References

Zeller, Georg, Julien Tap, Anita Y Voigt, Shinichi Sunagawa, Jens Roat Kultima, Paul I Costea, Aurélien Amiot, et al. 2014. “Potential of Fecal Microbiota for Early-Stage Detection of Colorectal Cancer.” Molecular Systems Biology 10 (11): 766.