import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
# log transformed data from the previous step
= microbiome_log
df
# Step 1: Standardize the data
= StandardScaler()
scaler = scaler.fit_transform(df)
scaled_data
# Step 2: Apply PCA (reduce to 10 components for visualization)
= PCA(n_components=10)
pca = pca.fit_transform(scaled_data)
pca_result
# Step 3: Create a DataFrame for PCA results
= pd.DataFrame(pca_result, columns=['PCA1', 'PCA2','PCA3', 'PCA4','PCA5',
pca_df 'PCA6','PCA7', 'PCA8', 'PCA9', 'PCA10'])
'study_condition'] = zeller_db['study_condition'].values
pca_df[
Principal Component Analysis
Principal Component Analysis (PCA) is a powerful dimensionality reduction technique that reduces the number of features (dimensions) while retaining the maximum variance in the data. By capturing the most important patterns in fewer components, PCA enhances data visualization and simplifies analysis.
We will now apply PCA to our filtered dataset and plot the resulting components to explore potential associations with study conditions. We will apply this transformation on filtered species. The code below perform that step.
![](chapter4_files/figure-html/fig-pca-output-1.png)
![](chapter4_files/figure-html/fig-pca-output-2.png)
![](chapter4_files/figure-html/fig-pca-output-3.png)
![](chapter4_files/figure-html/fig-pca-output-4.png)
![](chapter4_files/figure-html/fig-pca-output-5.png)
![](chapter4_files/figure-html/fig-pca-output-6.png)
![](chapter4_files/figure-html/fig-pca-output-7.png)
![](chapter4_files/figure-html/fig-pca-output-8.png)
![](chapter4_files/figure-html/fig-pca-output-9.png)
![](chapter4_files/figure-html/fig-pca-output-10.png)
Figure fig-var below shows the cumulative variance captured by 10 principal components, i.e. 26%.
![](chapter4_files/figure-html/fig-var-output-1.png)
We will utilize PCA components in our modeling phase to build a CRC detection system.