Using the generated data, find matrix $V$ and the vectors that describe the new coordinate system in reduced dimension $3$ using either SVD or eigenvectors & eigenvalues. Graph the data and calculate how much the training set's variance has diminished. Have you preserved $90 \%$ of the variance?





In [1]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D  # Import 3D plotting module

# Enable interactive mode for Google Colab
%matplotlib notebook

# Install Plotly
!pip install plotly

# Import Plotly
import plotly.express as px

# Set a random seed for reproducibility
np.random.seed(42)

# Define the number of vectors you want to generate
num_vectors = 100

# Generate random values for the first dimension
dim1 = np.random.randn(num_vectors)

# Create the second dimension with dependency on the first dimension and random noise
dependency_factor_dim2 = 0.7
noise_dim2 = np.random.randn(num_vectors)
dim2 = dependency_factor_dim2 * dim1 + 2 + noise_dim2

# Create the third dimension with dependency on the second dimension and random noise
dependency_factor_dim3 = -1.1
noise_dim3 = np.random.randn(num_vectors)
dim3 = dependency_factor_dim3 * dim2 - 4 + noise_dim3

# Generate random values for the fourth dimension
dim4 = np.random.randn(num_vectors)

# Create a 4-dimensional vector by stacking the dimensions horizontally
B = np.column_stack((dim1, dim2, dim3, dim4))

# Create a 3D scatter plot of the first three dimensions using   (allows you to rotate the figure with your mouse)
fig = px.scatter_3d(x=dim1, y=dim2, z=dim3, color=dim4)
fig.update_layout(title='Interactive 3D Scatter Plot of Dimensions 1, 2, 3', scene=dict(xaxis_title='x_1', yaxis_title='x_2', zaxis_title='x_3'))

# Show the plot
fig.show()



In [2]:
#SVD Solution

#Calculate the sample means for each column
column_means = np.mean(B, axis=0)

#Normalized matrix B_normalized
B_normalized = B - np.tile(column_means,(B.shape[0],1))

#Perform SVD on our normalized feature matrix
U, S, VT = np.linalg.svd(B_normalized)

# U: Left singular vectors (100 x 100)
# S: Singular values in a  vector (already sorted in descending order)
# VT: (V transposed) Right singular vectors (4 x 4)

V_top_three = VT.T[:,0:3]

#Project B_normalized features onto the vectors associated with the three largest singular values
T = B_normalized @ V_top_three

#Graph
fig = px.scatter_3d(x=T[:,0], y=T[:,1], z=T[:,2])
fig.update_layout(title='3D Scatter Plot of Dimensions 1, 2, 3 of T', scene=dict(xaxis_title='t_1', yaxis_title='t_2', zaxis_title='t_3'))

fig.show()

#Matrix V
print('Matrix V:')
print(V_top_three)

#SVD Solution

#Calculate the variance of each column before PCA
column_variances_of_B_normalized = np.var(B_normalized, axis=0)

#Sum of the variances before PCA
total_variance_original = np.sum(column_variances_of_B_normalized)
print("\nSum of the variances of each column before PCA:", total_variance_original)

#Variances after PCA
column_variances_of_T = np.var(T, axis=0)

# Sum the variances after PCA
total_variance_best_3_new_features = np.sum(column_variances_of_T)
print("\nSum of the variances of each column after PCA:", total_variance_best_3_new_features)

#Ratio
print("\nProportion of variance preserved after PCA:", total_variance_best_3_new_features/total_variance_original)

Matrix V:
[[-0.21543116  0.69557149 -0.48414346]
 [-0.5217725   0.21156703 -0.25244289]
 [ 0.82179994  0.3710635  -0.2108495 ]
 [ 0.0773805  -0.57769505 -0.81081452]]

Sum of the variances of each column before PCA: 5.062407538235173

Sum of the variances of each column after PCA: 4.753839190050946

Proportion of variance preserved after PCA: 0.9390471142724715


In [3]:
#Eigenvectors & Eigenvalues Solution

#Sigma
Sigma = B_normalized.T @ B_normalized

#Eigens
eigenvalues, eigenvectors = np.linalg.eig(Sigma)

#Sort Eigens
sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_indices]
eigenvectors = eigenvectors[:, sorted_indices]

print("Eigenvalues:")
print(eigenvalues)
print("\nEigenvectors:")
print(eigenvectors)

#Matrix V
V=eigenvectors[:, 0:3]
print('\nMatrix V: ')
print(V)

#Z
Z = V.T @ B_normalized.T
Z = Z.T

#Graph
fig = px.scatter_3d(x=Z[:,0], y=Z[:,1], z=Z[:,2])
fig.update_layout(title='Eigen 3D Scatter Plot of Dimensions 1, 2, 3 of Z', scene=dict(xaxis_title='z_1', yaxis_title='z_2', zaxis_title='z_3'))
fig.show()

#Eigen Solution

#Variances after PCA
column_variances_of_Z = np.var(Z, axis=0)

#Sum of the variances after PCA
total_variance_best_3_new_features = np.sum(column_variances_of_Z)
print("Sum of the variances of each column after PCA:", total_variance_best_3_new_features)

#Ratio
print("Proportion of variance preserved after PCA:", total_variance_best_3_new_features/total_variance_original)

Eigenvalues:
[316.3987448   89.96592894  69.01924526  30.85683482]

Eigenvectors:
[[ 0.21543116  0.69557149  0.48414346  0.48515444]
 [ 0.5217725   0.21156703  0.25244289 -0.7869342 ]
 [-0.82179994  0.3710635   0.2108495  -0.3774907 ]
 [-0.0773805  -0.57769505  0.81081452  0.05348365]]

Matrix V: 
[[ 0.21543116  0.69557149  0.48414346]
 [ 0.5217725   0.21156703  0.25244289]
 [-0.82179994  0.3710635   0.2108495 ]
 [-0.0773805  -0.57769505  0.81081452]]


Sum of the variances of each column after PCA: 4.753839190050945
Proportion of variance preserved after PCA: 0.9390471142724713


Yes, 93.9% variance is retained.