Name	Name	Last commit message	Last commit date
parent directory ..
Lab3.ipynb	Lab3.ipynb
README.md	README.md
data3.txt	data3.txt

Lab3

PCA

Code

Data

The data consists of 8 feature columns. The date is available here.

Task

Develop the algorithm for the Principal Component Analysis (PCA) method and implement it programmatically.
Conduct an analysis of experimental data using the Principal Component Analysis method.
1. Load the data according to your variant. Display the data on the monitor as a table.
2. Normalize (standardize) the original experimental data. Build a correlation matrix.
3. Ensure that the correlation matrix significantly differs from the identity matrix.
4. Calculate the projections of objects onto the principal components.
Analyze the results of the Principal Component Analysis method.
1. Check the equality of the sums of sample variances of the original features and the sample variances of projections onto the principal components.
2. Determine the relative proportion of variance attributable to the principal components. Build a covariance matrix for projections onto the principal components.
3. Based on the first M = 2 principal components, construct a scatter plot. Provide a meaningful interpretation of the first two principal components.

Procedure

Data was obtained from a .txt file.
Exploratory Data Analysis (EDA) was conducted on the obtained data. Descriptive statistics were displayed, distribution histograms and boxplot graphs were constructed.
The data were normalized using StandardScaler. Distribution histograms and boxplots were also created for the normalized data.
For the normalized data, Pearson and Kendall correlation matrices were constructed and displayed, along with a covariance matrix, which resembled the correlation matrix due to the normalization.
The value of d was calculated from the covariance matrix. From the theory: If the correlation matrix of the original data does not differ from the identity matrix (i.e., $(d \leq \chi^2)$ calculated at a given confidence level and degrees of freedom), then the application of the Principal Component Analysis method is not advisable.
Eigenvalues and eigenvectors were obtained using np.linalg.eig. The eigenvectors were used to project the original data onto the principal components, resulting in the Z matrix.
The variance of the projected data and the original data was calculated, and they closely matched, indicating the correct implementation of the PCA method.
The covariance matrix for Z was displayed.
The relative proportion of the spread attributable to the main components and the relative share of the spread attributable to the first i components were calculated.
A scatter plot for the first two principal components was constructed.
The results of the PCA implemented in sklearn were compared, and the same patterns were observed.
Another dimensionality reduction method, t-SNE, was applied, resulting in improved outcomes.
Yet another dimensionality reduction method, UMAP, demonstrated the best results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Lab3

PCA

Code

Data

Task

Procedure

FilesExpand file tree

Lab3

Directory actions

More options

Directory actions

More options

Latest commit

History

Lab3

Folders and files

parent directory

README.md

Lab3

PCA

Code

Data

Task

Procedure