Skip to content

lucasselvik/Handwritten-Digit-Classification-Algorithm

 
 

Repository files navigation

Handwritten-Digit-Classification-Using-K-Means-Clustering

Recreated and improved the K-Means algorithm from scratch to classify MNIST digits. Implemented K-Means++ initialization, centroid updates, and Euclidean-distance assignment, plus an outlier-detection system. Achieved 78% accuracy and identified high-variance misclassified digits.

Project Report

Read the full report here: Clustering and Classification of Handwritten Digits Using the K-Means Algorithm (PDF)

Overview

This project implements and optimizes the K-means clustering algorithm to classify handwritten digits from the Modified National Institute of Standards and Technology (MNIST) database. The goal was to classify 784-dimensional image vectors by forming clusters and calculating representative centroids. We modified the centroid initialization process using the K-means++ method for improved performance and established a distance-based statistical threshold for robust outlier detection. The resulting algorithm achieved 78% classification accuracy on the test set and successfully flagged 14 outliers.

Features

  • Core K-means Implementation: Developed a full K-means algorithm to classify high-dimensional MNIST image vectors
  • Centroid Initialization Optimization: Employed the K-means++ method to select initial centroids and improve overall clustering performance
  • Statistical Outlier Detection: Implemented a distance-based system to identify data anomalies using a statistical threshold
  • Parameter Tuning: Optimized the algorithm by running tests to determine the best number of clusters and iterations to minimize the cost function
  • Performance and Analysis: Achieved a classification accuracy of 78% and analyzed sources of error, particularly for digits lacking closed borders

Authors

  • Nick Regas
  • Lucas Selvik

About

Recreated and improved the K-Means algorithm from scratch to classify MNIST digits. Implemented K-Means++ initialization, centroid updates, and Euclidean-distance assignment, plus an outlier-detection system. Achieved 78% accuracy and identified high-variance misclassified digits.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • MATLAB 100.0%