Welcome to the main project repository for UuAP Case Study 1. This project focuses on the classification and machine learning analysis using purely data-driven k-mer frequency extraction over numerical string segments.
Course: Introduction to Data Analysis
Institution: IPI Academy Tuzla
Semester: Spring 2026
Important: The datasets used in this project were downloaded from the NCBI (National Center for Biotechnology Information) platform in FASTA format and serve exclusively for data analysis exercises. All copyrights and ownership of the genomic sequences belong entirely to NCBI.
This repository does not represent a validated biological research project, nor does it aim to establish genuine biological conclusions. The objective is simply to experiment with mathematical classifiers, sequence feature extraction, pattern matching, and analytical visualization entirely within a computational context.
lab_frog_dna/: Main analytics directory.- Contains Python pipelines (
lab_pipeline.py) used for algorithm evaluation and metric visualizations. - View the
lab_frog_dna/README.mdfor a comprehensive visual gallery of our machine learning model accuracies and generated PCA charts.
- Contains Python pipelines (