KNN classification

Implementations
Part 1
Published

April 11, 2023

Introduction

You and your (optional) group will implement KNN classification. You should feel free to copy whatever code is relevant from your KNN regression implementation.

I encourage you to work together, though each person should still submit their own individual document to Canvas!

Discuss

With your group, discuss which parts of the KNN regression code you will need to modify and how in order to implement classification. Once you feel ready, clone the GitHub project called knn_classification and work in the file knn_classification_implementation.Rmd.

Implement

Your final implementation must be a function called .knn_class() (note the period) that takes in the “minimal amount” of inputs and returns a vector of the predicted labels for the test data. You may assume that the response variable is of type factor. Your code must be as reproducible as possible, and should accommodate ties when predicting a label by randomly choosing one of the tied labels!

Deliverable

Once you’ve finished your implementation, check it by seeing if you get the same predicted labels as I do below using the following seed, choice of K, and train/test sets from the palmerpenguins dataset. You may need to install the package in your console first!

library(palmerpenguins)
penguins2 <- penguins %>%
  filter(year == 2009) %>%
  select(species, bill_depth_mm, flipper_length_mm) %>%
  na.omit() 
set.seed(2)
K <- 4
n <- nrow(penguins2)
train_ids <- sample(1:n, n*0.8)
train_dat <- penguins2[train_ids,]
test_dat <- penguins2[-train_ids,]
pred_y
 [1] "Chinstrap" "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [7] "Gentoo"    "Adelie"    "Adelie"    "Adelie"    "Gentoo"    "Gentoo"   
[13] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[19] "Gentoo"    "Chinstrap" "Adelie"    "Adelie"    "Adelie"    "Adelie"   
Note

It is possible we may get slightly different predictions if we implemented the random sampling differently. However, you should at least have the same predictions as I do for the following indices of the test data:

 [1]  2  3  4  5  6  8  9 10 11 12 13 14 15 16 17 18 19 22 24

Then, re-create the following table and use code to obtain and report the misclassification test error:

table(preds = pred_y, true = test_y)
           true
preds       Adelie Chinstrap Gentoo
  Adelie         8         4      0
  Chinstrap      1         1      0
  Gentoo         1         0      9

Submission

Once you’ve finished, push your changes to GitHub. Upload a PDF of your implementation and your confirmatory “code check” to Canvas.