4/6/23
We will have an implementation assignment this week (most likely begin in class tomorrow)
Still need to hear about project partners!
In the case of binary response, it is common to create a confusion matrix, from which we can obtain the misclassification rate and other rates of interest
FP
= “false positive”, FN
= “false negative”, TP
= “true positive, TN
=”true negative
Can calculate the overall error/misclassification rate: the proportion of observations that we misclassified
pred | true |
---|---|
0 | 0 |
1 | 1 |
0 | 1 |
1 | 0 |
1 | 1 |
1 | 1 |
0 | 0 |
0 | 0 |
1 | 1 |
0 | 1 |
10 observations, with true class and predicted class
Make a confusion matrix!
Pred |
True |
|
---|---|---|
0 | 1 | |
0 | 3 | 2 |
1 | 1 | 4 |
False positive rate (FPR): fraction of negative observations incorrectly classified as positive
Number of failures/negatives in data = \(\text{FP} + \text{TN}\)
\(\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}\)
False negative rate (FNR): fraction of positive observations incorrectly classified as negative example
Number of success/positives in data = \(\text{FN} + \text{TP}\)
\(\text{FNR} = \frac{\text{FN}}{\text{FN} + \text{TP}}\)
I fit a logistic regression to predict heart present
using predictors Topo
and WatrCont
, and obtained the following confusion matrix of the train data:
Pred |
True |
|
---|---|---|
0 | 1 | |
0 | 16 | 5 |
1 | 5 | 44 |
What is the misclassification rate?
What is the FNR? What is the FPR?
FNR: 5/49 = 0.238
FPR: 5/21 = 0.102
Is a false positive or a false negative worse? Depends on the context!
The previous confusion matrix was produced by classifying an observation as present if \(\hat{p}(x)=\widehat{\text{Pr}}(\text{present = 1} | \text{Topo, WatrCont}) \geq 0.5\)
Here, 0.5 is the threshold for assigning an observation to the “present” class
Can change threshold to any value in \([0,1]\), which will affect resulting error rates
Overall error rate minimized at threshold near 0.50
How to decide a threshold rate? Is there a way to obtain a “threshold-free” version of model performance?
The ROC curve is a measure of performance for binary classification at various thresholds
The ROC curve simultaneously plots the true positive rate TPR on the y-axis against the false positive rate FPR on the x-axis
An excellent model has AUC near 1 (i.e. near perfect separability), whereas a terrible model has AUC near 0 (always predicts the wrong class)
ROC and AUC for the training data:
While logistic regression is nice (and still frequently used), the basic model only accommodates binary responses. What if we have a response with more than two classes?
We will now see how we can use KNN for classification
Finding the neighbor sets proceeds exactly the same as in KNN regression! The difference lies in how we predict \(\hat{y}\)
This is a similar plot to that from KNN regression slides, where now points (plotted in standardized predictor space) are colored by present
status instead of abundance
.
Two test points, which we’d like to classify using KNN with \(K = 3\)
Discuss: which class labels would you predict for test points 1 and 2, and why?
KNN classification is easily extended to more than two classes!
We still follow the majority vote approach: simply predict the class that has the highest representation in the neighbor set
palmerPenguins
data: size measurements for adult foraging Adélie, Chinstrap, and Gentoo penguins
We will classify penguin species by size measurements!
An issue we may encounter in KNN classification is a tie
Discuss: how would you handle ties in the neighbor set?