Classification Trees

4/18/23

Housekeeping

  • Lab 05 due this Thursday!
  • Project proposals due this Sunday to Canvas!

Classification Trees

Classification trees

  • Interpretation of the tree is the same as for regression tree!

  • Will end up with paths to regions \(R_{1}, R_{2}, \ldots, R_{T}\) where \(T\) is the number of terminal nodes or leaves

  • What do you think we will predict for \(\hat{y}\) now in a classification task?

    • Recall: in regression trees, the prediction \(\hat{y}\) for an observation that falls into a given region \(R_{m}\) is the average of the training responses in that \(R_{m}\)

Building the tree

  • Just like in regression trees, we will use recursive binary splitting to grow the tree

  • Top-down, greedy approach that makes the “best” split at that moment in time

  • In regression trees, we used residual sum of squares to quantify “best”, but we cannot use that here!

  • What might we use instead to quantify “best” to decide each split?

  • Consider the fraction of training observations in the region that belong to the most common class: \[E = 1 - \max_{k}(\hat{p}_{mk}),\] where \(\hat{p}_{mk}\) is the proportion of training observations in the region \(R_{m}\) that are from class \(k\)

  • Does smaller or larger \(E\) correspond to a “good” split?

  • Unfortunately, this error is not sufficiently sensitive to tree-growing

Gini Index

The Gini index is a measure of the total variance across the \(K\) classes

\[G_{m} = \sum_{k=1}^{K} \hat{p}_{mk} (1 - \hat{p}_{mk})\]

  • \(G_{m}\) is small if all the \(\hat{p}_{mk}\)’s are close to zero or one

  • For this reason, Gini index is referred to as a measure of node purity

  • A small \(G_{m}\) indicates that the node contains most of its observations from a single class

  • Example: 3 classes and 9 observations in three regions
  Region1 Region2 Region3
1       A       A       A
2       A       A       A
3       A       B       A
4       A       B       B
5       A       C       B
6       A       C       B
7       A       C       C
8       A       C       C
9       A       C       C
  • In these three regions, what are the Gini indices \(G_{1}, G_{2}, G_{3}\)?

    • \(G_{1} = 1(1-1) + 0(1-0) + 0(1-0) = 0\)
    • \(G_{2} = \frac{2}{9}(1-\frac{2}{9}) +\frac{2}{9}(1-\frac{2}{9}) + \frac{5}{9}(1-\frac{5}{9}) \approx 0.60\)
    • \(G_{3} = \frac{1}{3}(1-\frac{1}{3}) + \frac{1}{3}(1-\frac{1}{3})+\frac{1}{3}(1-\frac{1}{3}) = \frac{2}{3} \approx 0.67\)

Entropy

Alternative to Gini index is cross-entropy:

\[D_{m} = -\sum_{k=1}^{K} \hat{p}_{mk} \log\hat{p}_{mk}\]

  • Very similar to Gini index, so cross-entropy is also a measure of node purity
  • In these three regions, what are the cross-entropies \(D_{1}, D_{2}, D_{3}\)?
  Region1 Region2 Region3
1       A       A       A
2       A       A       A
3       A       B       A
4       A       B       B
5       A       C       B
6       A       C       B
7       A       C       C
8       A       C       C
9       A       C       C
  • \(D_{1} = -(1\log(1) + 0\log(0) + 0\log(0)) = 0\)
  • \(D_{2} = -(\frac{2}{9}\log(\frac{2}{9}) +\frac{2}{9}\log(\frac{2}{9}) + \frac{5}{9}\log(\frac{5}{9})) \approx 1\)
  • \(D_{3} = \frac{1}{3}(1-\frac{1}{3}) + \frac{1}{3}(1-\frac{1}{3})+\frac{1}{3}(1-\frac{1}{3}) = \frac{2}{3} \approx 1.1\)

Exercise

Work through building a mini classification tree!