4/18/23
Interpretation of the tree is the same as for regression tree!
Will end up with paths to regions \(R_{1}, R_{2}, \ldots, R_{T}\) where \(T\) is the number of terminal nodes or leaves
What do you think we will predict for \(\hat{y}\) now in a classification task?
Just like in regression trees, we will use recursive binary splitting to grow the tree
Top-down, greedy approach that makes the “best” split at that moment in time
In regression trees, we used residual sum of squares to quantify “best”, but we cannot use that here!
What might we use instead to quantify “best” to decide each split?
Consider the fraction of training observations in the region that belong to the most common class: \[E = 1 - \max_{k}(\hat{p}_{mk}),\] where \(\hat{p}_{mk}\) is the proportion of training observations in the region \(R_{m}\) that are from class \(k\)
Does smaller or larger \(E\) correspond to a “good” split?
Unfortunately, this error is not sufficiently sensitive to tree-growing
The Gini index is a measure of the total variance across the \(K\) classes
\[G_{m} = \sum_{k=1}^{K} \hat{p}_{mk} (1 - \hat{p}_{mk})\]
\(G_{m}\) is small if all the \(\hat{p}_{mk}\)’s are close to zero or one
For this reason, Gini index is referred to as a measure of node purity
A small \(G_{m}\) indicates that the node contains most of its observations from a single class
Region1 Region2 Region3
1 A A A
2 A A A
3 A B A
4 A B B
5 A C B
6 A C B
7 A C C
8 A C C
9 A C C
In these three regions, what are the Gini indices \(G_{1}, G_{2}, G_{3}\)?
Alternative to Gini index is cross-entropy:
\[D_{m} = -\sum_{k=1}^{K} \hat{p}_{mk} \log\hat{p}_{mk}\]
Region1 Region2 Region3
1 A A A
2 A A A
3 A B A
4 A B B
5 A C B
6 A C B
7 A C C
8 A C C
9 A C C
Work through building a mini classification tree!