Live code:
Set-up
Recall that in our mite_dat, we have the following three categorical predictors:
- Shrub, which takes values “None”, “Few”, and “Many”
- Topo, which takes values “Hummock” and “Blanket”
- Substrate, which takes values “Sphagn1”, “Spaghn2”, “Sphagn3”, “Sphagn4”, “Litter”, “Barepeat”, “Interface”
We would like to be able to convert these categorical predictors into quantitative ones in order to compute distances.
Integer encoding
One-hot encoding (few levels)
One-hot encoding (many levels)
The Substrate variable has 7 levels! We could write 7 different if_else() statements, but that seems rather inefficient…
Instead, we will make clever use of the of the pivot_wider() function. In the code below:
- Line 2: create a new place-holder variable - valuethat gives us the mechanism to create dummy variables
- Line 3: - pivot_wider()to create new variables, one for each level of- Substrate.Each new variable gets its value from- value(i.e. a 1) if the original- Substratevariable belonged to that level.
You should notice that we get a lot of NA values! We just need to replace those NA’s with 0s. In the code below:
- Line 4: use the - values_fillargument to specify that- NAs should be 0s
- Line 5: modify the names of our new variables to more clearly indicate that they correspond to the same original variable 
mite_dat <- mite_dat %>%
  mutate(value = 1) %>%
  pivot_wider(names_from = Substrate, values_from = value, 
              values_fill = 0,
              names_prefix = "Sub_")
mite_dat %>%
  slice(1:6)# A tibble: 6 × 12
  SubsDens WatrCont Shrub Topo   abund…¹ Sub_S…² Sub_L…³ Sub_I…⁴ Sub_S…⁵ Sub_S…⁶
     <dbl>    <dbl> <ord> <fct>    <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     39.2     350. Few   Hummo…       0       1       0       0       0       0
2     55.0     435. Few   Hummo…       0       0       1       0       0       0
3     46.1     372. Few   Hummo…       0       0       0       1       0       0
4     48.2     360. Few   Hummo…       0       1       0       0       0       0
5     23.6     204. Few   Hummo…       0       1       0       0       0       0
6     57.3     312. Few   Hummo…       0       1       0       0       0       0
# … with 2 more variables: Sub_Sphagn2 <dbl>, Sub_Barepeat <dbl>, and
#   abbreviated variable names ¹abundance, ²Sub_Sphagn1, ³Sub_Litter,
#   ⁴Sub_Interface, ⁵Sub_Sphagn3, ⁶Sub_Sphagn4