Visualizing logistic regression results for non-technical audiences

.title[
# Visualizing logistic regression results for non-technical audiences
]
.author[
### Abby Kaplan and Keiko Cawley
]
.date[
### September 20, 2022
]

---

# GitHub
### https://github.com/keikcaw/visualizing-logistic-regression

---

# Logistic regression review

---
# Logistic regression: Binary outcomes

- Use logistic regression to model a binary outcome

- Examples from higher education:

- Did the student pass the class?

- Did the student enroll for another term?

- Did the student graduate?

---

# The design of logistic regression

- We want to model the probability that the outcome happened

- But probabilities are bounded between 0 and 1

- Instead, we model the logit of the probability:

$$
\mbox{logit}(p) = \log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n
$$

---

# What's the problem?

---

# Just tell me "the" effect

- Stakeholders often want to know whether something affects outcomes, and by how much

---

- But we don't model probabilities directly

$$
\log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n
$$

---

- But we don't model probabilities directly

$$
\boxed{\log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right)} = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n
$$

- We can solve for _p_:

$$
`\begin{aligned}
p & = \mbox{logit}^{-1}(\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n) \\ & \\
& = \frac{e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}{1 + e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}
\end{aligned}`
$$

---

# "The" effect is nonlinear in _p_

$$
`\begin{aligned}
p & = \frac{e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}{1 + e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}
\end{aligned}`
$$

---

---

---

---

# Sample dataset and model

---

# Dataset

- Our simulated dataset describes students who took Balloon Animal-Making 201 at University Imaginary

<table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Variable </th>
   <th style="text-align:left;"> Possible Responses </th>
   <th style="text-align:left;"> Variable Type </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Mac user </td>
   <td style="text-align:left;"> TRUE/FALSE </td>
   <td style="text-align:left;"> binary </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Wear glasses </td>
   <td style="text-align:left;"> TRUE/FALSE </td>
   <td style="text-align:left;"> binary </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Pet type </td>
   <td style="text-align:left;"> dog, cat, fish, none </td>
   <td style="text-align:left;"> categorical </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Favorite color </td>
   <td style="text-align:left;"> blue, red, green, orange </td>
   <td style="text-align:left;"> categorical </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Prior undergraduate GPA </td>
   <td style="text-align:left;"> 0.0-4.0 </td>
   <td style="text-align:left;"> continuous </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Height </td>
   <td style="text-align:left;"> 54-77 inches </td>
   <td style="text-align:left;"> continuous </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Went to tutoring </td>
   <td style="text-align:left;"> TRUE/FALSE </td>
   <td style="text-align:left;"> binary </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Passed </td>
   <td style="text-align:left;"> TRUE/FALSE </td>
   <td style="text-align:left;"> binary </td>
  </tr>
</tbody>
</table>

---

# Dataset

```r
library(tidyverse)
df = read.csv("data/course_outcomes.csv", header = T, stringsAsFactors = F)
```

<table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> id </th>
   <th style="text-align:left;"> mac </th>
   <th style="text-align:left;"> glasses </th>
   <th style="text-align:left;"> pet.type </th>
   <th style="text-align:left;"> favorite.color </th>
   <th style="text-align:right;"> prior.gpa </th>
   <th style="text-align:right;"> height </th>
   <th style="text-align:left;"> tutoring </th>
   <th style="text-align:left;"> passed </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> dog </td>
   <td style="text-align:left;"> red </td>
   <td style="text-align:right;"> 3.86 </td>
   <td style="text-align:right;"> 63 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> cat </td>
   <td style="text-align:left;"> green </td>
   <td style="text-align:right;"> 2.37 </td>
   <td style="text-align:right;"> 66 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> FALSE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> none </td>
   <td style="text-align:left;"> orange </td>
   <td style="text-align:right;"> 3.98 </td>
   <td style="text-align:right;"> 66 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> dog </td>
   <td style="text-align:left;"> red </td>
   <td style="text-align:right;"> 3.78 </td>
   <td style="text-align:right;"> 68 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> cat </td>
   <td style="text-align:left;"> blue </td>
   <td style="text-align:right;"> 3.73 </td>
   <td style="text-align:right;"> 67 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> dog </td>
   <td style="text-align:left;"> green </td>
   <td style="text-align:right;"> 3.99 </td>
   <td style="text-align:right;"> 61 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> none </td>
   <td style="text-align:left;"> red </td>
   <td style="text-align:right;"> 3.75 </td>
   <td style="text-align:right;"> 62 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> dog </td>
   <td style="text-align:left;"> red </td>
   <td style="text-align:right;"> 3.73 </td>
   <td style="text-align:right;"> 61 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> none </td>
   <td style="text-align:left;"> red </td>
   <td style="text-align:right;"> 4.00 </td>
   <td style="text-align:right;"> 64 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 10 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> dog </td>
   <td style="text-align:left;"> orange </td>
   <td style="text-align:right;"> 3.64 </td>
   <td style="text-align:right;"> 62 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 11 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> dog </td>
   <td style="text-align:left;"> red </td>
   <td style="text-align:right;"> 3.98 </td>
   <td style="text-align:right;"> 63 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 12 </td>
   <td style="text-align:left;"> FALSE </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> cat </td>
   <td style="text-align:left;"> green </td>
   <td style="text-align:right;"> 3.72 </td>
   <td style="text-align:right;"> 64 </td>
   <td style="text-align:left;"> TRUE </td>
   <td style="text-align:left;"> TRUE </td>
  </tr>
</tbody>
</table>

---

# Model

- Dependent variable: did the student pass?

- Continuous variables were centered and standardized

```r
df = df %>%
  mutate(cs.prior.gpa = (prior.gpa - mean(prior.gpa)) / sd(prior.gpa),
         cs.height = (height - mean(height)) / sd(height))
```

- Reference levels for categorical variables:

- Pet type: none

- Favorite color: blue

```r
df = df %>%
  mutate(pet.type = fct_relevel(pet.type, "none", "dog", "cat", "fish"),
         favorite.color = fct_relevel(favorite.color, "blue", "red", "green",
                                      "orange"))
```

---

# Model

```r
library(lme4)
pass.m = glm(passed ~ mac + glasses + pet.type + favorite.color + cs.prior.gpa +
               cs.height + tutoring,
             data = df, family = binomial(link = "logit"))
summary(pass.m)$coefficients
```

```
##                         Estimate Std. Error    z value      Pr(>|z|)
## (Intercept)           1.53678432 0.09665694 15.8993682  6.400698e-57
## macTRUE              -0.04070026 0.08251424 -0.4932514  6.218350e-01
## glassesTRUE           0.19330654 0.07787099  2.4823948  1.305026e-02
## pet.typedog          -0.25143138 0.08483778 -2.9636722  3.039919e-03
## pet.typecat           0.09616174 0.11927784  0.8061995  4.201278e-01
## pet.typefish         -1.19359401 0.16656361 -7.1659949  7.722363e-13
## favorite.colorred    -0.03945396 0.09265674 -0.4258078  6.702479e-01
## favorite.colorgreen  -0.38137532 0.10062190 -3.7901819  1.505370e-04
## favorite.colororange -0.24204783 0.13900517 -1.7412865  8.163337e-02
## cs.prior.gpa          1.03092175 0.03887172 26.5211237 5.531945e-155
## cs.height            -0.25908893 0.03833829 -6.7579681  1.399404e-11
## tutoringTRUE          0.22698497 0.07583279  2.9932300  2.760416e-03
```

---

# Model

```
##                         Estimate Std. Error    z value      Pr(>|z|)
## (Intercept)           1.53678432 0.09665694 15.8993682  6.400698e-57
## macTRUE              -0.04070026 0.08251424 -0.4932514  6.218350e-01
*## glassesTRUE           0.19330654 0.07787099  2.4823948  1.305026e-02
*## pet.typedog          -0.25143138 0.08483778 -2.9636722  3.039919e-03
## pet.typecat           0.09616174 0.11927784  0.8061995  4.201278e-01
*## pet.typefish         -1.19359401 0.16656361 -7.1659949  7.722363e-13
## favorite.colorred    -0.03945396 0.09265674 -0.4258078  6.702479e-01
*## favorite.colorgreen  -0.38137532 0.10062190 -3.7901819  1.505370e-04
## favorite.colororange -0.24204783 0.13900517 -1.7412865  8.163337e-02
*## cs.prior.gpa          1.03092175 0.03887172 26.5211237 5.531945e-155
*## cs.height            -0.25908893 0.03833829 -6.7579681  1.399404e-11
*## tutoringTRUE          0.22698497 0.07583279  2.9932300  2.760416e-03
```

---

# Causality disclaimer

- Some visualizations strongly imply a causal interpretation

- It's your responsibility to evaluate whether a causal interpretation is appropriate

- If the data doesn't support a causal interpretation, **don't use a visualization that implies one**

---

# Model coefficients

```r
coefs.df = summary(pass.m)$coefficients %>%
  data.frame() %>%
  rownames_to_column("parameter") %>%
  mutate(pretty.parameter =
           case_when(parameter == "(Intercept)" ~ "Intercept",
                     grepl("TRUE$", parameter) ~
                       str_to_title(gsub("TRUE", "", parameter)),
                     grepl("pet\\.type", parameter) ~
                       paste("Pet:", str_to_title(gsub("pet\\.type", "", parameter))),
                     grepl("favorite\\.color", parameter) ~
                       paste("Favorite color:",
                             str_to_title(gsub("favorite\\.color", "", parameter))),
                     parameter == "cs.prior.gpa" ~
                       paste("Prior GPA\n(", round(sd(df$prior.gpa), 1),
                             "-pt increase)", sep = ""),
                     parameter == "cs.height" ~
                       paste("Height\n(", round(sd(df$height), 1),
                             "-in increase)", sep = ""))) %>%
  dplyr::select(parameter, pretty.parameter, est = Estimate, se = Std..Error,
                z = z.value, p = Pr...z..)
```

---

# Color palette

```r
good.color = "#0571B0"
neutral.color = "gray"
bad.color = "#CA0020"
```

---

# Visualization family 1:

# Presenting model coefficients

---

# Coefficients in a table

<table class="table table-striped table-hover" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;text-align: center;"> Parameter </th>
   <th style="text-align:right;text-align: center;"> Estimate </th>
   <th style="text-align:right;text-align: center;"> Standard error </th>
   <th style="text-align:right;text-align: center;"> z </th>
   <th style="text-align:right;text-align: center;"> p </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Intercept </td>
   <td style="text-align:right;"> 1.5367843 </td>
   <td style="text-align:right;"> 0.0966569 </td>
   <td style="text-align:right;"> 15.8993682 </td>
   <td style="text-align:right;"> 0.0000000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Mac </td>
   <td style="text-align:right;"> -0.0407003 </td>
   <td style="text-align:right;"> 0.0825142 </td>
   <td style="text-align:right;"> -0.4932514 </td>
   <td style="text-align:right;"> 0.6218350 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Glasses </td>
   <td style="text-align:right;"> 0.1933065 </td>
   <td style="text-align:right;"> 0.0778710 </td>
   <td style="text-align:right;"> 2.4823948 </td>
   <td style="text-align:right;"> 0.0130503 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Pet: Dog </td>
   <td style="text-align:right;"> -0.2514314 </td>
   <td style="text-align:right;"> 0.0848378 </td>
   <td style="text-align:right;"> -2.9636722 </td>
   <td style="text-align:right;"> 0.0030399 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Pet: Cat </td>
   <td style="text-align:right;"> 0.0961617 </td>
   <td style="text-align:right;"> 0.1192778 </td>
   <td style="text-align:right;"> 0.8061995 </td>
   <td style="text-align:right;"> 0.4201278 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Pet: Fish </td>
   <td style="text-align:right;"> -1.1935940 </td>
   <td style="text-align:right;"> 0.1665636 </td>
   <td style="text-align:right;"> -7.1659949 </td>
   <td style="text-align:right;"> 0.0000000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Favorite color: Red </td>
   <td style="text-align:right;"> -0.0394540 </td>
   <td style="text-align:right;"> 0.0926567 </td>
   <td style="text-align:right;"> -0.4258078 </td>
   <td style="text-align:right;"> 0.6702479 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Favorite color: Green </td>
   <td style="text-align:right;"> -0.3813753 </td>
   <td style="text-align:right;"> 0.1006219 </td>
   <td style="text-align:right;"> -3.7901819 </td>
   <td style="text-align:right;"> 0.0001505 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Favorite color: Orange </td>
   <td style="text-align:right;"> -0.2420478 </td>
   <td style="text-align:right;"> 0.1390052 </td>
   <td style="text-align:right;"> -1.7412865 </td>
   <td style="text-align:right;"> 0.0816334 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Prior GPA
(0.6-pt increase) </td>
   <td style="text-align:right;"> 1.0309217 </td>
   <td style="text-align:right;"> 0.0388717 </td>
   <td style="text-align:right;"> 26.5211237 </td>
   <td style="text-align:right;"> 0.0000000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Height
(3-in increase) </td>
   <td style="text-align:right;"> -0.2590889 </td>
   <td style="text-align:right;"> 0.0383383 </td>
   <td style="text-align:right;"> -6.7579681 </td>
   <td style="text-align:right;"> 0.0000000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tutoring </td>
   <td style="text-align:right;"> 0.2269850 </td>
   <td style="text-align:right;"> 0.0758328 </td>
   <td style="text-align:right;"> 2.9932300 </td>
   <td style="text-align:right;"> 0.0027604 </td>
  </tr>
</tbody>
</table>

---

# Coefficients in a table

![blinking_meme](blinking_meme.jpg)

---

# Change in log odds

```r
log.odds.p = coefs.df %>%
  filter(parameter != "(Intercept)") %>%
  mutate(
    pretty.parameter = fct_reorder(pretty.parameter, est),
    lower.95 = est + (qnorm(0.025) * se),
    lower.50 = est + (qnorm(0.25) * se),
    upper.50 = est + (qnorm(0.75) * se),
    upper.95 = est + (qnorm(0.975) * se),
    signif = case_when(p > 0.05 ~ "Not significant",
                       est > 0 ~ "Positive",
                       est < 0 ~ "Negative"),
    signif = fct_relevel(signif, "Positive",
                         "Not significant",
                         "Negative")
  ) %>%
  ggplot(aes(x = pretty.parameter, color = signif)) +
  geom_linerange(aes(ymin = lower.95,
                     ymax = upper.95),
                 size = 1) +
  geom_linerange(aes(ymin = lower.50,
                     ymax = upper.50),
                 size = 2) +
  geom_point(aes(y = est), size = 3) +
  geom_hline(yintercept = 0) +
  scale_color_manual(
    "Relationship to\nlog odds of passing",
    values = c(good.color, neutral.color, bad.color)
  ) +
  labs(x = "", y = "Change in log odds",
       title = "Estimated relationships between\nstudent characteristics\nand log odds of passing") +
  coord_flip(clip = "off")
```
]

.pull-right[
<img src="RUG_presentation_files/figure-html/change_in_log_odds_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Change in log odds: Pros

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- The plot has a transparent relationship to the fitted model

---

# Change in log odds: Pros

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" />
]

- The plot has a transparent relationship to the fitted model

- Numbers all in one place: a single scale instead of a table of numbers
]

---

# Change in log odds: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Change in log odds: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- What is a 0.4 change in the log odds?

- Is the change between 0.4 and 0.8 log odds "big" or "small"?

- You probably don't want to give your audience a tutorial on the inverse logit function

---

# Secret log odds

```r
secret.log.odds.p = coefs.df %>%
  filter(parameter != "(Intercept)") %>%
  mutate(
    pretty.parameter = fct_reorder(pretty.parameter, est),
    lower.95 = est + (qnorm(0.025) * se),
    lower.50 = est + (qnorm(0.25) * se),
    upper.50 = est + (qnorm(0.75) * se),
    upper.95 = est + (qnorm(0.975) * se),
    signif = case_when(p > 0.05 ~ "Not significant",
                       est > 0 ~ "Positive",
                       est < 0 ~ "Negative"),
    signif = fct_relevel(signif, "Positive",
                         "Not significant",
                         "Negative")) %>%
  ggplot(aes(x = pretty.parameter, color = signif)) +
  geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) +
  geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) +
  geom_point(aes(y = est), size = 3) +
  geom_hline(yintercept = 0) +
  scale_y_continuous(
*   breaks = c(-1, 0, 1),
*   labels = c("← Lower",
*              "Same",
*              "Higher →")
  ) +
  scale_color_manual(
    "Relationship to\nlog odds of passing",
    values = c(good.color, neutral.color, bad.color)
  ) +
  labs(x = "", y = "Chance of passing",
       title = "Estimated relationships between\nstudent characteristics\nand chance of passing") +
  coord_flip()
```
]

.pull-right[
<img src="RUG_presentation_files/figure-html/change_in_log_odds_adjusted_axis_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Secret log odds: Pros

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Secret log odds: Pros

.pull-left[
<img src="RUG_presentation_files/figure-html/change_in_log_adds_adjusted_axis_highlighted_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- No numbers for your audience to misinterpret

---

# Secret log odds: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- Your audience might ask "where are the numbers?" anyway

---

#  Change in odds ratio

- Your audience may be more familiar with the "odds" part of log odds

---

`$$\log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n$$`

---

`$$\log\left(\boxed{\begin{array}{c}\frac{p}{1 - p}\end{array}}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n$$`

- Can't we just exponentiate to get the odds?

`$$\frac{p}{1 - p} = e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}$$`

- Now the effect of a coefficient is multiplicative, not additive

`$$\frac{p}{1 - p} = e^{\beta_ix_i} \cdot e^{\beta_0 + \beta_1x_1 + \ldots + \beta_{i-1}x_{i-1} + \beta_{i+1}x_{i+1} + \ldots + \beta_nx_n}$$`

---

# Change in odds ratio

```r
odds.ratio.p = coefs.df %>%
  filter(parameter != "(Intercept)") %>%
  mutate(
    pretty.parameter = fct_reorder(pretty.parameter, est),
    lower.95 = est + (qnorm(0.025) * se),
    lower.50 = est + (qnorm(0.25) * se),
    upper.50 = est + (qnorm(0.75) * se),
    upper.95 = est + (qnorm(0.975) * se),
    signif = case_when(p > 0.05 ~ "Not significant",
                       est > 0 ~ "Positive",
                       est < 0 ~ "Negative"),
    signif = fct_relevel(signif, "Positive",
                         "Not significant",
                         "Negative")
  ) %>%
* mutate(across(matches("est|lower|upper"),
*               ~ exp(.))) %>%
  ggplot(aes(x = pretty.parameter, color = signif)) +
  geom_linerange(aes(ymin = lower.95,
                     ymax = upper.95),
                 size = 1) +
  geom_linerange(aes(ymin = lower.50,
                     ymax = upper.50),
                 size = 2) +
  geom_point(aes(y = est), size = 3) +
  geom_hline(yintercept = 1) +
  scale_y_continuous(
*   labels = scales::percent_format()
  ) +
  scale_color_manual("Relationship to\nlog odds of passing",
                     values = c(good.color, neutral.color, bad.color)) +
  labs(x = "", y = "% change in odds ratio",
       title = "Estimated relationships between\nstudent characteristics\nand odds ratio of passing") +
  coord_flip()
```
]

.pull-right[
<img src="RUG_presentation_files/figure-html/odds_ratio_adjusted_axis_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Change in odds ratio: Pros

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- Still pretty easy: a simple transformation of your model coefficients

---

# Change in odds ratio: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Change in odds ratio: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" />
]

- Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4"
]

---

# Change in odds ratio: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" />
]

- Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4"

- The unfamiliar format may undo the benefit of using a familiar concept

{{content}}
]

- Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds

---

# Change in odds ratio: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/odds_ratio_adjusted_axis_highlighted_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

- Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4"

- The unfamiliar format may undo the benefit of using a familiar concept

- Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds

- Percent change in odds (300% = triple the odds) might be misinterpreted as a probability
]

---

# Change in odds ratio: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" />
]

- Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4"

- The unfamiliar format may undo the benefit of using a familiar concept

- Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds

- Percent change in odds (300% = triple the odds) might be misinterpreted as a probability

- Now we're pretty far removed from familiar scales
]

---

# Change in odds ratio: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Change in odds ratio: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" />
]

- The scale is expanded for positive effects and compressed for negative effects

---

# Visualization family 2:

# Presenting probabilities

---

# Probabilities relative to a baseline

-  Problem with probabilities: change in percentage points depends on baseline starting value

- We can choose an appropriate baseline probability, then compute the marginal effect of a predictor given that baseline

- Options for baseline:

- Model intercept

- Observed outcome % in dataset (similar to intercept if continuous predictors are centered and other coefficients aren't too large)

- Observed outcome % for a certain group (e.g., students with no tutoring)

- Some % that's meaningful in context (e.g., 85% pass rate in typical years)

---

# Probabilities relative to a baseline

- Baseline probability: inverse logit of the intercept

`$$p_0 = \mbox{logit}^{-1}(\beta_0)$$`

- Probability with discrete predictor `$i$`: inverse logit of intercept + predictor coefficient

`$$p_i = \mbox{logit}^{-1}(\beta_0 + \beta_i)$$`

- For a continuous predictor `$j$`, pick a change in predictor value that makes sense

- One standard deviation

- A context-specific benchmark (e.g., 1 point for GPA, 100 points on the SAT)

`$$p_j = \mbox{logit}^{-1}(\beta_0 + \beta_j\Delta x_j)$$`

- To show uncertainty, get confidence interval before inverse logit transformation

---

# Probabilities relative to a baseline

```r
intercept = coefs.df$est[coefs.df$parameter == "(Intercept)"]
prob.baseline.p = coefs.df %>%
  filter(parameter != "(Intercept)") %>%
  mutate(pretty.parameter = fct_reorder(pretty.parameter, est),
         lower.95 = est + (qnorm(0.025) * se),
         lower.50 = est + (qnorm(0.25) * se),
         upper.50 = est + (qnorm(0.75) * se),
         upper.95 = est + (qnorm(0.975) * se),
         signif = case_when(p > 0.05 ~ "Not significant",
                            est > 0 ~ "Positive",
                            est < 0 ~ "Negative"),
         signif = fct_relevel(signif, "Positive", "Not significant", "Negative")) %>%
  mutate(across(
*   matches("est|lower|upper"),
*   ~ invlogit(. + intercept)
  )) %>%
  ggplot(aes(x = pretty.parameter, color = signif)) +
  geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) +
  geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) +
  geom_point(aes(y = est), size = 3) +
  geom_hline(
*   yintercept = invlogit(intercept)
  ) +
  scale_y_continuous(
*   limits = c(0, 1),
*   labels = scales::percent_format()
  ) +
  scale_color_manual("Relationship to\nprobability of passing",
                     values = c(good.color, neutral.color, bad.color)) +
  labs(x = "", y = "Probability of passing",
       title = "Estimated relationships between\nstudent characteristics\nand probability of passing") +
  coord_flip() +
  theme_bw()
```
]

.pull-right[
<img src="RUG_presentation_files/figure-html/probability_baseline_plot-1.png" width="100%" style="display: block; margin: auto;" />

- (Uncertainty in intercept is not represented here)
]

---

# Probabilities relative to a baseline: Pros

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- Avoids the "percent change" formulation (common but misleading)

---

# Probabilities relative to a baseline: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/probability_baseline_highlighted_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- Using the intercept as a baseline chooses reference categories for categorical variables

---

# Probabilities relative to a baseline: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" />
]

- Using the intercept as a baseline chooses reference categories for categorical variables

- Students who don't use Macs, don't wear glasses, etc.
]

---

# Probabilities relative to a baseline: Cons

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" />
]

- Using the intercept as a baseline chooses reference categories for categorical variables

- Students who don't use Macs, don't wear glasses, etc.

- Not an appropriate choice for all datasets

{{content}}
]

- Doesn't show full range of possible effects at different baselines

---

# Probabilities relative to a baseline: Arrows

```r
prob.baseline.arrows.p = coefs.df %>%
  filter(parameter != "(Intercept)") %>%
  mutate(pretty.parameter = fct_reorder(pretty.parameter, est),
         signif = case_when(p > 0.05 ~ "Not significant",
                            est > 0 ~ "Positive",
                            est < 0 ~ "Negative"),
         signif = fct_relevel(signif, "Positive",
                              "Not significant",
                              "Negative"),
         est = invlogit(est + intercept)) %>%
* ggplot(aes(x = invlogit(intercept),
*            xend = est,
*            y = pretty.parameter,
*            yend = pretty.parameter,
*            color = signif)) +
* geom_segment(
*   size = 1,
*   arrow = arrow(length = unit(0.1, "in"),
*                 type = "closed")
* ) +
  geom_vline(xintercept = invlogit(intercept)) +
  scale_x_continuous(
    limits = c(0, 1),
    labels = scales::percent_format()
  ) +
  scale_color_manual("Relationship to\nprobability of passing",
                     values = c(good.color, neutral.color, bad.color)) +
  labs(x = "Probability of passing", y = "",
       title = "Estimated relationships between\nstudent characteristics\nand probability of passing") +
  theme_bw()
```
]

.pull-right[
<img src="RUG_presentation_files/figure-html/probability_baseline_arrows_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Probabilities relative to a baseline: Arrows

.pull-left[
<img src="RUG_presentation_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" />
]

{{content}}
]

- Doesn't show uncertainty around estimates

- Strong causal implications

---

# Multiple baselines by group

- Instead of one baseline probability, why not several?

- Example: show effect of `$i$` for each level of categorical variable `$j$`

`$$p_{j1} = \mbox{logit}^{-1}(\beta_0 + \beta_{j1})$$`

`$$p_{j1 + i} = \mbox{logit}^{-1}(\beta_0 + \beta_{j1} + \beta_i)$$`

---

# Multiple baselines by group

```r
prob.group.p = expand.grid(pet = c("None", "Dog", "Cat", "Fish"),
                           other.parameter = coefs.df %>%
                             filter(!grepl("pet\\.type|Intercept", parameter)) %>%
                             pull(parameter)) %>%
  mutate(pet.parameter = paste("pet.type", str_to_lower(pet), sep = "")) %>%
  left_join(coefs.df, by = c("pet.parameter" = "parameter")) %>%
  mutate(pretty.parameter = coalesce(pretty.parameter, "Pet: None"),
         mu = intercept + coalesce(est, 0),
         baseline.mu = mu) %>%
  dplyr::select(pet, other.parameter, mu, baseline.mu) %>%
  left_join(coefs.df, by = c("other.parameter" = "parameter")) %>%
  mutate(pretty.parameter = fct_reorder(pretty.parameter, est),
         mu = mu + est,
         lower.95 = mu + (qnorm(0.025) * se),
         lower.50 = mu + (qnorm(0.25) * se),
         upper.50 = mu + (qnorm(0.75) * se),
         upper.95 = mu + (qnorm(0.975) * se),
         signif = case_when(p > 0.05 ~ "Not significant",
                            est > 0 ~ "Positive",
                            est < 0 ~ "Negative"),
         signif = fct_relevel(signif, "Positive", "Not significant", "Negative")) %>%
  mutate(across(matches("mu|lower|upper"), ~ invlogit(.))) %>%
  ggplot(aes(x = pretty.parameter, color = signif)) +
  geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) +
  geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) +
  geom_point(aes(y = mu), size = 3) +
  geom_hline(aes(yintercept = baseline.mu)) +
  scale_y_continuous(limits = c(0, 1), labels = scales::percent_format()) +
  scale_color_manual("Relationship to\nprobability of passing",
                     values = c(good.color, neutral.color, bad.color)) +
  facet_wrap(~ pet) +
  theme(panel.spacing.x = unit(0.65, "lines")) +
  labs(x = "", y = "Probability of passing", subtitle = "By type of pet",
       title = "Estimated relationships between\nstudent characteristics\nand probability of passing") +
  coord_flip()
```
]

.pull-right[
<img src="RUG_presentation_files/figure-html/probability_group_plot-1.png" width="100%" style="display: block; margin: auto;" />
]

---