class: center, middle, inverse, title-slide .title[ # Visualizing logistic regression results for non-technical audiences ] .author[ ### Abby Kaplan and Keiko Cawley ] .date[ ### September 20, 2022 ] --- class: inverse, center, middle # GitHub ### https://github.com/keikcaw/visualizing-logistic-regression <style type="text/css"> .remark-slide-content h1 { margin-bottom: 0em; } .remark-code { font-size: 60% !important; } </style> --- class: inverse, center, middle # Logistic regression review --- # Logistic regression: Binary outcomes - Use logistic regression to model a binary outcome -- - Examples from higher education: -- - Did the student pass the class? -- - Did the student enroll for another term? -- - Did the student graduate? --- # The design of logistic regression - We want to model the probability that the outcome happened -- - But probabilities are bounded between 0 and 1 -- - Instead, we model the logit of the probability: $$ \mbox{logit}(p) = \log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n $$ --- class: inverse, center, middle # What's the problem? --- layout: true # Just tell me "the" effect - Stakeholders often want to know whether something affects outcomes, and by how much --- -- - But we don't model probabilities directly $$ \log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n $$ --- - But we don't model probabilities directly $$ \boxed{\log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right)} = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n $$ -- - We can solve for _p_: $$ `\begin{aligned} p & = \mbox{logit}^{-1}(\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n) \\ & \\ & = \frac{e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}{1 + e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}} \end{aligned}` $$ --- layout: true # "The" effect is nonlinear in _p_ $$ `\begin{aligned} p & = \frac{e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}{1 + e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}} \end{aligned}` $$ --- -- <img src="RUG_presentation_files/figure-html/unnamed-chunk-2-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="RUG_presentation_files/figure-html/unnamed-chunk-3-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="RUG_presentation_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" /> --- layout: false class: inverse, center, middle # Sample dataset and model --- # Dataset - Our simulated dataset describes students who took Balloon Animal-Making 201 at University Imaginary -- <table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Variable </th> <th style="text-align:left;"> Possible Responses </th> <th style="text-align:left;"> Variable Type </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Mac user </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> <tr> <td style="text-align:left;"> Wear glasses </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> <tr> <td style="text-align:left;"> Pet type </td> <td style="text-align:left;"> dog, cat, fish, none </td> <td style="text-align:left;"> categorical </td> </tr> <tr> <td style="text-align:left;"> Favorite color </td> <td style="text-align:left;"> blue, red, green, orange </td> <td style="text-align:left;"> categorical </td> </tr> <tr> <td style="text-align:left;"> Prior undergraduate GPA </td> <td style="text-align:left;"> 0.0-4.0 </td> <td style="text-align:left;"> continuous </td> </tr> <tr> <td style="text-align:left;"> Height </td> <td style="text-align:left;"> 54-77 inches </td> <td style="text-align:left;"> continuous </td> </tr> <tr> <td style="text-align:left;"> Went to tutoring </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> <tr> <td style="text-align:left;"> Passed </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> </tbody> </table> --- # Dataset ```r library(tidyverse) df = read.csv("data/course_outcomes.csv", header = T, stringsAsFactors = F) ``` <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> mac </th> <th style="text-align:left;"> glasses </th> <th style="text-align:left;"> pet.type </th> <th style="text-align:left;"> favorite.color </th> <th style="text-align:right;"> prior.gpa </th> <th style="text-align:right;"> height </th> <th style="text-align:left;"> tutoring </th> <th style="text-align:left;"> passed </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.86 </td> <td style="text-align:right;"> 63 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> cat </td> <td style="text-align:left;"> green </td> <td style="text-align:right;"> 2.37 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> none </td> <td style="text-align:left;"> orange </td> <td style="text-align:right;"> 3.98 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.78 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> cat </td> <td style="text-align:left;"> blue </td> <td style="text-align:right;"> 3.73 </td> <td style="text-align:right;"> 67 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> green </td> <td style="text-align:right;"> 3.99 </td> <td style="text-align:right;"> 61 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> none </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.75 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.73 </td> <td style="text-align:right;"> 61 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> none </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 64 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> orange </td> <td style="text-align:right;"> 3.64 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.98 </td> <td style="text-align:right;"> 63 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> cat </td> <td style="text-align:left;"> green </td> <td style="text-align:right;"> 3.72 </td> <td style="text-align:right;"> 64 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> </tbody> </table> --- # Model - Dependent variable: did the student pass? -- - Continuous variables were centered and standardized ```r df = df %>% mutate(cs.prior.gpa = (prior.gpa - mean(prior.gpa)) / sd(prior.gpa), cs.height = (height - mean(height)) / sd(height)) ``` -- - Reference levels for categorical variables: - Pet type: none - Favorite color: blue ```r df = df %>% mutate(pet.type = fct_relevel(pet.type, "none", "dog", "cat", "fish"), favorite.color = fct_relevel(favorite.color, "blue", "red", "green", "orange")) ``` --- # Model ```r library(lme4) pass.m = glm(passed ~ mac + glasses + pet.type + favorite.color + cs.prior.gpa + cs.height + tutoring, data = df, family = binomial(link = "logit")) summary(pass.m)$coefficients ``` ``` ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.53678432 0.09665694 15.8993682 6.400698e-57 ## macTRUE -0.04070026 0.08251424 -0.4932514 6.218350e-01 ## glassesTRUE 0.19330654 0.07787099 2.4823948 1.305026e-02 ## pet.typedog -0.25143138 0.08483778 -2.9636722 3.039919e-03 ## pet.typecat 0.09616174 0.11927784 0.8061995 4.201278e-01 ## pet.typefish -1.19359401 0.16656361 -7.1659949 7.722363e-13 ## favorite.colorred -0.03945396 0.09265674 -0.4258078 6.702479e-01 ## favorite.colorgreen -0.38137532 0.10062190 -3.7901819 1.505370e-04 ## favorite.colororange -0.24204783 0.13900517 -1.7412865 8.163337e-02 ## cs.prior.gpa 1.03092175 0.03887172 26.5211237 5.531945e-155 ## cs.height -0.25908893 0.03833829 -6.7579681 1.399404e-11 ## tutoringTRUE 0.22698497 0.07583279 2.9932300 2.760416e-03 ``` --- # Model ```r library(lme4) pass.m = glm(passed ~ mac + glasses + pet.type + favorite.color + cs.prior.gpa + cs.height + tutoring, data = df, family = binomial(link = "logit")) summary(pass.m)$coefficients ``` ``` ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.53678432 0.09665694 15.8993682 6.400698e-57 ## macTRUE -0.04070026 0.08251424 -0.4932514 6.218350e-01 *## glassesTRUE 0.19330654 0.07787099 2.4823948 1.305026e-02 *## pet.typedog -0.25143138 0.08483778 -2.9636722 3.039919e-03 ## pet.typecat 0.09616174 0.11927784 0.8061995 4.201278e-01 *## pet.typefish -1.19359401 0.16656361 -7.1659949 7.722363e-13 ## favorite.colorred -0.03945396 0.09265674 -0.4258078 6.702479e-01 *## favorite.colorgreen -0.38137532 0.10062190 -3.7901819 1.505370e-04 ## favorite.colororange -0.24204783 0.13900517 -1.7412865 8.163337e-02 *## cs.prior.gpa 1.03092175 0.03887172 26.5211237 5.531945e-155 *## cs.height -0.25908893 0.03833829 -6.7579681 1.399404e-11 *## tutoringTRUE 0.22698497 0.07583279 2.9932300 2.760416e-03 ``` --- # Causality disclaimer - Some visualizations strongly imply a causal interpretation -- - It's your responsibility to evaluate whether a causal interpretation is appropriate -- - If the data doesn't support a causal interpretation, **don't use a visualization that implies one** --- # Model coefficients ```r coefs.df = summary(pass.m)$coefficients %>% data.frame() %>% rownames_to_column("parameter") %>% mutate(pretty.parameter = case_when(parameter == "(Intercept)" ~ "Intercept", grepl("TRUE$", parameter) ~ str_to_title(gsub("TRUE", "", parameter)), grepl("pet\\.type", parameter) ~ paste("Pet:", str_to_title(gsub("pet\\.type", "", parameter))), grepl("favorite\\.color", parameter) ~ paste("Favorite color:", str_to_title(gsub("favorite\\.color", "", parameter))), parameter == "cs.prior.gpa" ~ paste("Prior GPA\n(", round(sd(df$prior.gpa), 1), "-pt increase)", sep = ""), parameter == "cs.height" ~ paste("Height\n(", round(sd(df$height), 1), "-in increase)", sep = ""))) %>% dplyr::select(parameter, pretty.parameter, est = Estimate, se = Std..Error, z = z.value, p = Pr...z..) ``` --- # Color palette ```r good.color = "#0571B0" neutral.color = "gray" bad.color = "#CA0020" ``` <img src="RUG_presentation_files/figure-html/unnamed-chunk-5-1.png" width="70%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Visualization family 1: # Presenting model coefficients --- # Coefficients in a table <table class="table table-striped table-hover" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;text-align: center;"> Parameter </th> <th style="text-align:right;text-align: center;"> Estimate </th> <th style="text-align:right;text-align: center;"> Standard error </th> <th style="text-align:right;text-align: center;"> z </th> <th style="text-align:right;text-align: center;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Intercept </td> <td style="text-align:right;"> 1.5367843 </td> <td style="text-align:right;"> 0.0966569 </td> <td style="text-align:right;"> 15.8993682 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Mac </td> <td style="text-align:right;"> -0.0407003 </td> <td style="text-align:right;"> 0.0825142 </td> <td style="text-align:right;"> -0.4932514 </td> <td style="text-align:right;"> 0.6218350 </td> </tr> <tr> <td style="text-align:left;"> Glasses </td> <td style="text-align:right;"> 0.1933065 </td> <td style="text-align:right;"> 0.0778710 </td> <td style="text-align:right;"> 2.4823948 </td> <td style="text-align:right;"> 0.0130503 </td> </tr> <tr> <td style="text-align:left;"> Pet: Dog </td> <td style="text-align:right;"> -0.2514314 </td> <td style="text-align:right;"> 0.0848378 </td> <td style="text-align:right;"> -2.9636722 </td> <td style="text-align:right;"> 0.0030399 </td> </tr> <tr> <td style="text-align:left;"> Pet: Cat </td> <td style="text-align:right;"> 0.0961617 </td> <td style="text-align:right;"> 0.1192778 </td> <td style="text-align:right;"> 0.8061995 </td> <td style="text-align:right;"> 0.4201278 </td> </tr> <tr> <td style="text-align:left;"> Pet: Fish </td> <td style="text-align:right;"> -1.1935940 </td> <td style="text-align:right;"> 0.1665636 </td> <td style="text-align:right;"> -7.1659949 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Favorite color: Red </td> <td style="text-align:right;"> -0.0394540 </td> <td style="text-align:right;"> 0.0926567 </td> <td style="text-align:right;"> -0.4258078 </td> <td style="text-align:right;"> 0.6702479 </td> </tr> <tr> <td style="text-align:left;"> Favorite color: Green </td> <td style="text-align:right;"> -0.3813753 </td> <td style="text-align:right;"> 0.1006219 </td> <td style="text-align:right;"> -3.7901819 </td> <td style="text-align:right;"> 0.0001505 </td> </tr> <tr> <td style="text-align:left;"> Favorite color: Orange </td> <td style="text-align:right;"> -0.2420478 </td> <td style="text-align:right;"> 0.1390052 </td> <td style="text-align:right;"> -1.7412865 </td> <td style="text-align:right;"> 0.0816334 </td> </tr> <tr> <td style="text-align:left;"> Prior GPA (0.6-pt increase) </td> <td style="text-align:right;"> 1.0309217 </td> <td style="text-align:right;"> 0.0388717 </td> <td style="text-align:right;"> 26.5211237 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Height (3-in increase) </td> <td style="text-align:right;"> -0.2590889 </td> <td style="text-align:right;"> 0.0383383 </td> <td style="text-align:right;"> -6.7579681 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Tutoring </td> <td style="text-align:right;"> 0.2269850 </td> <td style="text-align:right;"> 0.0758328 </td> <td style="text-align:right;"> 2.9932300 </td> <td style="text-align:right;"> 0.0027604 </td> </tr> </tbody> </table> --- # Coefficients in a table  --- # Change in log odds .pull-left[ ```r log.odds.p = coefs.df %>% filter(parameter != "(Intercept)") %>% mutate( pretty.parameter = fct_reorder(pretty.parameter, est), lower.95 = est + (qnorm(0.025) * se), lower.50 = est + (qnorm(0.25) * se), upper.50 = est + (qnorm(0.75) * se), upper.95 = est + (qnorm(0.975) * se), signif = case_when(p > 0.05 ~ "Not significant", est > 0 ~ "Positive", est < 0 ~ "Negative"), signif = fct_relevel(signif, "Positive", "Not significant", "Negative") ) %>% ggplot(aes(x = pretty.parameter, color = signif)) + geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) + geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) + geom_point(aes(y = est), size = 3) + geom_hline(yintercept = 0) + scale_color_manual( "Relationship to\nlog odds of passing", values = c(good.color, neutral.color, bad.color) ) + labs(x = "", y = "Change in log odds", title = "Estimated relationships between\nstudent characteristics\nand log odds of passing") + coord_flip(clip = "off") ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/change_in_log_odds_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Change in log odds: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - It's clear which relationships are positive and which are negative {{content}} ] -- - The plot has a transparent relationship to the fitted model --- # Change in log odds: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - It's clear which relationships are positive and which are negative - The plot has a transparent relationship to the fitted model - Numbers all in one place: a single scale instead of a table of numbers ] --- # Change in log odds: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Change in log odds: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - The magnitude of effect is in the log odds scale {{content}} ] -- - What is a 0.4 change in the log odds? {{content}} -- - Is the change between 0.4 and 0.8 log odds "big" or "small"? {{content}} -- - You probably don't want to give your audience a tutorial on the inverse logit function --- # Secret log odds .pull-left[ ```r secret.log.odds.p = coefs.df %>% filter(parameter != "(Intercept)") %>% mutate( pretty.parameter = fct_reorder(pretty.parameter, est), lower.95 = est + (qnorm(0.025) * se), lower.50 = est + (qnorm(0.25) * se), upper.50 = est + (qnorm(0.75) * se), upper.95 = est + (qnorm(0.975) * se), signif = case_when(p > 0.05 ~ "Not significant", est > 0 ~ "Positive", est < 0 ~ "Negative"), signif = fct_relevel(signif, "Positive", "Not significant", "Negative")) %>% ggplot(aes(x = pretty.parameter, color = signif)) + geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) + geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) + geom_point(aes(y = est), size = 3) + geom_hline(yintercept = 0) + scale_y_continuous( * breaks = c(-1, 0, 1), * labels = c("← Lower", * "Same", * "Higher →") ) + scale_color_manual( "Relationship to\nlog odds of passing", values = c(good.color, neutral.color, bad.color) ) + labs(x = "", y = "Chance of passing", title = "Estimated relationships between\nstudent characteristics\nand chance of passing") + coord_flip() ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/change_in_log_odds_adjusted_axis_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Secret log odds: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Secret log odds: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/change_in_log_adds_adjusted_axis_highlighted_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Easy: just relabel the x-axis {{content}} ] -- - No numbers for your audience to misinterpret --- # Secret log odds: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Can't convey absolute magnitude of an effect {{content}} ] -- - Your audience might ask "where are the numbers?" anyway --- layout: true # Change in odds ratio - Your audience may be more familiar with the "odds" part of log odds --- --- `$$\log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n$$` --- `$$\log\left(\boxed{\begin{array}{c}\frac{p}{1 - p}\end{array}}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n$$` -- - Can't we just exponentiate to get the odds? -- `$$\frac{p}{1 - p} = e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}$$` -- - Now the effect of a coefficient is multiplicative, not additive -- `$$\frac{p}{1 - p} = e^{\beta_ix_i} \cdot e^{\beta_0 + \beta_1x_1 + \ldots + \beta_{i-1}x_{i-1} + \beta_{i+1}x_{i+1} + \ldots + \beta_nx_n}$$` --- layout: false # Change in odds ratio .pull-left[ ```r odds.ratio.p = coefs.df %>% filter(parameter != "(Intercept)") %>% mutate( pretty.parameter = fct_reorder(pretty.parameter, est), lower.95 = est + (qnorm(0.025) * se), lower.50 = est + (qnorm(0.25) * se), upper.50 = est + (qnorm(0.75) * se), upper.95 = est + (qnorm(0.975) * se), signif = case_when(p > 0.05 ~ "Not significant", est > 0 ~ "Positive", est < 0 ~ "Negative"), signif = fct_relevel(signif, "Positive", "Not significant", "Negative") ) %>% * mutate(across(matches("est|lower|upper"), * ~ exp(.))) %>% ggplot(aes(x = pretty.parameter, color = signif)) + geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) + geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) + geom_point(aes(y = est), size = 3) + geom_hline(yintercept = 1) + scale_y_continuous( * labels = scales::percent_format() ) + scale_color_manual("Relationship to\nlog odds of passing", values = c(good.color, neutral.color, bad.color)) + labs(x = "", y = "% change in odds ratio", title = "Estimated relationships between\nstudent characteristics\nand odds ratio of passing") + coord_flip() ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/odds_ratio_adjusted_axis_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Change in odds ratio: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Changes in odds might be easier to describe than changes in log odds {{content}} ] -- - Still pretty easy: a simple transformation of your model coefficients --- # Change in odds ratio: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Not the way we usually describe odds ] --- # Change in odds ratio: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" ] --- # Change in odds ratio: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" - The unfamiliar format may undo the benefit of using a familiar concept {{content}} ] -- - Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds --- # Change in odds ratio: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/odds_ratio_adjusted_axis_highlighted_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" - The unfamiliar format may undo the benefit of using a familiar concept - Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds - Percent change in odds (300% = triple the odds) might be misinterpreted as a probability ] --- # Change in odds ratio: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" - The unfamiliar format may undo the benefit of using a familiar concept - Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds - Percent change in odds (300% = triple the odds) might be misinterpreted as a probability - Now we're pretty far removed from familiar scales ] --- # Change in odds ratio: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Change in odds ratio: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] - The scale is expanded for positive effects and compressed for negative effects --- class: inverse, center, middle # Visualization family 2: # Presenting probabilities --- # Probabilities relative to a baseline - Problem with probabilities: change in percentage points depends on baseline starting value -- - We can choose an appropriate baseline probability, then compute the marginal effect of a predictor given that baseline -- - Options for baseline: -- - Model intercept -- - Observed outcome % in dataset (similar to intercept if continuous predictors are centered and other coefficients aren't too large) -- - Observed outcome % for a certain group (e.g., students with no tutoring) -- - Some % that's meaningful in context (e.g., 85% pass rate in typical years) --- # Probabilities relative to a baseline - Baseline probability: inverse logit of the intercept `$$p_0 = \mbox{logit}^{-1}(\beta_0)$$` -- - Probability with discrete predictor `\(i\)`: inverse logit of intercept + predictor coefficient `$$p_i = \mbox{logit}^{-1}(\beta_0 + \beta_i)$$` -- - For a continuous predictor `\(j\)`, pick a change in predictor value that makes sense -- - One standard deviation -- - A context-specific benchmark (e.g., 1 point for GPA, 100 points on the SAT) -- `$$p_j = \mbox{logit}^{-1}(\beta_0 + \beta_j\Delta x_j)$$` -- - To show uncertainty, get confidence interval before inverse logit transformation --- # Probabilities relative to a baseline .pull-left[ ```r intercept = coefs.df$est[coefs.df$parameter == "(Intercept)"] prob.baseline.p = coefs.df %>% filter(parameter != "(Intercept)") %>% mutate(pretty.parameter = fct_reorder(pretty.parameter, est), lower.95 = est + (qnorm(0.025) * se), lower.50 = est + (qnorm(0.25) * se), upper.50 = est + (qnorm(0.75) * se), upper.95 = est + (qnorm(0.975) * se), signif = case_when(p > 0.05 ~ "Not significant", est > 0 ~ "Positive", est < 0 ~ "Negative"), signif = fct_relevel(signif, "Positive", "Not significant", "Negative")) %>% mutate(across( * matches("est|lower|upper"), * ~ invlogit(. + intercept) )) %>% ggplot(aes(x = pretty.parameter, color = signif)) + geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) + geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) + geom_point(aes(y = est), size = 3) + geom_hline( * yintercept = invlogit(intercept) ) + scale_y_continuous( * limits = c(0, 1), * labels = scales::percent_format() ) + scale_color_manual("Relationship to\nprobability of passing", values = c(good.color, neutral.color, bad.color)) + labs(x = "", y = "Probability of passing", title = "Estimated relationships between\nstudent characteristics\nand probability of passing") + coord_flip() + theme_bw() ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/probability_baseline_plot-1.png" width="100%" style="display: block; margin: auto;" /> - (Uncertainty in intercept is not represented here) ] --- # Probabilities relative to a baseline: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Familiar scale: probabilities, expressed as percentages {{content}} ] -- - Avoids the "percent change" formulation (common but misleading) --- # Probabilities relative to a baseline: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/probability_baseline_highlighted_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Have to choose a baseline; there may be no "good" choice {{content}} ] -- - Using the intercept as a baseline chooses reference categories for categorical variables --- # Probabilities relative to a baseline: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Have to choose a baseline; there may be no "good" choice - Using the intercept as a baseline chooses reference categories for categorical variables - Students who don't use Macs, don't wear glasses, etc. ] --- # Probabilities relative to a baseline: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Have to choose a baseline; there may be no "good" choice - Using the intercept as a baseline chooses reference categories for categorical variables - Students who don't use Macs, don't wear glasses, etc. - Not an appropriate choice for all datasets {{content}} ] -- - Doesn't show full range of possible effects at different baselines --- # Probabilities relative to a baseline: Arrows .pull-left[ ```r prob.baseline.arrows.p = coefs.df %>% filter(parameter != "(Intercept)") %>% mutate(pretty.parameter = fct_reorder(pretty.parameter, est), signif = case_when(p > 0.05 ~ "Not significant", est > 0 ~ "Positive", est < 0 ~ "Negative"), signif = fct_relevel(signif, "Positive", "Not significant", "Negative"), est = invlogit(est + intercept)) %>% * ggplot(aes(x = invlogit(intercept), * xend = est, * y = pretty.parameter, * yend = pretty.parameter, * color = signif)) + * geom_segment( * size = 1, * arrow = arrow(length = unit(0.1, "in"), * type = "closed") * ) + geom_vline(xintercept = invlogit(intercept)) + scale_x_continuous( limits = c(0, 1), labels = scales::percent_format() ) + scale_color_manual("Relationship to\nprobability of passing", values = c(good.color, neutral.color, bad.color)) + labs(x = "Probability of passing", y = "", title = "Estimated relationships between\nstudent characteristics\nand probability of passing") + theme_bw() ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/probability_baseline_arrows_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Probabilities relative to a baseline: Arrows .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Emphasizes direction of effect {{content}} ] -- - Doesn't show uncertainty around estimates {{content}} -- - Strong causal implications --- # Multiple baselines by group - Instead of one baseline probability, why not several? -- - Example: show effect of `\(i\)` for each level of categorical variable `\(j\)` -- `$$p_{j1} = \mbox{logit}^{-1}(\beta_0 + \beta_{j1})$$` -- `$$p_{j1 + i} = \mbox{logit}^{-1}(\beta_0 + \beta_{j1} + \beta_i)$$` --- # Multiple baselines by group .pull-left[ ```r prob.group.p = expand.grid(pet = c("None", "Dog", "Cat", "Fish"), other.parameter = coefs.df %>% filter(!grepl("pet\\.type|Intercept", parameter)) %>% pull(parameter)) %>% mutate(pet.parameter = paste("pet.type", str_to_lower(pet), sep = "")) %>% left_join(coefs.df, by = c("pet.parameter" = "parameter")) %>% mutate(pretty.parameter = coalesce(pretty.parameter, "Pet: None"), mu = intercept + coalesce(est, 0), baseline.mu = mu) %>% dplyr::select(pet, other.parameter, mu, baseline.mu) %>% left_join(coefs.df, by = c("other.parameter" = "parameter")) %>% mutate(pretty.parameter = fct_reorder(pretty.parameter, est), mu = mu + est, lower.95 = mu + (qnorm(0.025) * se), lower.50 = mu + (qnorm(0.25) * se), upper.50 = mu + (qnorm(0.75) * se), upper.95 = mu + (qnorm(0.975) * se), signif = case_when(p > 0.05 ~ "Not significant", est > 0 ~ "Positive", est < 0 ~ "Negative"), signif = fct_relevel(signif, "Positive", "Not significant", "Negative")) %>% mutate(across(matches("mu|lower|upper"), ~ invlogit(.))) %>% ggplot(aes(x = pretty.parameter, color = signif)) + geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) + geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) + geom_point(aes(y = mu), size = 3) + geom_hline(aes(yintercept = baseline.mu)) + scale_y_continuous(limits = c(0, 1), labels = scales::percent_format()) + scale_color_manual("Relationship to\nprobability of passing", values = c(good.color, neutral.color, bad.color)) + facet_wrap(~ pet) + theme(panel.spacing.x = unit(0.65, "lines")) + labs(x = "", y = "Probability of passing", subtitle = "By type of pet", title = "Estimated relationships between\nstudent characteristics\nand probability of passing") + coord_flip() ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/probability_group_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Emphasizes that the baseline we show is a _choice_ ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/probability_group_highlighted_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups - Effect of GPA is larger for fish owners than for dog owners ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups - Effect of GPA is larger for fish owners than for dog owners - Here, this is purely because of fish owners' lower baseline ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups - Effect of GPA is larger for fish owners than for dog owners - Here, this is purely because of fish owners' lower baseline - But we could also show the effects of an interaction term in the model {{content}} ] -- - We can use arrows here as well --- # Multiple baselines by group: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-30-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Still have to choose baselines by group {{content}} ] -- - May suggest essentializing interpretations of groups {{content}} -- - Cluttered --- # Banana graphs - We can overcome the baseline-choosing problem by iterating across every baseline -- - For example: -- - Start with every possible probability of passing Balloon Animal-Making 201, from 0% to 100% (at sufficiently small intervals) -- - For each probability, add the effect of having a pet fish -- `$$p_f = \mbox{logit}^{-1}(\mbox{logit}(p_0) + \beta_f)$$` --- # Banana graphs .pull-left[ ```r est = coefs.df$est[coefs.df$parameter == "pet.typefish"] se = coefs.df$se[coefs.df$parameter == "pet.typefish"] effect.color = case_when(coefs.df$p[coefs.df$parameter == "pet.typefish"] > 0.05 ~ neutral.color, est > 0 ~ good.color, T ~ bad.color) banana.p = data.frame(x = seq(0.01, 0.99, 0.01), upper.95 = 0.975, upper.50 = 0.75, median = 0.5, lower.50 = 0.25, lower.95 = 0.025) %>% * mutate(across(matches("median|upper|lower"), * function(q) { * current.x = get("x") * invlogit(logit(current.x) + * est + * (qnorm(q) * se)) * })) %>% ggplot(aes(x = x, group = 1)) + geom_segment(x = 0, xend = 1, y = 0, yend = 1) + geom_ribbon(aes(ymin = lower.95, ymax = upper.95), fill = effect.color, alpha = 0.2) + geom_ribbon(aes(ymin = lower.50, ymax = upper.50), fill = effect.color, alpha = 0.4) + geom_line(aes(y = median), color = effect.color) + scale_x_continuous(labels = scales::percent_format()) + scale_y_continuous(labels = scales::percent_format()) + labs(x = "Baseline probablity of passing", y = "Probability of passing with effect", title = gsub("\n", " ", coefs.df$pretty.parameter[coefs.df$parameter == "pet.typefish"]), subtitle = "Estimated relationship to probability of passing") ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/banana_graph_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Banana graphs .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-31-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - **x-axis:** baseline probability {{content}} ] -- - **y-axis:** probability with effect of having a pet fish {{content}} -- - Solid line provides a reference (no effect) {{content}} -- - Positive effects above the line; negative effects below the line; no effect on the line --- # Banana graphs <img src="RUG_presentation_files/figure-html/banana_graph_highlighted_plot-1.png" width="50%" style="display: block; margin: auto;" /> --- # Banana graphs .pull-left[ ```r banana.multiple.p = * expand.grid(x = seq(0.01, 0.99, 0.01), * pet = c("fish", "dog", "cat")) %>% mutate(pet = paste("pet.type", pet, sep = ""), upper.95 = 0.975, upper.50 = 0.75, median = 0.5, lower.50 = 0.25, lower.95 = 0.025) %>% * inner_join(coefs.df, * by = c("pet" = "parameter")) %>% mutate(across(matches("median|upper|lower"), function(q) { current.x = get("x") invlogit(logit(current.x) + est + (qnorm(q) * se)) })) %>% mutate(effect.color = case_when(p > 0.05 ~ neutral.color, est > 0 ~ good.color, T ~ bad.color)) %>% ggplot(aes(x = x, color = effect.color, fill = effect.color, group = 1)) + geom_segment(x = 0, xend = 1, y = 0, yend = 1, color = "black") + geom_ribbon(aes(ymin = lower.95, ymax = upper.95), color = NA, alpha = 0.2) + geom_ribbon(aes(ymin = lower.50, ymax = upper.50), color = NA, alpha = 0.4) + geom_line(aes(y = median)) + scale_x_continuous(labels = scales::percent_format()) + scale_y_continuous(labels = scales::percent_format()) + scale_color_identity() + scale_fill_identity() + labs(x = "Baseline probablity of passing", y = "Probability of passing with effect", title = "Estimated relationship to\nprobability of passing") + * facet_wrap(~ pretty.parameter, ncol = 1) ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/banana_graph_multiple_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Banana graphs: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-32-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Do not have to pick and choose a baseline {{content}} ] -- - Show the whole range of predicted probabilities --- # Banana graphs: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-33-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Can take up quite a bit of space {{content}} ] -- - May be initially difficult to understand {{content}} -- - Predictor variables are in separate graphs: hard to compare --- class: inverse, center, middle # Visualization family 3: # Counterfactual counts --- # Extra successes - Sometimes stakeholders are interested in **the number of times something happens (or doesn't happen)** -- - Example: stakeholders want to assess the impact of tutoring on pass rates in Balloon Animal-Making 201 -- - They're interested not just in _whether_ tutoring helps students, but _how much_ it helps them -- - In our dataset, 2,571 students received tutoring; of those, 2,023 passed the class -- - Suppose those students had _not_ received tutoring; in that case, how many would have passed? -- - In other words, how many "extra" passes did we get because of tutoring? --- # Extra successes - To get a point estimate: -- - Take all students who received tutoring -- - Set `tutoring` to `FALSE` instead of `TRUE` -- - Use the model to make (counterfactual) predictions for the revised dataset -- - Count predicted counterfactual passes; compare to the actual number of passes -- - We can get confidence intervals by simulating many sets of outcomes and aggregating over them. --- # Extra successes .pull-left[ ```r extra.p = with( list( * temp.df = map_dfr( * 1:5000, * function(d) { * data.frame( * draw = d, * mu = model.matrix( * pass.m, * data = df %>% * filter(tutoring) %>% * mutate(tutoring = F) * ) %*% * rnorm(nrow(coefs.df), * mean = coefs.df$est, * sd = coefs.df$se) * ) * } * ) ), { temp.df %>% mutate(pred = runif(n()) < invlogit(mu)) %>% group_by(draw) %>% summarise(pred.passed = sum(pred)) %>% ungroup() %>% * mutate(extra.passed = * sum(df$passed & df$tutoring) * - pred.passed) %>% ggplot(aes(x = extra.passed)) + geom_histogram(fill = "gray") + geom_vline(xintercept = 0) + labs(x = "Number of extra students\nwho passed because of tutoring", y = "Number of simulations", title = "Estimated number of extra students\nwho passed because of tutoring") } ) ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/extra_passes_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Extra successes: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-34-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Counts have a straightforward interpretation {{content}} ] -- - Natural baseline: account for other characteristics of your population (e.g., number of fish owners) --- # Extra successes: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-35-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - "Number of simulations" may be hard to explain {{content}} ] -- - Assumes that the counterfactual makes sense {{content}} -- - Strong causal interpretation --- # Extra successes by group - Your stakeholders may be interested in different effects by group -- - We can summarize counterfactuals for separate groups --- # Extra successes by group .pull-left[ ```r extra.group.p = with( list( temp.df = map_dfr(1:5000, function(d) { data.frame(draw = d, pet.type = df$pet.type[df$tutoring], mu = model.matrix(pass.m, data = df %>% filter(tutoring) %>% mutate(tutoring = F)) %*% rnorm(nrow(coefs.df), mean = coefs.df$est, sd = coefs.df$se)) }) ), { temp.df %>% mutate(pred = runif(n()) < invlogit(mu)) %>% * group_by(pet.type, draw) %>% summarise(pred.passed = sum(pred), .groups = "keep") %>% ungroup() %>% * left_join(df %>% * filter(tutoring) %>% * group_by(pet.type) %>% * summarise(actual.passed = * sum(passed)) %>% * ungroup(), * by = "pet.type") %>% mutate(pet.type = str_to_title(pet.type), extra.passed = actual.passed - pred.passed) %>% group_by(pet.type) %>% summarise(lower.95 = quantile(extra.passed, 0.025), lower.50 = quantile(extra.passed, 0.25), median = median(extra.passed), upper.50 = quantile(extra.passed, 0.75), upper.95 = quantile(extra.passed, 0.975)) %>% ungroup() %>% mutate(pet.type = fct_reorder(pet.type, median)) %>% ggplot(aes(x = pet.type)) + geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1) + geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) + geom_point(aes(y = median), size = 3) + geom_hline(yintercept = 0) + labs(subtitle = "By type of pet", x = "Pet type", y = "Number of extra students\nwho passed because of tutoring", title = "Estimated number of extra students\nwho passed because of tutoring") + coord_flip() } ) ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/extra_passes_group_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Extra successes by group: Pros .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-36-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Avoids a scale with number of simulations; focus is on range of predictions {{content}} ] -- - Shows differences by group {{content}} -- - Interaction terms in the model would be incorporated automatically --- # Extra successes by group: Cons .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-37-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Doesn't show how absolute numbers depend on group size {{content}} ] -- - Tutoring actually has a _larger_ percentage point effect for fish owners (because of the lower baseline), but the group is small {{content}} -- - (Your audience may care about counts, percentages, or both) --- # Potential successes compared to group size - Attempt to show _both_ the effect size for each group _and_ the overall size of that group -- - Here, we switch the direction of the counterfactual -- - Start with untutored students -- - How many would have passed with tutoring? -- - We think this emphasizes the benefits of tutoring more clearly in this graph -- - Either direction is possible; do what makes sense in your context! --- # Potential successes compared to group size .pull-left[ ```r potential.group.p = with( list( temp.df = map_dfr(1:5000, function(d) { data.frame(draw = d, pet.type = df$pet.type[!df$tutoring], mu = model.matrix(pass.m, data = df %>% filter(!tutoring) %>% mutate(tutoring = T)) %*% rnorm(nrow(coefs.df), mean = coefs.df$est, sd = coefs.df$se)) }) ), { temp.df %>% mutate(pred = runif(n()) < invlogit(mu)) %>% group_by(pet.type, draw) %>% * summarise(n.passed = sum(pred), * .groups = "keep") %>% ungroup() %>% group_by(pet.type) %>% summarise(lower.95 = quantile(n.passed, 0.025), lower.50 = quantile(n.passed, 0.25), upper.50 = quantile(n.passed, 0.75), upper.95 = quantile(n.passed, 0.975), n.passed = median(n.passed)) %>% ungroup() %>% * mutate(pass.type = "Predicted") %>% * bind_rows( * df %>% * filter(!tutoring) %>% * group_by(pet.type) %>% * summarise(n.passed = sum(passed)) %>% * ungroup() %>% * mutate(pass.type = "Actual") * ) %>% mutate(pet.type = str_to_title(pet.type), pet.type = fct_reorder(pet.type, n.passed, max)) %>% ggplot(aes(x = pet.type, color = pass.type, shape = pass.type)) + geom_linerange(aes(ymin = lower.95, ymax = upper.95), size = 1, show.legend = F) + geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2, show.legend = F) + geom_point(aes(y = n.passed), size = 3) + scale_color_manual(values = c("red", "black")) + scale_shape_manual(values = c(18, 16)) + labs(x = "Pet type", color = "", shape = "", subtitle = "By type of pet", y = "Number of untutored students\npredicted to pass with tutoring", title = "Estimated number of untutored students\nwho would have passed with tutoring") + expand_limits(y = 0) + coord_flip() } ) ``` ] .pull-right[ <img src="RUG_presentation_files/figure-html/potential_passes_group_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Potential successes compared to group size .pull-left[ <img src="RUG_presentation_files/figure-html/unnamed-chunk-38-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Acknowledges different group sizes: puts absolute numbers in context {{content}} ] -- - But small groups are squished at the bottom of the scale (hard to see) --- # Conclusion - There is no right or wrong way, only better and worse ways for a particular project, so get creative! -- - Knowing your stakeholders as well as the context and purpose of your research should be your guides to determine which visualization is most appropriate -- - Use colors, the layout, and annotations to your advantage -- - Share your ideas with others --- class: inverse, center, middle # Thank you!