class: center, middle, inverse, title-slide .title[ # Visualizing logistic regression results for non-technical audiences ] .author[ ### Abby Kaplan and Keiko Cawley ] .date[ ### October 17, 2023 ] --- class: inverse, center, middle # GitHub ### https://github.com/keikcaw/visualizing-logistic-regression <style type="text/css"> .remark-slide-content h1 { margin-bottom: 0em; } .remark-code { font-size: 60% !important; } </style> --- class: inverse, center, middle # Logistic regression review --- # Logistic regression: Binary outcomes - Use logistic regression to model a binary outcome -- - Examples from higher education: -- - Did the student pass the class? -- - Did the student enroll for another term? -- - Did the student graduate? --- # The design of logistic regression - We want to model the probability that the outcome happened -- - But probabilities are bounded between 0 and 1 -- - Instead, we model the logit of the probability: $$ \mbox{logit}(p) = \log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n $$ --- class: inverse, center, middle # What's the problem? --- layout: true # Just tell me "the" effect - Stakeholders often want to know whether something affects outcomes, and by how much --- -- - But we don't model probabilities directly $$ \log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n $$ --- - But we don't model probabilities directly $$ \boxed{\log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right)} = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n $$ -- - We can solve for _p_: $$ `\begin{aligned} p & = \mbox{logit}^{-1}(\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n) \\ & \\ & = \frac{e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}{1 + e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}} \end{aligned}` $$ --- layout: true # "The" effect is nonlinear in _p_ $$ `\begin{aligned} p & = \frac{e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}}{1 + e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}} \end{aligned}` $$ --- -- <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-2-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-3-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" /> --- layout: false class: inverse, center, middle # Sample dataset and model --- # Dataset - Our simulated dataset describes students who took Balloon Animal-Making 201 at University Imaginary -- <table class="table table-striped table-hover" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Variable </th> <th style="text-align:left;"> Possible Responses </th> <th style="text-align:left;"> Variable Type </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Mac user </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> <tr> <td style="text-align:left;"> Wear glasses </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> <tr> <td style="text-align:left;"> Pet type </td> <td style="text-align:left;"> dog, cat, fish, none </td> <td style="text-align:left;"> categorical </td> </tr> <tr> <td style="text-align:left;"> Favorite color </td> <td style="text-align:left;"> blue, red, green, orange </td> <td style="text-align:left;"> categorical </td> </tr> <tr> <td style="text-align:left;"> Prior undergraduate GPA </td> <td style="text-align:left;"> 0.0-4.0 </td> <td style="text-align:left;"> continuous </td> </tr> <tr> <td style="text-align:left;"> Height </td> <td style="text-align:left;"> 54-77 inches </td> <td style="text-align:left;"> continuous </td> </tr> <tr> <td style="text-align:left;"> Went to tutoring </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> <tr> <td style="text-align:left;"> Passed </td> <td style="text-align:left;"> TRUE/FALSE </td> <td style="text-align:left;"> binary </td> </tr> </tbody> </table> --- # Dataset ```r library(tidyverse) df = read.csv("data/course_outcomes.csv", header = T, stringsAsFactors = F) ``` <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:left;"> mac </th> <th style="text-align:left;"> glasses </th> <th style="text-align:left;"> pet.type </th> <th style="text-align:left;"> favorite.color </th> <th style="text-align:right;"> prior.gpa </th> <th style="text-align:right;"> height </th> <th style="text-align:left;"> tutoring </th> <th style="text-align:left;"> passed </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.86 </td> <td style="text-align:right;"> 63 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> cat </td> <td style="text-align:left;"> green </td> <td style="text-align:right;"> 2.37 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> none </td> <td style="text-align:left;"> orange </td> <td style="text-align:right;"> 3.98 </td> <td style="text-align:right;"> 66 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.78 </td> <td style="text-align:right;"> 68 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> cat </td> <td style="text-align:left;"> blue </td> <td style="text-align:right;"> 3.73 </td> <td style="text-align:right;"> 67 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> green </td> <td style="text-align:right;"> 3.99 </td> <td style="text-align:right;"> 61 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> none </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.75 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.73 </td> <td style="text-align:right;"> 61 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> none </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 64 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> orange </td> <td style="text-align:right;"> 3.64 </td> <td style="text-align:right;"> 62 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> dog </td> <td style="text-align:left;"> red </td> <td style="text-align:right;"> 3.98 </td> <td style="text-align:right;"> 63 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> cat </td> <td style="text-align:left;"> green </td> <td style="text-align:right;"> 3.72 </td> <td style="text-align:right;"> 64 </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> TRUE </td> </tr> </tbody> </table> --- # Model - Dependent variable: did the student pass? -- - Continuous variables were centered and standardized -- - Reference levels for categorical variables: - Pet type: none - Favorite color: blue --- # Model ``` ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.53678432 0.09665694 15.8993682 6.400698e-57 ## macTRUE -0.04070026 0.08251424 -0.4932514 6.218350e-01 ## glassesTRUE 0.19330654 0.07787099 2.4823948 1.305026e-02 ## pet.typedog -0.25143138 0.08483778 -2.9636722 3.039919e-03 ## pet.typecat 0.09616174 0.11927784 0.8061995 4.201278e-01 ## pet.typefish -1.19359401 0.16656361 -7.1659949 7.722363e-13 ## favorite.colorred -0.03945396 0.09265674 -0.4258078 6.702479e-01 ## favorite.colorgreen -0.38137532 0.10062190 -3.7901819 1.505370e-04 ## favorite.colororange -0.24204783 0.13900517 -1.7412865 8.163337e-02 ## cs.prior.gpa 1.03092175 0.03887172 26.5211237 5.531945e-155 ## cs.height -0.25908893 0.03833829 -6.7579681 1.399404e-11 ## tutoringTRUE 0.22698497 0.07583279 2.9932300 2.760416e-03 ``` --- # Model ``` ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.53678432 0.09665694 15.8993682 6.400698e-57 ## macTRUE -0.04070026 0.08251424 -0.4932514 6.218350e-01 *## glassesTRUE 0.19330654 0.07787099 2.4823948 1.305026e-02 *## pet.typedog -0.25143138 0.08483778 -2.9636722 3.039919e-03 ## pet.typecat 0.09616174 0.11927784 0.8061995 4.201278e-01 *## pet.typefish -1.19359401 0.16656361 -7.1659949 7.722363e-13 ## favorite.colorred -0.03945396 0.09265674 -0.4258078 6.702479e-01 *## favorite.colorgreen -0.38137532 0.10062190 -3.7901819 1.505370e-04 ## favorite.colororange -0.24204783 0.13900517 -1.7412865 8.163337e-02 *## cs.prior.gpa 1.03092175 0.03887172 26.5211237 5.531945e-155 *## cs.height -0.25908893 0.03833829 -6.7579681 1.399404e-11 *## tutoringTRUE 0.22698497 0.07583279 2.9932300 2.760416e-03 ``` --- # Causality disclaimer - Some visualizations strongly imply a causal interpretation -- - It's your responsibility to evaluate whether a causal interpretation is appropriate -- - If the data doesn't support a causal interpretation, **don't use a visualization that implies one** --- --- class: inverse, center, middle # Visualization family 1: # Presenting model coefficients --- # Coefficients in a table <table class="table table-striped table-hover" style="font-size: 14px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;text-align: center;"> Parameter </th> <th style="text-align:right;text-align: center;"> Estimate </th> <th style="text-align:right;text-align: center;"> Standard error </th> <th style="text-align:right;text-align: center;"> z </th> <th style="text-align:right;text-align: center;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Intercept </td> <td style="text-align:right;"> 1.5367843 </td> <td style="text-align:right;"> 0.0966569 </td> <td style="text-align:right;"> 15.8993682 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Mac </td> <td style="text-align:right;"> -0.0407003 </td> <td style="text-align:right;"> 0.0825142 </td> <td style="text-align:right;"> -0.4932514 </td> <td style="text-align:right;"> 0.6218350 </td> </tr> <tr> <td style="text-align:left;"> Glasses </td> <td style="text-align:right;"> 0.1933065 </td> <td style="text-align:right;"> 0.0778710 </td> <td style="text-align:right;"> 2.4823948 </td> <td style="text-align:right;"> 0.0130503 </td> </tr> <tr> <td style="text-align:left;"> Pet: Dog </td> <td style="text-align:right;"> -0.2514314 </td> <td style="text-align:right;"> 0.0848378 </td> <td style="text-align:right;"> -2.9636722 </td> <td style="text-align:right;"> 0.0030399 </td> </tr> <tr> <td style="text-align:left;"> Pet: Cat </td> <td style="text-align:right;"> 0.0961617 </td> <td style="text-align:right;"> 0.1192778 </td> <td style="text-align:right;"> 0.8061995 </td> <td style="text-align:right;"> 0.4201278 </td> </tr> <tr> <td style="text-align:left;"> Pet: Fish </td> <td style="text-align:right;"> -1.1935940 </td> <td style="text-align:right;"> 0.1665636 </td> <td style="text-align:right;"> -7.1659949 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Favorite color: Red </td> <td style="text-align:right;"> -0.0394540 </td> <td style="text-align:right;"> 0.0926567 </td> <td style="text-align:right;"> -0.4258078 </td> <td style="text-align:right;"> 0.6702479 </td> </tr> <tr> <td style="text-align:left;"> Favorite color: Green </td> <td style="text-align:right;"> -0.3813753 </td> <td style="text-align:right;"> 0.1006219 </td> <td style="text-align:right;"> -3.7901819 </td> <td style="text-align:right;"> 0.0001505 </td> </tr> <tr> <td style="text-align:left;"> Favorite color: Orange </td> <td style="text-align:right;"> -0.2420478 </td> <td style="text-align:right;"> 0.1390052 </td> <td style="text-align:right;"> -1.7412865 </td> <td style="text-align:right;"> 0.0816334 </td> </tr> <tr> <td style="text-align:left;"> Prior GPA (0.6-pt increase) </td> <td style="text-align:right;"> 1.0309217 </td> <td style="text-align:right;"> 0.0388717 </td> <td style="text-align:right;"> 26.5211237 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Height (3-in increase) </td> <td style="text-align:right;"> -0.2590889 </td> <td style="text-align:right;"> 0.0383383 </td> <td style="text-align:right;"> -6.7579681 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> Tutoring </td> <td style="text-align:right;"> 0.2269850 </td> <td style="text-align:right;"> 0.0758328 </td> <td style="text-align:right;"> 2.9932300 </td> <td style="text-align:right;"> 0.0027604 </td> </tr> </tbody> </table> --- # Coefficients in a table  --- # Change in log odds <img src="RMAIR_presentation_files/figure-html/change_in_log_odds_plot-1.png" width="100%" style="display: block; margin: auto;" /> --- # Change in log odds: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - It's clear which relationships are positive and which are negative {{content}} ] -- - The plot has a transparent relationship to the fitted model --- # Change in log odds: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - It's clear which relationships are positive and which are negative - The plot has a transparent relationship to the fitted model - Numbers all in one place: a single scale instead of a table of numbers ] --- # Change in log odds: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Change in log odds: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - The magnitude of effect is in the log odds scale {{content}} ] -- - What is a 0.4 change in the log odds? {{content}} -- - Is the change between 0.4 and 0.8 log odds "big" or "small"? {{content}} -- - You probably don't want to give your audience a tutorial on the inverse logit function --- # Secret log odds <img src="RMAIR_presentation_files/figure-html/change_in_log_odds_adjusted_axis_plot-1.png" width="100%" style="display: block; margin: auto;" /> --- # Secret log odds: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Secret log odds: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Easy: just relabel the x-axis {{content}} ] -- - No numbers for your audience to misinterpret --- # Secret log odds: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Can't convey absolute magnitude of an effect {{content}} ] -- - Your audience might ask "where are the numbers?" anyway --- layout: true # Change in odds ratio - Your audience may be more familiar with the "odds" part of log odds --- --- `$$\log\left(\begin{array}{c}\frac{p}{1 - p}\end{array}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n$$` --- `$$\log\left(\boxed{\begin{array}{c}\frac{p}{1 - p}\end{array}}\right) = \beta_0 + \beta_1x_1 + \ldots + \beta_nx_n$$` -- - Can't we just exponentiate to get the odds? -- `$$\frac{p}{1 - p} = e^{\beta_0 + \beta_1x_1 + \ldots + \beta_nx_n}$$` -- - Now the effect of a coefficient is multiplicative, not additive -- `$$\frac{p}{1 - p} = e^{\beta_ix_i} \cdot e^{\beta_0 + \beta_1x_1 + \ldots + \beta_{i-1}x_{i-1} + \beta_{i+1}x_{i+1} + \ldots + \beta_nx_n}$$` --- layout: false # Change in odds ratio <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> --- # Change in odds ratio: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Changes in odds might be easier to describe than changes in log odds {{content}} ] -- - Still pretty easy: a simple transformation of your model coefficients --- # Change in odds ratio: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Not the way we usually describe odds ] --- # Change in odds ratio: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" ] --- # Change in odds ratio: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" - The unfamiliar format may undo the benefit of using a familiar concept {{content}} ] -- - Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds --- # Change in odds ratio: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" - The unfamiliar format may undo the benefit of using a familiar concept - Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds - Percent change in odds (300% = triple the odds) might be misinterpreted as a probability ] --- # Change in odds ratio: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Not the way we usually describe odds - Usually use integers: "3-to-1" or "2-to-5", not "3" or "0.4" - The unfamiliar format may undo the benefit of using a familiar concept - Exponentiated coefficients don't represent odds directly; they represent _changes_ in odds - Percent change in odds (300% = triple the odds) might be misinterpreted as a probability - Now we're pretty far removed from familiar scales ] --- # Change in odds ratio: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Change in odds ratio: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> ] - The scale is expanded for positive effects and compressed for negative effects --- class: inverse, center, middle # Visualization family 2: # Presenting probabilities --- # Probabilities relative to a baseline - Problem with probabilities: change in percentage points depends on baseline starting value -- - We can choose an appropriate baseline probability, then compute the marginal effect of a predictor given that baseline -- - Options for baseline: -- - Model intercept -- - Observed outcome % in dataset (similar to intercept if continuous predictors are centered and other coefficients aren't too large) -- - Observed outcome % for a certain group (e.g., students with no tutoring) -- - Some % that's meaningful in context (e.g., 85% pass rate in typical years) --- # Probabilities relative to a baseline - Baseline probability: inverse logit of the intercept `$$p_0 = \mbox{logit}^{-1}(\beta_0)$$` -- - Probability with discrete predictor `\(i\)`: inverse logit of intercept + predictor coefficient `$$p_i = \mbox{logit}^{-1}(\beta_0 + \beta_i)$$` -- - For a continuous predictor `\(j\)`, pick a change in predictor value that makes sense -- - One standard deviation -- - A context-specific benchmark (e.g., 1 point for GPA, 100 points on the SAT) -- `$$p_j = \mbox{logit}^{-1}(\beta_0 + \beta_j\Delta x_j)$$` -- - To show uncertainty, get confidence interval before inverse logit transformation --- # Probabilities relative to a baseline <img src="RMAIR_presentation_files/figure-html/probability_baseline_plot-1.png" width="100%" style="display: block; margin: auto;" /> - (Uncertainty in intercept is not represented here) --- # Probabilities relative to a baseline: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Familiar scale: probabilities, expressed as percentages {{content}} ] -- - Avoids the "percent change" formulation (common but misleading) --- # Probabilities relative to a baseline: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Have to choose a baseline; there may be no "good" choice {{content}} ] -- - Using the intercept as a baseline chooses reference categories for categorical variables --- # Probabilities relative to a baseline: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Have to choose a baseline; there may be no "good" choice - Using the intercept as a baseline chooses reference categories for categorical variables - Students who don't use Macs, don't wear glasses, etc. ] --- # Probabilities relative to a baseline: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Have to choose a baseline; there may be no "good" choice - Using the intercept as a baseline chooses reference categories for categorical variables - Students who don't use Macs, don't wear glasses, etc. - Not an appropriate choice for all datasets {{content}} ] -- - Doesn't show full range of possible effects at different baselines --- # Probabilities relative to a baseline: Arrows <img src="RMAIR_presentation_files/figure-html/probability_baseline_arrows_plot-1.png" width="100%" style="display: block; margin: auto;" /> --- # Probabilities relative to a baseline: Arrows .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Emphasizes direction of effect {{content}} ] -- - Doesn't show uncertainty around estimates {{content}} -- - Strong causal implications --- # Multiple baselines by group - Instead of one baseline probability, why not several? -- - Example: show effect of `\(i\)` for each level of categorical variable `\(j\)` -- `$$p_{j1} = \mbox{logit}^{-1}(\beta_0 + \beta_{j1})$$` -- `$$p_{j1 + i} = \mbox{logit}^{-1}(\beta_0 + \beta_{j1} + \beta_i)$$` --- # Multiple baselines by group <img src="RMAIR_presentation_files/figure-html/probability_group_plot-1.png" width="100%" style="display: block; margin: auto;" /> --- # Multiple baselines by group: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Emphasizes that the baseline we show is a _choice_ ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-30-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-31-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups - Effect of GPA is larger for fish owners than for dog owners ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-32-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups - Effect of GPA is larger for fish owners than for dog owners - Here, this is purely because of fish owners' lower baseline ] --- # Multiple baselines by group: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-33-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Emphasizes that the baseline we show is a _choice_ - Honors differences among groups - Effect of GPA is larger for fish owners than for dog owners - Here, this is purely because of fish owners' lower baseline - But we could also show the effects of an interaction term in the model {{content}} ] -- - We can use arrows here as well --- # Multiple baselines by group: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-34-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Still have to choose baselines by group {{content}} ] -- - May suggest essentializing interpretations of groups {{content}} -- - Cluttered --- # Banana graphs - We can overcome the baseline-choosing problem by iterating across every baseline -- - For example: -- - Start with every possible probability of passing Balloon Animal-Making 201, from 0% to 100% (at sufficiently small intervals) -- - For each probability, add the effect of having a pet fish -- `$$p_f = \mbox{logit}^{-1}(\mbox{logit}(p_0) + \beta_f)$$` --- # Banana graphs <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-35-1.png" width="100%" style="display: block; margin: auto;" /> --- # Banana graphs .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-36-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - **x-axis:** baseline probability {{content}} ] -- - **y-axis:** probability with effect of having a pet fish {{content}} -- - Solid line provides a reference (no effect) {{content}} -- - Positive effects above the line; negative effects below the line; no effect on the line --- # Banana graphs <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-37-1.png" width="100%" style="display: block; margin: auto;" /> --- # Banana graphs .center[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-38-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Banana graphs: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-39-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Do not have to pick and choose a baseline {{content}} ] -- - Show the whole range of predicted probabilities --- # Banana graphs: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-40-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Can take up quite a bit of space {{content}} ] -- - May be initially difficult to understand {{content}} -- - Predictor variables are in separate graphs: hard to compare --- class: inverse, center, middle # Visualization family 3: # Counterfactual counts --- # Extra successes - Sometimes stakeholders are interested in **the number of times something happens (or doesn't happen)** -- - Example: stakeholders want to assess the impact of tutoring on pass rates in Balloon Animal-Making 201 -- - They're interested not just in _whether_ tutoring helps students, but _how much_ it helps them -- - In our dataset, 2,571 students received tutoring; of those, 2,023 passed the class -- - Suppose those students had _not_ received tutoring; in that case, how many would have passed? -- - In other words, how many "extra" passes did we get because of tutoring? --- # Extra successes - To get a point estimate: -- - Take all students who received tutoring -- - Set `tutoring` to `FALSE` instead of `TRUE` -- - Use the model to make (counterfactual) predictions for the revised dataset -- - Count predicted counterfactual passes; compare to the actual number of passes -- - We can get confidence intervals by simulating many sets of outcomes and aggregating over them. --- # Extra successes <img src="RMAIR_presentation_files/figure-html/extra_passes_plot-1.png" width="100%" style="display: block; margin: auto;" /> --- # Extra successes: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-41-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Counts have a straightforward interpretation {{content}} ] -- - Natural baseline: account for other characteristics of your population (e.g., number of fish owners) --- # Extra successes: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-42-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - "Number of simulations" may be hard to explain {{content}} ] -- - Assumes that the counterfactual makes sense {{content}} -- - Strong causal interpretation --- # Extra successes by group - Your stakeholders may be interested in different effects by group -- - We can summarize counterfactuals for separate groups --- # Extra successes by group <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-43-1.png" width="100%" style="display: block; margin: auto;" /> --- # Extra successes by group: Pros .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-44-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Avoids a scale with number of simulations; focus is on range of predictions {{content}} ] -- - Shows differences by group {{content}} -- - Interaction terms in the model would be incorporated automatically --- # Extra successes by group: Cons .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-45-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Doesn't show how absolute numbers depend on group size {{content}} ] -- - Tutoring actually has a _larger_ percentage point effect for fish owners (because of the lower baseline), but the group is small {{content}} -- - (Your audience may care about counts, percentages, or both) --- # Potential successes compared to group size - Attempt to show _both_ the effect size for each group _and_ the overall size of that group -- - Here, we switch the direction of the counterfactual -- - Start with untutored students -- - How many would have passed with tutoring? -- - We think this emphasizes the benefits of tutoring more clearly in this graph -- - Either direction is possible; do what makes sense in your context! --- # Potential successes compared to group size <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-46-1.png" width="100%" style="display: block; margin: auto;" /> --- # Potential successes compared to group size .pull-left[ <img src="RMAIR_presentation_files/figure-html/unnamed-chunk-47-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ - Acknowledges different group sizes: puts absolute numbers in context {{content}} ] -- - But small groups are squished at the bottom of the scale (hard to see) --- # Conclusion - There is no right or wrong way, only better and worse ways for a particular project, so get creative! -- - Knowing your stakeholders as well as the context and purpose of your research should be your guides to determine which visualization is most appropriate -- - Use colors, the layout, and annotations to your advantage -- - Share your ideas with others --- class: inverse, center, middle # Thank you!