id submission_type answer
tutorial-id none 131-stops
name question Surya Fraser
email question surya.fraser@gmail.com
introduction-1 question Wisdom, Justice, Courage, and Temperance
introduction-2 question > show_file(".gitignore") stops_files >
introduction-3 question > show_file("stops.qmd", chunk = "Last") #|message: false library(tidyverse) library(primer.data) >
introduction-4 question > library(tidyverse) >
introduction-5 question Description This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question A causal effect is the difference between potenial outcomes.
introduction-7 question The fundamental problem of causal inference is that each unit can only receive one treatment.
introduction-8 question arrested
introduction-9 question The variable could be record. It could have values 1 and 0 representing whether or not the person had a criminal record.
introduction-10 question 2: the person is wearing a mask or not.
introduction-11 question If mask is 1 (the person is wearing a mask) maybe arrested is 1 (an arrest was made) and if it was 0, maybe arrested was 0 and the causal effect would be 1.
introduction-12 question race
introduction-13 question White and black people maybe.
introduction-14 question What is the difference in the average arrested values for black ad white people?
wisdom-1 question The question, the Preceptor Table, and an examination of the data.
wisdom-2 question A Preceptor Table contains the minimum number of units and columns nessisary to answer the question.
wisdom-3 question The rows in the Preceptor Table are units, it contains columns for covariates, and it also contains column(s) for outcome(s).
wisdom-4 question The drivers.
wisdom-5 question arrested
wisdom-6 question Maybe age.
wisdom-7 question There are none.
wisdom-8 question When the drivers were pulled over maybe.
wisdom-9 question The drivers are rows, there is a column for race, and another column for arrested.
wisdom-10 question "What is the difference in arrest rate between Black and White drivers adjusting for other variables?"
wisdom-11 question Arrests in traffic stops are related to a variety of factors. To understand the impact of race om traffic stop arrests I examine data from the Stanford Open Policing Project.
justice-1 question The Population Table, validity, stability, representativeness, and unconfoundedness.
justice-2 question Validity refers to the consistency between columns in the data and Preceptor Table.
justice-4 question A Population Table combines the data and the Preceptor Table.
justice-5 question Units: drivers Time: time of arrest
justice-6 question Stability concerns the relationship between the Preceptor Table and the rows in the data, population, and Preceptor Table.
justice-7 question Stability might not hold because of changes overtime.
justice-8 question Representativeness concerns the relationships between the rows in the Population Table.
justice-11 question Unconfoundedness concerns whether the decision for a unit to receive treatment was random.
justice-12 question > library(tidymodels) ── Attaching packages ──────────────── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.9 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.1 ✔ parsnip 1.3.2 ✔ yardstick 1.3.2 ✔ recipes 1.3.1 ── Conflicts ───────────────────────── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Use suppressPackageStartupMessages() to eliminate package startup messages >
justice-13 question > library(broom) >
justice-14 question $$ \log\left( \frac{\mathbb{P}(Y = 1)}{\mathbb{P}(Y = 0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k $$
justice-15 question However, one potential weakness in this model is the lack of stability in the data due to changes overtime in the law.
courage-1 question Courage is how the data generation mechanism is created.
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm")|>fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg(engine = "lm")|>fit(arrested ~ sex, data = x)|>tidy(conf.int = TRUE)
courage-5 exercise linear_reg(engine = "lm")|>fit(arrested ~ race, data = x)|>tidy(conf.int = TRUE)
courage-6 exercise linear_reg(engine = "lm")|>fit(arrested ~ race, data = x)|>tidy(conf.int = TRUE)
courage-7 exercise linear_reg(engine = "lm")|>fit(arrested ~ sex + race, data = x)|>tidy(conf.int = TRUE)
courage-8 exercise linear_reg(engine = "lm")|>fit(arrested ~ sex + race*zone, data = x)|>tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question > x <- stops |> + filter(race %in% c("black", "white")) |> + mutate(race = str_to_title(race), + sex = str_to_title(sex)) + + fit_stops <- linear_reg() |> + set_engine("lm") |> + fit(arrested ~ sex + race*zone, data = x) >
courage-11 question > library(easystats) # Attaching packages: easystats 0.7.5 ✔ bayestestR 0.16.1 ✔ correlation 0.8.8 ✔ datawizard 1.2.0 ✔ effectsize 1.0.1 ✔ insight 1.3.1 ✔ modelbased 0.12.0 ✔ performance 0.15.0 ✔ parameters 0.27.0 ✔ report 0.6.1 ✔ see 0.11.0 >
courage-12 question > check_predictions(extract_fit_engine(fit_stops))
courage-13 question \log\left( \frac{\mathbb{P}(\text{stop} = 1)}{\mathbb{P}(\text{stop} = 0)} \right) = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \cdots
courage-14 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) >
courage-15 question > tutorial.helpers::show_file(".gitignore") stops_files *_cache >
courage-16 exercise tidy(fit_stops, conf.int=TRUE)
courage-17 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") # Load required libraries library(broom) library(gt) # Get the model summary table_data <- tidy(fit_stops, conf.int = TRUE) # Clean and rename columns table_data |> dplyr::mutate(term = gsub("sexMale", "Sex: Male", term), term = gsub("raceWhite", "Race: White", term), term = gsub("zone", "Zone ", term), term = gsub("\\(Intercept\\)", "Intercept", term)) |> dplyr::select(term, estimate, std.error, conf.low, conf.high, p.value) |> gt() |> fmt_number(columns = c(estimate, std.error, conf.low, conf.high, p.value), decimals = 4) |> cols_label( term = "Term", estimate = "Estimate", std.error = "Std. Error", conf.low = "95% CI (Low)", conf.high = "95% CI (High)", p.value = "p-value" ) |> tab_header( title = "Logistic Regression Coefficients", subtitle = "Model: Probability of Stop" ) |> tab_options( table.font.size = "small", data_row.padding = px(4), table.align = "center" ) >
courage-18 question I model traffic stop arrests with a logistic model as a function of race, sex, and zone.
temperance-1 question Temperance concerns answering the initial question.
temperance-2 question This estimate imples that male drivers have a higher log odds of being arrested.
temperance-3 question This estimate means white drivers have a lower log odds of being arrested.
temperance-4 question This reflects the probability for being arrested when all ariables are at their reference levels.
temperance-5 question > library(marginaleffects)
temperance-6 question What is the difference in arrest rate between Black and White drivers adjusting for other variables?
temperance-7 question > predictions(fit_stops) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric >
temperance-8 question > plot_predictions(fit_stops,by="sex")
temperance-9 question > plot_predictions(fit_stops,condition ="sex") >
temperance-10 question > plot_predictions(fit_stops,condition =c("sex", "race")) >
temperance-11 question # Load required packages library(marginaleffects) library(ggplot2) # Generate predicted probabilities by sex and race preds <- plot_predictions(fit_stops, condition = c("sex", "race"), draw = FALSE) # Plot with ggplot2 ggplot(preds, aes(x = sex, y = estimate, fill = race)) + geom_col(position = position_dodge(width = 0.7), width = 0.6) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), position = position_dodge(width = 0.7), width = 0.2) + labs( title = "Predicted Probability of Being Stopped", subtitle = "Grouped by Sex and Race", x = "Sex", y = "Predicted Probability", fill = "Race" ) + theme_minimal(base_size = 13) + theme( legend.position = "top", plot.title = element_text(face = "bold"), axis.title.y = element_text(margin = margin(r = 10)) )
temperance-12 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") # Load required packages library(marginaleffects) library(ggplot2) # Generate predicted probabilities by sex and race preds <- plot_predictions(fit_stops, condition = c("sex", "race"), draw = FALSE) # Plot with ggplot2 ggplot(preds, aes(x = sex, y = estimate, fill = race)) + geom_col(position = position_dodge(width = 0.7), width = 0.6) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), position = position_dodge(width = 0.7), width = 0.2) + labs( title = "Predicted Probability of Being Stopped", subtitle = "Grouped by Sex and Race", x = "Sex", y = "Predicted Probability", fill = "Race" ) + theme_minimal(base_size = 13) + theme( legend.position = "top", plot.title = element_text(face = "bold"), axis.title.y = element_text(margin = margin(r = 10)) ) >
temperance-13 question According to the model's output white drivers have a lower log odds of being arrested.
temperance-15 question > tutorial.helpers::show_file("stops.qmd") --- title: "Stops" format: html author: Surya Fraser execute: echo: false --- ```{r} #|message: false #|warning: false suppressPackageStartupMessages({ library(tidyverse) library(tidymodels) }) library(primer.data) library(broom) library(marginaleffects) ``` ```{r} x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) ``` ```{r} #| cache: true fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) ``` ```{r} # Load required libraries library(broom) library(gt) # Get the model summary table_data <- tidy(fit_stops, conf.int = TRUE) # Clean and rename columns table_data |> dplyr::mutate(term = gsub("sexMale", "Sex: Male", term), term = gsub("raceWhite", "Race: White", term), term = gsub("zone", "Zone ", term), term = gsub("\\(Intercept\\)", "Intercept", term)) |> dplyr::select(term, estimate, std.error, conf.low, conf.high, p.value) |> gt() |> fmt_number(columns = c(estimate, std.error, conf.low, conf.high, p.value), decimals = 4) |> cols_label( term = "Term", estimate = "Estimate", std.error = "Std. Error", conf.low = "95% CI (Low)", conf.high = "95% CI (High)", p.value = "p-value" ) |> tab_header( title = "Logistic Regression Coefficients", subtitle = "Model: Probability of Stop" ) |> tab_options( table.font.size = "small", data_row.padding = px(4), table.align = "center" ) ``` ```{r} # Load required packages library(marginaleffects) library(ggplot2) # Generate predicted probabilities by sex and race preds <- plot_predictions(fit_stops, condition = c("sex", "race"), draw = FALSE) # Plot with ggplot2 ggplot(preds, aes(x = sex, y = estimate, fill = race)) + geom_col(position = position_dodge(width = 0.7), width = 0.6) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), position = position_dodge(width = 0.7), width = 0.2) + labs( title = "Predicted Probability of Being Stopped", subtitle = "Grouped by Sex and Race", x = "Sex", y = "Predicted Probability", fill = "Race" ) + theme_minimal(base_size = 13) + theme( legend.position = "top", plot.title = element_text(face = "bold"), axis.title.y = element_text(margin = margin(r = 10)) ) ``` $$ \log\left( \frac{\mathbb{P}(Y = 1)}{\mathbb{P}(Y = 0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k $$ $$ \log\left( \frac{\mathbb{P}(\text{stop} = 1)}{\mathbb{P}(\text{stop} = 0)} \right) = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \cdots $$ Arrests in traffic stops are related to a variety of factors. To understand the impact of race om traffic stop arrests I examine data from the Stanford Open Policing Project. However, one potential weakness in this model is the lack of stability in the data due to changes overtime in the law. I model traffic stop arrests with a logistic model as a function of race, sex, and zone. According to the model's output white drivers have a lower log odds of being arrested. >
temperance-16 question https://sfraser8910.github.io/stops/
temperance-17 question https://github.com/sfraser8910/stops
minutes question 120