id submission_type answer
tutorial-id none 131-stops
name question Sajida Rehman
email question sajidarehman259@gmail.com
ID question 6674
introduction-1 question Wisdom, Justice, Courage, and Temperance.
introduction-2 question > show_file(".gitignore") stops_files >
introduction-3 question > show_file("stops.qmd", chunk = "Last") #| message: false library(tidyverse) library(primer.data) >
introduction-4 question > library(tidyverse) ── Attaching core tidyverse packages ──────────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package to force all conflicts to become errors >
introduction-5 question Description This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question A causal effect is the difference between two potential outcomes.
introduction-7 question The fundamental problem of causal inference is that we can only observe one potential outcome.
introduction-8 question outcome variable is arrested
introduction-9 question An example of a binary, manipulable variable is officer_warning_given, which indicates whether an officer gave a verbal warning before deciding to arrest; this can be influenced through policy or training to potentially reduce unnecessary arrests.
introduction-10 question Each arrest has two potential outcomes—one if the driver is wearing a mask and one if they are not—because we are considering how the treatment variable mask could causally affect the likelihood of arrest.
introduction-11 question For a single driver, the treatment variable mask can take on two values: 1 if the driver is wearing a mask, and 0 if they are not. Suppose that if the driver wears a mask (mask = 1), they are not arrested (outcome = 0), but if they do not wear a mask (mask = 0), they are arrested (outcome = 1). The causal effect of wearing a mask for this driver is the difference in outcomes: 0−1=−1, meaning that wearing a mask reduced the chance of arrest for this individual.
introduction-12 question One variable in the stops dataset that likely has an important connection to arrested is race.
introduction-13 question Black drivers and White drivers may have different average arrest rates during traffic stops, reflecting potential racial disparities in policing outcomes.
introduction-14 question How does a driver's race influence the probability of being arrested during a traffic stop?
wisdom-1 question Wisdom requires a question, the creation of a Preceptor Table and an examination of our data.
wisdom-2 question A Preceptor Table is the smallest possible table of data with rows and columns such that, if there is no missing data, we can easily calculate the quantities of interest.
wisdom-3 question The rows of the Preceptor Table are the units. The outcome is at least one of the columns. If the problem is causal, there will be at least two (potential) outcome columns. The other columns are covariates. If the problem is causal, at least one of the covariates will be a treatment.
wisdom-4 question The units for this problem are individual traffic stops.
wisdom-5 question The outcome variable for this problem is arrested, which indicates whether or not an arrest occurred during the traffic stop.
wisdom-6 question A useful covariate for this problem would be the reason for the stop (e.g., speeding, broken taillight, expired registration).
wisdom-7 question In this observational problem, there is no actual treatment applied.
wisdom-8 question The Preceptor Table refers to the moment after the traffic stop has occurred.
wisdom-9 question The Preceptor Table for this problem is a structured summary of data, where each row represents a single traffic stop involving one driver. For each stop, the table records the outcome, and several covariates that may help explain that outcome. These covariates include the driver’s race, sex, and possibly age and type of car, as well as the zone where the stop occurred.
wisdom-10 question Are Black drivers more likely to be arrested than White drivers, after accounting for age, sex, and zone?
wisdom-11 question Arrests during traffic stops represent a critical area for examining how individual characteristics may influence law enforcement outcomes. This analysis uses data from the Stanford Open Policing Project, comprising approximately 400,000 traffic stops conducted in New Orleans between 2011 and 2018, to investigate whether Black drivers are more likely to be arrested than White drivers, controlling for age, sex, and zone.
justice-1 question Justice concerns the Population Table and the four key assumptions which underlie it: validity, stability, representativeness, and unconfoundedness.
justice-2 question Validity is the consistency, or lack thereof, in the columns of the data set and the corresponding columns in the Preceptor Table.
justice-3 question One reason the assumption of validity might not hold is that the race column is based on the officer’s perception rather than self-identification, which could lead to misclassification and affect the accuracy of our analysis.
justice-4 question The Population Table includes a row for each unit/time combination in the underlying population from which both the Preceptor Table and the data are drawn.
justice-5 question Each row in the Population Table represents a unique unit/time combination, where the unit is an individual traffic stop involving a single driver, and the time is the specific date and time at which that stop occurred.
justice-6 question Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn.
justice-7 question One reason the assumption of stability might not hold is that officer behavior or department policies may change over time.
justice-8 question Representativeness, or the lack thereof, concerns two relationships among the rows in the Population Table. The first is between the data and the other rows. The second is between the other rows and the Preceptor Table.
justice-9 question One reason the assumption of representativeness might not hold is that the data only includes stops where complete information was recorded, and stops resulting in arrests were more likely to have missing values, so the observed data may not accurately reflect the overall population of all traffic stops.
justice-10 question One reason the assumption of representativeness might not be true is that the Preceptor Table excludes cases with missing data, whereas the full Population includes all traffic stops.
justice-11 question Unconfoundedness means that the treatment assignment is independent of the potential outcomes, when we condition on pre-treatment covariates.
justice-12 question > library(tidymodels) ── Attaching packages ────────────────────────────────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.8 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.1 ✔ parsnip 1.3.2 ✔ yardstick 1.3.2 ✔ recipes 1.3.1 ── Conflicts ───────────────────────────────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Search for functions across packages at https://www.tidymodels.org/find/ >
justice-13 question > library(broom) >
justice-14 question $$ Y \sim \text{Bernoulli}(\rho) $$ $$ \rho = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k)}} $$ $$ P(Y = 1) = \rho $$
justice-15 question One potential limitation of our model is that it is based only on complete cases, which may bias results if the excluded stops with missing data.
courage-1 question Courage starts with math, explores models, and then creates the data generating mechanism.
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x) |> tidy(conf.int = TRUE)
courage-5 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-6 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race * zone, data = x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question > fit_stops parsnip model object Call: stats::lm(formula = arrested ~ sex + race * zone, data = data) Coefficients: (Intercept) sexMale raceWhite zoneB 0.1773298 0.0614460 -0.0445247 0.0146036 zoneC zoneD zoneE zoneF 0.0061012 0.0780600 0.0019025 -0.0027057 zoneG zoneH zoneI zoneJ 0.0308717 0.0757019 0.0330416 0.0237773 zoneK zoneL zoneM zoneN 0.0586687 -0.0038877 0.0393026 0.0139437 zoneO zoneP zoneQ zoneR 0.0232251 0.0140617 0.0126170 0.0119566 zoneS zoneT zoneU zoneV 0.0594727 0.0113267 0.0071986 0.0770051 zoneW zoneX zoneY raceWhite:zoneB 0.1143814 0.0057280 0.0386437 -0.0077384 raceWhite:zoneC raceWhite:zoneD raceWhite:zoneE raceWhite:zoneF 0.0065557 0.0294040 0.0068179 -0.0137965 raceWhite:zoneG raceWhite:zoneH raceWhite:zoneI raceWhite:zoneJ 0.0088500 0.0085970 -0.0339373 -0.0244272 raceWhite:zoneK raceWhite:zoneL raceWhite:zoneM raceWhite:zoneN -0.0381747 -0.0075094 -0.0423222 -0.0566405 raceWhite:zoneO raceWhite:zoneP raceWhite:zoneQ raceWhite:zoneR -0.0149832 0.0092133 -0.0544990 -0.0379411 raceWhite:zoneS raceWhite:zoneT raceWhite:zoneU raceWhite:zoneV -0.0250048 -0.0272932 0.0383220 -0.0387945 raceWhite:zoneW raceWhite:zoneX raceWhite:zoneY -0.1233162 0.0843196 -0.0002596 >
courage-11 question > library(easystats) # Attaching packages: easystats 0.7.4 (red = needs update) ✖ bayestestR 0.16.0 ✖ correlation 0.8.7 ✖ datawizard 1.1.0 ✔ effectsize 1.0.1 ✖ insight 1.3.0 ✖ modelbased 0.11.2 ✖ performance 0.14.0 ✖ parameters 0.26.0 ✔ report 0.6.1 ✔ see 0.11.0 Restart the R-Session and update packages with `easystats::easystats_update()`. >
courage-12 question > check_predictions(extract_fit_engine(fit_stops)) >
courage-13 question $$ \widehat{\text{arrested}} =\ 0.1770 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} \\ + 0.0146 \cdot \text{zone}_{\text{B}} + 0.0061 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} \\ + 0.0019 \cdot \text{zone}_{\text{E}} - 0.0027 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} \\ + \text{(interaction terms for race and zone)} $$
courage-14 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) fit_stops >
courage-15 question > tutorial.helpers::show_file(".gitignore") stops_files *_cache >
courage-16 exercise tidy(fit_stops, conf.int = TRUE)
courage-17 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| label: tbl-fit-summary #| cache: true tidy(fit_stops, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate(across(where(is.numeric), ~round(.x, 3))) |> gt() |> tab_header( title = "Estimated Coefficients and 95% Confidence Intervals", subtitle = "Linear model for predicting arrest during traffic stops" ) |> cols_label( term = "Variable", estimate = "Estimate", conf.low = "Lower 95% CI", conf.high = "Upper 95% CI" ) >
courage-18 question We model the likelihood of being arrested during a traffic stop, a binary outcome, as a logistic function of driver sex, race, and the zone in which the stop occurred, including interaction effects between race and zone.
temperance-1 question Temperance uses the data generating mechanism to answer the questions with which we began. Humility reminds us that this answer is always a lie. We can also use the DGM to calculate many similar quantities of interest, displaying the results graphically.
temperance-2 question The estimated coefficient of 0.06 for sexMale suggests that, holding race and zone constant, male drivers are associated with a 6 percentage point higher probability of being arrested during a traffic stop compared to female drivers.
temperance-3 question The estimated coefficient of -0.04 for raceWhite indicates that, holding sex and zone constant, White drivers are associated with a 4 percentage point lower probability of being arrested during a traffic stop compared to black drivers.
temperance-4 question he estimated intercept of 0.18 represents the predicted probability of arrest for the reference group, typically female, non-White drivers in the baseline zone
temperance-5 question > library(marginaleffects) >
temperance-6 question How does a driver's race, sex, and location (zone) influence the probability of being arrested during a traffic stop in New Orleans?
temperance-7 question > predictions(fit_stops) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric >
temperance-8 question > plot_predictions(fit_stops, by = "sex") >
temperance-9 question > plot_predictions(fit_stops, condition = "sex") >
temperance-10 question plot_predictions(fit_stops, condition = c("sex", "race"))
temperance-11 question # Load necessary libraries library(ggplot2) library(dplyr) library(scales) library(tidytext) # for reorder_within() and scale_x_reordered() # Create a polished plot plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, y = estimate, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5), linewidth = 0.8, alpha = 0.8) + geom_point(size = 2.5, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + # ← Corrected function name scale_y_continuous(labels = percent_format(accuracy = 1)) + scale_color_manual(values = c("Black" = "#1b9e77", "White" = "#d95f02")) + labs( title = "Predicted Arrest Probability by Race, Zone, and Sex", subtitle = "Black drivers face higher arrest rates across zones, especially among males", x = "Zone", y = "Predicted Probability of Arrest", caption = "Source: New Orleans Traffic Stops Dataset" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", size = 16), plot.subtitle = element_text(size = 13, margin = margin(b = 10)), plot.caption = element_text(size = 10, hjust = 0), axis.text.x = element_text(size = 10, angle = 45, hjust = 1), legend.position = "top", strip.text = element_text(face = "bold", size = 12) )
temperance-12 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") # Load necessary libraries library(ggplot2) library(dplyr) library(scales) library(tidytext) # for reorder_within() and scale_x_reordered() # Create a polished plot plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, y = estimate, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5), linewidth = 0.8, alpha = 0.8) + geom_point(size = 2.5, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + # ← Corrected function name scale_y_continuous(labels = percent_format(accuracy = 1)) + scale_color_manual(values = c("Black" = "#1b9e77", "White" = "#d95f02")) + labs( title = "Predicted Arrest Probability by Race, Zone, and Sex", subtitle = "Black drivers face higher arrest rates across zones, especially among males", x = "Zone", y = "Predicted Probability of Arrest", caption = "Source: New Orleans Traffic Stops Dataset" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", size = 16), plot.subtitle = element_text(size = 13, margin = margin(b = 10)), plot.caption = element_text(size = 10, hjust = 0), axis.text.x = element_text(size = 10, angle = 45, hjust = 1), legend.position = "top", strip.text = element_text(face = "bold", size = 12) ) >
temperance-13 question We estimate that Black male drivers in Zone D face a 25% chance of arrest, with a 95% confidence interval of 23% to 27%.
temperance-14 question Our estimates may be biased due to unmeasured factors and data imbalance, such as overrepresentation of certain zones or officer bias. A better approach could involve weighting or mixed-effects models, which might lower the estimated arrest probability for Black drivers to around 22% with a 95% confidence interval of [20%, 24%].
temperance-15 question > tutorial.helpers::show_file("stops.qmd") --- title: "Stops" author: "Sajida Rehman" execute: echo: false format: html --- Arrests during traffic stops represent a critical area for examining how individual characteristics may influence law enforcement outcomes. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop. However, the data used in both our Preceptor Table and dataset may not fully represent the broader population, as they may cover different time periods and could reflect biases from certain officers who unfairly target specific groups. We modeled arrested as a linear function of both sex and the product of race and zone. From this, we examined that Males are less likely of getting arrested than Females. Our analysis suggests that males are less likely to be arrested than females. Specifically, we estimate that Black drivers in New Orleans face about a 25% chance of being arrested during a traffic stop, compared to roughly 20% for White drivers, with this estimate incorporating inherent uncertainty. $$ Y \sim \text{Bernoulli}(\rho) $$ $$ \rho = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k)}} $$ $$ P(Y = 1) = \rho $$ $$ \widehat{\text{arrested}} =\ 0.1770 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} \\ + 0.0146 \cdot \text{zone}_{\text{B}} + 0.0061 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} \\ + 0.0019 \cdot \text{zone}_{\text{E}} - 0.0027 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} \\ + \text{(interaction terms for race and zone)} $$ ```{r} #| message: false library(tidyverse) library(primer.data) library(tidymodels) library(broom) library(marginaleffects) ``` ```{r} #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) fit_stops ``` ```{r} fit_stops_logistic <- logistic_reg() |> set_engine("glm") |> fit(as.factor(arrested) ~ sex + race, data = x) tidy(fit_stops_logistic, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate(across(where(is.numeric), ~round(., 3))) |> knitr::kable( caption = "Logistic Regression Estimates for Arrest Probability (Source: Traffic stops dataset filtered for Black and White drivers)" ) ``` ```{r} # Load necessary libraries library(ggplot2) library(dplyr) library(scales) library(tidytext) # for reorder_within() and scale_x_reordered() # Create a polished plot plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, y = estimate, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5), linewidth = 0.8, alpha = 0.8) + geom_point(size = 2.5, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + # ← Corrected function name scale_y_continuous(labels = percent_format(accuracy = 1)) + scale_color_manual(values = c("Black" = "#1b9e77", "White" = "#d95f02")) + labs( title = "Predicted Arrest Probability by Race, Zone, and Sex", subtitle = "Black drivers face higher arrest rates across zones, especially among males", x = "Zone", y = "Predicted Probability of Arrest", caption = "Source: New Orleans Traffic Stops Dataset" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", size = 16), plot.subtitle = element_text(size = 13, margin = margin(b = 10)), plot.caption = element_text(size = 10, hjust = 0), axis.text.x = element_text(size = 10, angle = 45, hjust = 1), legend.position = "top", strip.text = element_text(face = "bold", size = 12) ) ``` >
temperance-16 question https://sajida25.github.io/stops/
temperance-17 question https://github.com/Sajida25/stops
minutes question 180