| tutorial-id |
none |
131-stops |
| name |
question |
Sajida Rehman |
| email |
question |
sajidarehman259@gmail.com |
| ID |
question |
6674 |
| introduction-1 |
question |
Wisdom, Justice, Courage, and Temperance. |
| introduction-2 |
question |
> show_file(".gitignore")
stops_files
> |
| introduction-3 |
question |
> show_file("stops.qmd", chunk = "Last")
#| message: false
library(tidyverse)
library(primer.data)
> |
| introduction-4 |
question |
> library(tidyverse)
── Attaching core tidyverse packages ──────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
> |
| introduction-5 |
question |
Description
This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department. |
| introduction-6 |
question |
A causal effect is the difference between two potential outcomes. |
| introduction-7 |
question |
The fundamental problem of causal inference is that we can only observe one potential outcome. |
| introduction-8 |
question |
outcome variable is arrested |
| introduction-9 |
question |
An example of a binary, manipulable variable is officer_warning_given, which indicates whether an officer gave a verbal warning before deciding to arrest; this can be influenced through policy or training to potentially reduce unnecessary arrests. |
| introduction-10 |
question |
Each arrest has two potential outcomes—one if the driver is wearing a mask and one if they are not—because we are considering how the treatment variable mask could causally affect the likelihood of arrest. |
| introduction-11 |
question |
For a single driver, the treatment variable mask can take on two values: 1 if the driver is wearing a mask, and 0 if they are not. Suppose that if the driver wears a mask (mask = 1), they are not arrested (outcome = 0), but if they do not wear a mask (mask = 0), they are arrested (outcome = 1). The causal effect of wearing a mask for this driver is the difference in outcomes:
0−1=−1, meaning that wearing a mask reduced the chance of arrest for this individual. |
| introduction-12 |
question |
One variable in the stops dataset that likely has an important connection to arrested is race. |
| introduction-13 |
question |
Black drivers and White drivers may have different average arrest rates during traffic stops, reflecting potential racial disparities in policing outcomes. |
| introduction-14 |
question |
How does a driver's race influence the probability of being arrested during a traffic stop? |
| wisdom-1 |
question |
Wisdom requires a question, the creation of a Preceptor Table and an examination of our data. |
| wisdom-2 |
question |
A Preceptor Table is the smallest possible table of data with rows and columns such that, if there is no missing data, we can easily calculate the quantities of interest. |
| wisdom-3 |
question |
The rows of the Preceptor Table are the units. The outcome is at least one of the columns. If the problem is causal, there will be at least two (potential) outcome columns. The other columns are covariates. If the problem is causal, at least one of the covariates will be a treatment. |
| wisdom-4 |
question |
The units for this problem are individual traffic stops. |
| wisdom-5 |
question |
The outcome variable for this problem is arrested, which indicates whether or not an arrest occurred during the traffic stop. |
| wisdom-6 |
question |
A useful covariate for this problem would be the reason for the stop (e.g., speeding, broken taillight, expired registration). |
| wisdom-7 |
question |
In this observational problem, there is no actual treatment applied. |
| wisdom-8 |
question |
The Preceptor Table refers to the moment after the traffic stop has occurred. |
| wisdom-9 |
question |
The Preceptor Table for this problem is a structured summary of data, where each row represents a single traffic stop involving one driver. For each stop, the table records the outcome, and several covariates that may help explain that outcome. These covariates include the driver’s race, sex, and possibly age and type of car, as well as the zone where the stop occurred. |
| wisdom-10 |
question |
Are Black drivers more likely to be arrested than White drivers, after accounting for age, sex, and zone? |
| wisdom-11 |
question |
Arrests during traffic stops represent a critical area for examining how individual characteristics may influence law enforcement outcomes. This analysis uses data from the Stanford Open Policing Project, comprising approximately 400,000 traffic stops conducted in New Orleans between 2011 and 2018, to investigate whether Black drivers are more likely to be arrested than White drivers, controlling for age, sex, and zone. |
| justice-1 |
question |
Justice concerns the Population Table and the four key assumptions which underlie it: validity, stability, representativeness, and unconfoundedness. |
| justice-2 |
question |
Validity is the consistency, or lack thereof, in the columns of the data set and the corresponding columns in the Preceptor Table. |
| justice-3 |
question |
One reason the assumption of validity might not hold is that the race column is based on the officer’s perception rather than self-identification, which could lead to misclassification and affect the accuracy of our analysis. |
| justice-4 |
question |
The Population Table includes a row for each unit/time combination in the underlying population from which both the Preceptor Table and the data are drawn. |
| justice-5 |
question |
Each row in the Population Table represents a unique unit/time combination, where the unit is an individual traffic stop involving a single driver, and the time is the specific date and time at which that stop occurred. |
| justice-6 |
question |
Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn. |
| justice-7 |
question |
One reason the assumption of stability might not hold is that officer behavior or department policies may change over time. |
| justice-8 |
question |
Representativeness, or the lack thereof, concerns two relationships among the rows in the Population Table. The first is between the data and the other rows. The second is between the other rows and the Preceptor Table. |
| justice-9 |
question |
One reason the assumption of representativeness might not hold is that the data only includes stops where complete information was recorded, and stops resulting in arrests were more likely to have missing values, so the observed data may not accurately reflect the overall population of all traffic stops. |
| justice-10 |
question |
One reason the assumption of representativeness might not be true is that the Preceptor Table excludes cases with missing data, whereas the full Population includes all traffic stops. |
| justice-11 |
question |
Unconfoundedness means that the treatment assignment is independent of the potential outcomes, when we condition on pre-treatment covariates. |
| justice-12 |
question |
> library(tidymodels)
── Attaching packages ────────────────────────────────────────── tidymodels 1.3.0 ──
✔ broom 1.0.8 ✔ rsample 1.3.0
✔ dials 1.4.0 ✔ tune 1.3.0
✔ infer 1.0.8 ✔ workflows 1.2.0
✔ modeldata 1.4.0 ✔ workflowsets 1.1.1
✔ parsnip 1.3.2 ✔ yardstick 1.3.2
✔ recipes 1.3.1
── Conflicts ───────────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Search for functions across packages at https://www.tidymodels.org/find/
> |
| justice-13 |
question |
> library(broom)
> |
| justice-14 |
question |
$$
Y \sim \text{Bernoulli}(\rho)
$$
$$
\rho = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k)}}
$$
$$
P(Y = 1) = \rho
$$ |
| justice-15 |
question |
One potential limitation of our model is that it is based only on complete cases, which may bias results if the excluded stops with missing data. |
| courage-1 |
question |
Courage starts with math, explores models, and then creates the data generating mechanism. |
| courage-2 |
exercise |
linear_reg(engine = "lm") |
| courage-3 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex, data = x) |
| courage-4 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex, data = x) |>
tidy(conf.int = TRUE) |
| courage-5 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ race, data = x) |>
tidy(conf.int = TRUE) |
| courage-6 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ race, data = x) |>
tidy(conf.int = TRUE) |
| courage-7 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex + race, data = x) |>
tidy(conf.int = TRUE) |
| courage-8 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex + race * zone, data = x) |>
tidy(conf.int = TRUE) |
| courage-9 |
exercise |
fit_stops |
| courage-10 |
question |
> fit_stops
parsnip model object
Call:
stats::lm(formula = arrested ~ sex + race * zone, data = data)
Coefficients:
(Intercept) sexMale raceWhite zoneB
0.1773298 0.0614460 -0.0445247 0.0146036
zoneC zoneD zoneE zoneF
0.0061012 0.0780600 0.0019025 -0.0027057
zoneG zoneH zoneI zoneJ
0.0308717 0.0757019 0.0330416 0.0237773
zoneK zoneL zoneM zoneN
0.0586687 -0.0038877 0.0393026 0.0139437
zoneO zoneP zoneQ zoneR
0.0232251 0.0140617 0.0126170 0.0119566
zoneS zoneT zoneU zoneV
0.0594727 0.0113267 0.0071986 0.0770051
zoneW zoneX zoneY raceWhite:zoneB
0.1143814 0.0057280 0.0386437 -0.0077384
raceWhite:zoneC raceWhite:zoneD raceWhite:zoneE raceWhite:zoneF
0.0065557 0.0294040 0.0068179 -0.0137965
raceWhite:zoneG raceWhite:zoneH raceWhite:zoneI raceWhite:zoneJ
0.0088500 0.0085970 -0.0339373 -0.0244272
raceWhite:zoneK raceWhite:zoneL raceWhite:zoneM raceWhite:zoneN
-0.0381747 -0.0075094 -0.0423222 -0.0566405
raceWhite:zoneO raceWhite:zoneP raceWhite:zoneQ raceWhite:zoneR
-0.0149832 0.0092133 -0.0544990 -0.0379411
raceWhite:zoneS raceWhite:zoneT raceWhite:zoneU raceWhite:zoneV
-0.0250048 -0.0272932 0.0383220 -0.0387945
raceWhite:zoneW raceWhite:zoneX raceWhite:zoneY
-0.1233162 0.0843196 -0.0002596
> |
| courage-11 |
question |
> library(easystats)
# Attaching packages: easystats 0.7.4 (red = needs update)
✖ bayestestR 0.16.0 ✖ correlation 0.8.7
✖ datawizard 1.1.0 ✔ effectsize 1.0.1
✖ insight 1.3.0 ✖ modelbased 0.11.2
✖ performance 0.14.0 ✖ parameters 0.26.0
✔ report 0.6.1 ✔ see 0.11.0
Restart the R-Session and update packages with `easystats::easystats_update()`.
> |
| courage-12 |
question |
> check_predictions(extract_fit_engine(fit_stops))
> |
| courage-13 |
question |
$$
\widehat{\text{arrested}} =\
0.1770
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}} \\
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.0061 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}} \\
+ 0.0019 \cdot \text{zone}_{\text{E}}
- 0.0027 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}} \\
+ \text{(interaction terms for race and zone)}
$$ |
| courage-14 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
fit_stops
> |
| courage-15 |
question |
> tutorial.helpers::show_file(".gitignore")
stops_files
*_cache
> |
| courage-16 |
exercise |
tidy(fit_stops, conf.int = TRUE) |
| courage-17 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| label: tbl-fit-summary
#| cache: true
tidy(fit_stops, conf.int = TRUE) |>
select(term, estimate, conf.low, conf.high) |>
mutate(across(where(is.numeric), ~round(.x, 3))) |>
gt() |>
tab_header(
title = "Estimated Coefficients and 95% Confidence Intervals",
subtitle = "Linear model for predicting arrest during traffic stops"
) |>
cols_label(
term = "Variable",
estimate = "Estimate",
conf.low = "Lower 95% CI",
conf.high = "Upper 95% CI"
)
> |
| courage-18 |
question |
We model the likelihood of being arrested during a traffic stop, a binary outcome, as a logistic function of driver sex, race, and the zone in which the stop occurred, including interaction effects between race and zone. |
| temperance-1 |
question |
Temperance uses the data generating mechanism to answer the questions with which we began. Humility reminds us that this answer is always a lie. We can also use the DGM to calculate many similar quantities of interest, displaying the results graphically. |
| temperance-2 |
question |
The estimated coefficient of 0.06 for sexMale suggests that, holding race and zone constant, male drivers are associated with a 6 percentage point higher probability of being arrested during a traffic stop compared to female drivers. |
| temperance-3 |
question |
The estimated coefficient of -0.04 for raceWhite indicates that, holding sex and zone constant, White drivers are associated with a 4 percentage point lower probability of being arrested during a traffic stop compared to black drivers. |
| temperance-4 |
question |
he estimated intercept of 0.18 represents the predicted probability of arrest for the reference group, typically female, non-White drivers in the baseline zone |
| temperance-5 |
question |
> library(marginaleffects)
> |
| temperance-6 |
question |
How does a driver's race, sex, and location (zone) influence the probability of being arrested during a traffic stop in New Orleans? |
| temperance-7 |
question |
> predictions(fit_stops)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric
> |
| temperance-8 |
question |
> plot_predictions(fit_stops, by = "sex")
> |
| temperance-9 |
question |
> plot_predictions(fit_stops, condition = "sex")
> |
| temperance-10 |
question |
plot_predictions(fit_stops, condition = c("sex", "race")) |
| temperance-11 |
question |
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(scales)
library(tidytext) # for reorder_within() and scale_x_reordered()
# Create a polished plot
plot_predictions(fit_stops$fit,
newdata = "balanced",
condition = c("zone", "race", "sex"),
draw = FALSE) |>
as_tibble() |>
group_by(zone, sex) |>
mutate(sort_order = estimate[race == "Black"]) |>
ungroup() |>
mutate(zone = reorder_within(zone, sort_order, sex)) |>
ggplot(aes(x = zone,
y = estimate,
color = race)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
width = 0.2,
position = position_dodge(width = 0.5),
linewidth = 0.8,
alpha = 0.8) +
geom_point(size = 2.5,
position = position_dodge(width = 0.5)) +
facet_wrap(~ sex, scales = "free_x") +
scale_x_reordered() + # ← Corrected function name
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_color_manual(values = c("Black" = "#1b9e77", "White" = "#d95f02")) +
labs(
title = "Predicted Arrest Probability by Race, Zone, and Sex",
subtitle = "Black drivers face higher arrest rates across zones, especially among males",
x = "Zone",
y = "Predicted Probability of Arrest",
caption = "Source: New Orleans Traffic Stops Dataset"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 13, margin = margin(b = 10)),
plot.caption = element_text(size = 10, hjust = 0),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
legend.position = "top",
strip.text = element_text(face = "bold", size = 12)
) |
| temperance-12 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(scales)
library(tidytext) # for reorder_within() and scale_x_reordered()
# Create a polished plot
plot_predictions(fit_stops$fit,
newdata = "balanced",
condition = c("zone", "race", "sex"),
draw = FALSE) |>
as_tibble() |>
group_by(zone, sex) |>
mutate(sort_order = estimate[race == "Black"]) |>
ungroup() |>
mutate(zone = reorder_within(zone, sort_order, sex)) |>
ggplot(aes(x = zone,
y = estimate,
color = race)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
width = 0.2,
position = position_dodge(width = 0.5),
linewidth = 0.8,
alpha = 0.8) +
geom_point(size = 2.5,
position = position_dodge(width = 0.5)) +
facet_wrap(~ sex, scales = "free_x") +
scale_x_reordered() + # ← Corrected function name
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_color_manual(values = c("Black" = "#1b9e77", "White" = "#d95f02")) +
labs(
title = "Predicted Arrest Probability by Race, Zone, and Sex",
subtitle = "Black drivers face higher arrest rates across zones, especially among males",
x = "Zone",
y = "Predicted Probability of Arrest",
caption = "Source: New Orleans Traffic Stops Dataset"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 13, margin = margin(b = 10)),
plot.caption = element_text(size = 10, hjust = 0),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
legend.position = "top",
strip.text = element_text(face = "bold", size = 12)
)
> |
| temperance-13 |
question |
We estimate that Black male drivers in Zone D face a 25% chance of arrest, with a 95% confidence interval of 23% to 27%. |
| temperance-14 |
question |
Our estimates may be biased due to unmeasured factors and data imbalance, such as overrepresentation of certain zones or officer bias. A better approach could involve weighting or mixed-effects models, which might lower the estimated arrest probability for Black drivers to around 22% with a 95% confidence interval of [20%, 24%]. |
| temperance-15 |
question |
> tutorial.helpers::show_file("stops.qmd")
---
title: "Stops"
author: "Sajida Rehman"
execute:
echo: false
format: html
---
Arrests during traffic stops represent a critical area for examining how individual characteristics may influence law enforcement outcomes. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop. However, the data used in both our Preceptor Table and dataset may not fully represent the broader population, as they may cover different time periods and could reflect biases from certain officers who unfairly target specific groups. We modeled arrested as a linear function of both sex and the product of race and zone. From this, we examined that Males are less likely of getting arrested than Females. Our analysis suggests that males are less likely to be arrested than females. Specifically, we estimate that Black drivers in New Orleans face about a 25% chance of being arrested during a traffic stop, compared to roughly 20% for White drivers, with this estimate incorporating inherent uncertainty.
$$
Y \sim \text{Bernoulli}(\rho)
$$
$$
\rho = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k)}}
$$
$$
P(Y = 1) = \rho
$$
$$
\widehat{\text{arrested}} =\
0.1770
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}} \\
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.0061 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}} \\
+ 0.0019 \cdot \text{zone}_{\text{E}}
- 0.0027 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}} \\
+ \text{(interaction terms for race and zone)}
$$
```{r}
#| message: false
library(tidyverse)
library(primer.data)
library(tidymodels)
library(broom)
library(marginaleffects)
```
```{r}
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
fit_stops
```
```{r}
fit_stops_logistic <- logistic_reg() |>
set_engine("glm") |>
fit(as.factor(arrested) ~ sex + race, data = x)
tidy(fit_stops_logistic, conf.int = TRUE) |>
select(term, estimate, conf.low, conf.high) |>
mutate(across(where(is.numeric), ~round(., 3))) |>
knitr::kable(
caption = "Logistic Regression Estimates for Arrest Probability (Source: Traffic stops dataset filtered for Black and White drivers)"
)
```
```{r}
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(scales)
library(tidytext) # for reorder_within() and scale_x_reordered()
# Create a polished plot
plot_predictions(fit_stops$fit,
newdata = "balanced",
condition = c("zone", "race", "sex"),
draw = FALSE) |>
as_tibble() |>
group_by(zone, sex) |>
mutate(sort_order = estimate[race == "Black"]) |>
ungroup() |>
mutate(zone = reorder_within(zone, sort_order, sex)) |>
ggplot(aes(x = zone,
y = estimate,
color = race)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
width = 0.2,
position = position_dodge(width = 0.5),
linewidth = 0.8,
alpha = 0.8) +
geom_point(size = 2.5,
position = position_dodge(width = 0.5)) +
facet_wrap(~ sex, scales = "free_x") +
scale_x_reordered() + # ← Corrected function name
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_color_manual(values = c("Black" = "#1b9e77", "White" = "#d95f02")) +
labs(
title = "Predicted Arrest Probability by Race, Zone, and Sex",
subtitle = "Black drivers face higher arrest rates across zones, especially among males",
x = "Zone",
y = "Predicted Probability of Arrest",
caption = "Source: New Orleans Traffic Stops Dataset"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 13, margin = margin(b = 10)),
plot.caption = element_text(size = 10, hjust = 0),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
legend.position = "top",
strip.text = element_text(face = "bold", size = 12)
)
```
> |
| temperance-16 |
question |
https://sajida25.github.io/stops/ |
| temperance-17 |
question |
https://github.com/Sajida25/stops |
| minutes |
question |
180 |