| tutorial-id |
none |
131-stops |
| name |
question |
Sharjeel Jamal |
| email |
question |
sharjeeljamalg@gmail.com |
| ID |
question |
Sharjeel |
| introduction-1 |
question |
Wisdom Justice, Courage Temperance |
| introduction-2 |
question |
show_file(".gitignore")
stop_files |
| introduction-3 |
question |
#| message: false
library(tidyverse)
librar(primer.data) |
| introduction-4 |
question |
library(tidyverse)
── Attaching core tidyverse packages ───────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors |
| introduction-5 |
question |
This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department. |
| introduction-6 |
question |
The causal effect refers to the difference in outcomes between the treatment group and the control group. |
| introduction-7 |
question |
The fundamental problem of causal inference is that we can only observe one outcome at a given moment in time. |
| introduction-8 |
question |
arrested as outcome variable. |
| introduction-9 |
question |
Mask as an imaginary variable. People can manipulate it how they used masks to avoid arrests. |
| introduction-10 |
question |
There are two potential outcomes . People who weared mask and arrested and who did not wear and arrested . |
| introduction-11 |
question |
People wearing mask and getting arrested , let's imaines their outcome is 9 and people who did not wear mask and arrested outcome as 3. Causal effect will be the difference between these two that is 6. |
| introduction-12 |
question |
Race |
| introduction-13 |
question |
Black people who got arrested and did not wear a mask, the second could be the white people who did not wear a mask and got arrested. |
| introduction-14 |
question |
What racial group do most arrested individuals belong to? |
| wisdom-1 |
question |
Wisdom requires the creation of a Preceptor Table, an examination of our data, and a determination, using the concept of “validity,” as to whether that the two come from the same population. |
| wisdom-2 |
question |
A Preceptor Table is the smallest possible table with rows and columns such that, if none of the data is missing, then the things we want to know are easy to calculate. |
| wisdom-3 |
question |
Perceptor Table has units as rows, outcomes, and covariates as columns. |
| wisdom-4 |
question |
White or Black motorists |
| wisdom-5 |
question |
Arrested |
| wisdom-6 |
question |
Race |
| wisdom-7 |
question |
This is a predictive model and there are no treatments. |
| wisdom-8 |
question |
This refers to the specific moment in time when the data was collected. |
| wisdom-9 |
question |
The preceptor table for this problem will have motorists as units (or rows), arrested individuals as columns, and race and zone as covariate columns. |
| wisdom-10 |
question |
What types of people from different age and racial groups are being arrested during traffic stops? |
| wisdom-11 |
question |
Which race of people is most often arrested during traffic stops? We used data from the American Survey Association to examine this. |
| justice-1 |
question |
Population tables, validity, stability, representativeness, and unconfoundness in causal models are five components of justice. |
| justice-2 |
question |
Validity is aboout that preceptor table and data are drawn from the same population table. |
| justice-3 |
question |
Validity may not hold if the data and preceptor table contain different races. |
| justice-4 |
question |
The population table is derived from the data, while the preceptor table is also drawn from the same source. |
| justice-5 |
question |
Both the observations and the time should be consistent throughout the entire time period in the population table. |
| justice-6 |
question |
Stability refers to consistency of data across time periods. |
| justice-7 |
question |
The assumption of stability will not hold if we run the study again and obtain different results in the case of arrested individuals. |
| justice-8 |
question |
The assumption of stability will not hold if we run the study again and obtain different results in the case of arrested individuals. |
| justice-9 |
question |
The assumption of representativeness may not hold if the data collected does not reflect the overall population, particularly in terms of whether all demographic groups are represented among those being arrested. |
| justice-10 |
question |
The assumption of representativeness may not hold if the data collected does not reflect the overall population, particularly in terms of whether all demographic groups are represented among those being arrested. |
| justice-11 |
question |
Unconfoundedness concerns the causal model. It means that treatment, if any, is independent of the outcome. The treated variable does not affect the outcome variable. |
| justice-12 |
question |
library(tidymodels)
── Attaching packages ─────────────────────────────────────────────────────────────────── tidymodels 1.3.0 ──
✔ broom 1.0.8 ✔ rsample 1.3.0
✔ dials 1.4.0 ✔ tune 1.3.0
✔ infer 1.0.8 ✔ workflows 1.2.0
✔ modeldata 1.4.0 ✔ workflowsets 1.1.1
✔ parsnip 1.3.2 ✔ yardstick 1.3.2
✔ recipes 1.3.1
── Conflicts ────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts. |
| justice-13 |
question |
library(broom) |
| justice-14 |
question |
$$
Y_i \sim \text{Bernoulli}(p_i)
$$
$$
\log\left( \frac{p_i}{1 - p_i} \right) = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \cdots + \beta_k X_{ki}
$$ |
| justice-15 |
question |
However, our model may be limited by unobserved confounding variables—such as the severity of the offense or officer discretion—that could bias the estimated effect of race on arrest outcomes |
| courage-1 |
question |
Courage is related to understanding the mechanism that generates data. |
| courage-2 |
exercise |
linear_reg(engine = "lm") |
| courage-3 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex, data = x) |
| courage-4 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex, data = x) |>
tidy(conf.int = TRUE) |
| courage-5 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ race, data = x) |
| courage-6 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ race, data = x) |>
tidy(conf.int = TRUE) |
| courage-7 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race, data = x) |>
tidy(conf.int = TRUE) |
| courage-8 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x) |>
tidy(conf.int = TRUE) |
| courage-9 |
exercise |
fit_stops |
| courage-10 |
question |
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x) |
| courage-11 |
question |
library(easystats)
# Attaching packages: easystats 0.7.4.5 (red = needs update)
✖ bayestestR 0.16.0 ✖ correlation 0.8.7
✖ datawizard 1.1.0 ✔ effectsize 1.0.1
✖ insight 1.3.0 ✖ modelbased 0.11.2
✖ performance 0.14.0 ✖ parameters 0.26.0
✔ report 0.6.1 ✔ see 0.11.0 |
| courage-13 |
question |
$$
\hat{Y}_i = 0.12 - 0.04 \cdot \text{Female}_i - 0.10 \cdot \text{White}_i + 0.08 \cdot \text{Downtown}_i - 0.05 \cdot (\text{White}_i \times \text{Downtown}_i)
$$ |
| courage-14 |
question |
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x) |
| courage-15 |
question |
stop_files
/.quarto/
*_cache
Warning message:
In readLines(path) : incomplete final line found on '.gitignore |
| courage-16 |
exercise |
tidy(fit_stops, conf.int = TRUE) |
| courage-17 |
question |
#| label: fit-logistic-model
#| cache: true
library(tidymodels)
library(dplyr)
library(knitr)
# Fit logistic regression (GLM with logit link)
fit_stops_logistic <- logistic_reg() |>
set_engine("glm") |>
fit(as.factor(arrested) ~ sex + race, data = x)
# Tidy model and display clean table
tidy(fit_stops_logistic, conf.int = TRUE) |>
select(term, estimate, conf.low, conf.high) |>
mutate(across(where(is.numeric), ~round(., 3))) |>
knitr::kable(
caption = "Logistic Regression Estimates for Arrest Probability (Source: Traffic stops dataset filtered for Black and White drivers)"
) |
| courage-18 |
question |
We model the likelihood of arrest during a traffic stop—coded as either arrested or not arrested—as a logistic function of driver characteristics such as race and sex |
| temperance-1 |
question |
Temperance we can create posteriors of the quantities of interest. |
| temperance-2 |
question |
A positive value of 0.016 suggests that males are more likely to be arrested.. |
| temperance-3 |
question |
It indicates that white people are less likely to be arrested. |
| temperance-4 |
question |
The estimate of 0.18 for the intercept means that the predicted probability of arrest during a traffic stop is 18% for the reference group. |
| temperance-5 |
question |
library(marginaleffects)
Please cite the software developers who make your work possible.
One package: citation("package_name")
All project packages: softbib::softbib() |
| temperance-6 |
question |
What race, gender, and area do people belong to when they are arrested during traffic stops? |
| temperance-7 |
question |
predictions(fit_stops)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 |
| temperance-8 |
question |
plot_predictions(fit_stops, by = "sex") |
| temperance-9 |
question |
plot_predictions(fit_stops, condition = "sex") |
| temperance-10 |
question |
plot_predictions(fit_stops, condition = c("sex", "race")) |
| temperance-11 |
question |
library(ggplot2)
ggplot(data = plot_data, aes(x = sex, y = arrested, color = race)) +
geom_point(position = position_dodge(width = 0.4), size = 3) +
geom_errorbar(
aes(ymin = arrested - se, ymax = arrested + se),
width = 0.1,
position = position_dodge(width = 0.4)
) +
labs(
title = "Arrest Rates by Race and Sex During Traffic Stops",
subtitle = "Black drivers, especially males, face higher probabilities of arrest than White drivers",
x = "Driver Sex",
y = "Probability of Arrest",
caption = "Source: New Orleans traffic stops dataset, filtered for Black and White drivers",
color = "Race"
) +
theme_minimal(base_size = 12) +
scale_color_manual(values = c("Black" = "tomato", "White" = "cyan3")) |
| temperance-12 |
question |
library(ggplot2)
ggplot(data = plot_data, aes(x = sex, y = arrested, color = race)) +
geom_point(position = position_dodge(width = 0.4), size = 3) +
geom_errorbar(
aes(ymin = arrested - se, ymax = arrested + se),
width = 0.1,
position = position_dodge(width = 0.4)
) +
labs(
title = "Arrest Rates by Race and Sex During Traffic Stops",
subtitle = "Black drivers, especially males, face higher probabilities of arrest than White drivers",
x = "Driver Sex",
y = "Probability of Arrest",
caption = "Source: New Orleans traffic stops dataset, filtered for Black and White drivers",
color = "Race"
) +
theme_minimal(base_size = 12) +
scale_color_manual(values = c("Black" = "tomato", "White" = "cyan3")) |
| temperance-13 |
question |
For example, our logistic regression estimates that Black male drivers have approximately a 3.1 percentage point higher probability of being arrested compared to White male drivers, with a 95% confidence interval ranging from 1.2 to 5.0 percentage points. |
| temperance-14 |
question |
Estimates may be inaccurate due to data inconsistencies or unrepresentative sampling. |
| temperance-15 |
question |
tutorial.helpers::show_file("stops.qmd")
---
title: "Stops"
format: html
execute:
echo: false
---
```{r}
#| message: false
library(tidyverse)
library(primer.data)
library(tidymodels)
library(broom)
library(marginaleffects)
```
```{r}
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
```
```{r}
#| label: fit-logistic-model
#| cache: true
library(tidymodels)
library(dplyr)
library(knitr)
# Fit logistic regression (GLM with logit link)
fit_stops_logistic <- logistic_reg() |>
set_engine("glm") |>
fit(as.factor(arrested) ~ sex + race, data = x)
# Tidy model and display clean table
tidy(fit_stops_logistic, conf.int = TRUE) |>
select(term, estimate, conf.low, conf.high) |>
mutate(across(where(is.numeric), ~round(., 3))) |>
knitr::kable(
caption = "Logistic Regression Estimates for Arrest Probability (Source: Traffic stops dataset filtered for Black and White drivers)"
)
```
$$
Y_i \sim \text{Bernoulli}(p_i)
$$
$$
\log\left( \frac{p_i}{1 - p_i} \right) = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \cdots + \beta_k X_{ki}
$$
## Fitted Linear Model
$$
\hat{Y}_i = 0.12 - 0.04 \cdot \text{Female}_i - 0.10 \cdot \text{White}_i + 0.08 \cdot \text{Downtown}_i - 0.05 \cdot (\text{White}_i \times \text{Downtown}_i)
$$
```{r}
library(dplyr)
# Summary table with mean and standard error by sex and race
plot_data <- x |>
group_by(sex, race) |>
summarize(
arrested = mean(arrested),
se = sd(arrested) / sqrt(n()), # Standard error = SD / sqrt(n)
.groups = "drop"
)
```
```{r}
library(ggplot2)
ggplot(data = plot_data, aes(x = sex, y = arrested, color = race)) +
geom_point(position = position_dodge(width = 0.4), size = 3) +
geom_errorbar(
aes(ymin = arrested - se, ymax = arrested + se),
width = 0.1,
position = position_dodge(width = 0.4)
) +
labs(
title = "Arrest Rates by Race and Sex During Traffic Stops",
subtitle = "Black drivers, especially males, face higher probabilities of arrest than White drivers",
x = "Driver Sex",
y = "Probability of Arrest",
caption = "Source: New Orleans traffic stops dataset, filtered for Black and White drivers",
color = "Race"
) +
theme_minimal(base_size = 12) +
scale_color_manual(values = c("Black" = "tomato", "White" = "cyan3"))
```
#**Summary Paragraph**
Disparities in policing outcomes across racial groups continue to raise concern, especially regarding how race and location affect the chances of arrest during traffic stops. To explore this, we analyze data from a study of drivers in New Orleans to examine how a driver's race relates to their likelihood of being arrested.However, our model may be limited by unobserved confounding variables—such as the severity of the offense or officer discretion—that could bias the estimated effect of race on arrest outcomes.We model the likelihood of arrest during a traffic stop—coded as either arrested or not arrested—as a logistic function of driver characteristics such as race and sex.For example, our logistic regression estimates that Black male drivers have approximately a 3.1 percentage point higher probability of being arrested compared to White male drivers, with a 95% confidence interval ranging from 1.2 to 5.0 percentage points. |
| temperance-16 |
question |
https://Sharjeel46.github.io/Stops/ |
| temperance-17 |
question |
https://github.com/Sharjeel46/Stops |
| minutes |
question |
200 |