id submission_type answer
tutorial-id none 131-stops
name question Sharjeel Jamal
email question sharjeeljamalg@gmail.com
ID question Sharjeel
introduction-1 question Wisdom Justice, Courage Temperance
introduction-2 question show_file(".gitignore") stop_files
introduction-3 question #| message: false library(tidyverse) librar(primer.data)
introduction-4 question library(tidyverse) ── Attaching core tidyverse packages ───────────────────────────────────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package to force all conflicts to become errors
introduction-5 question This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question The causal effect refers to the difference in outcomes between the treatment group and the control group.
introduction-7 question The fundamental problem of causal inference is that we can only observe one outcome at a given moment in time.
introduction-8 question arrested as outcome variable.
introduction-9 question Mask as an imaginary variable. People can manipulate it how they used masks to avoid arrests.
introduction-10 question There are two potential outcomes . People who weared mask and arrested and who did not wear and arrested .
introduction-11 question People wearing mask and getting arrested , let's imaines their outcome is 9 and people who did not wear mask and arrested outcome as 3. Causal effect will be the difference between these two that is 6.
introduction-12 question Race
introduction-13 question Black people who got arrested and did not wear a mask, the second could be the white people who did not wear a mask and got arrested.
introduction-14 question What racial group do most arrested individuals belong to?
wisdom-1 question Wisdom requires the creation of a Preceptor Table, an examination of our data, and a determination, using the concept of “validity,” as to whether that the two come from the same population.
wisdom-2 question A Preceptor Table is the smallest possible table with rows and columns such that, if none of the data is missing, then the things we want to know are easy to calculate.
wisdom-3 question Perceptor Table has units as rows, outcomes, and covariates as columns.
wisdom-4 question White or Black motorists
wisdom-5 question Arrested
wisdom-6 question Race
wisdom-7 question This is a predictive model and there are no treatments.
wisdom-8 question This refers to the specific moment in time when the data was collected.
wisdom-9 question The preceptor table for this problem will have motorists as units (or rows), arrested individuals as columns, and race and zone as covariate columns.
wisdom-10 question What types of people from different age and racial groups are being arrested during traffic stops?
wisdom-11 question Which race of people is most often arrested during traffic stops? We used data from the American Survey Association to examine this.
justice-1 question Population tables, validity, stability, representativeness, and unconfoundness in causal models are five components of justice.
justice-2 question Validity is aboout that preceptor table and data are drawn from the same population table.
justice-3 question Validity may not hold if the data and preceptor table contain different races.
justice-4 question The population table is derived from the data, while the preceptor table is also drawn from the same source.
justice-5 question Both the observations and the time should be consistent throughout the entire time period in the population table.
justice-6 question Stability refers to consistency of data across time periods.
justice-7 question The assumption of stability will not hold if we run the study again and obtain different results in the case of arrested individuals.
justice-8 question The assumption of stability will not hold if we run the study again and obtain different results in the case of arrested individuals.
justice-9 question The assumption of representativeness may not hold if the data collected does not reflect the overall population, particularly in terms of whether all demographic groups are represented among those being arrested.
justice-10 question The assumption of representativeness may not hold if the data collected does not reflect the overall population, particularly in terms of whether all demographic groups are represented among those being arrested.
justice-11 question Unconfoundedness concerns the causal model. It means that treatment, if any, is independent of the outcome. The treated variable does not affect the outcome variable.
justice-12 question library(tidymodels) ── Attaching packages ─────────────────────────────────────────────────────────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.8 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.1 ✔ parsnip 1.3.2 ✔ yardstick 1.3.2 ✔ recipes 1.3.1 ── Conflicts ────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Use tidymodels_prefer() to resolve common conflicts.
justice-13 question library(broom)
justice-14 question $$ Y_i \sim \text{Bernoulli}(p_i) $$ $$ \log\left( \frac{p_i}{1 - p_i} \right) = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \cdots + \beta_k X_{ki} $$
justice-15 question However, our model may be limited by unobserved confounding variables—such as the severity of the offense or officer discretion—that could bias the estimated effect of race on arrest outcomes
courage-1 question Courage is related to understanding the mechanism that generates data.
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ sex, data = x) |> tidy(conf.int = TRUE)
courage-5 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ race, data = x)
courage-6 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x)
courage-11 question library(easystats) # Attaching packages: easystats 0.7.4.5 (red = needs update) ✖ bayestestR 0.16.0 ✖ correlation 0.8.7 ✖ datawizard 1.1.0 ✔ effectsize 1.0.1 ✖ insight 1.3.0 ✖ modelbased 0.11.2 ✖ performance 0.14.0 ✖ parameters 0.26.0 ✔ report 0.6.1 ✔ see 0.11.0
courage-13 question $$ \hat{Y}_i = 0.12 - 0.04 \cdot \text{Female}_i - 0.10 \cdot \text{White}_i + 0.08 \cdot \text{Downtown}_i - 0.05 \cdot (\text{White}_i \times \text{Downtown}_i) $$
courage-14 question x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x)
courage-15 question stop_files /.quarto/ *_cache Warning message: In readLines(path) : incomplete final line found on '.gitignore
courage-16 exercise tidy(fit_stops, conf.int = TRUE)
courage-17 question #| label: fit-logistic-model #| cache: true library(tidymodels) library(dplyr) library(knitr) # Fit logistic regression (GLM with logit link) fit_stops_logistic <- logistic_reg() |> set_engine("glm") |> fit(as.factor(arrested) ~ sex + race, data = x) # Tidy model and display clean table tidy(fit_stops_logistic, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate(across(where(is.numeric), ~round(., 3))) |> knitr::kable( caption = "Logistic Regression Estimates for Arrest Probability (Source: Traffic stops dataset filtered for Black and White drivers)" )
courage-18 question We model the likelihood of arrest during a traffic stop—coded as either arrested or not arrested—as a logistic function of driver characteristics such as race and sex
temperance-1 question Temperance we can create posteriors of the quantities of interest.
temperance-2 question A positive value of 0.016 suggests that males are more likely to be arrested..
temperance-3 question It indicates that white people are less likely to be arrested.
temperance-4 question The estimate of 0.18 for the intercept means that the predicted probability of arrest during a traffic stop is 18% for the reference group.
temperance-5 question library(marginaleffects) Please cite the software developers who make your work possible. One package: citation("package_name") All project packages: softbib::softbib()
temperance-6 question What race, gender, and area do people belong to when they are arrested during traffic stops?
temperance-7 question predictions(fit_stops) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
temperance-8 question plot_predictions(fit_stops, by = "sex")
temperance-9 question plot_predictions(fit_stops, condition = "sex")
temperance-10 question plot_predictions(fit_stops, condition = c("sex", "race"))
temperance-11 question library(ggplot2) ggplot(data = plot_data, aes(x = sex, y = arrested, color = race)) + geom_point(position = position_dodge(width = 0.4), size = 3) + geom_errorbar( aes(ymin = arrested - se, ymax = arrested + se), width = 0.1, position = position_dodge(width = 0.4) ) + labs( title = "Arrest Rates by Race and Sex During Traffic Stops", subtitle = "Black drivers, especially males, face higher probabilities of arrest than White drivers", x = "Driver Sex", y = "Probability of Arrest", caption = "Source: New Orleans traffic stops dataset, filtered for Black and White drivers", color = "Race" ) + theme_minimal(base_size = 12) + scale_color_manual(values = c("Black" = "tomato", "White" = "cyan3"))
temperance-12 question library(ggplot2) ggplot(data = plot_data, aes(x = sex, y = arrested, color = race)) + geom_point(position = position_dodge(width = 0.4), size = 3) + geom_errorbar( aes(ymin = arrested - se, ymax = arrested + se), width = 0.1, position = position_dodge(width = 0.4) ) + labs( title = "Arrest Rates by Race and Sex During Traffic Stops", subtitle = "Black drivers, especially males, face higher probabilities of arrest than White drivers", x = "Driver Sex", y = "Probability of Arrest", caption = "Source: New Orleans traffic stops dataset, filtered for Black and White drivers", color = "Race" ) + theme_minimal(base_size = 12) + scale_color_manual(values = c("Black" = "tomato", "White" = "cyan3"))
temperance-13 question For example, our logistic regression estimates that Black male drivers have approximately a 3.1 percentage point higher probability of being arrested compared to White male drivers, with a 95% confidence interval ranging from 1.2 to 5.0 percentage points.
temperance-14 question Estimates may be inaccurate due to data inconsistencies or unrepresentative sampling.
temperance-15 question tutorial.helpers::show_file("stops.qmd") --- title: "Stops" format: html execute: echo: false --- ```{r} #| message: false library(tidyverse) library(primer.data) library(tidymodels) library(broom) library(marginaleffects) ``` ```{r} #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) ``` ```{r} #| label: fit-logistic-model #| cache: true library(tidymodels) library(dplyr) library(knitr) # Fit logistic regression (GLM with logit link) fit_stops_logistic <- logistic_reg() |> set_engine("glm") |> fit(as.factor(arrested) ~ sex + race, data = x) # Tidy model and display clean table tidy(fit_stops_logistic, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate(across(where(is.numeric), ~round(., 3))) |> knitr::kable( caption = "Logistic Regression Estimates for Arrest Probability (Source: Traffic stops dataset filtered for Black and White drivers)" ) ``` $$ Y_i \sim \text{Bernoulli}(p_i) $$ $$ \log\left( \frac{p_i}{1 - p_i} \right) = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{3i} + \cdots + \beta_k X_{ki} $$ ## Fitted Linear Model $$ \hat{Y}_i = 0.12 - 0.04 \cdot \text{Female}_i - 0.10 \cdot \text{White}_i + 0.08 \cdot \text{Downtown}_i - 0.05 \cdot (\text{White}_i \times \text{Downtown}_i) $$ ```{r} library(dplyr) # Summary table with mean and standard error by sex and race plot_data <- x |> group_by(sex, race) |> summarize( arrested = mean(arrested), se = sd(arrested) / sqrt(n()), # Standard error = SD / sqrt(n) .groups = "drop" ) ``` ```{r} library(ggplot2) ggplot(data = plot_data, aes(x = sex, y = arrested, color = race)) + geom_point(position = position_dodge(width = 0.4), size = 3) + geom_errorbar( aes(ymin = arrested - se, ymax = arrested + se), width = 0.1, position = position_dodge(width = 0.4) ) + labs( title = "Arrest Rates by Race and Sex During Traffic Stops", subtitle = "Black drivers, especially males, face higher probabilities of arrest than White drivers", x = "Driver Sex", y = "Probability of Arrest", caption = "Source: New Orleans traffic stops dataset, filtered for Black and White drivers", color = "Race" ) + theme_minimal(base_size = 12) + scale_color_manual(values = c("Black" = "tomato", "White" = "cyan3")) ``` #**Summary Paragraph** Disparities in policing outcomes across racial groups continue to raise concern, especially regarding how race and location affect the chances of arrest during traffic stops. To explore this, we analyze data from a study of drivers in New Orleans to examine how a driver's race relates to their likelihood of being arrested.However, our model may be limited by unobserved confounding variables—such as the severity of the offense or officer discretion—that could bias the estimated effect of race on arrest outcomes.We model the likelihood of arrest during a traffic stop—coded as either arrested or not arrested—as a logistic function of driver characteristics such as race and sex.For example, our logistic regression estimates that Black male drivers have approximately a 3.1 percentage point higher probability of being arrested compared to White male drivers, with a 95% confidence interval ranging from 1.2 to 5.0 percentage points.
temperance-16 question https://Sharjeel46.github.io/Stops/
temperance-17 question https://github.com/Sharjeel46/Stops
minutes question 200