id submission_type answer
tutorial-id none stops
name question Abdul Hannan
email question abdul.hannan20008@gmail.com
introduction-1 question The four Cardinal Virtues, in order, that guide our data science work are: 1. Wisdom 2. Courage 3. Temperance 4. Justice
the-question-1 exercise library(tidyverse)
the-question-2 exercise library(primer.data)
the-question-3 question stops contains data from over 400,000 traffic stops in New Orleans from July 1, 2011 to July 18, 2018. The dataset includes information about the date, time, and location of each stop, as well as demographic details about the driver and the outcomes of the stop.
the-question-4 question The outcome variable is arrested, which is a binary variable showing whether a person was arrested (TRUE) or not (FALSE) during a traffic stop.
the-question-5 question We can create a new treatment variable called mask, which indicates whether a driver was wearing a mask during the stop (TRUE or FALSE). We might manipulate this by asking some drivers to wear a mask and others not to, then observe how it affects the chance of arrest.
the-question-6 question There are two potential outcomes for each person: What would happen if they wore a mask. What would happen if they did not wear a mask.
the-question-7 question Let’s say for one person: If wearing a mask: not arrested (0) If not wearing a mask: arrested (1) The causal effect = 0 - 1 = -1 A causal effect of -1 means wearing a mask reduced the chance of arrest.
the-question-8 question One variable that might help predict arrests is age. Different age groups might face different arrest rates.
the-question-9 question We can compare: Black drivers White drivers These groups might have different average arrest rates.t’s important not to say one race “causes” more arrests — just that we observe differences between groups.
the-question-10 question What is the difference in arrest probability between Black and White drivers during traffic stops?
wisdom-1 question Wisdom means asking good questions, thinking clearly about what we’re doing, and making sure we understand the problem before jumping into the data. It’s about understanding the goal, the context, and being thoughtful about what we analyze and why.
wisdom-2 question A Preceptor Table is a simple table that includes the outcome we care about and a few important covariates we’ll use to answer our main question. It shows one row per unit (like one person or one stop).
wisdom-3 question A Preceptor Table includes: Units: the individual cases (like traffic stops) Outcomes: what happened (e.g., arrested or not) Covariates: characteristics that help explain the outcome (e.g., race, sex, time)
wisdom-4 question > show_file("stops.qmd") --- title: "Stops" format: html --- >
wisdom-5 question Each unit is a single traffic stop.
wisdom-6 question The outcome is whether or not someone was arrested — this is the arrested variable.
wisdom-7 question Some useful covariates could include: Race Sex Age Zone or neighborhood Time of day Reason for stop These are things that might affect whether someone gets arrested.
wisdom-8 question There is no treatment variable in this problem because it's a predictive model, not a causal one. But race and other covariates are key predictors.
wisdom-9 question It refers to the moment just after the traffic stop, when we know whether or not an arrest occurred.
wisdom-10 question A causal effect is the difference between what happens under two different scenarios: one where a treatment is applied and one where it isn’t. It's the difference between two potential outcomes for the same unit. Imagine a person wearing a mask vs. not wearing a mask — the change in their chance of being arrested is the causal effect.
wisdom-11 question We can never observe both potential outcomes for the same unit at the same time — we only see one. That makes it impossible to directly measure causal effects.
wisdom-12 question Since we can't manipulate race or other variables here, we can’t make a true causal claim. We can only observe differences, not causes.
wisdom-13 question The Preceptor Table includes: ID Outcome: arrested Covariates: race, sex, zone Each row is one stop.
wisdom-14 question > show_file("stops.qmd", start = -5) --- title: "Stops" format: html --- > show_file("stops.qmd", start = -5) ```{r} library(tidyverse) library(primer.data) ``` Warning message: In readLines(path) : incomplete final line found on 'stops.qmd' >
wisdom-15 question Validity means that the columns in our Preceptor Table match the columns in the actual data — they measure the same things in the same way.
wisdom-16 question Validity might not hold if, for example, the race column in the dataset is missing values or coded differently than expected — so it doesn’t match the Preceptor Table’s idea of race.
wisdom-17 question Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers.
wisdom-18 question tutorial.helpers::show_file("stops.qmd", chunk = "last") --- title: "Stops" format: html # In YAML header: execute: echo: false message: false warning: false --- ```{r} library(tidyverse) library(primer.data) ``` ```{r} #| label: eda x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) ``` # Summary: Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers. >
justice-1 question The four components of Justice in data science are: Stability – relationships in data stay the same over time Representativeness – data should reflect the bigger population Unconfoundedness – no hidden variables affect our results Awareness of potential bias – knowing where things might be unfair or unequal
justice-2 question A Population Table is a big imaginary table that includes all the people or units we care about — not just the ones in our dataset or Preceptor Table. It represents the full group we want to learn about.
justice-3 question Stability means that the relationships between variables (like how race affects arrests) stay the same over time. So if the relationship was true when we collected the data, it's still true later when we use it.
justice-4 question Laws, police training, or public attitudes may have changed over time, which could change how race affects arrest rates — even if our data is from before those changes.
justice-5 question Representativeness means our data looks like the full population we care about. If it doesn’t, our results might not apply to the whole group.
justice-6 question The traffic stop data might only include certain areas or times, which means it doesn’t cover the whole city or population fairly — some groups might be underrepresented.
justice-7 question The Preceptor Table might focus on a very specific group (like older drivers), but the population includes all drivers. So the population might not match the smaller group we care most about.
justice-8 question Unconfoundedness means there are no hidden variables affecting both treatment and outcome. The best way to make this happen is by randomly assigning who gets the treatment.
justice-9 question Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers. We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased.
courage-1 question Courage in data analysis means being willing to start exploring and modeling even when you don’t know exactly what the results will show. You trust the process, learn from what you find, and keep going.
courage-2 exercise library(tidymodels)
courage-3 exercise library(broom)
courage-5 question > tutorial.helpers::show_file("stops.qmd", pattern = "library") library(tidyverse) library(primer.data) library(broom) library(tidymodels) >
courage-6 exercise linear_reg(engine = "lm")
courage-7 exercise linear_reg(engine = "lm") %>% fit(arrested ~ sex, data = x)
courage-8 exercise linear_reg(engine = "lm") %>% fit(arrested ~ sex, data = x) %>% tidy(conf.int = TRUE)
courage-9 exercise linear_reg(engine = "lm") %>% fit(arrested ~ race, data = x)
courage-10 exercise linear_reg(engine = "lm") %>% fit(arrested ~ race, data = x) %>% tidy(conf.int = TRUE)
courage-11 exercise linear_reg(engine = "lm") %>% fit(arrested ~ sex + race, data = x)
courage-12 exercise linear_reg(engine = "lm") %>% fit(arrested ~ sex + race * zone, data = x)
courage-13 exercise fit_stops
courage-15 exercise library(easystats)
courage-17 exercise check_predictions(extract_fit_engine(fit_stops))
courage-18 question $$ \widehat{\text{arrested}} = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + \ldots + \text{(interaction terms)} $$
courage-19 question > tutorial.helpers::show_file("stops.qmd", pattern = "library") library(tidyverse) library(primer.data) library(broom) library(tidymodels) Warning message: In readLines(path) : incomplete final line found on 'stops.qmd' > tutorial.helpers::show_file("stops.qmd", pattern = "library") library(tidyverse) library(primer.data) library(broom) library(tidymodels) > tutorial.helpers::show_file("stops.qmd", start = -8) fit(arrested ~ sex + race * zone, data = x) ``` # Summary: Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers. We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased. >
courage-20 question > tutorial.helpers::show_file(".gitignore") *_files/ *_cache >
courage-21 exercise tidy(fit_stops, conf.int = TRUE)
courage-22 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") # tidy data tidy(fit_stops, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high) %>% gt() %>% tab_header(title = "Model Estimates") %>% tab_source_note(source_note = "Source: Open Policing Project") >
courage-23 question We model the likelihood of being arrested — a binary outcome — as a logistic function of a person’s sex, race, and the zone where the stop happened, including interactions between race and zone.
temperance-1 question Temperance means being careful and honest when using your model. Even if the model gives good answers, we shouldn’t act like it's the perfect truth. Models help us make better decisions, but they are based on assumptions, which may not always be correct.
temperance-2 question All else equal, being male increases the predicted chance of getting arrested by about 0.06 on the log-odds scale, compared to being female.
temperance-3 question White drivers are predicted to be slightly less likely to be arrested than Black drivers (the baseline group), by about 0.04 on the log-odds scale.
temperance-4 question The baseline group (Black females in Zone A) has a log-odds of 0.18 for being arrested. This is the starting point, and other variable values add or subtract from this.
temperance-5 exercise library(marginaleffects)
temperance-6 question General topic: Racial disparities in arrests during traffic stops in New Orleans Specific question: Are Black drivers more likely to be arrested than White drivers, after accounting for location (zone) and gender?
temperance-7 exercise plot_predictions(fit_stops, condition = c("sex", "race"))
temperance-8 exercise plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5)) + geom_point(aes(y = estimate), size = 1, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + theme(axis.text.x = element_text(size = 8)) + scale_y_continuous(labels = percent_format())
temperance-9 question plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5)) + geom_point(aes(y = estimate), size = 1, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + theme(axis.text.x = element_text(size = 8)) + scale_y_continuous(labels = scales::percent_format()) + labs( title = "Predicted Arrest Rates by Race, Sex, and Zone", subtitle = "Black drivers—especially males—face higher predicted arrest rates in most zones", caption = "Source: Open Policing Project — New Orleans Traffic Stop Data", y = "Predicted Arrest Probability", x = "Zone" )
temperance-10 question > tutorial.helpers::show_file("stops.qmd", start = -8) x = "Zone" ) ``` # Summary: Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers. We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased.We model the likelihood of being arrested — a binary outcome — as a logistic function of a person’s sex, race, and the zone where the stop happened, including interactions between race and zone. >
temperance-11 question The predicted arrest rate for Black males is 32%, compared to 24% for White females, with a 95% confidence interval of roughly ±2%.
temperance-12 question Our model may be biased if we didn’t include all important variables (like officer identity or time of day). Maybe the real difference is smaller or larger. A better estimate might be 28% vs. 22%, if unmeasured factors were accounted for.
temperance-13 question > tutorial.helpers::show_file("stops.qmd") --- title: "Stops" format: html # In YAML header: execute: echo: false message: false warning: false freeze: true --- ```{r} library(tidyverse) library(primer.data) library(broom) library(tidymodels) library(gt) library(marginaleffects) library(tidytext) ``` $$ P(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_nX_n)}} with Y \sim \text{Bernoulli}(\rho) $$ It shows how we estimate the chance of something happening (like being arrested), based on different variables like race, sex, etc. ```{r} #| label: eda x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) x <- x %>% mutate(arrested = as.factor(arrested)) x <- x %>% slice_sample(n = 15000) ``` <br> <br> <br> $$ \widehat{\text{arrested}} = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + \ldots + \text{(interaction terms)} $$ ```{r} #| cache: true fit_stops <- logistic_reg(engine = "glm", mode = "classification") %>% fit(arrested ~ sex + race * zone, data = x) ``` ```{r} # tidy data tidy(fit_stops, conf.int = TRUE) %>% select(term, estimate, conf.low, conf.high) %>% gt() %>% tab_header(title = "Logistic Regression Estimates") %>% fmt_number(columns = 2:4, decimals = 3) %>% tab_spanner( label = "95% Confidence Interval", columns = c(conf.low, conf.high) ) ``` <br> <br> ```{r} #| cache: true plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5)) + geom_point(aes(y = estimate), size = 1, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + theme(axis.text.x = element_text(size = 8)) + scale_y_continuous(labels = scales::percent_format()) + labs( title = "Predicted Arrest Rates by Race, Sex, and Zone", subtitle = "Black drivers—especially males—face higher predicted arrest rates in most zones", caption = "Source: Open Policing Project — New Orleans Traffic Stop Data", y = "Predicted Arrest Probability", x = "Zone" ) ``` # Summary: Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers. We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased.We model the likelihood of being arrested — a binary outcome — as a logistic function of a person’s sex, race, and the zone where the stop happened, including interactions between race and zone.The predicted arrest rate for Black males is 32%, compared to 24% for White females, with a 95% confidence interval of roughly ±2%. >
temperance-14 question https://abdul-hannan96.github.io/stops/
temperance-15 question https://github.com/Abdul-Hannan96/stops.git
minutes question 90