id submission_type answer
tutorial-id none 131-stops
name question Naveed Ahmad
email question naveedgill4u@yahoo.com
introduction-1 question Wisdom, Justice, Courage, and Temperance
introduction-2 question show_file(".gitignore") stops_files >
introduction-3 question show_file("stops.qmd", chunk = "Last") #| message: false library(tidyverse) library(primer.data) >
introduction-4 question library(tidyverse) + library(primer.data) ── Attaching core tidyverse packages ───────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.0.4 ── Conflicts ─────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package to force all conflicts to become errors >
introduction-5 question New Orleans Traffic Stops Data Description This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question Difference between two potential outcomes
introduction-7 question Two potential outcomes cannot be studied about an individual at the same time
introduction-8 question Zone
introduction-9 question Guidance, if the driver was guided it will carry 9 scores for future avoidance of stop and if not then it will carry 3 scores to avoid future stop
introduction-10 question Two, either may be arrested or not
introduction-11 question if wearing mask 1 score if not wearing mask 0 score casual effect 1-0 = 1
introduction-12 question sex
introduction-13 question female group may different average value of arrest then men
introduction-14 question what is the average difference of being arrested on the basis of race?
wisdom-1 question Wisdom guides for specific research question, preceptor table and assumption of validity
wisdom-2 question A table with units for each observation, outcome variable and covariates, if all values are filled, answer our research question.
wisdom-3 question A table with units for each observation, outcome variable and covariates
wisdom-4 question each stop
wisdom-5 question arrested
wisdom-6 question race, age, sex, place, time, date
wisdom-7 question Its predictive model, no treatment
wisdom-8 question present
wisdom-9 question Units for each individual stopped, arrested outcome variable, race, age, sex, zone, covariates
wisdom-10 question What is the average difference of arrest between white and black drivers adjusting sex and zone?
wisdom-11 question When stopped, the arrest of drivers may be predicted to very on some factors like race, sex, zone. In this study we will use data from the open policing project to predict the arrest by accounting for some covariates.
justice-1 question It involves creation of population table, and assumptions of validity, stability, representativeness and unconfoundedness.
justice-2 question The relationship between columns of preceptor table and columns of data set is similar
justice-3 question Columns of data may not have the similar columns in preceptor table, as individual stopped may have different attitude in both data and preceptor table
justice-4 question A table having rows with unit and time combination for recording all stopped with other variables
justice-5 question Unit is the row recording each stopped individual on specific time period
justice-6 question Stability means the relationship between columns of the population table is consistent with three categories of rows: rows of preceptor table, data and other rows of population table.
justice-7 question The police officer as well drivers and legal provisions may have changed over the time, behavior of drivers and police officers.
justice-8 question The preceptor table may have different drivers then in data and population table, or zone restrictions may have changed
justice-9 question The population may have drivers who were not educated as in the data set
justice-10 question The preceptor table may have individual drivers who were more trained then in population table.
justice-11 question Treatment assignment should be independent or unbiased
justice-12 question library(tidymodels) ── Attaching packages ─────────────────────────────────────── tidymodels 1.3.0 ──
justice-13 question library(broom)
justice-14 question Since $Y$ is a binary variable (with exactly two possible values), the probability family is **Bernoulli**: $$ Y \sim \text{Bernoulli}(\rho) $$ where $\rho$ is the probability that one of the two possible values — conventionally referred to as 1 (or TRUE) — occurs. By definition, $1 - \rho$ is the probability of the other value. For a binary outcome variable, we use a **log-odds model**: $$ \log\left(\frac{\rho}{1 - \rho}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k $$
justice-15 question However, our assumptions that data is valid, stable and representative of all drivers if not proved fully, the predictions may very. Because, the divers or offcicers may have change in behavior, education, traing, and also change in zone may differ hence, our predictions about arrest may very from actual.
courage-1 question it decide about Data Generating Mechanizm
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x) |> tidy(conf.int = TRUE)
courage-5 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x)
courage-6 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race*zone, data = x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question fit_stops parsnip model object Call: stats::lm(formula = arrested ~ sex + race * zone, data = data) Coefficients: (Intercept) sexMale raceWhite zoneB 0.1773298 0.0614460 -0.0445247 0.0146036 zoneC zoneD zoneE zoneF 0.0061012 0.0780600 0.0019025 -0.0027057 zoneG zoneH zoneI zoneJ 0.0308717 0.0757019 0.0330416 0.0237773 zoneK zoneL zoneM zoneN 0.0586687 -0.0038877 0.0393026 0.0139437 zoneO zoneP zoneQ zoneR 0.0232251 0.0140617 0.0126170 0.0119566 zoneS zoneT zoneU zoneV 0.0594727 0.0113267 0.0071986 0.0770051 zoneW zoneX zoneY raceWhite:zoneB 0.1143814 0.0057280 0.0386437 -0.0077384 raceWhite:zoneC raceWhite:zoneD raceWhite:zoneE raceWhite:zoneF 0.0065557 0.0294040 0.0068179 -0.0137965 raceWhite:zoneG raceWhite:zoneH raceWhite:zoneI raceWhite:zoneJ 0.0088500 0.0085970 -0.0339373 -0.0244272 raceWhite:zoneK raceWhite:zoneL raceWhite:zoneM raceWhite:zoneN -0.0381747 -0.0075094 -0.0423222 -0.0566405 raceWhite:zoneO raceWhite:zoneP raceWhite:zoneQ raceWhite:zoneR -0.0149832 0.0092133 -0.0544990 -0.0379411 raceWhite:zoneS raceWhite:zoneT raceWhite:zoneU raceWhite:zoneV -0.0250048 -0.0272932 0.0383220 -0.0387945 raceWhite:zoneW raceWhite:zoneX raceWhite:zoneY -0.1233162 0.0843196 -0.0002596 >
courage-11 question library(easystats) # Attaching packages: easystats 0.7.4 ✔ bayestestR 0.16.0 ✔ correlation 0.8.7 ✔ datawizard 1.1.0 ✔ effectsize 1.0.1 ✔ insight 1.3.0 ✔ modelbased 0.11.2 ✔ performance 0.14.0 ✔ parameters 0.26.0 ✔ report 0.6.1 ✔ see 0.11.0 >
courage-12 question check_predictions(extract_fit_engine(fit_stops)) >
courage-13 question $$ \widehat{\text{arrested}} = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \text{(interaction terms for race and zone)} $$
courage-14 question tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) >
courage-15 question tutorial.helpers::show_file(".gitignore") stops_files *_cache >
courage-16 exercise tidy(fit_stops, conf.int = TRUE)
courage-17 question tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| message: false #| warning: false stops_tbl <- tidy(fit_stops, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate(across(c(estimate, conf.low, conf.high), round, 2)) |> gt() |> tab_header( title = "Coefficients with 95% Confidence Intervals" ) |> cols_label( term = "Term", estimate = "Estimate", conf.low = "95% CI Lower", conf.high = "95% CI Upper" ) |> fmt_number( columns = c(estimate, conf.low, conf.high), decimals = 2 ) |> tab_options( table.font.size = "small", table.border.top.color = "gray80", table.border.bottom.color = "gray80" ) stops_tbl >
courage-18 question We used logistic regression model to predict arrests as function of race, sex, and zone.
temperance-1 question Modesty in claims because our results are based upon assumptions
temperance-2 question Male has .06 more chances of arrest than female
temperance-3 question Race white has -.04 less chances of arrest
temperance-4 question There are 0.18 chances of a driver being arrested at stop
temperance-5 question library(marginaleffects) Please cite the software developers who make your work possible. One package: citation("package_name") All project packages: softbib::softbib() >
temperance-6 question what is the average difference of black and white drivers of being arrested on basis of sex, race and zone at each stop?
temperance-7 question predictions(fit_stops) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric
temperance-8 question predictions(fit_stops, by = "sex") sex Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Female 0.192 0.001234 156 <0.001 Inf 0.190 0.194 Male 0.254 0.000823 309 <0.001 Inf 0.253 0.256 Type: numeric
temperance-9 question predictions(fit_stops, condition = "sex") Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric >
temperance-10 question predictions(fit_stops, condition = c("sex", "race")) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric >
temperance-11 question stops_pre_tbl |> slice_sample(n = 1000) |> mutate(zone = factor(zone)) |> # Ensures correct ordering of zones ggplot(aes(y = zone, x = estimate, color = race, shape = sex)) + geom_point(size = 2) + geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) + # horizontal error bars labs( title = "Prediction of arrested by Zone, Race, and Sex", subtitle = "Black and sex Male has the highest pobability of arrested", x = "Predicted Probability", y = "Zone" ) + theme_minimal(base_size = 13)
temperance-12 question tutorial.helpers::show_file("stops.qmd", chunk = "Last") stops_pre_tbl |> slice_sample(n = 1000) |> mutate(zone = factor(zone)) |> # Ensures correct ordering of zones ggplot(aes(y = zone, x = estimate, color = race, shape = sex)) + geom_point(size = 2) + geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) + # horizontal error bars labs( title = "Prediction of arrested by Zone, Race, and Sex", subtitle = "Black and sex Male has the highest pobability of arrested", x = "Predicted Probability", y = "Zone" ) + theme_minimal(base_size = 13)
temperance-13 question The predictions of being arrested show that black, male drivers have the highest probability in all zones except zone x. The highest probability of being arrested is more than 35% for black drivers in zone w with variation in perdition 32% to 38%.
temperance-14 question The assumption that data drivers in our prediction represent the drivers in the population rest all things being constant but if our assumptions do not hold good then actual results may very as the attitude of police officer, training of drivers, wearing of safety halmets, etc may reduce chances of being arrested.
temperance-15 question tutorial.helpers::show_file("stops.qmd") --- title: "Stops" author: Naveed Ahmad format: html execute: echo: false --- ```{r} #| message: false #| warning: false library(tidyverse) library(primer.data) library(tidymodels) library(broom) library(gt) library(marginaleffects) library(ggplot2) ``` ## Summary <div style="text-align: justify;">When stopped, the arrest of drivers may be predicted to very on some factors like race, sex, zone. In this study we will use data from the open policing project to predict the arrest of drivers by adjusting for some covariates. However, our assumptions that data is valid, stable and representative of all drivers if not proved fully, the predictions may very. Because, the divers or offcicers may have change in behavior, education, traing, and also change in zone may differ hence, our predictions about arrest may very from actual. We used logistic regression model to predict arrested as function of race, sex, and zone. The predictions of being arrested show that black, male drivers have the highest probability in all zones except zone x. The highest probability of being arrested is more than 35% for black drivers in zone w with variation in perdition 32% to 38%.</div> ## Statistical Model Since $Y$ is a binary variable (with exactly two possible values), the probability family is **Bernoulli**: $$ Y \sim \text{Bernoulli}(\rho) $$ where $\rho$ is the probability that one of the two possible values — conventionally referred to as 1 (or TRUE) — occurs. By definition, $1 - \rho$ is the probability of the other value. For a binary outcome variable, we use a **log-odds model**: $$ \log\left(\frac{\rho}{1 - \rho}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k $$ ```{r} #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) ``` ## Table of Coefficients of Parameters ```{r} #| message: false #| warning: false stops_tbl <- tidy(fit_stops, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate(across(c(estimate, conf.low, conf.high), round, 2)) |> gt() |> tab_header( title = "Coefficients with 95% Confidence Intervals" ) |> cols_label( term = "Term", estimate = "Estimate", conf.low = "95% CI Lower", conf.high = "95% CI Upper" ) |> fmt_number( columns = c(estimate, conf.low, conf.high), decimals = 2 ) |> tab_options( table.font.size = "small", table.border.top.color = "gray80", table.border.bottom.color = "gray80" ) stops_tbl ``` ## Fited regression model $$ \widehat{\text{arrested}} = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \text{(interaction terms for race and zone)} $$ this is our Data Generating Mechanism ```{r} #| message: false #| warning: false stops_pre_tbl <- predictions(fit_stops, condition = c("sex", "race")) ``` ## Plot showing prediction of arrested ```{r} stops_pre_tbl |> slice_sample(n = 1000) |> mutate(zone = factor(zone)) |> # Ensures correct ordering of zones ggplot(aes(y = zone, x = estimate, color = race, shape = sex)) + geom_point(size = 2) + geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) + # horizontal error bars labs( title = "Prediction of arrested by Zone, Race, and Sex", subtitle = "Black and sex Male has the highest pobability of arrested", x = "Predicted Probability", y = "Zone" ) + theme_minimal(base_size = 13) ``` >
temperance-16 question https://naveedgill4u.github.io/stops/
temperance-17 question https://github.com/naveedgill4u/stops
minutes question 210