Unsound statistical analysis misrepresents racial profiling in California police stop data

Findings from the California’s Racial & Identity Profiling Advisory (RIPA) Board’s Annual Report released earlier this month have sparked controversy after the results revealed that nonwhites are dispropotionately represented in police stops. The report also claimed that, of those stopped, nonwhites were searched more frequently, arrested more frequently, and more frequently engaged in physical confrontations with police officers. This led many people to conclude that the police are in fact, racist. However, it’s important to note that the practice of policing is far more complicated than what can be captured in datasets. While these data appear straightforward, studying racial bias is complicated.

There are myriad contextual factors at play that affect officer decisionmaking and police-citizen interactions, such that it is nearly impossible to attribute racial disparities solely to any one cause. Unfortunately, contextual factors are often not easily measured, or they might be ignored on the basis that these details are “less important.” But ignoring these key details leaves us with an incomplete understanding of the dynamics influencing these police encounters. So when it comes to the RIPA Board’s report, the findings seem straightforward, but a closer look shows some holes in the methodology that likely undermine the validity of the findings. To this end, the Peace Officers Research Association of California (PORAC) conducted a critical analysis of the report that highlighted numerous problems with the RIPA data and the methodology used in the report. In this post, I will summarize the key issues raised by the PORAC.

California’s Racial and Identity Profiling Act (RIPA) Board

In 2015, the California state legislature passed the Racial and Identity Profiling Act (RIPA) (AB 953), which (1) amended the definition of unlawful profiling to include both racial and identity profiling, (2) required the Attorney General to establish the RIPA Board, and (3) required each state and local agency that employs peace officers to annually report to the Attorney General data on all pedesterian and vehicle stop incidents.  The requirements for collecting California traffic and pedestrian stop data are arguably the most expansive efforts in the United States, although other states have collection requirements as well. For example, like California, Oregon and Illinois also mandate data collection for both traffic and pedestrian stops.

The RIPA Act tasked the Board with eliminating racial and identity profiling in law enforcement by investigating and analyzing law enforcement policy and data. The data routinely  collected by peace officers includes both person-level and stop-level information. Person-level data elements include the officers’ perception of the demographic characteristics for each individual stopped. Stop-level elements include things like the time/location of a stop, reason for stop (e.g., traffic violation, reasonable suspicion), actions taken by the officer (e.g., search, handcuffed), reasons for their actions (e.g., probable cause, consent), whether contraband or evidence discovered, and the ultimate result of the stop (e.g., arrest, citation).

The RIPA Board’s Annual Report

Every year, the RIPA Board releases a public report that reviews the most recently available RIPA data. The annual report released in January 2023 contained information about 2,937,662 stops conducted in 18 California police agencies between January 1 and December 31, 2020. The first part of the RIPA report’s analysis showed the perceived demographics of individuals stopped, including the breakdowns of the stop characteristics (e.g., actions taken, outcomes) across different demographic groups. The second part of the analysis used data from the residential population to identify “benchmarks” (i.e., reference points) for racial/ethnic proportions within the community.

Then, the authors compared the racial/ethnic proportions of people stopped by the police with those of the residential population to look for discrepencies to assess whether certain racial/ethnic groups were overrepresented or underrepresented in police stops. This approach is referred to as population-based “benchmarking,” which is the simplest way to examine racial bias in policing. In their analysis, they noted that nonwhites were overrepresented in police stops when compared with the demographics of the residential population. Of those stopped, nonwhites were more likely to be searched, more likely to be arrested, and more likely to become engaged in violence with the officer.

The Peace Officers Research Association of California (PORAC)’s Critical Review

Police-citizen interactions are very nuanced, and there are many factors at play that can affect whether a certain demographic is over- or under-represented in police stops (such as location, time/day of the stop etc.). However, the RIPA report largely ignores context and focuses on race as being the sole explanatory factor dictating police stop outcomes. Thus, it remains uncelar whether there was an actual causal relationship between race and police stop outcomes or whether this relationship was caused by an unmeasured or unidentified “third factor.”

In order to determine that one event caused the other, three criteria must be established: 1) correlation between the two factors (they either increase together, decrease together, or move in opposite directions); 2) proper temporal ordering of variables (the causal factor comes before the supposed effect); and 3) elimination of alternative explanations for the relationship, or “non-spuriousness” (meaning that there are no mysterious ‘third factors’ impacting the relationship). Per the PORAC, none of these criteria are met in the RIPA analysis due to various methodological issues, which are summarized below.

The report’s comparison group is insufficient, which is necessary for identifying correlations and eliminating alternative explanations.

The report did not compare the sample of individuals stopped with the wider population of the individuals available to be stopped. Instead, the authors used population-based “benchmarking” to assess whether certain races or ethnicities were overrepresented in police stops relative to the relative proportions of the residential population. Unfortunately though, benchmarking studies are only valid if  people are randomly selected for the sample. In the context of police stops, this would mean that all individuals within the residential population have an equal and nonzero chance of being stopped. However, we know this is not true. Police stops do not occur on a statistically random basis — in fact, almost nothing in criminal justice does. For example, neighborhoods with high numbers of calls for service are inadvertently subjected to increased surveillance and more police stops. Other factors like driving performance,  suspicious activity, or time of day also play a role in reducing the randomness by which people are stopped by the police.

The report excludes data from nearly 60% of all stops, which means that these results likely would not extend to the wider population of police stops.

The sample included a total of 18 agencies that provided data on pedestrian and traffic stops occurring during 2020. The agencies included California Highway Patrol (CHP), eight police departments (Los Angeles, San Diego, San Francisco, Sacramento, Fresno, San Jose, Long Beach, and Oakland) and six county sheriff’s departments (Los Angeles, San Bernardino, Sacramento, San Diego, Riverside, and Orange County). However, for some reason, the CHP data was not included throughout all analyses. This is problematic because CHP accunts for 57.7% of all stops in the state. Therefore, the RIPA analysis is attempting to generalize their results to the broader population based on only a subset (42.3%) of possible stops. By excluding more than half of the data, it’s hard to know whether the results actually extend to police stops more generally or specifically CHP stops.

Interestingly, there is some evidence that suggests that adding CHP traffic stops to the analysis would have affected the results. For example, this study found that CHP traffic stops rarely result in a search, in which cases racial disparities are notably less. In fact, some researchers have suggested that CHP stops result in lower search rates of nonwhites than whites.

The RIPA data does not record whether an officer knew a driver’s race prior to conducting a traffic stop, which is necessary for determining temporal order.

Racial profiling in police stops supposedly occurs when an officer initiates a stop based on someone’s race or ethnicity. Thus, to establish that race is the actual cause (or even a motivation) for a stop, the authors would need to demonstrate that the officer knew (and was motivated by) the driver’s race prior to the stop. But in the RIPA data, there is no way to know whether the officers knew the race of drivers prior to the stop, because the officer’s perception of the driver’s race is inputted after the stop is conducted. This flaw in the RIPA data makes it difficult to establish temporal order. In order to sustain an allegation of racial profiling, what the officer knew about the individual stopped prior to initiating the stop is an essential condition for determining discriminatory intent. There is no measure of what the officer knew about the driver prior to initiating the stop, much less whether the officer acted with an intention to discriminate. As a result, it is not possible from the RIPA data to allege that individuals were stopped on the basis of their identity-related category.

In an attempt to circumvent this problem, researchers include a “veil of darkness” analysis. This theory assumes that officers are less likely to initiate a stop based on racial animus during the night time, as they are less able to see the race of a driver prior to the stop. According to this theory, police stops occuring during the night time would be unlikely to arise from racial discrimination. Thus, if the proportion of black drivers stopped after dark is smaller than the proportion of black drivers stopped during day time, it is said to suggest the presence of racial profiling.

The veil of darkness strategy has been utilized more and more in racial profiling research over time, though it is not nearly as effective as it is commonly proposed. There are many limitations of this theory, and the assumptions on which it is based are questionable. First, the method relies on the assumption that an officer can always see a driver’s race during the day yet can never see their race during the night. This assumption is simply not true. More often than not, an officer cannot confirm the race of a person driving a vehicle before stopping them, particularly if the officer is following the vehicle in question. Secondly, there are other factors that may alter the ability of officers to detect the race of drivers in darkness, such as street lighting. Similarly, adverse weather conditions may make it harder to see (despite being daylight).  The veil of darkness theory also does not extend to pedestrian stops.

The RIPA report uses fundamentally flawed statistical analysis, which limits their ability to eliminate alternative explanations for the alleged racial disparties.

The RIPA study also inadequately accounts for other important stop characteristics, such as the reason for the stop, the duration of the stop, or the location. The study includes only one independent variable – the driver’s race or ethnicity – meaning that race or ethnicity is the only attributable explanation for why an officer initiated a stop. This further hindered the validity of the study, as authors were not able to eliminate plausible alternative explanations for alleged racial disparties. In other words, the study failed to consider important contextual details or motivations underlying the police-citizen interactions. This increases the potential likelihood that an unidentified or unmeasured “third factor” could have affected the results.

This is an important oversight, because context around a stop does explain part of the observed racial gap. Understanding this context is important for developing a more informed means of evaluating an officer’s behavior. Differences in context expose officers to varying levels of risk, which means that stop outcomes will also likely vary. For example, past reports using the RIPA data have found that the correlation between race and stops leading to arrest is partly due to a higher share of nonwhites being stopped for reasonable suspicion or for having an outstanding warrant, which are typically more serious and more likely to result in a booking.

But measuring the contexts of stops is incredibly challenging, given the myriad potential factors that can and do influence police officers’ decisions. The RIPA data attempts to capture some of this information, but unfortunately these attributes are not exhaustive. Problematically, there are no variables used for measuring the severity of the offense, which is strongly predictive of a stop and known to influence the outcome.  Like the RIPA report, other research on racial profiling has also highlighted its challenges in determining causality.

RIPA failed to conduct a thorough literature review.

The report neglected to provide a literature review of empirical studies and instead relied on print and TV news sources. In doing so, the report fails to discuss how their research compares to other similar studies. There is a wide body of research that has found that racial disparities across numerous stop outcomes narrow when the analysis adjusts for contextual factors, such as the reasoning for the stop (e.g., reasonable suspicion vs. traffic violation). Futhermore, past research shows how demographic characteristics differ across jurisdictions. For example, if nonwhite residents tend to live in cities where law enforcement may conduct searches more often across all racial and ethnic groups, then racial disparities in search rates may partly reflect location.

Further, research has also found that search rates are much higher in stops for reasonable suspicion, for individuals with a warrant, or for someone on parole/probation, in which cases the racial disparities are markely reduced.

There were several issues regarding the measurements used to in the RIPA report.

According to the PORAC, the measure of race/ethnicity is collected twice in the RIPA data, but it is not clear which variable was actually used in the RIPA report’s analysis. When evaluating the reason for a stop, 86% of stops were for a traffic violation, while 11.5% were on the basis of reasonable suspicion. However, for the former, there is no mention of how the severity of alleged traffic violations (moving or non-moving) affect anything associated with the stop outcome. For the latter, the description of what constitutes “reasonable suspicion” is inconcistently reported. Regarding actions taken (by the police officer) during the stop, the RIPA report lists 23 possible actions. Yet, the analysis indicates that in 80.9% of the stops the officers took no action, which seems to indicate that these variables are poorly conceptulaized and not capturing the full breadth of potential officer actions. Furthermore, are several factors that might justify an officers’ actions that are not measured nor defined, and there is no provision for reporting whether actions were initiated solely at the discretion of the police officer or as a result of another individual’s behavior (e.g., whether the officer had probable cause to conduct a search vs. whether the individual consented to a search).

Conclusion

The conclusions from the RIPA Board’s annual report misrepresent the state of racial profiling in California due to serious flaws in their methodology. The report fails to achieve the professional statistical rigor that is necessary to draw meaningful conclusions. Many factors contribute to whether an officer stops someone and to the officer’s subsequent actions, such that alleged racial disparities may echo circumstances that do not reflect an individual officer’s bias. Differences in contexts, location, and agencies likely contribute to racial disparities in stop outcomes. Consequently, statistical analyses must be used to adjust to account for these differences across race/ethnic groups to generate valid comparisons. Unfortunately, the RIPA report does not attempt to do so, and the data is absent a variety of important contextual factors.

As previously mentioned, an understanding of the context in which stops occur is critically important. Overlooking this nuance within the data collection protocol eliminates the possibility of more meaningful inquiry into officer decision making. For example, independent of race and ethnicity, if an officer observed a person committing a crime, if a person has a warrant, or has a weapon, that person likely will be detained and searched, and possibly booked into jail after a stop. These situations would also likely be more adversarial—including the potential for use of force—than a traffic stop. If an individual is acting erratic, possibly due to behavioral health issues, an officer may shift decisions and actions. The prevalence of such situations across race and ethnicity may contribute to differences in outcomes. Additionally, younger drivers may be less experienced and therefore more likely to violate traffic laws, and hence are plausibly more likely to be stopped than their older and more experienced counterparts.

Once again, this post summarized the key issues raised in the PORAC’s critical analysis of the RIPA report. For more information, see PORAC’s original piece.