One Weird Trick to Promote Persistent Information Asymmetry

When I first began researching the legal services market, I quickly became aware of the serious information asymmetry problem across the marketplace – especially for criminal defense. As I’ve probably mentioned before, defendants have no real way to estimate the quality of an attorney when considering hiring one for a criminal case (except obviously by consulting Blackstone Trial Analytics, LLC – the trusted name in attorney referrals and quantitative LSM analysis). This is a classic adverse selection problem. Defendants know (or can approximately make themselves aware of) the average outcome of a criminal charge; but they don’t know which attorneys contribute better than average outcomes and which contribute worse than average outcomes.

Ultimately, attorneys seem to set roughly comparable prices for their services and share the market. Of course many defendants would like to pay more for high quality attorneys and probably all defendants would like to avoid low quality attorneys (at least at the prevailing prices). I wondered why (high quality) attorneys didn’t try to solve this problem. They could, for example, publicize their records. But this wouldn’t work if many other attorneys simply didn’t publicize their own records. Probably the public would have a hard time interpreting the record in the context of suppliers of criminal defense services generally. This is especially true if people systematically overrate their probability of success at trial.

More plausibly, I thought, attorneys could make their fees contingent on case outcomes. For example, they could charge some variable amount (by quality) for plea bargains and more for trials – much more where the defendant wins. Obviously any specific deal is possible, including zero or even negative fees. These arrangements, I thought, would probably produce few poor incentives, remove some bad existing incentives and communicate important facts about quality to defendants.

Generally, when there appears to be an obvious and easy solution to fix an apparent market failure, a non-market failure lurks just behind it. This is one of those cases. Here’s the relevant ABA rule:

(d) A lawyer shall not enter into an arrangement for, charge, or collect:

(1) any fee in a domestic relations matter, the payment or amount of which is contingent upon the securing of a divorce or upon the amount of alimony or support, or property settlement in lieu thereof; or

(2) a contingent fee for representing a defendant in a criminal case


Juries: 12 Increasingly Angry Men

One of the most enjoyable things about economic analysis is that it often yields surprising facts about cherished institutions. For example, the jury selection process in the United States probably burdens defendants disproportionately in criminal trials – most people believe jury selection serves to mitigate the problem of biased juries.

Why? Imagine the distribution of jurors by how sympathetic they are to the defendant. I would guess the distribution isn’t normal with a mean of “indifferent”; instead, the average juror probably starts from a relatively unsympathetic position. Assume for a particular defendant, 60% of the population has some bias against the defendant and 40% has some bias in favor. If the jury is randomly selected, we would expect 6/10 jurors to be relatively anti-defendant. If the prosecutor and defense attorney can identify the jurors biases and are allowed one challenge each, the final jury should be 6.2/10 anti-defendant. Every additional challenge increases the frequency of anti-defendant jurors in the panel.

Apparently Scots law embraces (wisely, I think) random jury selection. The quotation below is from Peter Duff’s “The Scottish Criminal Jury: A Very Peculiar Institution”:

There is no equivalent to the voir dire procedure in Scotland, a fact which might surprise some American readers. The strong opposition of the Scottish criminal justice system to any procedure of this type is well illustrated by the observations of the Appeal Court in McCadden v. H. M. Advocate:

There may never be a process which eliminates the possibility of personal prejudices existing among jurors, the nearest practical one (and it is not foolproof) being possibly the “vetting” of jurors, a system against which the law of Scotland has steadfastly closed the doors. Evidence of how it is used and abused in countries in which it is operated only tends to confirm the wisdom of that decision.

The court went on to observe that it should not be “lightly assumed” that jurors will pursue their prejudices in defiance of their oath and the directions of the judge. On a more practical note, the court pointed out that the broad base from which jurors are drawn means that any prejudices and biases tend to cancel each other out, and further, that the majority verdict, whereby a bare eight to-seven vote either way suffices, ensures that it is unlikely that one prejudiced juror can affect the outcome of the case.

Why Are Public Defenders So Good?

In my last post, I provided a graph suggesting public defenders have above average win-rates. Most people find this surprising. Actually, this fits neatly into a model of the LSM where defense attorneys are profit maximizers and public defenders are sentence minimizers. Profit maximization does not imply sentence minimization. Instead, defense attorneys focus on “Win-Stay” and “Lose-Stay” outcomes. To see what I mean, consider Bayes’ Rule


All of this means that the probability of A conditional on B equals the probability of B conditional on A multiplied by the probability you assign to A, over the probability of B conditional on A multiplied by the probability you assign to A plus the probability of B conditional on not-A multiplied by the probability you assign to not-A. If the above isn’t clear, check out Bryan Caplan’s excellent lecture notes or this post at Econlog.

Here’s an example relevant to the defense attorney profit maximization problem:

P(A|B) = P(Attorney is Optimal|Bad Case Outcome)
P(B|A)=  P(Bad Case Outcome|Attorney is Optimal)
P(A) = Probability Attorney is Optimal
P(~A)= Probability Attorney is not Optimal

The profit maximizing attorney wants to persuade clients with bad outcomes that their attorney was still the correct choice. This way, the attorney still has access to that client’s network (and of course for future cases with the same client). In order to do this, attorneys should focus on increasing their clients’ subjectively held belief that they are high quality and increasing the clients’ belief that bad outcomes with high quality attorneys are common. For simplicity, let’s assume that the attorney’s clients will stay with them or recommend them to others if the attorney wins their case.

These incentives create a potential agent-principal problem in the attorney-defendant relationship. If an hour of signaling “I have a great win-rate” does more to increase the probability of Lose-Stay outcomes than an hour of work increasing the probability of winning, the attorney will invest too little (from the defendant’s perspective) in actually winning.

Public defenders, as sentence minimizers, don’t have this problem. Basically – and this can be seen in the data – the average public defender is a better agent than the average private defense attorney. Of course public defenders have obvious weaknesses – essentially zero budget for non-procedural trial inputs, for example. But with respect to procedural inputs, they should behave as if they have been given infinitely large budgets.


Should Public Defender Caseloads Matter for Indigent Defense Outcomes?

Unsurprisingly, I believe the answer is yes; surprisingly, I expect heavy caseloads should improve defense outcomes. My reasoning:

  • Public defenders (PDs) behave as if they are not budget constrained (with respect to trial inputs they supply themselves).
  • Heavy caseloads should increase the probability of seeing any given case continued on a given day.
  • The average indigent defendant is more likely than the average wealthy defendant to have a prior criminal record, and the marginal disutility of additional criminal charges is probably strongly diminishing. In other words, prosecutors quickly lose (or often lack) the ability to tempt indigent defendants with plea bargains that offer features like amended charges with better labor market signaling. Indigent defendants have more taste for trials, which are costly to prosecutors.

(1) strongly suggests that PD clients are relatively more expensive to prosecute, especially at general district court levels where things like (costly) expert testimony are less common. (3) also suggests a level effect; when indigent defendants and relatively wealthy defendants have identical case details, the indigent defendants will probably do somewhat better in terms of trial outcomes as a group. (2) implies that as PD caseloads increase, the probability of continuances across the PD portfolio increase. Sentence maximizing prosecutors will know this, and will be induced to offer more favorable pleas or drop charges as needed.

If I Love the Legal System So Much, Why Don’t I Marry It?

As a consultant, my job is simple: increase mean, decrease variance. In order to do this, I need to have some expertise in how the criminal justice system (CJS) works. I think the best way to understand the CJS is to break the system up into critical component parts. Each component part involves interactions between agents, and each agent has goals. Using court data, we can refine and parameterize our agent models until we have a predictive service to offer our clients. Many of the conclusions of this type of analysis are surprising – for example, the finding that judges are better adjudicators than most people believe. I frequently point out that the CJS today doesn’t seem wildly different from the CJS most reformers pine for.

This should not be construed to mean I think the CJS is optimal. Rather, I think judges and prosecutors behave basically like the types of agents you would want in an optimal CJS. The real problem isn’t with the adjudicators, it’s with the arresting authorities and the laws. The arresting authorities have an easy enough fix in principle; we could just tie their pay to their successful prosecution rate (or any derivative of this plan). Would this fix magically give us an optimal CJS? No, but it would likely help fight the pervasive over-arresting of blacks by police. The more fundamental problem is the law generally. The optimal CJS is that which maximizes aggregate utility; by backward induction the optimal set of laws is that which maximizes aggregate utility.

I submit that the current set of laws is nowhere close to the utility maximizing set. While I think many laws are undesirable and should be repealed, the worst feature of our legal regime seems to be the given penalties for breaking most laws. We all know long prison sentences are overrated; the wise Alex Tabarrok prefers this alternative (do read his entire post):

I favor more police on the street to make punishment more quick, clear, and consistent. I would be much happier with more police on the street, however, if that policy was combined with an end to the “war on drugs”, shorter sentences, and an end to brutal post-prison policies that exclude millions of citizens from voting, housing, and jobs.

I suspect such a policy regime would move us unambiguously closer to the optimal regime (i.e. more deterrence and more utility), but it neglects the problem of police behavior. Presumably, the social costs would still be born disproportionately by populations of color.

Are Police Better People? Probably Not.

After reviewing the FCGDC data, it seems more likely that disparate black/white CJS outcomes are driven by police rather than judges or prosecutors. It seems the fundamental problem is that blacks are arrested at rates much higher than you would expect based on Fairfax County demographic information. There are a couple of ways to think about this. Perhaps (1) police officers tend to arrest blacks at lower level of confidence than whites with respect to suspected guilt. Perhaps (2) police arrest whites and blacks at comparable confidence levels but they target relatively black neighborhoods. Alternatively, (3) black populations may be associated with more crimes per capita than white populations. Given the relationship between income and criminality, it seems likely that the third option may have some truth – but it seems intuitively unlikely that this can account for the vast disparity between white and black arrest rates.

On the other hand, imagine that police officers maximize arrests at a certain level of confidence. Imagine that they have no bias against any particular group. In a highly stylized setting where a representative police officer can choose between patrolling two neighborhoods, identical in every way, he will be indifferent. Now suppose one neighborhood is somewhat poorer and the average inhabitant of that neighborhood is somewhat more likely to be involved in criminal activity. If this police officer is the ideal social agent (i.e. only cares about enforcing society’s laws), he will focus disproportionately on the poorer neighborhood. Specifically, he will patrol the poorer neighborhood until the marginal benefit of search (in arrests) equals the marginal cost (in time); at that point he will be indifferent between patrolling either neighborhood. In this scenario, the majority of arrests will come from the poorer neighborhood. If we assume the relative population size of the two neighborhoods is large compared to the size of the police force, we should expect a vast majority of arrests to come from the poorer neighborhood. If we add enough assumptions, we can create a scenario where the police behavior in Fairfax County today is “socially optimal” under the dubious meta-assumption that the Code of Virginia is optimally designed.

Can we test to see if reality approximates the story above? Yes. We already know that blacks and whites face comparable trial outcomes when controlling for income, so it’s unlikely that the average black defendant is more guilty than the average white defendant (i.e. the fact bundle against black defendants seems as strong as the bundle against white defendants on average). So what? The important takeaway from this fact is that one group doesn’t seem to commit crimes more conspicuously or hide crimes more adeptly; framed differently, it isn’t easier to search for either black or white criminals, although the density of criminals in one area may be relatively higher than that of another area. This makes reality look somewhat similar to the theoretical world outlined above. We also know that blacks are relatively more likely than whites to have charges against them dropped. This suggests support for police behavior hypothesis (1), police arrest blacks at lower confidence levels than whites. Reviewing the FCGDC police data itself, we find additional support for hypothesis (1) and (2).

With respect to the data, the distribution of average defendant win-rates associated with individual police officers is distributed fairly normally (we only looked at officers with at least 50 trials); but a histogram alone doesn’t tell us too much in this case. Maybe police arrest everyone at 50% confidence levels, and variation in defendant win-rates is more about whether the individual police officer is in the short-run or long-run. This is analogous to flipping a fair coin some number of times; over 50 trials, you may see a relatively large number of heads turn up. Over 1,000,000 trials, the Heads/Tails ratio should settle down at 1:1. Actually, we find that defendant win-rate by police officer doesn’t change much over time. A much more plausible story is that individual police officers arrest defendants at individual confidence levels. We also find substantial differences in black/white defendant win-rates across a number of police officers.

To sum up, even assuming the unexpectedly high level of black arrests is purely a function of crime density in poorer areas:

  • Individual police officers make arrests at dramatically differing levels of confidence.
  • A number of police seem to arrest blacks at lower levels of confidence than whites.

This is about the most charitable view of the arresting authorities one could reasonably give and it doesn’t exactly paint the police in a very flattering light.

Are Judges Better People? Probably.

In my first post, I wrote about data from the Fairfax County General District Court (FCGDC) and how it seems to support the view that courts themselves should not be the primary target of the typical criminal justice reformer. This should not be misconstrued as evidence that courts are models of well-designed institutions that we should copy whenever possible. The FCGDC data provides support for the view that prosecutors are extremely effective at giving adjudicating authorities what they want, and by a stroke of fortune the adjudicating authorities want things like accuracy and procedural discipline instead of things like apartheid. In other words, the FCGDC doesn’t seem to be a particularly robust institution, but nevertheless it happens to work surprisingly well. In this post, I’ll give a few reasons why I think that’s true.

If the story I’ve outline is correct, judges are basically benevolent dictators. This is exceedingly rare in politics; why does it make sense in the courtroom?

From “’Ideology’ or ‘Situation Sense’? An Experimental Investigation of Motivated Reasoning and Professional Judgment” by Dan Kahan, David Hoffman, Danieli Evans, Neal Devins, Eugene Lucci, and Katherine Cheng:

The study involved a sample of sitting judges (n = 253), who, like members of a general public sample (n = 800), were culturally polarized on climate change, marijuana legalization and other contested issues. When the study subjects were assigned to analyze statutory interpretation problems, however, only the responses of the general-public subjects and not those of the judges varied in patterns that reflected the subjects’ cultural values. The responses of a sample of lawyers (n = 217) were also uninfluenced by their cultural values; the responses of a sample of law students (n = 284), in contrast, displayed a level of cultural bias only modestly less pronounced than that observed in the general-public sample.

The key takeaway from this study is that judges are less susceptible to cognitive biases than members of the public. The authors attribute this to legal training, and their results do seem to support this story; law students do worse than judges and members of the public do worse than law students. I wonder how much of the story can be explained by IQ alone (judges have higher IQs than law students, law students have higher IQs than the general population); it would be interesting to see the study redone testing additional high-IQ groups without formal legal training (sociologists, engineers, economists, etc.). At any rate, the results are interesting.

Obviously the study results are obtained by judges responding to an anonymous, low-stakes survey. But the results from the FCGDC dataset support that judges abstain from motivated reasoning even in relatively high-stakes trials. Given that few formal institutions constrain judge behavior (in practice they are basically free to adjudicate however they want), what can account for a judge’s style? Probably the same things that prevent people from producing low-quality work in general – it’s embarrassing, for one. Economists often note that the bureaucracy, for example, works better than you would expect assuming all of its agents are highly self-interested. The same seems to apply to judges.

What Economics Can Teach Us About Criminal Justice Reform

I. What does Reform Mean

The Wikipedia page “Criminal justice reform in the United States” describes goals typical of reform advocates:

Criminal justice reform in the United States is a type of reform aimed at fixing perceived errors in the criminal justice system. Goals of such reform include decreasing the United States’ prison population and reducing prison sentences and eliminating mandatory minimum sentences for low-level drug offenders.

In other words, reformers want to change trial output. The natural questions to ask: How should we change inputs to reach the desired output? Is it better to just mandate a certain mix of output? How do courts even work? Before looking at trial output and making claims about all the ways courts can fail, we should first consider what the output of a well-functioning court would actually look like.

To paraphrase Gordon Tullock (from Trials on Trial, I think), in order to protect ourselves against theft, we can invest in heavy doors and strong locks, or in courts and police. Tullock’s point is that there are many ways to deter behavior and the legal system is only one method. We all already know that the optimal level of theft is not zero, so our investment in a legal system should be related to the costs it takes to maintain it and the benefits it confers.

Most criminal justice reformers hold specific views about the optimal reform plan – what should be illegal, what the correct sentence is for each crime, whether first time criminals should be treated differently, etc. I won’t really discuss any of that in this post, although my personal belief is that the optimal reform plan is the utility-maximizing reform plan. Today I’ll focus on the less controversial stuff. Most everyone agrees that the perfect court would have two basic features:

  • Perfect accuracy
  • Zero cost

Obviously this isn’t realistic, so a good second-best candidate is “a high level of accuracy given an acceptable level of cost.” Essentially, the marginal benefit of an additional trial should equal the marginal cost – and we should allocate resources across all crime prevention options until we reach the lowest attainable level of crime given the resources we’re willing to commit.

The most important feature seems to be that we shouldn’t be able to predict trial outcomes from traits that are uncorrelated with crimes. On the other hand, if certain traits are correlated with criminal behavior we should be able to predict trial outcomes given that information. I submit that the above is what virtually every criminal justice reformer wants.

II. Are Courts Optimal Adjudicators?

Most people will say obviously not. Reformers frequently cite federal and state incarceration statistics to argue that the legal system is plagued by institutional racism, but this is a non-answer. The legal system may well be plagued by institutional racism, but are the courts? Our legal system has two primary functions: arresting and adjudicating. How well do our adjudicating authorities adjudicate?

Using data from misdemeanor cases from the Fairfax County General District Court (FCGDC), we can begin to seriously consider that question. Before we do that, we’ll need to agree on best-case behavior for agents involved in court proceedings. Our assumptions about optimal courts suggest that adjudicators should only care about accurately determining whether a defendant is innocent or guilty. Prosecutors should be strict sentence maximizers subject to a budget constraint (to the extent that they don’t care about sentences, they are free to discriminate according to irrelevant tastes for certain defendants). Prosecutors are free to use plea bargaining as a sentence maximization tool. Defense attorneys should maximize win-rates subject to a defendant imposed budget constraint. Defendants maximize utility and are risk-averse.

What are some testable implications of these assumptions?

  1. Prosecutors allocate resources across all cases to maximize win-rates; trial time is scarce given the caseload so most cases will be resolved through plea bargaining.
  2. Prosecutors control the cases that go to trial through the use of plea bargaining such that their win-rate at trial is very high.
  3. Traits not related to guilt or innocence will not be able to predict guilt or innocence at trial.
  4. Income will be related to plea and nolle prosequi outcomes.

Implication (1) is so obvious and widely known it hardly needs to be proven. According to the New York Times, about 94% of state cases are resolved through plea bargaining.[1] According to the FCGDC data, (2) is also correct. Under 10% of cases were resolved with outcomes of “Dismissed” or “Not Guilty” – this number is roughly identical for defendants represented by public defenders or private attorneys. Initially, many people are surprised by the equality between the two attorney types; but remember, prosecutors only let cases with high win probabilities go to trial. If there is a quality difference between public defenders and private attorneys it would be seen in “Plea” and “Nolle Prosequi” outcomes. I’ll write more about the differences between public defenders and private attorneys in another post.

Implication (3), somewhat more surprisingly, is also correct. Looking only at public defenders (where all defendants are indigent so income differences across racial groups are held roughly constant) groups generally have identical outcomes (I looked at dismissals, not guiltys, nolle prosequis, guilty in absentias, and plea bargains). The exception is Hispanic defendants, who were less likely to have nolle prosequi and not guilty outcomes, but are also more likely to be guilty in absentia (Hispanics are the only group associated with guilty in absentia outcomes).

Turning to defendants with private representation, you see the appearance of racial bias throughout the data, with the notable (and predictable) exception of not guilty and dismissal outcomes. Either the courts only exhibit racial bias toward defendants who hire private attorneys, or something else is driving the outcomes. Adding a crude income control for defendants based on the median income of the town or city they live in, income has a statistically significant effect on every outcome where racial bias is apparent and no effect on not guilty or dismissal outcomes.

III. What’s Causing the Relatively High Rates of Black Incarceration?

Certainly part of the problem is the relatively low incomes associated with much of the black population in the United States. Less income means fewer inputs for trial, less ability to burn prosecutorial resources, and ultimately fewer plea bargains. But this can’t explain the entire problem. The more likely cause? The arresting authorities. In Fairfax County (and presumably many other places) blacks make up a much larger percentage of criminal defendants than you would expect. Among public defender-represented defendants, blacks make up around 39% of the total. Among defendants with private attorneys, about 23%. Blacks make up about 10% of the population of Fairfax County.

In a forthcoming post, I’ll consider the black arrest rate in greater detail, using more data from the FCGDC dataset. For now, the key thing to take away from this post is that courts themselves (at least in Fairfax County) are unlikely to be a major driver of the disparate incarceration outcomes between blacks and whites. Police behavior is a much more likely suspect.