































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Opponents argue that deterrence arguments do not apply in these circumstances and/or that the statistical analyses suffer from grave flaws.
Typology: Exercises
1 / 39
This page cannot be seen from the preview
Don't miss anything!
Ethan Cohen-Cole^1 Steven Durlauf Jeffrey Fagan Daniel Nagin
Abstract While issues of deterrence lie at the heart of criminal justice policy, there are important contexts where the studies of deterrence effects have failed to provide anything close to a scholarly consensus. A principle example of this are laws on capital punishment. Proponents argue that such laws prevent murders because potential criminals fear such strong punishment. Opponents argue that deterrence arguments do not apply in these circumstances and/or that the statistical analyses suffer from grave flaws. Each side can cite many statistical studies in support of its claims. This paper presents a methodology by which one can integrate the various studies into a single coherent analysis. We use a methodology generally called “model averaging” by which one takes weighted averages of a wide set of possible models of deterrence. Our conclusion is that there is little empirical evidence in favor of the deterrence hypothesis.
(^1) Cohen-Cole: Federal Reserve Bank of Boston, 600 Atlantic Avenue, Boston, MA 02210. (617) 973.3294. ethan.cohen-cole@bos.frb.org; Durlauf: Department of Economics, University of Wisconsin. 1180 Observatory Drive, Madison WI, 53706-1393; Fagan: Columbia University School of Law; 435 West 116th Street Room 634, Box D-18 New York NY 10027; Nagin: H. John Heinz III School of Public Policy & Management; Carnegie Mellon University 5000 Forbes Avenue. Pittsburgh, PA 15213-3890. The Department of Justice’s National Institute of Justice has provided financial assistance. We are grateful for research assistance provided by Jon Larson. The views expressed in this paper are solely those of the authors and do not reflect official positions of the Federal Reserve Bank of Boston or the Federal Reserve System.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
variables, model specification, etc. on the part of the researcher, and can have major effects on the conclusions of a particular data analysis. The existing research on this topic comes to sufficiently differing conclusions predicated upon one or more underlying assumptions to call into question the ability of any single model to explain the impact of execution laws. Such dependence on the specifics of research design, from data cleaning to aggregation to model choice, forms the basis for the use of averaging techniques. That is, since relatively minor variations in model or variable choice can lead to dramatic changes in conclusions, one suspects that inclusion of the information content of all of these models would lend itself to conclusions upon which policymakers could be more confident. This paper will describe a method which accounts for model uncertainty and places into a form that is easily interpretable to policymakers. More generally, the structure of model averaging may be understood as follows. Suppose one wishes to
execution variable in some deterrence regression. Conventional statistical methods may
averaging approach, one attempts to eliminate conditioning on a specific model. To do this, one specifies a space of possible models M. The true model is unknown, so from the perspective of the researcher, each model will have some probability of being true. These model probabilities will depend both on the prior beliefs of the researcher as well as on the relative goodness of fits of the different models given available data D; hence each
model will have a posterior probability: μ (^) ( m D ). These posterior probabilities allow us
to average the model-specific estimates: δ^ ˆ = (^) ∑ δ μˆ m (^) ( m D ). m Using this methodology, we estimate the deterrent effect of capital punishment. Our finding is that there is little evidence of a deterrent effect.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
Ethan Cohen-Cole^1 Steven Durlauf Jeffrey Fagan Daniel Nagin
While issues of deterrence lie at the heart of criminal justice policy, there are important contexts where the studies of deterrence effects have failed to provide anything close to a scholarly consensus. A principle example of this are laws on capital punishment. Proponents argue that such laws prevent murders because potential criminals fear such strong punishment. Opponents argue that deterrence arguments do not apply in these circumstances and/or that the statistical analyses suffer from grave flaws. Each side can cite many statistical studies in support of its claims. Efforts to change the policy landscape are ongoing and policymakers continue to struggle with the interpreting the results of conflicting studies. Thirty-eight states currently have a death-penalty law. The fundamental problem that underlies the disparate findings on the deterrent effect of death sentencing is that individual studies reflect specific assumptions about the appropriate data, control variables, model specification, etc. on the part of the researcher. These assumptions can reflect an expression of possible deterrence explanation (e.g. using incarceration rates as a control), and can have major effects on the conclusions of a particular data analysis. However, one is hard pressed to make a compelling argument that inclusion of a given variable over another is the crucial decision in the formation of a deterrence study. This is based on the fact
Cohen-Cole: Federal Reserve Bank of Boston, 600 Atlantic Avenue, Boston, MA 02210. (617) 973.3294. ethan.cohen-cole@bos.frb.org; Durlauf: Department of Economics, University of Wisconsin. 1180 Observatory Drive, Madison WI, 53706-1393; Fagan: Columbia University School of Law; 435 West 116th Street Room 634, Box D-18 New York NY 10027; Nagin: H. John Heinz III School of Public Policy & Management; Carnegie Mellon University 5000 Forbes Avenue. Pittsburgh, PA 15213-3890. The Department of Justice’s National Institute of Justice has provided financial assistance. We are grateful for research assistance provided by Jon Larson. The views expressed in this paper are solely those of the authors and do not reflect official positions of the Federal Reserve Bank of Boston or the Federal Reserve System.
1
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
assumptions used in setting up the econometric study. In the legal field, Baldus and Cole (1975) and Bowers and Pierce (1975) both argued in the Yale Journal that the results were unfounded. Though Ehrlich’s work has been challenged on many grounds, his principal findings of a deterrent effect within a rational choice model have been supported by many. Our perspective on the literature is that it is characterized principally by studies that ask whether a regressor that proxies for the probability of executions possesses a statistically significant coefficient. Essentially these are all recharacterizations of Ehrlich’s original rational choice model interpreted as some type of econometric specification. If the standard of a coefficient’s significance is passed, then one sees often some discussion of the magnitude of the coefficient (i.e. the number of murders deterred or caused) and a discussion of alternate models. There are many lines of dispute about the construction of these models and there have been a number of studies on sensitivity adjustments to this basic setup (Isaac Ehrlich and Zhiqiang Liu, 1999, Edward E. Leamer, 1983, Michael McAleer and Michael R Veall, 1989, Walter S McManus, 1985). Though in principle, the lines of debate are around the end object, whether deterrence is effective, in practice, there is significant dispute about the construction of an appropriate “model”. That is, while Ehrlich’s concept, and the rational choice framework in general, is instructive in theory, in practice, one can justify conceptually a very wide range of appropriate models. As a result, the field has produce a wide variety of methods employed to support or refute Ehrlich’s original conclusion. For the sake of explication, we identify five factors in the capital punishment literature that have been in dispute. These are not claimed to be exhaustive, but are employed in order to illustrate both the range of the debate as well as provide some indication of the connection between a researcher’s choice of assumptions and his/her concomitant conclusions The exact nature of these disputed factors is not of particular relevance at this stage. Our Table 1 here serves simply to illustrate that despite the form of statistical machinery brought to bear, there have been opposing conclusions in each case.
Table 1: Examples of Specification Variation in Capital Punishment Literature
Disputed Factors
Finding Some Deter rence
No De ter
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
re nc e Controls (^) • Mocan and Gittings (2001) use pardon data.
Functional Form
Data Stationarit y
- Cover and Thistle (1988) assume non- stationarity.
Time period
Data Choice
As we can see, the choice of functional form, controls variable, etc., and the particular assumption made is non-trivial in the resulting findings. In Ehrlich’s original studies he uses a log-log form for his time-series regression and finds evidence of deterrence. Passel and Taylor (1977), using a close copy of Ehrlich’s data change the functional form to a linear one and find no deterrence. There are logical arguments to be made for both forms, but one hopes that important policy decisions not rest on a relatively esoteric decision on whether to employ a linear or a logarithmic-transformed series of data. Other issues arise in the use of time-series data – if
(^2) Katz et al find that the conditions are a deterrent, but the death penalty is not.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
effect under one group of model specifications but no effect under another set, what should a policymaker conclude?
Literature on prior efforts to handle model uncertainty Some authors have attempted to address this issue in the past. We discuss a few such efforts here. First, the topic was addressed with a technique called Extreme Bounds Analysis (Edward E. Leamer, 1983). Applying Extreme Bounds Analysis on a coefficient estimate involves estimating a set of alternate model specifications and seeing how the coefficient estimate changes. If the sign of the coefficient is not constant, i.e. it “flips” across specifications, one concludes that the evidence is “fragile”. This strategy suffers from two problems. One, the conclusion of fragility can be influenced by the choice of coefficient that one considers; that is, if one were to choose another coefficient the results for the original choice may change. Second, the procedure fails to fully integrate information across the complete set of models that are analyzed – as it does not account for goodness of fit differences across models. Moreover, extreme bounds analysis concludes evidence is fragile even when, out of 1000 regressions, 999 produce a positive coefficient estimate and 1 produces a negative estimate. Brock, Durlauf and West (2003) in fact show that extreme bound analysis implies a special and extreme form of risk aversion if one uses it to guide policy decisions. Second, McManus (1985) used a precursor of sorts to the method advocated in our paper. McManus’ paper applied a Bayesian-style analysis to look at the importance of a researcher’s prior views on the results of a deterrence study of capital punishment. He specified five distinct views of the world and posited a method by which such views would be implemented in a deterrence study. The intuition behind his study is straightforward and appropriate; he found that even by specifying a very small number of models, the results can be quite varied.^3 This method had two problems. One, the number of models in his study is quite limited, and thus subject to a similar type of critique that we have leveled at the remainder of the deterrence literature. Though, in his defense, McManus is quite disciplined about including varied models. Two, in the past 20 years, the statistical machinery to integrate models in a Bayesian context, now often called model averaging, has been greatly developed.
(^3) McManus called the models in his papers “beliefs”. Since we use the term beliefs to discuss priors on variable inclusion, discussed below, we match terminology by using “models” here.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
A growing literature has emerged in the use of this method known as model averaging. The basic concept is not far removed from McManus. It seeks to avoid researcher bias in the determination of variable choice, specification choice, etc. by including a large set of possibilities into a single analysis. The earliest discussion of model averaging is Leamer (1978) but the approach seems to have been dormant until the middle 1990’s; Draper (1995), Raftery (1995), and Raftery, Madigan and Hoeting (1997) apparently initiated recent interest. Useful introductions are available in Wasserman (1996) and Hoeting, Clyde, Madigan and Raftery (1999). Model averaging has been advocated and employed in Brock and Durlauf (2001a), Brock, Durlauf and West (2003), Fernandez, Ley and Steel (2001), Sala-i-Martin, Doppelhofer, and Miller (2004) and Masanjala and Papageorgiou (2004b). We continue in the next section to explain the nature and implementation of model averaging.
As the prior section illustrates, the existing research on this topic comes to sufficiently differing conclusions predicated upon one or more underlying assumptions to call into question the ability of any single model to explain the impact of deterrence laws. Such dependence on the specifics of research design, from data cleaning to aggregation to model choice, forms the basis for the use of averaging techniques. That is, since relatively minor variations in model or variable choice can lead to dramatic changes in conclusions, one suspects that inclusion of the information content of all of these models would lend itself to conclusions upon which policymakers could be more confident. This section will describe a method which accounts for model uncertainty and places into a form that is easily interpretable to policymakers. This project intends to account explicitly for model uncertainty in the analysis of deterrence laws. We will follow the model averaging literature as our mechanism for dealing with model uncertainty. We will adapt the general framework that has been developed in the statistics literature; however, we will use standard frequentist estimators.^4 This frequentist approach to model averaging is described in Sala-i-Martin, Doppelhofer, and Miller (2004) and Brock, Durlauf, and West (2003). The basic idea of model averaging is straightforward. Consider an object of interest – in this case the difference in crime rates under alternate laws – and take a
(^4) Within the statistics literature, model averaging is usually done in Bayesian contexts. A full discussion of the difference between Bayesian and Frequentist approaches is beyond this paper.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
conditional on data available, D. These posterior probabilities allow us to average the model-
specific estimates: ˆ^ ˆ m ( m
δ = (^) ∑δ μ m D
δ and weights this information according to the likelihood the model is the correct one. As suggested above, in the case that a single model is true, it will receive a weight of 1. Brock et al. (2003) argue that the strategy of constructing posterior probabilities that are not model- dependent is the appropriate one when the objective of the statistical exercise is to evaluate alternative policy questions such as whether to implement capital punishment in a state. Notice that this approach does not identify the “best” model; instead, it studies the effect of the policy, i.e. the parameter δ. Thus, while the exercise could in theory find a single model with weight of one, in practice, a finding that a give model, m* , within some space M , has the highest conditional probability of describing the data is not a recommendation to select that model.
Notice that averaging across models means that a key role is played by the posterior model probabilities. Using Bayes rule, the posterior probability may be rewritten as
μ ( m D (^) ) = μ^ (^^ D m μ(^ D )^ μ) (^ m )∝ μ( D m (^) ) ( m ). (1)
The calculation of posterior model probabilities thus depends on two terms. The first, μ( D m )
is the probability of data given a model. Raftery (1996) has developed a proof to illustrate that the probability of a model given a dataset can be calculated using the ratio of a given model’s likelihood to the sum of the likelihood of all models in the space M.^7 This derivation allows us
to use the likelihood of a given model for μ (^) ( D m ). The second term, , is the prior
probability assigned to model. Hence, computing posterior model probabilities requires specifying prior beliefs on the probabilities of the elements of the model space
μ ( m ) m M (see below).
(^7) A full discussion of the derivation of
)
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
Applying the concepts above, one can compute the uncertainty, i.e. variance, associated with an estimated policy effect when one avoids conditioning on knowing the true model. This variance is written as:
This formula illustrates how model uncertainty affects the overall uncertainty one should
m M
, is a weighted average of the variances for each model and is
effectively the same construction as the estimate itself. The second term however, reflects the variance across models in M ; this reflects that the models are themselves different. This
m M
∈
some sense captures how model uncertainty increases the variance associated with a parameter estimate relative to conventional calculations. To see why this second term is interesting,
(as measured by the variance) that exists with respect to δ. The importance of this last section is that model averaging not only permits policymakers to account for differences in predictions of the direction of effect of capital punishment, but it also allows policymakers to have a greater understanding of the errors in these predictions of the effects of changes in these policies. Once such additional variance has been accounted for, a finding of a significant effect (positive or negative) of deterrence laws allows a policymaker to have that much more confidence about her decisions.
As we discussed in the section above, it appears that policy conclusions from this literature depend in part on the construction of datasets used. This is important in this context as much of the debate over the effect of capital punishment has included differences over the choice of data construction (see Table 2, Panels D and E for example).
ˆ (^) −ˆ m
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
been used. The data is publicly available from the Department of Justice’s Bureau of Justice Statistics, the FBI’s Uniform Crime Reports, and the Bureau of the Census. Donohue and Wolfers (2005) replicate many of the results in Dezhbakhsh et al.’s, and we use data provided by Dezhbakhsh, Rubin and Shepherd in this section to replicate and discuss the results from both papers. Both papers estimate some version of a deterrence regression drawn from Ehrlich (1977). That is, the murder rate is a function of three principal deterrence variables: the probability of arrest, the probability of receiving a death sentence conditional of being arrested, and the probability of being executed conditional on receiving a death sentence. Of course, all of the specifications have various demographic and economic control variables added. They include controls for the aggravated assault rate, the robbery rate, the population proportion of 10-19 year olds, 20-29 year old, demographic percentages of blacks, percentage of non-black minorities, percentage of males, the percentage of NRA members, real per capita income, real per capita income maintenance payments, real per capita unemployment insurance payments, and the population density. Thus, the principle regression is: , , 0 1 , , 2 , 3 , , , , , , 2 , 6 1 , ,^2 , , 3 , , , ,
c s t c s t s t s t c s t c s t s t s t c s t c s t c s t c s t
Murders HomicideArrests DeathSentences Executions pop Murders Arrests DeathSentences Assaults Robberies (^) Demograph Population Population
− −
5 , , 6 , 7, 8, , ,
c s t
c s t s t t c t t s t c s t s t c^ t
ics
economy NRAmembers county time
They use first stage regressions to estimate their variables of interest as follows:
, , 0 1 , , 2 3, , , , , ,
c s t c s t t t t ' c s c s t c s t
HomicideArrests Murders (^) PolicePayroll time Murders =^ ψ^ +ψ^ Pop +ψ^ +^ ∑^ ψ +ε (5) , 0 1 , , 2 3 , , , 4 5, '', ,
s t c s t s t c s t t t^ t^ c s t
DeathSentences Murders (^) JudicialExpense PartisanInfluence Arrests pop Admissions time
, 0 1 , , 2 3 , , , 4, ''', ,
s t c s t s t c s t t t^ t^ c s t
Executions Murders (^) JudicialExpense PartisanInfluence DeathSentences pop time
, t
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
In these cases, we have followed the notation from Donohue and Wolfers. The variable
indicates the population in county c , state s , and time t , divided by 100,000. Partisan influence in the Republican presidential candidate’s vote share in the most recent election, and Admissions is the prison admission rate. Note that some of the key variables are estimated at the state level (the subscript c is omitted in these cases).
popc s t , ,
(^8) Additional information is available in the original text. Tables 3 and 4 in Dezhbakhsh, Rubin and Shepherd (p 362-363) present the results from
Equation 3 given six different versions of the variable (^) DeathSentencesExecutions. Their measures of
execution probabilities in the six columns are as follows:
Columns 1 and 4: , , 6
s t s t
Executions DeathSentences (^) −^ (8)
Columns 2 and 5: ,^6 ,
s t s t
Executions DeathSentences
Columns 3 and 6:
3 4 3 , 9 ,
t^ s t t^ s^ t
Executions DeathSentences −^ =− =−
∑ ∑
Columns 1-3 omit observations in which there are no death sentences. Columns 4-6 use a method to use the probability of the most recent year which had a death sentence. Donohue and Wolfers’ Table 7 (p 824) repeats the results. While both papers provide a wide variety of other information, we focus on this Table as it provides a useful case to illustrate our points. Table 3 below reproduces the Dezhbakhsh, Rubin and Shepherd results. No computation has been done here, the information has simply been transferred for comparability. The basic point made is that all of the coefficients are negative in sign and most are significant at the 5% level. When translated into a calculated number of lives saved per execution, the estimates range from 19 to 36 persons. Table 4 below replicates the Donohue and Wolfers innovations. Donohue and Wolfers have graciously provided their data and code to us, and we use these to produce Table 4 in its entirety. They have made minor modifications to the Dezhbakhsh, Rubin and Shepherd
(^8) It is worthwhile at this stage to point out that DRS use a combination of county and state effects to predict county- level murder rates. We will not discuss the merits (or difficulties) of this type of estimation strategy other than to comment that it will not impact the model averaging exercise that we are conducting.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
variables that we consider are the aggravated assault rate, the robbery rate, the population proportion of 10-19 year olds, 20-29 year old, demographic percentages of blacks, percentage of non-black minorities, percentage of males, the percentage of NRA members, real per capita income, real per capita income maintenance payments, real per capita unemployment insurance payments, and the population density. For each column 1-6, we report the model and data averaged coefficients on the probability of arrest, the probability of death sentence, and the probability of execution. All 18 of the key model average coefficients in the 6 columns are insignificant. The signs are positive in most cases, which would reject the Dezhbakhsh, Rubin and Shepherd conclusions, but the standard errors are very large. The estimated lives saved trade off are all negative and large. We also report a calculation for the implied trade off in terms of net lives saves from each execution. A positive number in this row suggests that an execution produces the social benefit of savings lives by deterring future murders. Each of the six here are negative, but also with very large standard errors. Our next step is to incorporate some of the critiques of Donohue and Wolfers into our model and data averaging techniques.^9 First, we illustrate the results of averaging over only a number of different data possibilities. Donohue and Wolfers suggest three simple data modifications that they show to have a large impact on the Dezhbakhsh, Rubin and Shepherd results (see Table 4). We test this hypothesis by specifying four sets of data: the original Dezhbakhsh, Rubin and Shepherd data, as well as three additional datasets as suggested by Donohue and Wolfers. The three modifications are 1) using a single voting variables, 2) dropping Texas, and 3) dropping California. In the first case, the voting variables are the percentage of statewide votes going to the republication presidential candidate in each of six elections. This produces six distinct variables. In the cases where we produce results for the first change in data specification, we use on the vote percentage from the most recent election. We assign a prior probability to the Dezhbakhsh, Rubin and Shepherd data of ½ and a probability of
(^9) A couple of technical notes are relevant at this stage. First, the development of model averaging techniques to date has been limited to certain types of regression. To use instrumental variables regression in a model averaged context, there is some outstanding debate about whether one should “select” an optimal first stage model or use model averaged coefficients for use the second stage. To be consistent with the remainder of the paper, we use model averaged coefficients in Table 8 to produce our fully model averaged results. As well, to be able to interpret variance estimates in a model averaged context, one must use standard OLS techniques.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.
1/6 to each of the three Donohue and Wolfers models, such that the combined probability of the Donohue and Wolfers models is equal to ½ as well. We use the baseline specification from Dezhbakhsh, Rubin and Shepherd for both the first and the second stage regressions. Table 6 reports the same 18 coefficients as in Table 5, and repeats the net lives analysis. Again, the results show very large standard errors in all cases. The subsequent Table 7 combines the methods used in Table 5 and Table 6 such that the averaging takes place over both the second stage as well as over data. We continue the priors assignment used for the two previous tables. Each of the 4096 models receive equal weight and the data possibilities are weighted at ½, 1/6, 1/6, 1/6. As in Table 5 and Table 6, the results again show very large standard errors. Finally, Table 8 shows MA coefficients where we allow for a large model space in the first stage regressions as well. In this case, we specify that for each of the 14 first stage regressions, we allow for 2^12 unique possible models. In the first stage regressions, we iterate over either seven or twelve variables in most cases. The variables are police expenditure, judicial expenditure, assault rate, robbery rate, state NRA membership, prison admissions, and either one or six voting variables as discussed above. As Dezhbakhsh, Rubin and Shepherd and Donohue and Wolfers have done, we use the predicted values of the variables from the first stage regressions in our second stage. In our case, instead of using the OLS coefficients in each case, we averaged coefficients for each of the 14 specifications. These new predicted variables are then used in the second stage regressions. As in Tables 4 and 6, the second stage regressions are also estimated using the averaging methods described. Again the results are that none of the 18 coefficients are significant. The summary table below provides a quick reference for the exercises conducted.
and do not necessarily reflect the official position or policies of the U.S. Department of Justice.