Bayesian Moment-Based Inference in a Regression Model with Misclassification Error

We present a Bayesian analysis of a regression model with a binary covariate that may have classification (measurement) error. Prior research demonstrates that the regression coeffi cient is only partially identified. We take a Bayesian approach which adds assumptions in the form of priors on the unknown misclassification probabilities. The approach is intermediate between the frequentist bounds of previous literature and strong assumptions which achieve point identification, and thus preferable in many settings. We present two simple algorithms to sample from the posterior distribution when the likelihood function is not fully parametric but only satisfies a set of moment restrictions. We focus on how varying amounts of information contained in a prior distribution on the misclassification probabilities change the posterior of the parameters of interest. While the priors add information to the model, they do not necessarily tighten the identified set. However, the information is suffi cient to tighten Bayesian inferences. We also consider the case where the mismeasured binary regressor is endogenous. We illustrate the use of our Bayesian approach in a simulated data set and an empirical application investigating the association between narcotic pain reliever use and earnings.


Introduction
In this paper we consider a regression model with a binary explanatory variable that is subject to measurement error: there is some nonzero probability that an observation is classi…ed into the wrong category. Such a model is relevant, for example, for estimating treatment e¤ects when compliance with treatment is not observed. Other examples include measuring the union wage di¤erential (Bollinger 1996(Bollinger , 2001Card 1996;Freeman 1984), measuring the impact of IT training and certi…cation on earnings (Vakhitova 2006), measuring the impact of disability status on earnings and employment (Kreider and Pepper 2007), or measuring the impact of food stamps (SNAP) on food security or health . Misclassi…cation also occurs in survey data, which are known to su¤er from response error (Biemer et al. 1991;Bound et al. 2001). Failing to account for this problem may result in serious bias. If one is willing to impose a strong identifying assumption (for example, assuming that the misclassi…cation rate is known or consistently estimable as in Aigner (1973)), it is possible to consistently estimate the parameters of interest. Some recent models have achieved identi…cation when certain location parameters of the mismeasured variable are identi…ed (Hu and Schennach 2008). Hu (2008) provides an identi…cation approach using instruments and (weak) restrictions on the misclassi…cation rates, whereas Card (1996) uses validation data to obtain estimates of misclassi…cation rates. In the absence of instruments, validation data or tight restrictions on misclassi…cation rates, the parameter of interest is typically no longer point identi…ed.
In certain models, however, it may still be feasible to …nd identi…ed and informative bounds for the parameter. A derivation of such bounds can be found in Klepper (1988aKlepper ( , 1988b and Bollinger (1996).
Bounding results are obtained for models without additional assumptions on misclassi…cation.
However, imposing parametric distributional assumptions is typically undesirable; while these may lead to identi…cation, the results are likely to be fragile. Di¤erences in distributions may result in large changes in the estimates. Instead, as is the case for the model considered in this paper, informative bounds can be derived by only assuming that the second moments of the observed variables are …nite. Unfortunately, the resulting bounds are often quite far apart. For example, our empirical application bounds the earnings gap associated with narcotic pain reliever use between $7; 618 and $3:1 million. Adding the assumption that drug use is not over-reported, the upper bound shrinks to $243; 900. This example and earlier work by Bollinger (1996) demonstrate that additional information about the extent of measurement error has a substantial e¤ect on the bounds of the identi…ed region. While this is one form of useful information, in many cases the assumptions necessary for this approach may be di¢ cult to justify. However, researchers may have information about the misclassi…cation rates that can be formalized as a Bayesian prior distribution. In this paper, we examine the link between information in the form of a prior and the implications for posterior distributions and inference.
There is only a small literature on using Bayesian inference in set-identi…ed models. One of the earliest contributions in this area is Erickson (1989). More recently, Poirier (1998) and Moon and Schorfheide (2012) consider a broad variety of models with non-identi…ed parameters. In these models the data are informative about a set of reduced form parameters (we adopt the terminology of Moon and Schorfheide (2012)). It is well known (e.g. Walker 1969;Heyde and Johnstone 1979) that in parametric models the posterior of these parameters is asymptotically normal. In the limit, the posterior distribution and the asymptotic distribution of the maximum likelihood estimator coincide. However, rarely are the reduced form parameters of primary interest. We will refer to the latter as "primary" parameters. In our context, these are the regression parameters that determine the outcome model. In partially identi…ed models, there is a mapping from primary parameters to reduced form parameters which is not one-to-one. From a Bayesian perspective, the posterior distribution of the primary parameters is only partially updated. In fact, Poirier (1998) shows that the (marginal) posterior of the primary parameters is a weighted average of their conditional prior, given the reduced form parameters.
The analysis we propose here is di¤erent from previous work such as Moon and Schorfheide (2012) in that we consider likelihoods that are de…ned by a simple set of moment restrictions.
Thus, we do not impose distributional assumptions. As shown in Van Hasselt and Bollinger (2012), even assumptions such as homoskedasticity can lead to a change in the identi…ed set in the model studied here. Most non-Bayesian treatments of this model share the focus on moments. Rather than utilizing an approximate likelihood as in Liao and Jiang (2010), we incorporate two versions of a semiparametric likelihood that are particularly convenient in the context of moment functions.
Depending on the prior information that the researcher entertains, inference can be based on the Bayesian bootstrap (Rubin 1981;Chamberlain and Imbens 2003), or on the Bayesian exponentially tilted empirical likelihood (BETEL) approach of Schennach (2005). We focus on adding information to the model through the prior distributions on the misclassi…cation rates. This is a natural approach to adding information and compliments the approach taken in Bollinger (1996), where (deterministic) information on the measurement error rates was found to tighten the identi…ed region. In the context of our model, we assess how priors change the posterior distribution and hence change inference about the parameter of interest. We extend these results to allow for endogeneity. Here, in particular, we highlight that the prior provides identifying information that results in …nite highest posterior density (HPD) intervals.
Partial or set identi…cation requires a careful approach to inference. In practice, many authors (e.g. Bollinger 1996) simply use standard con…dence intervals for the estimated upper and lower bounds. There is a substantive di¤erence, however, between inference about the identi…ed set and inference about the parameter itself. For the case of a single parameter, Horowitz and Manski (2000) demonstrate how to construct a con…dence region with a given (asymptotic) coverage probability for the entire identi…ed set. Chernozhukov, Hong, and Tamer (2007) extend these results to vector-valued parameters and a broader class of econometric models. Such con…dence regions are conservative for the parameter itself. Imbens and Manski (2004) show how to construct the con…dence set for a (scalar) parameter. Their method entails properly adjusting the critical value so that the coverage probability converges to the desired level uniformly over the parameter space.
From a Bayesian perspective, it is natural to focus on inference about the regression parameters rather than the identi…ed set. As such, we will compare our results to frequentist inference based on the work of Imbens and Manski (2004). As our results show, it is possible for Bayesian credible intervals to be strictly contained in frequentist con…dence sets (Moon and Schorfheide 2012).
The approach here is intermediate between the frequentist bounds, which incorporate no prior information on the parameters beyond the main model and the data, and cases where additional information such as bounds on misclassi…cation rates or further distributional assumptions tighten the identi…ed set. We examine priors that do not change the identi…ed set yet lead to stronger inferential conclusions through a more concentrated posterior distribution and narrower highest posterior density intervals. We also demonstrate that approaches with strong information can be nested in the Bayesian prior, resulting in both a tightening of the identi…ed set and a concentration of the posterior of the parameter of interest. The focus here is how information, stated in the form of a prior on the misclassi…cation rates, sharpens the inference or identi…ed set, as compared to the approach of Bollinger (1996).
Recently, several papers have addressed both misclassi…cation errors and endogeneity of a binary explanatory variable Frazis and Loewenstein 2003;Shiu 2016). In the model here, without further information about the extent of the endogeneity, set identi…cation fails and the model parameters are completely unidenti…ed (Manski 1995). We show how to incorporate endogeneity in the Bayesian model and calculate Bayesian credible intervals under di¤erent priors about the endogeneity. These priors can again be seen as incorporating di¤erent amounts of identifying information into the model. Here the identifying information results in …nite HPD intervals which allow for inference.
The remainder of this paper is organized as follows. Section 2 introduces the model, whereas section 3 presents algorithms to sample from the posteriors, based on the Bayesian bootstrap and BETEL. We illustrate the use of these algorithms with simulated data in section 4. In section 5 we expand the model to include endogeneity. In section 6, we apply the algorithms to estimate the wage gap between individuals who use and those who do not use prescription pain relievers.

The Model
We consider a simple regression model with a single binary regressor taking values of zero or one.
The regression coe¢ cient in this case is the di¤erence between two conditional means. It is possible to incorporate additional covariates into the analysis that follows, but such an extension complicates the notation and is not necessary to convey the main points. Our discussion here closely follows Bollinger (1996) and Van Hasselt and Bollinger (2012), whose notation we adopt. The outcome for sampling unit i is given by (1) where Z i 2 f0; 1g has a Bernoulli distribution with PrfZ i = 1g = and 2 (0; 1). The linearity of equation (1) is not restrictive because Z i is binary. The regression coe¢ cients satisfy = E(Y i jZ i = 0) and = E(Y i jZ i = 1) E(Y i jZ i = 0), and the model is saturated. We refer, throughout, to the parameters in this model as the regression parameters.
The binary covariate Z i is not always observed. Instead, the data contain a variable X i , where Here p is the conditional probability of observing a false positive while q is the conditional probability of observing a false negative. The di¤erence X i Z i is measurement error, which is nonclassical because its conditional distribution depends on the value of Z i . Equation (2) does represent the assumption, however, that the error is conditionally independent of the residual U i . Because the misclassi…cation error is conditionally (on Z i ) independent of the outcome. 1 Finally, the restriction p + q < 1 ensures that the covariance between X i and Z i is positive. Hence, the misclassi…cation is not so extreme as to make X i and Z i independent (p + q = 1) or to reverse the categorical de…nitions (p + q > 1). (2012), we distinguish three sets of parameters in this model:

Following Moon and Schorfheide
(i) the regression parameters = ( ; ; ; 2 U ), (ii) the error probabilities p and q, and (iii) the …rst two central moments (mean, variance, covariance) = ( X ; Y ; 2 Y ; XY ) of X i and Y i . 2 We assume that the goal is to make inference about , and in particular . The error probabilities are not of primary interest and can be considered nuisance parameters. The vector of moments ( ) is identi…ed by the observed data and can be estimated by conventional methods. Following Moon and Schorfheide (2012), we refer to these as the reduced form parameters. Equations (1) and (2) imply the following functional relations between the three sets of parameters: 1 This type of error is called non-di¤ erential (e.g. Carroll, Ruppert, and Stefanski 1995). In certain applications the assumption of non-di¤erential error may be untenable. For example, Bound, Brown, and Mathiowetz (2001) and Kreider and Pepper (2007) argue that the measurement error in self-reported health variables may be related to labor market outcomes.
2 Since Var(Xi) = X (1 X ); we do not parameterize the variance of Xi separately.
From an identi…cation perspective, the moments can be treated as known constants because they are nonparametrically identi…ed and estimable through sample moments. The system (3) then has 4 equations in 6 remaining unknowns. Without further restrictions a unique solution for does not exist, and hence, is not identi…ed. Di¤erent sets of additional restrictions can lead to identi…cation. Chen, Hu, and Lewbel (2008) show that is identi…ed if E(U 2 i jZ i ) and E(U 3 i jZ i ) are independent of Z i . In this paper we base inference only on the model in equations (1) and (2) and the restrictions that these equations imply. Consequently, our goal is to make inference about parameters that are not point identi…ed. Bollinger (1996) shows that despite the lack of point identi…cation, the regression model and nuisance parameters are partially identi…ed, in the sense that (i) these parameters can be bounded from above and below, and (ii) these bounds are nonparametrically identi…ed (hence estimable).
For example, assuming that 0, it can be shown that The interval between the lower and upper bounds is the identi…ed set for . Bollinger (1996) also presents bounds on p, q, and . Although all these bounds can be easily estimated, in practice they can be quite far apart, providing unsatisfying conclusions. A number of assumptions can be brought into the model which shrink the identi…ed set. For example, in an application to pollution exposure and health, Klepper (1988a) uses the restriction p = q, which tightens the upper bound. Bollinger (1996) discusses additional restrictions on (p; q) that further shrink the identi…ed set and applies this to the union wage di¤erential. Van Hasselt and Bollinger (2012) show that homoskedasticity tightens the identi…ed set and that homoskedasticity coupled with the assumption p = q identi…es the model parameters. In each of these cases, the bounds are tightened because the upper bound represents extreme cases of highly asymmetric misclassi…cation (only one way or another) coupled with a variance of u i which is zero. In this paper, we focus on the wider bounds of Bollinger (1996), which are based on fewer assumptions. We do, however, consider priors that impose additional restrictions on (p; q) as well as priors that do not. It is relatively straightforward to modify our approach here, in conjunction with other assumptions such as error symmetry or homoskedasticity.

The Likelihood, Identi…cation, and the Posterior
We assume that an i.i.d. sample D n = fX i ; Y i g n i=1 is observed from the model in (1) and (2). There are di¤erent ways to parameterize the likelihood in terms of , ; and (p; q). In this section we use the likelihood f (D n j ; p; q), parameterized by the regression parameters and the error probabilities.
However, the following arguments apply to other parameterizations as well. Given a prior distribution f ( ; p; q), the posterior distribution can be written as f ( ; p; qjD n ) / f (D n j ; p; q) f ( ; p; q).
One way to proceed is to assume that the regression error U i in equation (1) has a known distribution, for example U i jZ i N (0; 2 U ). Although such a parametric assumption is often made for convenience, it has a strong impact on the identi…cation (or lack thereof) of the model parameters.
In the context of a partially identi…ed, semiparametric model, parametric restrictions can either signi…cantly reduce the size of the identi…ed set or lead to point identi…cation. Related to this, statistical inference, whether Bayesian or not, can be quite sensitive to parametric assumptions.
In the approach we take here, we do not assume that the likelihood is a known parametric distribution. Instead, we examine to what extent the econometrician can learn about while maintaining a weak set of assumptions about the statistical model generating D n . We focus on the reduced form parameters , which are nonparametrically identi…ed and can be consistently estimated from the data. Knowledge of , however, is not su¢ cient to identify or (p; q). This situation can be characterized by likelihoods that satisfy f (D n j ; ; p; q) = f (D n j ); where , , and (p; q) are subject to the system of equations in (3). The key insights are that (i) the mapping from to is not one-to-one, and (ii) the likelihood function is determined by alone. Instead of conditioning on the full vector ( ; ; p; q) on the left-hand side of (4), we can condition on and any two elements of ( ; p; q), because the system in (3) then determines the remaining parameters. Thus, instead of (4), we can also write f (D n j ; p; q) = f (D n j ), or f (D n j ; ; ) = f (D n j ), etcetera.
Suppose the econometrician has prior beliefs about the misclassi…cation probabilities p and q and the moments , expressed by a distribution f ( ; p; q). From Bayes'rule and equation (4), it follows that This shows that the posterior distribution factors into the product of the marginal posterior of the identi…ed parameters and the conditional prior of the non-identi…ed parameters p and q. 3 This is a crucial feature of posterior distributions in models with non-identi…ed parameters and has been discussed by many authors (e.g., Kadane 1974;Poirier 1998;Moon and Schorfheide 2012).
The sample is informative about , because in large samples f ( jD n ) becomes less dispersed and concentrates around some value. On the other hand, updating beliefs about (p; q) occurs only through updating the value of in the conditional prior f (p; qj ). Moreover, if S is the support of f ( jD n ), then it follows from (5) that Hence the marginal posterior of (p; q) is a weighted average of the conditional prior where the weight function is the posterior of . In large samples, the posterior of will concentrate around some value, say . Equation (6) shows that then the posterior of (p; q) will concentrate around f (p; qj ).

The Bayesian Bootstrap
We consider two versions of a semi-parametric likelihood that only satis…es the simple moment A …rst option is to use the Bayesian bootstrap, introduced by Rubin (1981) and adapted by Chamberlain and Imbens (2003). The main idea is as follows: suppose (X i ; Y i ) has a discrete joint distribution with a …nite support. Let fz j ; j = 1; : : : ; Jg be the collection of support points. Since most data are measured with …nite precision (i.e., discretely) and because J can be large, the assumption of a …nite number of support points is not very restrictive (Chamberlain and Imbens 2003, p.12). Let = ( 1 ; : : : The moment restrictions can then be written as J X j=1 j g(z j ; ) = 0: Through this set of equations, a prior (posterior) distribution for induces a prior (posterior) for ; c j > 0; j = 1; : : : ; J: Chamberlain and Imbens (2003) show that the improper prior that is obtained when c j ! 0 for all j has some desirable properties. With this choice of c, and using the multinomial likelihood, it follows that the Dirichlet posterior of is given by jD n D(n 1 ; : : : ; n J ): This posterior, together with the set of restrictions in (7), imply that the posterior of is a multivariate B-spline (Dahmen and Micchelli 1981). It is easy to generate random draws from this posterior, as we will discuss shortly. However, we …rst turn to the posterior of the remaining parameters. We focus on p and q, assuming (as we did in the previous section) that the econometrician has prior beliefs about the misclassi…cation probabilities. Given a random draw from the posterior of (p; q; ), a value of = ( ; ; ; 2 U ) can be calculated from the system in (3). This value then constitutes a draw from the posterior of . The likelihood as a function of ( ; p; q) can be calculated by integrating out multinomial probabilities over their conditional prior with support : From Bayes'rule, Consider the conditional prior f (p; qj ; ) and suppose we change the distribution of (X i ; Y i ) by changing . This a¤ects the moments of (X i ; Y i ) and informs us about p and q (and ), because it changes the bounds of the identi…ed set. However, it adds no information about the location of p and q within these bounds. In other words, the information that carries about the misclassi…cation rates operates only through the reduced form parameters , so that f (p; qj ; ) = f (p; qj ).
Substituting this into the previous display and (8), we then …nd Thus, the Bayesian bootstrap likelihood function satis…es (4), and the posterior of ( ; p; q) satis…es (5). Random draws from the posterior f ( ; p; qjD n ) can now easily be generated, as described in the following algorithm.
Algorithm 1 If f (p; qj ) is the conditional prior distribution of p and q given , then a random draw from the Bayesian bootstrap posterior distribution of ( ; p; q) can be obtained as follows: 1. Randomly generate a set of independent variables fu i g n i=1 from the unit exponential distribution.

Calculate the solution to the system of equations
4. Generate a random draw (p ; q ) from the conditional distribution f (p; qj ). The value ( ; p ; q ) is a draw from the posterior.
Note that substituting ( ; p ; q ) into the system (3) and calculating the solution ( ; yields a random draw from the (degenerate) posterior of all model parameters. Finally, we note again that step 4 in the algorithm is formulated in terms of the conditional prior of p and q given . If the econometrician wants to use prior beliefs about, for example, and , a conditional prior distribution f ( ; j ) would be used in step 4. Posterior draws ( ; ; ) and the mapping (3) then immediately yield the posterior draws (p ; q ; ; 2 U ).

Bayesian Exponentially Tilted Empirical Likelihood
A second semiparametric likelihood that only satis…es a set of moment restrictions is the Bayesian exponentially tilted empirical likelihood (BETEL) of Schennach (2005). It is based on the idea of maximum entropy estimation (e.g. Kitamura and Stutzer 1997;Imbens, Spady, and Johnson 1998). In particular, the entropy of a multinomial likelihood supported on the sample is maximized, subject to the moment restrictions. Let g i ( ) be shorthand for g(X i ; Y i ; ). For a given value of , the multinomial probabilities ( ) solve the following problem: For a given value of , the solution is given by where is a vector of Lagrange multipliers. In practice, the multinomial probabilities are easy to calculate, because ( ) minimizes a strictly convex function. The multinomial likelihood can be used to calculate the posterior of : The likelihood function has solved a maximum entropy problem that only depends on the value of . As such, the BETEL likelihood function also satis…es (4) and (5).
As with the Bayesian bootstrap, the decomposition in equation (5) suggests a simple way to generate a sample from the BETEL posterior. First, generate a random draw from the posterior in (9); second, generate a random draw (p ; q ) from the conditional prior f (p; qj ).
While the second step is straightforward, the …rst step is slightly more involved compared to the Bayesian bootstrap. We use the Metropolis-Hastings algorithm (Gilks et al. 1996) to generate an approximate sample from f ( jD n ), similar to the approach of Lancaster and Jun (2010). In the second step, a draw is generated from a conditional prior. This leads to the following algorithm. 5 Algorithm 2 Let f ( ) be the prior of and let f (p; qj ) be the conditional prior. Given the parameter values ( t ; p t ; q t ) at iteration t, generate ( t+1 ; p t+1 ; q t+1 ) as follows: 1. Generate a random draw~ from a distribution g( j t ) that depends on the current value t .
2. Calculate the multinomial BETEL likelihood at the values t and~ , and the ratio 4. Generate a random draw (p t+1 ; q t+1 ) from the distribution f (p; qj t+1 ).
Algorithm 2 generates a Markov chain of values for ( ; p; q). These values can be used to calculate a set of values for , which represents an approximate sample from the posterior f ( jD n ).
The distribution g is the "proposal distribution" that generates candidates for new states in the Markov chain for . At each iteration, the chain either moves to the new state with probability minf1; r t g or remains in its current state t with probability 1 minf1; r t g. In practice, the proposal distribution is often chosen such that around 25% 30% of the generated draws from g are accepted as new states in the Markov chain (Gelman et al. 1995). Intuitively, if r t 0 the Markov chain remains mostly stuck in certain states, whereas if r t 1 the chain mostly consists of values drawn from g. In both cases the simulated values will likely be a poor approximation to the posterior distribution.
The posterior of will be close to normal in large samples. A natural and convenient choice (Gelman et al. 1995, p. 334) andV n is an estimator of the asymptotic variance of the method-of-moments estimator of (Lancaster and Jun 2010). In this case, g(~ j t ) = g( t j~ ) and the ratio r t in algorithm 2 simpli…es to the ratio of posteriors f (~ jD n )=f ( t jD n ).
The major di¤erence between BETEL and the Bayesian bootstrap is that BETEL allows a researcher to start with prior beliefs about ( ; p; q). The mapping in (3) and a change of variables can be used to calculate the prior f ( ; p; q). An application of algorithm 2 then yields an approxi-mate sample from the posterior of ( ; p; q) and, through the system (3), from the posterior of the regression parameters . On the other hand, the Bayesian bootstrap cannot be used with arbitrary prior beliefs about ( ; p; q). In particular, there is no way to explicitly incorporate prior beliefs about . 6 The Bayesian bootstrap is applicable if the econometrician speci…es the conditional prior of any two parameters in ( ; p; q), given . For example, algorithm 1 shows how a given conditional prior f (p; qj ) can be used to generate a sample from the posterior f ( ; p; qjD n ).
In some cases BETEL may be the preferred approach because of its ‡exibility in terms of specifying the prior distribution. In other cases, the econometrician may view the Bayesian bootstrap as the easier approach because it requires fewer prior inputs (i.e., the conditional prior f (p; qj ) instead of the full joint prior f (p; q; )). However, if the same conditional prior f (p; qj ) is used for BETEL and for the Bayesian bootstrap, we expect the posteriors to be similar in large samples.
BETEL requires a prior f ( ); but in large samples its impact is negligible and the posterior of concentrates around some value . In the Bayesian bootstrap, the prior f ( ) is not well de-…ned, but the posterior of also concentrates around . From equation (6), in both approaches f (p; qjD n ) converges to f (p; qj ).

A Selection of Priors
In this section we present several priors f (p; qj ) that could be used in practice, re ‡ecting di¤erent beliefs about misclassi…cation rates. In sections 4 and 6 these priors will be used in a simulation example and an empirical application. Conditional on , the probabilities p and q are bounded, and these bounds must be re ‡ected in the support of the (conditional) prior distribution. Speci…cally, the restriction 2 U 0 and the mapping in (3) imply the inequality where 2 XY is the squared correlation between X i and Y i . Since (p; q) has to satisfy (10), the error rates cannot be independent in the prior. Bollinger (1996) shows that the maximum possible value of p occurs at q = 0. In that case, 0 p p ( ), where p ( ) = X (1 2 XY ). From (10), it can also be shown that for a given value of p, the bounds on q are 0 q q (p; ), where An approach which has intuitive appeal is to base the priors of p and q on the uniform distribution.
The imposition of any prior distribution imposes information about the parameters. The bounds in Bollinger (1996) represent the fully agnostic case of no prior information (one can think of this as the case representing the union of all possible priors). The uniform prior imposes information in the form that all values have the same likelihood. An implication of this is that the probability of no measurement error is zero. Note that we construct the priors conditional on the reduced form parameters . We construct the joint prior as the product of a uniform prior for p given , and a uniform prior for q given p and . 7 This results in the following prior, which we label "uniform".
1f0 p p ( ); 0 q q (p; )g: In many cases, researchers have information that leads them to believe that misclassi…cation rates are more likely to be concentrated among lower values of p and q. While inference could be based on imposing known upper bounds on the misclassi…cation rates (as in Bollinger 1996), this clearly rules out the (remote) possibility that these rates exceed the chosen thresholds. A probabilistic approach to incorporating this information is to use a "power" type distribution for the prior. As with the uniform, the probability of p = q = 0 (no measurement error) is zero.
However, the probability of measurement error for sets of (p; q) near the upper bound is very low as well. Our second "power" prior is then: In cases where researchers believe the misclassi…cation rates are likely to be below a certain value but otherwise do not want to make the claim that the very lowest values are most likely, they may opt for a mixture of uniforms. Indeed, this prior allows researchers to place a high likelihood that the misclassi…cation rates are below some threshold but doesn't rule out higher rates, unlike the approach of Bollinger (1996). In our third prior, we therefore suppose that p p with probability 1 (provided p < p ( )) and, conditional on p, that q q with probability 2 (again, provided q < q (p; )). Thus, p and q may exceed these bounds (though they are still subject to p ( ) and q (p; )), but this only happens with probabilities of (1 1 ) and (1 2 ) respectively. This leads to the following prior with a uniform mixture structure. (13) Thus, if the upper bounds p ( ) and q (p; ) exceed p and q respectively, the priors are mixtures of uniform distributions. Otherwise, the priors reduce to f 1 (pj ) and f 1 (qjp; ).
An even stronger case combines the certainty bounds of Bollinger (1996) and the uniform distribution of prior 1. If the econometrician believes that p p and q q with certainty then this belief can be expressed by the "bounded uniform" prior f 4 (pj ) = 1 minf p; p ( )g 1f0 p minf p; p ( )gg; f 4 (qjp; ) = 1 minf q; q (p; )g 1f0 q minf q; q (p; )gg: Finally, in some cases it may be reasonable to assume that misclassi…cation is one-sided, in the sense that false negatives do not occur and q = 0. For example, in the empirical application considered in section 6, the binary variable is an indicator for abstinence from prescription pain reliever (a value of 1 indicates no use, or abstinence of prescription pain reliever). There is reason to believe that few claim drug use in this survey when indeed they are not using. One can also extend the general idea to a prior where p = 0. This may apply in food stamp programs, where there is little incentive to report participation when one does not participate (Bollinger and David 1997). If, at the same time, p is believed to be less than p, we can use the prior f 5 (pj ) = 1 minf p; p ( )g 1f0 p minf p; p ( )gg: For the purpose of sampling from the posterior, algorithms 1 and 2 simplify slightly because no random draws of q need to be generated.
We note that when moving from f 1 to f 2 to f 3 , we increase the amount of information contained in the prior without changing the identi…ed set. While Bollinger (1996)

A Simulation Example
In this section we provide an example with simulated data. The example aims to illustrate the relationship between the prior and the posterior rather than to present a full Monte Carlo study.
In our Bayesian analysis, we calculate the 95% highest posterior density (HPD) interval. This interval contains 95% of the posterior probability and the highest values of the posterior density.
Thus, it is the tightest 95% band one can form with the posterior. Reporting the 95% HPD interval is common practice in the empirical Bayesian literature. For comparison purposes, we also calculate frequentist 95% con…dence intervals for the parameters (Imbens and Manski 2004).
In the simulation, we use the following values for the model parameters: = = 1, 2 U = 0:63; and = 0:3. This implies an R-squared in the regression equation of 0:25. The misclassi…cation probabilities are p = 0:15 and q = 0:09. These are relatively high compared to many empirical settings (for one review, see Bound et al. 2001). We generate a sample of 1; 000 observations, where the outcome Y i is calculated according to equation (1) and the misclassi…ed variable X i is generated, conditional on Z i , according to equation (2). The calculation of the HPD intervals is based on 10; 000 simulated draws from the Bayesian bootstrap and BETEL posteriors.  Bollinger (1996), whereas the intervals were calculated using the method suggested by Imbens and Manski (2004). One should use caution in comparing these intervals. The Imbens-Manski con…dence intervals are only a¤ected by (sampling) uncertainty in the identi…ed parameters ( ), whereas the HPD intervals are a¤ected by uncertainty about and the conditional prior of (p; q). The uncertainty about in both cases is relatively small, given the sample size and low variances. In the columns labeled 'case 1,' the estimated bounds and con…dence intervals were calculated under the assumption that p + q < 1 (see equation (2)). In the columns labeled 'case 2', these were calculated using the additional information that p 0:2 and q 0:1. Throughout this discussion, we will focus on the parameter . In case 1, the estimated upper bound for is 3:494. The 95% con…dence interval ranges from 0:542 to 3:932 and is quite wide. As noted in Bollinger (1996), the upper bound is highly sensitive to the addition of other information. If known upper bounds on p and q are imposed, as in case 2, the identi…ed set shrinks. For in particular, the estimated upper bound drops to 1:119 and the upper limit of the 95% con…dence interval drops to 1:295.
Using the Bayesian bootstrap and BETEL, we calculated 95% highest posterior density (HPD) intervals under the four priors (11) -(14). For priors f 3 and f 4 , we set 1 = 2 = 0:9, p = 0:2 and q = 0:1. The results are given in table 2. We present the Bayesian bootstrap posterior distributions for graphically in Figures 1 and 2, together with the upper and lower limits of the frequentist 95% con…dence interval. As one might expect, the speci…c approach to obtaining the posteriors (Bayesian bootstrap or BETEL) does not appear to matter in a meaningful way.
As discussed above, the two approaches are complementary in how they incorporate information about the reduced form parameters . We begin by comparing the estimated bounds and frequentist con…dence regions to the HPD intervals resulting from the "uniform" prior in (11). The Bayesian bootstrap and BETEL 95% HPD intervals for are much narrower than the 95% con…dence interval based on Imbens and Manski (2004)  The results highlight that the upper bound is particularly sensitive to additional information. In this case, the information of nearly any prior will result in a tighter inference.  The second prior, the "power" prior in (12), is based on the power distribution and shifts the information to place a higher likelihood on low misclassi…cation rates. As with the …rst prior, the posterior for the four parameters still covers the entire frequentist bounds but is now more concentrated near the lower bounds (see …gure 1). The second prior has reduced the Bayesian bootstrap and BETEL lower bounds of the 95% HPD interval for , which are now nearly identical to the lower bound of the 95% frequentist con…dence interval. The upper bounds of the 95% HPD intervals are also lower compared to those of the …rst prior, falling from 2:673 to 2:200 for the Bayesian bootstrap and from 2:662 to 2:252 for BETEL. Thus, moving from the uniform prior to the power prior shrinks the 95% HPD region for and shifts it slightly toward the origin. This is quite intuitive, because the feasible region for is a mapping of various (p; q) combinations, with lower values of consistent with lower values of (p; q). As we place higher prior probability on lower values of (p; q), we would expect correspondingly higher probability on lower values of .
We also note that the 95% HPD interval is not necessarily contained within the frequentist 95% con…dence interval. While the addition of prior information often shrinks the HPD interval by lowering the upper bound, the e¤ect on the lower bounds is modest. However, we do …nd that the length of the HPD interval is always substantially less than the length of the frequentist con…dence interval.
The third prior, the uniform mixture distribution in (13), places even higher probabilities on lower values of (p; q). Compared to the power distribution, it reduces the overall probability of p > 0:2 and in particular concentrates probability on q < 0:1. As can be seen in both …gure 1 and table 2, this further concentrates the posterior and shrinks the 95% HPD intervals for , with most of the change coming from the much tighter upper bound (1:657 for the Bayesian bootstrap and 1:735 for BETEL).
The fourth prior, the bounded uniform distribution in (14), results in a change in the identi…ed set, tightening the asymptotic and estimated bounds (compare case 1 and case 2 in table 1), and shrinking the frequentist con…dence interval. However, the frequentist con…dence interval of (0:542; 1:295) is still wider than the 95% HPD intervals (0:585; 1:126) from the Bayesian bootstrap and (0:576; 1:107) from BETEL. While the magnitudes of the di¤erences are small, the best view of these is as a percentage change, as in practice these regions have larger scales. Thus, the HPD intervals from the Bayesian bootstrap and BETEL are about 28% smaller than the frequentist con…dence interval. In …gure 2, observe that the resulting posterior is more centered in the identi…ed set than the …rst three priors. Clearly, the major gain of bounding the misclassi…cation probability lies in tightening the identi…ed set. The Bayesian HPD bounds associated with priors 1,2, and 3 are based on a weaker assumption: high values of (p; q) are allowed but discounted as less likely.
In summary, the results here reinforce those of Bollinger (1996) in that they demonstrate how the addition of information changes the inferences that can be made about the unidenti…ed parameters. Even small changes in information can have large impacts potential conclusions. Adding information which does not change the identi…ed set has important impacts on the conclusions one might draw using posteriors. Priors that concentrate probability provide stronger conclusions. We suggest that the use of a prior and a Bayesian approach is a reasonable way to include information about the measurement error process, in cases where the information is not strong enough for identi…cation yet cannot easily be incorporated into the frequentist bounds.

Addressing Potential Endogeneity
In many evaluation settings, such as the drug usage application below, concern arises that in addition to measurement error, the true status Z i may be endogenous. Several papers have addressed both endogeneity and measurement error (DiTraglia and Garcia-Jimeno 2015b, 2015a; Hu et al. 2015;Shiu 2016;Kreider et al. 2012;Frazis and Loewenstein 2003). Instrumental variables and non-linearity can be used to obtain identi…cation (Hu et al. 2015;Shiu 2016;Frazis and Loewenstein 2003). In contrast, Kreider et al. (2012) derive partial identi…cation results when the dependent variable is binary. In this case, the slope coe¢ cient ( ) in the model is not identi…ed without further assumptions. In most applications, additional assumptions are brought to bear to obtain identi…cation or at least obtain set identi…cation. In this section we use additional assumptions in the form of a Bayesian prior to allow for inference. In contrast to results in our previous sections, the posterior of covers the real line. However, stronger priors lead to more concentrated posteriors and the HPD interval shrinks as additional information is incorporated While this certainly does not "solve" the identi…cation problem, it formalizes informational content from two issues, which provides an approach useful to researchers.
As a departing point, we relax the assumption that E(U i jZ i ) = 0 and replace it with This adds a single new parameter ( ) and maintains the assumption that E(U i ) = 0. The parameterization of E(U i jZ i ) is general, given the binary nature of Z i . This approach di¤ers from Erickson (1989) who derives posteriors when there is correlation between the measurement error of a continuous regressor and the residual error U i . We can rewrite the model where now E(U i jZ i ) = 0 and E(U 2 i ) < 1. This returns us to the original model from section 2, and all previous bounding results now apply to = + . However, without further information about ; bounds for cannot be obtained. DiTraglia and García-Jimeno (2015a) discuss this in more detail, and show that assumptions about parameters and instruments can sharpen these bounds. In many applications, prior bounds on would still be wide or would be controversial.
The posteriors in section 3 and the simulations in section 4 provide a posterior for . Thus, for any given value of , a distribution for = can be obtained. By adding a prior for , the posterior of follows directly from the joint posterior of ( ; ). As before, let be the vector of identi…ed, reduced form parameters. An argument similar to that of section 3.1 can be used to show that A draw from the posterior of can therefore be obtained as follows. First, generate a draw from the Bayesian bootstrap of BETEL posterior of ( ; ), as in section 3.2 and 3.3. Second, draw a value from its conditional prior f ( j ; ) and calculate = . Because is completely unidenti…ed, it may be reasonable to assume that is independent of and in the prior. We use this in the examples below and draw directly from its marginal prior. As such, the posterior of will be highly sensitive to the choice of prior for .
We extend the simulations in section 4 to explore a variety of priors on , using a simulated data set with = 1. The remaining aspects of the data generating process are the same as those in section 4. We focus on the uniform prior for the measurement error portion, and highlight the implications of four di¤erent priors on : (i) a point mass at = 0, which assumes that Z i is exogenous; (ii) N (0; 1); (iii) 2 (1), a chi-square distribution with 1 degree of freedom; and (iv) a 50-50 mixture of a point mass at zero and a 2 (1) distribution. Each of these represents di¤erent assumptions about the potential for endogeneity. In the case of the normal, it allows for both positive and negative endogeneity, but concentrated on low values of . The chi-square prior assumes that there is positive endogeneity, while the mixture allows for a 50% probability that there is no endogeneity and a 50% probability that iendogeneity is positive.  The …rst case establishes an HPD interval similar to those in the …rst row of table 2. The main di¤erence is that the HPD region is shifted higher, due to the positive endogeneity present in the data generating process (the interval bounds + ). In the second row, the prior changes from a simple "no endogeneity" hypothesis to a standard normal distribution on . This distribution assumes there is endogeneity, but is symmetric in allowing for both positive an negative values for . While it allows for some probability throughout the real line, the prior imposes higher probability on low values for . The 95% HPD intervals for are now much wider, ranging from 0:148 to 6:033 for the Bayesian bootstrap and from 0:087 to 5:823 for BETEL. We can now no longer conclude that is positive. Furthermore, the upper bound has increased, which represents the possibility that indeed the endogeneity could work in either direction. When has a 2 (1) prior, expressing the belief that there is a positive covariance between the regressor and error, the posterior puts more mass on negative values of . The lower bound of the HPD interval is now 2:291 for the Bayesian bootstrap and 2:048 for BETEL. Finally, when has a mixture prior that assumes there is no endogeneity with probability 1 2 , the HPD intervals are narrower than for the 2 (1) prior but still include the origin. The upper bounds of both HPD intervals are more than six times the true parameter value, re ‡ecting the possibility of high misclassi…cation rates and a "small amount" of endogeneity ( ).  for . Comparing the posterior from …gure 1 for the uniform prior on (p; q) without endogeneity to the posterior in …gure 3 with the prior = 0, we …nd a nearly identical …gure, but the entire posterior is shifted to the right because it is actually a posterior for ( + ) = 2. The second case, where the prior on is standard normal, results in a more dispersed posterior. When has a 2 (1) prior, we …nd the posterior shifted to the left. This represents the fact that the endogeneity was positive, thus reducing any estimate of . However, the measurement error still plays a crucial role and ensures that large values of are also still likely. Finally, adding mass to the point = 0 results in a posterior that is a mixture of the …rst posterior (where = 0 was assumed) and the third posterior (based on 2 (1)). The …nite 95% HPD intervals highlight that in these cases the imposition of a prior on the endogeneity parameter provides identifying information. As such, it is critical that in this context researchers think carefully about specifying reasonable priors.

Abstinence and Earnings
In this section we use the 2009 National Survey on Drug Use and Health to examine the relationship between drug use and earnings. In particular we focus upon past-year use of narcotic pain relievers, an increasing problem in the U.S. The application uses a simple model where annual family earnings are regressed upon a self-reported measure of abstinence: X = 1 if the individual reports no use of narcotic pain relievers in the past year, and X = 0 otherwise. Drug abuse is well known to be measured with error (Pepper 2001) and many speculate that the issue is quite serious. Unlike measures such as union status or food stamp program participation (Bollinger 1996;, it is di¢ cult to obtain objective measures of the probabilities of misreporting. However, priors may be formed based upon casual observation and experience with patient populations, survey administration, or other data. We begin by focusing only on measurement error issues (assuming zero endogeneity). Table 3 presents two sets of bounds based upon the approach of Bollinger (1996). In the left panel, labeled "no restriction," the simple bounds results from section 2 are applied with no additional restrictions imposed. As is typically the case, the bounds are wide to the point of incredulity, with the upper bound implying that abstinence is associated with an income gain of over $3 million. In the righthand panel, labeled "no under-reporting", the bounds and con…dence intervals are calculated under the restriction q = 0: there are no individuals who abstained from narcotic pain relievers during the past year but reported use nonetheless. In the current context, this restriction seems reasonable.
While the upper bound on tightens substantially (the restriction has no implication on the lower bound), it still implies that, on average, abstinence results in earnings as much as $243; 900 higher compared to drug users.
We consider 3 priors that represent varying beliefs (conditional on ) about the misclassi…cation rates. The …rst prior is the uniform prior in (11). The second prior is the power prior in (12) which still allows for substantial misclassi…cation, in particular a large degree of over-reporting of  abstinence, but high misclassi…cation rates are relatively unlikely compared to the uniform prior.
The third prior combines a power prior for p (given ) with a point mass at q = 0, representing the belief that there is no under-reporting of abstinence.
The 95% HPD intervals for (p; q; ; ) calculated from the Bayesian bootstrap and BETEL under each of the three priors are given in table 5. The Bayesian bootstrap posterior distributions corresponding to the three priors are shown in Figure 4. Use of the uniform prior results in an HPD interval that is markedly narrower than the frequentist con…dence interval. For example, from the Bayesian bootstrap we infer that the wage gap is likely between $5; 618 and $124; 600, while the error rates p and q are still allowed to take on their full range of potential values.
Moving to the power prior in (12), we see that the HPD intervals for p and become marginally narrower, whereas the HPD interval of q remains virtually unchanged. There is still substantial uncertainty about the rate of false positives and the true abstinence rates. Under the Bayesian bootstrap, there is a 95% probability that p is between 0 and 0:835, whereas is likely between 0:655 and 1:000. However, there is a strong e¤ect on inference about . The posterior of becomes more concentrated around the lower bound of the identi…ed set, and the 95% HPD intervals are much narrower. For example, using the Bayesian bootstrap, the upper limit of the 95% HPD interval for drops from $124; 600 under the uniform prior to $70; 500 under the power prior.
The additional assumptions embodied by the power prior relative to the uniform prior, namely that lower misclassi…cation rates are more likely than higher ones, are in many ways much weaker   Figure 4: Bayesian bootstrap posteriors of . The vertical line labeled 95% LCL represents the lower limit of the 95% con…dence interval (upper limit not shown).
When moving from prior 2 to prior 3 by imposing the restriction q = 0, similar conclusions can be drawn. The 95% HPD intervals of p and are barely a¤ected. However, the assumption of no under-reporting is very helpful for Bayesian inference about . The posterior distribution becomes even more concentrated around the lower bound of the identi…ed set. With the Bayesian bootstrap, the upper limit of the 95% HPD interval for drops from $70; 500 to $13; 100 when it is assumed that q = 0. We conclude that with 95% probability, the wage gap is between $5; 323 and $13; 100. This stands in sharp contrast with the bounds of $6; 153 and $288; 100 of the frequentist 95% con…dence interval (see table 4).
Next we extend the analysis to allow for endogeneity. It is quite possible that the decision to abstain from drug use is correlated with other unobserved factors which would positively impact earnings. This is an interesting case as the endogeneity would tend to bias the coe¢ cient estimate upward, while the measurement error tends to bias it downward. As noted in section 5 above, adding potential endogeneity results in a complete loss of identi…cation for . However, the imposition of a prior on the amount of endogeneity will result in an informative 95% HPD region.   In tables 6 and 7 we allow for endogeneity and measurement error. In table 6, we use a normal prior with mean 0 and variance 25 for the endogeneity parameter . This implies that the part of the wage gap that can be attributed to unobserved factors lies between $10; 000 and +$10; 000 with 95% probability. It should be noted that the support of the prior is over the real line, hence higher and lower amounts are possible, but simply deemed not probable. The normal prior on allows the econometrician to assume endogeneity is likely, but allows for two possible cases. In the …rst case, one could argue that positive endogeneity may occur if "high quality" individuals both earn more and are less likely to indulge in pain medication abuse. In the second case, one could argue that individuals who know they have high earnings (conditional on X's) may "buy more" substance abuse if substance abuse is a normal good. An alternative prior for is presented in table 7 and is based on a 2 (12). This prior has approximately the same variance as before but now assumes the endogeneity parameter is strictly positive. This implies that the researcher assumes the …rst case above: that high quality individuals are both high earners and not likely to abuse pain medication. This prior rules out case two where high earners also consider pain medication abuse to be a normal good.
As can be seen from table 6, allowing for endogeneity has a large impact on the bounds of the HPD intervals. Comparing, for example, the Bayesian bootstrap intervals from tables 5 and 6 under a prior power distribution with q = 0 (prior 3), the 95% HPD interval changes from [$5; 323; $13; 100] to [ $3; 230; $19; 590]. This change can be seen in …gure 5 and re ‡ects the possibility that actually drug use increases earnings but that high earning individuals are signi…cantly more likely to abstain from drugs. In table 7, the chi-square prior re ‡ects an assumption that there is positive endogeneity but allows for a variety of strengths. Again, posterior probability mass is shifted to the left, and more dramatically so than in the case of the normal prior. For the power prior with q = 0, the new 95% Bayesian bootstrap HPD interval ranges from $15; 710 to $6; 824 (see …gure 5). The addition of endogeneity into the model signi…cantly alters the posterior distributions, as one would expect. Researchers should understand that the use of these priors is not "agnostic" but rather conveys information that has, essentially, identifying power. This is highlighted in this case, where inference can be drawn about a parameter that is completely unidenti…ed in a classical sense.

Conclusion
In this paper we have analyzed a simple regression model with a potentially misclassi…ed binary regressor, which may or may not be endogenous, from a Bayesian perspective. In the absence of instruments, parametric assumptions, or restrictions on third and higher-order moments, the regression model parameters are only partially identi…ed when there is no endogeneity. With endogeneity, the parameter is not even set identi…ed. This paper proposes a Bayesian approach for semi-parametric inference about the regression model parameters. Speci…cally, we use a likelihood function that is de…ned only by a set of moment restrictions, and we formulate posteriors based on the Bayesian bootstrap of Rubin (1981) and the Bayesian Exponentially Tilted Empirical Likelihood of Schennach (2005). The advantage of this approach is that is does not rely on parametric assumptions about the distribution of the regression error.
We …rst consider the partially identi…ed case when the binary regressor is exogenous. The Bayesian approach in this paper is intermediate between the bounds of Bollinger (1996), an approach designed to be as agnostic about measurement error processes as possible, and the tighter bounds achieved by assumptions on the underlying model in the form of either bounds on mis-classi…cation probabilities or distributional assumptions. The prior is used to incorporate various amounts of information. We show that while in many cases the priors do not change the identi…ed set, they do result in changes in inference by concentrating posterior probability and narrowing HPD intervals. This allows researchers a broader array of assumptions while still preserving the more agnostic approach of frequentist bounds. In particular, it allows researchers to provide prior information about the misclassi…cation rates. These rates are generally accepted to be lower than the frequentist bounds would allow. However, researchers may be uncomfortable making the strong assumption of a sharp upper (or lower) bound on the misclassi…cation rates. This paper provides an intermediate approach. The sensitivity of the upper bound to even small amounts of prior information is highlighted here, in that many priors result in 95% HPD intervals that are much narrower than the frequentist 95% con…dence intervals.
We then consider the case where the binary regressor is assumed to be misclassi…ed and endogenous. In this case, no bounds on the slope coe¢ cient exist and the prior brings strong information which produces a bounded 95% HPD interval. This highlights the fact that the prior adds information to the model, which creates an intermediate case between complete identi…cation failure and the type of assumptions typically invoked to achieve point identi…cation.
The results in the paper are illustrated through a simulation which compares inference between the frequentist approach and the Bayesian approach proposed here. In the …rst case, with no endogeneity, the simulation shows that a uniform prior on the misclassi…cation rates results in tighter inference through a posterior which concentrates probability on lower values of the slope coe¢ cient. This emphasizes the known fact that the upper bound for the slope coe¢ cient is highly sensitive to additional information and demonstrates that the set of misclassi…cation rates consistent with the highest values in the identi…ed set have a small measure. The simulation also demonstrates how both sharp bounds on the misclassi…cation rates and the Bayesian prior can be incorporated to produce a tighter identi…ed set as well as tighter inference on the parameter of interest. When endogeneity is allowed for, the regions become wider. Indeed, the frequentist approach has no bounds for the parameter of interest; the imposition of the Bayesian prior brings information which allows for inference. This highlights the informational content of the prior.
The empirical example illustrates the use of prior information in an important application. The association between drug use and earnings is well-known, but questions arise about the robustness of these results, due to obvious concerns about the accuracy of self-reported drug use and endogeneity of drug use. In our example, the frequentist bounds on the coe¢ cient measuring the association between drug use and earnings are extremely wide even when drug use is assumed to be exogenous, ranging from $6,153 to over $3 million. Most research would contend that both the upper bound on the slope coe¢ cient and the associated bounds on the misclassi…cation rates are di¢ cult to accept.
In providing estimates, researchers would like to impose some restrictions which re ‡ect knowledge of likely misreporting scenarios. However, choosing a strict upper bound on misclassi…cation rates is controversial. Our approach allows the researcher to incorporate additional information using a Bayesian prior. We show that reasonable assumptions about misclassi…cation rates lead to much tighter bounds. For example, placing a power distribution on the rate of over-reporting abstinence and restricting the rate of under-reporting to zero yields a 95% HPD interval for the slope coe¢ cient ranging from $5,323 to $13,100. However, these bounds widen again, and shift, as we allow for endogeneity.
Our work here provides insight into how Bayesian priors can be used as to sharpen inference in set identi…ed models and allows for inference in non-identi…ed cases. Importantly, the prior embodies information or assumptions beyond what is necessary to arrive at the frequentist bounds.
Reasonable information in the form of a carefully constructed prior can thus provide a degree of identi…cation.