







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A novel nonparametric identification result for a class of index models, where the arguments represent values of regressors and control variables. The authors show how the presence of control variables can help achieve identification in a nontrivial way. They also compare their results with Heckman and Honoré (1989) and discuss the conditions for identification.
What you will learn
Typology: Lecture notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!
Manuscript submitted to The Econometrics Journal , pp. 1–13.
Mogens Fosgerau†^ and Dennis Kristensen‡ †Dept. of Economics, Univ. of Copenhagen, ster Farimagsgade 5, 1353 Kbenhavn K, Denmark. E-mail: mogens.fosgerau@econ.ku.dk ‡Department of Economics, University College London, Gower Street, London, WC1E 6BT, UK. E-mail: d.kristensen@ucl.ac.uk
Summary We establish nonparametric identification in a class of so-called index models using a novel approach that relies on general topological results. Our proof strategy requires substantially weaker conditions on the functions and distributions characterizing the model compared to existing strategies; in particular, it does not require any large support conditions on the regressors of our model. We apply the general identification result to additive random utility and competing risk models.
We develop a novel nonparametric identification result for the following class of models, Π (w, x, z) = Λ (a(w, x), z) , (1.1)
where a (w, x) = g (w) + h (x). (1.2) is a vector of additively separable index functions while Λ : RJ^ × RdZ^7 → RJ^ , g : RdW^7 → RJ^ and h : RdX^7 → RJ^ are all vector-valued functions of dimension J ≥ 1. The arguments w ∈ RdJ^ and x ∈ RdX^ represent the values of two sets of regressors, W and X, while z ∈ RdZ^ corresponds to values of a set of control variables, Z. We take as high-level assumption that we know (have observed from data) the function Π (w, x, z), for (w, x, z) in the support of (W, X, Z), from which we then wish to identify the unknown functions Λ (a, z) and h (x), while we treat the function g (w) as being known. We refer to this class of models as index models since W and X are restricted to enter the model through g(W ) and h(X), respectively. We make three major contributions relative to the existing literature: First, we do not impose any large support conditions on any of the regressors in our model. Most existing results on identification within this class of models require availability of a set of ”special” continuously distributed regressors; identification is then achieved by sending each of these special regressors off to the boundary of their support. Estimators based on such ”thin set identification” argument were analyzed by Khan and Tamer (2010) who showed that they tend to be irregularly behaved with slow convergence
(^1) This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 740369).
2 Mogens Fosgerau†^ and Dennis Kristensen‡
rates. In contrast, we achieve identification as long as the random index a (W, X) exhibits sufficient, but potentially bounded, variation. We expect this to translate into better behaved estimators. Second, we impose weak conditions on the functions of interest and distributions of the random variables (W, X, Z). We do not require continuity or differentiability of the functions entering the model in order to show identification while most existing results as a minimum require these to be differentiable. Similarly, we only require g(W ) to have continuous support while (X, Z) can both be discrete, continuous or a mix of the two as long as their supports satisfy certain conditions. Thus, our results cover models with thresholds and kinks in Λ, g and h, which existing results cannot handle. In the case of discrete choice models such features may occur if the decision maker optimizes subject to constraints; see, e.g., Cantillo and de Dios Ort´uzar (2006). These models have traditionally been formulated in a parametric fashion; our theory demonstrates how these can be identified without parametric constraints. There is a growing literature on nonparametric estimation with unknown thresholds and kinks which we conjecture can be employed in our setting in order to translate our identification result into actual estimators; see, e.g., Chiou et al. (2018). Third, we show how the presence of the controls Z can help to achieve identification in a nontrivial way: We first show local identification at each value of the control Z. Suitable variation in Z then allows us to piece the locally identified components together across different values of Z to achieve global identification. In comparison, most other papers that allow for control variables show identification at a fixed arbitrary value of Z in which case variation in Z is unnecessary for identification. Our proof strategy relies on arguments from general topology that, to our knowledge, are completely new to the literature on nonparametric identification. These should be of general interest since they can be used for identification in other settings. The two key elements of our approach is the notions of relative identification and connected sets. Below, we state our formal definition of the former:
Definition 1.1. A function h is said to be relatively identified on a given set X if identification of h (x∗) at some point x∗^ ∈ X implies that h (x) is also identified at all other x ∈ X.
Next, recall the topological notion of connnectedness: A connected set cannot be con- tained in the union of two non-empty disjoint open sets while having non-empty intersec- tion with both. In particular, it is not possible to split a connected open set into disjoint open subsets. Our identification strategy then proceeds in three steps where we here initially suppress the presence of Z for simplicity: First, we decompose the support of X into suitable subsets and achieve relative identification on each of these. This is done via two features of our model: For a given x, we are able to identify the relative variation in Λ (a), with a = h (x) + g (w), through the observed variation in Π (w, x) w.r.t. w through the known function g(w). By injectivity of Λ, we are then able to identify the relative value of a which in turn yields the relative value of h (x) = a − g (w) on suitably chosen subsets of the support of X. Second, we achieve global identification on the union of these subsets by using the second main ingredient of our proof strategy, connectedness: We will require the support of a (W, X) to be connected which is used to extend relative local identification to global identification. Finally, reintroducing Z, we again rely on the supports of X|Z = z
4 Mogens Fosgerau†^ and Dennis Kristensen‡
The model (1.1) comprises a range of models that are met in economics. We here present two classes of models that fall within our framework. We will return to these two classes of models in Section 5 where we apply our general identification result to each of them.
2.1. Discrete choice models
We here first demonstrate that the class of additive random utility models (ARUM) can be mapped into (1.1). Using existing results in the literature, this in turn implies that our results also apply to a broad class of rational inattention discrete choice models (Fosgerau et al., 2019) and an even wider class of perturbed utility models.
2.1.1. Additive random utility Consider an agent choosing between J + 1 alternatives, each carrying an associated indirect utility of the form
Uj = aj (W, X) + εj , j = 0, 1 , ..., J,
where (W, X) is a set of observed covariates while ε = (ε 0 , ε 1 , ..., εJ ) is unobserved. This model was initially proposed by McFadden (1973) and has since become one of the workhorses in applied microeconomics; see e.g. Ben-Akiva and Lerman (1985) and Maddala (1986). As is standard in the literature, we impose the following normalization on the ”outside”option j = 0: a 0 (w, x) = 0. Some of the regressors (W, X) may potentially be dependent on ε. To handle this situation, we assume the availability of a set of control variables Z so that (W, X) are independent of ε conditional on Z. In addition to (W, X, Z), the researcher also observes the utility maximizing choice, D = arg maxj∈{ 0 , 1 ,...,J} Uj. Thus, the conditional choice probabilities (CCP’s),
Πj (w, x, z) := P (D = j| (W, X, Z) = (w, x, z)) , j = 0, 1 , ..., J, (2.3)
are identified in the population. We collect these in the vector-valued function Π (w, x, z) = {Πj (w, x, z) : j = 1, ..., J} ∈ RJ^ where we leave out the CCP of the outside option. It now follows from standard results in the literature that Π (w, x, z) can be written on the form (1.1) with Λ being the gradient of the so-called surplus function; see Section 5 for further details. Our identification result requires the researcher to group the observed covariates into two sets: The first set, denoted W , contains the ”special” regressors that enter the index a through a known function g(W ) as specified by the researcher, c.f. eq. (1.2). The second set, denoted X, then enters a through h(X) which is left unspecified. The choices of W and g(W ) are application specific and should be guided by two considerations: First, g(W ) need to exhibit sufficient continuous variation on RJ^ since this is a key requirement for our identification result to go through. Second, since gj (W ) affects the utility of the jth alternative positively by definition, it should be specified accordingly. As an example of this joint modelling and identification strategy, let us consider the problem of estimating willingness-to-pay for different goods, a common problem in var- ious applied fields of economics (e.g., Fosgerau, 2006; Bontemps and Nauges, 2016). In this setting, choosing g to be gj (Wj ) = − ln Wj , where Wj is the price of alternative j, j = 1, ..., J, transforms a positive price vector into a vector that can in principle attain values in all of RJ^. With this choice, hj (X) + εj captures the log willingness to pay
Identification of a class of index models 5
for good j, where X contains characteristics of the agent and other characteristics of the different alternatives. Prices generally exhibit continuous variation and so satisfy the first of the two aforementioned requirements. This example assumes the availability of alternative specific regressors, W 1 , ...., WJ. However, our identification result may still be applied if this is not true. In this case, the researcher needs to construct alternative- specific regressors g 1 (W ), ...., gJ (W ) from a set of underlying covariates W. Our assumption of g(W ) being known has antecedents in the literature on identification in discrete choice models. For example, in the context of binary choice (J = 1), Lewbel et al. (2000) also assumes the presence of a ”special” regressor, in our notation W , that enters the utility of alternative 1 in a known fashion. But this paper furthermore restricts h(x) to be linear, h(x) = βx and, importantly, identification of β is achieved through variation of g(W ) on the boundary of its support. Our identification result does not rely on any such argument. Our framework also includes so-called rational inattention discrete choice model. Fos- gerau et al. (2019) show that any ARUM satisfying the conditions above is observa- tionally equivalent to a rational inattention discrete choice model in which the prior is held constant. This generalizes the finding of Matˇejka and McKay (2015) who show that the multinomial logit model has a foundation as a rational inattention model. Thus, our identification result extends without effort to a broad class of rational inattention models.
2.1.2. Perturbed utility The class of perturbed utility models (Fosgerau and McFadden, 2012; Fudenberg et al., 2015; Allen and Rehbeck, 2019) is another generalization of the class of ARUM. As shown by Hofbauer and Sandholm (2002), the CCP’s of an ARUM can be represented as the solution to a maximization problem where an agent chooses the vector of CCP’s to maximize a function that consists of a linear term and a concave term. Here we present an extended version that includes controls affecting the concave term, i.e.
Λ (a, z) = arg max q∈∆ {a⊺q + Ω (q|z)} , (2.4)
where a ∈ RJ+1^ is a vector of utility indices, ∆ = {q ∈ RJ++1 :
j=0 qj^ = 1}^ is the unit simplex and Ω (·|z) is a concave function for each z ∈ Z. The perturbed utility model includes ARUM as a special case, while allowing an individual to have strict preference for randomization rather than to choose a vertex of the probability simplex. As noted by Allen and Rehbeck (2019), observing only realizations of lotteries across choice options is sufficient for identification which requires only the vector of CCP’s, Π (w, x, z). We show in Section 5 that the implied CCP’s satisfy (1.1).
2.2. Accelerated failure time models for competing risks
Consider a competing risk model as in Heckman and Honor´e (1989) with J competing causes of failure. A latent failure time Tj > 0 is associated with each cause j ∈ { 1 , ..., J}. The econometrician observes the duration until the first failure, Y = minj∈{ 1 ,...,J} Tj , and the associated cause of failure, D = arg minj∈{ 1 ,...,J} Tj , together with a set of observed covariates (X, W, Z). Assume that the jth failure time satisfies
ln Tj = aj (W, X) − εj , (2.5)
for some function aj (w, x) , j = 1, ..., J. The model may then be termed a multivariate generalized accelerated failure time model (Kalbfleisch and Prentice, 1980; Fosgerau et al.,
Identification of a class of index models 7
on g(W ), which is in contrast to most existing results in the literature, as discussed in the Introduction. If, for example, G (x, z) = RJ^ , for all x, then our result demonstrates that h(x) is identified on all of supp (X) ; but it is not necessary, identification on all of supp (X) can be achieved without such full support condition. Next, let M (x, z) = G (x, z) × {x} , A (x, z) = a (M (x, z)) = G (x, z) + {h (x)} ,
denote the support of (W, X) | (X, Z) = (x, z) and a (W, X) | (X, Z) = (x, z), respectively, and
M (z) = ∪x∈X (z)M (x, z) , A (z) = ∪x∈X (z)A (x, z) = a (M (z)) , (3.8)
the supports of the same random variables but now only conditioning on Z = z. Finally, for some set Z 0 ⊆ supp (Z) chosen according to certain assumptions stated below, let
A 0 = ∪z∈Z 0 A (z) X 0 = ∪z∈Z 0 X (z). (3.9)
be the supports of a (W, X) and X conditional on Z ∈ Z 0 , respectively. We will then show identification of h (x) and Λ (a, z) for x ∈ X 0 , a ∈ A 0 and z ∈ Z 0. Specifically, Z 0 will be constructed according to certain properties of the underlying covariates and the functions of interest. Observe the dependence of M 0 and A 0 on the set Z 0. To achieve “maximal” identification, we would ideally like to choose Z 0 = supp (Z). However, we potentially have to restrict Z 0. First, we require a 7 → Λ (a, z) to satisfy the following condition for all z ∈ Z 0 :
Assumption 3.1. For any z ∈ Z 0 , a 7 → Λ (a, z) is injective on A (z) as defined in (3.8).
By asking for Λ(a, z) to be injective, we can identify the relative variation in a (w, x) through the observed variation in Π (w, x, z). In a given application, Assumption 3.1 may not hold for all z ∈ supp (Z) in which case we need to remove such values from Z 0. In the worst case scenario, this leaves us with Z 0 being empty and our identification result becomes void. At the other extreme, Z 0 = supp (Z) and we may achieve identification on the whole support. Due to the structure of a (w, x), it follows from the definition of G (x, z) that A (x, z) and thereby also A (z) and A 0 are open sets. We add to this by also requiring A (z) to be connected for all z ∈ Z 0. An open set A is connected if A = O 1 ∪ O 2 implies that O 1 ∩ O 2 6 = ∅ whenever O 1 and O 2 are nonempty open sets. Thus an open connected set cannot be separated into two non-empty disjoint open sets. We then impose:
Assumption 3.2. A(z) is connected for all z ∈ Z 0.
Assumption 3.2 allows us to go from local identification at a given point x ∈ X (z) to relative identification on all of X (z), z ∈ Z 0 via the image of a (x, w). The assumption imposes restrictions on the support of the random variable a (X, W ) instead of (X, W ) themselves. This is done in order to impose minimal restrictions on the distribution of X and the smoothness of h. Recall that W is assumed to contain a continuous component. Thus, Assumption 3.2 includes, for example, the case of X being unbounded and discrete, or X to be continuous while h (X) is discontinuous everywhere. Assumption 3.2 is not verifiable from data but the same holds for smoothness conditions that are regularly imposed in existing identification results. If we are willing to entertain certain smoothness
8 Mogens Fosgerau†^ and Dennis Kristensen‡
conditions, such as the inverse of Λ(a, z) being continuous with respect to a, then the assumption is implied by connectedness of Π(M(z)|z) = Λ(A(z), z), this latter property being verifiable. Similarly, if we restrict X and h to both be continuous, it will be implied by connectedness of M (z). Once we have achieved relative identification on each X (z), z ∈ Z 0 , global identification is then reached through the following assumption:
Assumption 3.3. If Z 1 ∪ Z 2 = Z 0 , Z 1 , Z 2 6 = ∅, then (∪z∈Z 1 M(z) ∩ (∪z∈Z 2 M(z) 6 = ∅.
This is used to paste together the relatively identified sets X (z) across z. Again, this assumption does not require X and/or h to be continuous, only that the sets supp (W, X|Z = z), z ∈ Z 0 overlap. Finally, the following normalization on the func- tion h gives us identification on X (z 0 ):
Assumption 3.4. There exists known z 0 ∈ Z 0 and (w 0 , x 0 ) ∈ M (z 0 ) so that h (x 0 ) = 0.
Such a normalization is needed to identify the level of h since, for any given pair of
(Λ, h), we have Λ (g (w) + h (x) , z) = Λ˜
g (w) + ˜h (x) , z
where Λ (˜ a, z) = Λ (a + c, z)
and ˜h (x) = h (x) − c for some given value of c ∈ RJ^.
As explained earlier, we shall make use of the notion of relative identification in our proof of identification. As a first step, we show relative identification on any two overlapping images of a; this is achieved through injectivity of Λ (a, z) which allows us to map the overlapping images into overlapping images of Π.
Lemma 4.1. Suppose that Assumption 3.1 holds, and that h (x∗) is identified at x∗^ ∈ X (z) for some z ∈ Z 0. Then the set X ∗^ (z) := {x ∈ X (z)|A (x∗, z) ∩ A (x, z) 6 = ∅} is identified and h (x) is identified on X ∗^ (z).
Proof. By definition, A (x∗, z)∩A (x, z) 6 = ∅ if and only if there exists w∗^ and w so that g (w∗) ∈ G (x∗, z), g (w) ∈ G (x, z) and a (w∗, x∗) = a (w, x). Using that Λ (a|z) is injective by Assumption 3.1, the last equality is equivalent to Λ (a (w∗, x∗) , z) = Λ (a (w, x) , z), which we recognize as
Π (w∗, x∗, z) = Π (w, x, z) , (4.10)
where Π is known to us. Thus, X ∗^ (x, z) is identified as the set of solutions x to (4.10) as we vary (w∗, w). Next, for any given x ∈ X ∗^ (x, z), let w∗^ and w be the corresponding values for which (4.10) holds. Since these are known, the value a (w∗, x∗) = g (w∗)+h (x∗) is also known to us. This in turn implies that h (x) = a (w, x) − g (w) = a (w∗, x∗) − g (w) is identified.
We then use this lemma in conjunction with the connectedness of A(z) to show relative identification on each of the sets X (z):
Lemma 4.2. Suppose that Assumptions 3.1-3.2 hold. Then, for all z ∈ Z 0 , h (x) is rela- tively identified on X (z) as defined in eq. (3.7).
10 Mogens Fosgerau†^ and Dennis Kristensen‡
5.1. ARUM
Define the surplus function
G (a 0 , ...aJ , z) := E
max j=0,...,J Uj |a (W, X) = a, Z = z
max j=0,...,J {εj + aj } |Z = z
for any given (a 0 , a 1 , ..., aJ ) ∈ RJ+1, where the second equality uses eq. (2.1.1) and Assumption 5.1(i). The Williams-Daly-Zacchary Theorem (McFadden, 1981) then implies that the CCP’s, as defined in (2.3), can be written on the form (1.1)-(1.2) with Λ defined as the gradient of the surplus function,
Λ (a, z) :=
∂G (a, z) ∂a
a 0 =
We conclude:
Corollary 5.1. Any ARUM on the form (2.1.1) that satisfies Assumptions 3.1-3.4 and 5.1(i) is identified.
Next, we discuss each of Assumptions 3.1-3.4 in the context of ARUM and how these compare with existing ones found in the literature on identification of ARUM. First, Assumption 3.1, injectivity of Λ (·, z) for each z, is implied by Assumption 5.1(ii), c.f. Hofbauer and Sandholm (2002, Thm 2.1). However, Assumption 5.1(ii) is not neces- sary for injectivity to hold. A simply example is the binomial model, where the probability for alternative 0 is the cumulative distribution of ε 1. If the distribution includes point masses, then ties can occur, but this does not destroy injectivity. This is true for any tie-breaking rule. More generally, if the subdifferential of the surplus function is strictly cyclically monotone (Rockafellar, 1970), which does not require the existence of a density, then the utility maximizing choice probabilities under any tie-breaking rule are injective (Sørensen and Fosgerau, 2020). Assumptions 3.2-3.3 impose restrictions on the joint variation of (g(W ), X). For As- sumption 3.2 to hold, we need to identify J regressors, g(W ), that exhibit enough joint continuous variation so their joint support, conditional on (X, Z) has non-empty interior on RJ^. One instance where this can be achieved is if we have observed alternative specific characteristics. In case of demand modellling, one such choice would be a (transforma- tion) of the (relative) prices of the different alternative while X contains all remaining regressors, possibly including other alternative specific covariates. In this case, to control for potential endogeneity of prices, we could then include cost shifters in Z. Prices tend to exhibit continuous variation and Assumptions 3.2 would be likely to hold. Assumption 3.3 requires other observed product characteristics and the agent’s observed character- istics to exhibit sufficient variation conditional on the controls in Z so that these have overlapping support across different values of Z. As already mentioned in the introduction, there are few fully nonparametric identi- fication results for ARUM. To our knowledge, the only results comparable to ours are found in Matzkin (1993). Her results also require the presence of alternative specific regressors but impose stronger conditions on these and other covariates. Moreover, her set-up does not include any control variables. On the other hand, she does not necessarily require that a(W, X) is additive, which we assume throughout. Theorem 1 of Matzkin (1993) does allow for dependence between (W, X) and ε but in this case, she requires
Identification of a class of index models 11
the observed component of the utilities to be identical across alternatives and strictly increasing in one of the arguments. In our notation, this requires aj (W, X), j = 1, ..., J to all be identical. We do not impose any such constraints. Her Theorem 2 requires full independence between (W, X) and ε but, on the other hand, impose fewer restrictions on a(W, X) compared to us. But in both cases, she identifies Λ by letting different com- ponents of W diverge to +∞, which is an example of ”thin set identification” discussed earlier.
5.2. Perturbed discrete choice
We here demonstrate that the CCP’s for the pertubed discrete choice model again can be expressed on the form (1.1)-(1.2) with Λ defined in (2.4) being injective. This is done under the following restrictions: First, in order to rule out zero demands, the norm of the gradient ∇qΩ (q|z) has to approach infinity as q approaches the boundary of the unit simplex. Second, Ω (q|z) is differentiable^1. Third, we normalize the outside option so that g 0 (w) = h 0 (x) = 0. Under these three restrictions, for each value of the control z, the demand solves the first-order condition for an interior solution,
a + ∇q Ω (Λ (a, z) |z) = λι,
where λ is a scalar constant and ι ∈ RJ^ is a vector consisting of ones. To show that Λ is injective, consider this equation at a 1 and a 2 and assume that Λ (a 1 , z) = Λ (a 2 , z). Define a matrix M such that M x = x − x 0 ι for all x = (x 0 , ..., xJ ) ∈ RJ+1. Pre-multiply this matrix onto the first-order condition to obtain that
a 1 + M ∇q Ω (Λ (a 1 , z) |z) = a 2 + M ∇qΩ (Λ (a 2 , z) |z) ,
which implies that a 1 = a 2 as required.
5.3. Competing Risk
Define
Λ (a, z) := G (a, z) · ∂G (a, z) ∂a
where as before a = (a 1 , ..., aJ ) while G (a, z) is now defined as the expected log failure time,
G (a, z) := E [ln Y |a (W, X) = a, Z = z] = −E
max j=1,...,J {−aj + εj } |Z = z
where the second equality uses eq. (2.5) and Assumption 5.1(i). Williams-Daly-Zacchary Theorem (McFadden, 1981) then implies that Π, now defined by (2.6), can be written on the form (1.1)-(1.2). Injectivity of Λ (a, z), as given in eq. (5.11), is obtained by recycling the arguments of the previous subsection except that no normalization of one of the causes of failure is required since the level G (a, z) is included.
Corollary 5.2. Any competing risk model on the form (2.5) that satisfies Assumptions 3.1-3.4 and 5.1(i) is identified.
(^1) Note we do not require a Hessian.
Identification of a class of index models 13
Fosgerau, M. and D. L. McFadden (2012, 3). A theory of the perturbed consumer with general budgets. NBER Working Paper , 1–27. Fosgerau, M., E. Melo, A. de Palma, and M. Shum (2019, 12). Discrete Choice and Rational Inattention: A General Equivalence Result. SSRN Electronic Journal. Fudenberg, D., R. Iijima, and T. Strzalecki (2015). Stochastic Choice and Revealed Perturbed Utility. Econometrica 83 (6), 2371–2409. Heckman, J. J. and B. E. Honor´e (1989, 6). The identifiability of the competing risks model. Biometrika 76 (2), 325–330. Hofbauer, J. and W. H. Sandholm (2002). On the global convergence of stochastic fictitious play. Econometrica 70 (6), 2265–2294. Honor´e, B. E. and A. Lleras-Muney (2006, 11). Bounds in Competing Risks Models and the War on Cancer. Econometrica 74 (6), 1675–1698. Kalbfleisch, J. D. and R. L. Prentice (1980). The statistical analysis of failure time data, Volume 2nd. of Wiley Series in probability and statistics. Hoboken, New Jersey: Wiley. Khan, S. and E. Tamer (2010, 11). Irregular Identification, Support Conditions, and Inverse Weight Estimation. Econometrica 78 (6), 2021–2042. Lee, S. and A. Lewbel (2013, 10). Nonparametric identification of accelerated failure time competing risks models. Econometric Theory 29 (05), 905–919. Lewbel, A., W. Shen, and H. M. Zhang (2000, 7). Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables. Journal of Econometrics 97 (530), 145–177. Maddala, G. S. (1986). Limited-dependent and qualitative variables in econometrics. Cambridge: Cambridge University Press. Manski, C. F. (1975, 8). Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics 3 (3), 205–228. Matˇejka, F. and A. McKay (2015, 1). Rational Inattention to Discrete Choices: A New Foundation for the Multinomial Logit Model. American Economic Review 105 (1), 272–298. Matzkin, R. L. (1993, 7). Nonparametric identification and estimation of polychotomous choice models. Journal of Econometrics 58 (1-2), 137–168. McFadden, D. (1981). Econometric Models of Probabilistic Choice. In C. Manski and D. McFadden (Eds.), Structural Analysis of Discrete Data with Econometric Applica- tions, pp. 198–272. Cambridge, MA, USA: MIT Press. McFadden, D. L. (1973). Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics, pp. 105142. New York: Academic Press. Rockafellar, R. T. (1970). Convex Analysis. Princeton, N.J.: Princeton University Press. Sørensen, J. R.-V. and M. Fosgerau (2020). How McFadden met Rockafellar and learnt to do more with less.