



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An introduction to the concentration index, a statistical tool used to quantify income-related inequality in health variables. how to compute the concentration index and obtain a standard error for it using grouped data. It also discusses the interpretation of the index and its applications in health equity analysis.
What you will learn
Typology: Summaries
1 / 6
This page cannot be seen from the preview
Don't miss anything!
The concentration index [1-3] and related concentration curve (see Technical Note #6) provide a means of quantifying the degree of income-related inequality in a specific health variable. For example, it could be used to quantify the degree to which health subsidies are better targeted towards the poor in some countries than others [4], or the degree to which child mortality is more unequally distributed to the disadvantage of poor children in one country than another [5], or the extent to which inequalities in adult health are more pronounced in some countries than in others [6]. Many other applications are possible. This Note describes how to compute the concentration index, and how to obtain a standard error for it. Both the grouped-data and micro-data cases are considered.
The concentration index is defined with reference to the concentration curve (q.v.), which graphs on the x - axis the cumulative percentage of the sample, ranked by living standards, beginning with the poorest, and on the y -axis the cumulative percentage of the health variable corresponding to each cumulative percentage of the distribution of the living standard variable. Figures 1 provides an example of a concentration curve, where the health variable is ill-health, which in this example is higher amongst the poor than amongst the better-off. The concentration index is defined as twice the area between the concentration curve, L ( p ), and the line of equality (the 45 0 line running from the bottom-left corner to the top-right). So, in the case where there is no income-related inequality, the concentration index is zero. The convention is that the index takes a negative value when the curve lies above the line of equality, indicating disproportionate concentration of the health variable among the poor, and a positive value when it lies below the line of equality. If the health variable, is a ‘bad’ such as ill health, a negative value of the concentration index means ill health is higher among the poor.
Figure 1: Ill-health concentration curve
0.00 0.20ranked by economic status cumulative % of persons,0.40 0.60 0.80 1.
cumulative % of ill health
L ( p )
0% 0% 100%
100%
The concentration, C , index is easily computed in a spreadsheet program using the following formula [7]:
C = ( p 1 L 2 - p 2 L 1 ) + ( p 2 L 3 - p 3 L 2 ) + … + ( p T-1 L T - p T L T-1),
where p is the cumulative percent of the sample ranked by economic status, L ( p ) is the corresponding concentration curve ordinate, and T is the number of socioeconomic groups.
Table 1 provides a worked example. It shows the number of births in each wealth group over the period 1982-92 in India. Expressing these as percentages of the total number of births, and cumulating them gives the cumulative percentage of births, ordered by wealth. This is what is plotted on the x -axis in the concentration curve diagram and gives us p. (See the Technical Note on the concentration curve for the concentration curve graph for these data.) Also shown are the under-five mortality rates (U5MR) for each of five wealth groups. Multiplying the U5MR by the number of births gives the number of deaths in each wealth group. Expressing these as a percentage of the total number of deaths, and cumulating them, gives the cumulative percentage of deaths for the corresponding percentage of births. This is what is plotted on the y -axis in Figure 1, and gives us L ( p ). The final column shows the terms in brackets in the formula above, there being T -1 terms in total. The sum of these is –0.1694, which is the concentration index. The negative concentration index reflects the higher mortality rates amongst poorer children.
Table 1: Under-five deaths in India, 1982-
Wealth No. of rel % cumul % U5MR No. of rel % cumul % Conc. group births births births per 1000 deaths deaths deaths index
Poorest 29939 23% 23% 154.7 4632 30% 30% -0. 2nd 28776 22% 45% 152.9 4400 29% 59% -0. Middle 26528 20% 66% 119.5 3170 21% 79% -0. 4th 24689 19% 85% 86.9 2145 14% 93% -0. Richest 19739 15% 100% 54.3 1072 7% 100% 0. Total/average 129671 118.8 15419 -0.
A standard error can be computed for C in the grouped data case using a formula given in Kakwani et al. [2]. Let n denote the sample size, T the number of groups, ft the proportion of the sample in the t th group, μ t the mean value of health variable amongst the t th group, and C the concentration index. Let R (^) t be the fractional of the t th group, defined as
R (^) t f ft
t = (^) = +
−
1 1 2
1
and hence indicating the cumulative proportion of the population up to the midpoint of each group interval. The variance of C is given by eqn (14) in Kakwani et al.:
n
f a C n t t^ t f^ R^ C
T t t^ t^ t
T
1 1
1 1 2 2 2 2 2 1
2 μ^1 σ
where σ t^2 is the variance of μ t ,
μ μ
(^2 1 2 )
q (^) t f
t
1 μ γ^1 μ^ γ^ γ
Case where variances of the group means are unknown
In many applications, the standard errors of the group means will be unknown. For example, the data might have been obtained from published tabulations by income quintile. In such a case, the second term in the expression for the variance of C will necessarily be assumed to be equal to zero. However, in addition, one needs to replace n by T in the denominator of the first term, since there are in effect only T observations, not n.
are used to measure inequality in malnutrition between poor and better-off children. Malnutrition is measured by the child’s height-for-age percentile score (HAP) in a hypothetical population of well- nourished children assembled by the US National Center for Health Statistics (NCHS). Thus a score of 50 means the child in question is at the median height-for-age in the well-nourished reference population. We rank children by per capita household consumption (PCCONS). Initially, the commands below use sample weights (WT), as the 1998 VLSS is not nationally representative without them. These weights, or expansion factors, indicate the number of people in Vietnam which each represents.
The concentration index ( C ) can be computed very simply by making use of the “convenient covariance” result [8-10]:
C = 2 cov( y (^) i , R (^) i ) / μ,
where y is the health variable whose inequality is being measured, μ is its mean, R (^) i is the i th individual’s fractional rank in the socioeconomic distribution (e.g. the person’s rank in the income distribution), and cov(.,.) is the covariance. Where the data are weighted, a weighted covariance needs to be computed, and a weighted fractional rank needs to be generated [10].
Stata commands for computing the concentration index
The command GLCURVE (a program downloadable from the Stata website) can be used to generate the fractional rank in the distribution of income or whatever measure of living standards is being used. This can be used for weighted data. The COR command (weighted if necessary), along with the means and covariance options, can then be used to obtain the mean of the health variable and the covariance between it and the fractional rank variable. In the malnutrition example, the GLCURVE command generates the fractional rank variable CONRNK from the PCCONS variable. The COR command then calculates the mean of the HAP variable and the covariance between the fractional rank variable CONRNK and HAP.
glcurve pccons [fw=wt] , pvar(conrnk) cor conrnk hap [fw=wt] , c m
The covariance between HAP and CONRNK is 1.1505 and the mean of the HAP is 14.024 (meaning the average Vietnamese child is only at the fourteenth percentile in the reference population). This gives a concentration index of 0.1641—i.e. a tendency for better-off children in Vietnam to be taller (and better nourished) than poor children.
SPSS commands for computing the concentration index
The fractional rank variable can be computed by the RANK command. The CORRELATION command with the covariance option can be used to obtain the covariance between the health variable and the fractional rank variable. The DESCRIPTIVES command can then be used to calculate the mean of the health variable. All these commands need to be preceded by the WEIGHT option if the sample is weighted. The SPSS syntax below is for the malnutrition example.
WEIGHT BY wt. RANK VARIABLES=pccons (A) /RFRACTION into RNKCON /PRINT=YES /TIES=MEAN. CORRELATIONS /VARIABLES=rnkcon hap /STATISTICS XPROD /MISSING=PAIRWISE. DESCRIPTIVES VARIABLES=hap rnkcon /STATISTICS=MEAN.
There are two ways to compute the standard error of C with micro-data. The second “convenient regression” method is easier to implement, and seems likely to be at least as precise. It also has the
advantage of yielding an estimate of the concentration index itself. Neither, however, is appropriate with weighted data. In the example, we have assumed for illustrative purposes that the VLSS data are self- weighting. The value of C obtained ignoring the weighted character of the data is 0.1731.
The formula method
The first is to use the formula given in eqn (22) in Kakwani et al. [2]:
¼
º «¬
ª
n C (^) n n i 1 ai C var( )^112
where
y a = 2 − 1 − + 2 − − 1 − μ
and
i qi (^) n 1 yi
1 μ^ γ
is the ordinate of the concentration curve L ( p ), and q 0 =0.
This is easily computed in Stata with the following commands, which are for the malnutrition example.
glcurve hap , glvar(glhap) sortvar(lnpcexp) pvar(incrnk) egen meany = mean(hap) gen ccurve = glhap / meany sort ccurve gen cclag = ccurve[_n - 1] gen a = (hap/meany) * (2*incrnk-1-.173122) + 2 - cclag - ccurve gen asq = a^ sum asq
The GLCURVE command generates GLHAP, which, divided through by the mean of the health variable HAP, gives the concentration curve ordinate CCURVE (the analogue of q or L ( p )). The next two commands generate the lagged value of L ( p ), or q (^) i -1. Inserting the estimated value of C in the next command generates the variable a. The mean of a^2 is then obtained, which can then be used to compute var( C ) manually using the formula above. In the VLSS example, the mean of a^2 is equal to 2.1741, which gives a value of se( C ) equal to 0.0124.
The convenient regression method
The “convenient covariance” result above can be used to define a convenient regression for the concentration index [2], equal to
i i
i R R u
y »= + + ¼
º « ¬
ª α β μ
2 σ 2
where (^) σ (^) R^2 is the variance of the fractional rank variable. The estimate of β is equal to the concentration
index, C. Estimating this equation is an alternative to (but equivalent to) the convenient covariance method. It also gives rise to an alternative interpretation of the concentration index as the slope of a line passing through the heads of a parade of people, ranked by their consumption or SES, and their height proportional to the value of their health variable, expressed as a fraction of the mean. The standard error of β provides an estimate of the standard error of C , but is inaccurate since the nature of the fractional rank variable induces a particular pattern of autocorrelation in the data. The formula above gets round this, but an alternative is to use the Newey-West [11] regression estimator, which corrects for autocorrelation, as well as any heteroscedasticty. The commands below implement this for the malnutrition example.