





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Paper; Class: APPLIED ECONOMETRICS; Subject: Economics; University: Clark University; Term: Fall 1999;
Typology: Papers
1 / 29
This page cannot be seen from the preview
Don't miss anything!
Abstract ‘Classical’ econometric theory assumes that observed data come from a stationary process, where means and variances are constant over time. Graphs of economic time series, and the his- torical record of economic forecasting, reveal the invalidity of such an assumption. Consequently, we discuss the importance of stationarity for empirical modeling and inference; describe the ef- fects of incorrectly assuming stationarity; explain the basic concepts of non-stationarity; note some sources of non-stationarity; formulate a class of non-stationary processes (autoregressions with unit roots) that seem empirically relevant for analyzing economic time series; and show when an analysis can be transformed by means of differencing and cointegrating combinations so stationarity becomes a reasonable assumption. We then describe how to test for unit roots and cointegration. Monte Carlo simulations and empirical examples illustrate the analysis.
Much of ‘classical’ econometric theory has been predicated on the assumption that the observed data come from a stationary process, meaning a process whose means and variances are constant over time. A glance at graphs of most economic time series, or at the historical track record of economic forecasting, suffices to reveal the invalidity of that assumption: economies evolve, grow, and change over time in both real and nominal terms, sometimes dramatically – and economic forecasts are often badly wrong, although that should occur relatively infrequently in a stationary process. Figure 1 shows some ‘representative’ time series to emphasize this point.^1 The first panel (denoted a) reports the time series of broad money in the UK over 1868–1993 on a log scale, together with the corresponding price series (the UK data for 1868–1975 are from Friedman and Schwartz, 1982, extended to 1993 by Attfield, Demery and Duck, 1995). From elementary calculus, since ∂ log y/∂y = 1 /y, the log scale shows proportional changes: hence, the apparently small movements between the minor tic marks actually represent approximately 50% changes. Panel b shows real (constant price) money on a log scale, together with real (constant price) output also on a log scale: the wider spacing of the tic marks reveals a much smaller range of variation, but again, the notion of a constant mean seems untenable. Panel c records long-run and short-run interest rates in natural scale, highlighting changes over time in the variability of economic series, as well as in their means, and indeed, of
∗Financial support from the U.K. Economic and Social Research Council under grant R000234954, and from the Danish Social Sciences Research Council is gratefully acknowledged. We are pleased to thank Campbell Watkins for helpful comments on, and discussion of, earlier drafts. (^1) Blocks of four graphs are lettered notionally as a, b; c, d in rows from the top left; six graphs are a, b, c; d, e, f; and so on.
1875 1900 1925 1950 1975 2000
5
10
UK nominal money UK price level
1875 1900 1925 1950 1975 2000
7
8
9
UK real money UK real output
1875 1900 1925 1950 1975 2000
.
.
.
UK short−term interest rate UK long−term interest rate
1950 1960 1970 1980 1990
0
.
.
UK annual inflation US annual inflation
1960 1970 1980 1990
1
US real output US real money
1950 1960 1970 1980 1990
.
.
US long−term interest rate
Figure 1: Some ‘representative’ time series
the relationships between them: all quiet till 1929, the two variables diverge markedly till the early 1950s, then rise together with considerable variation. Nor is the non-stationarity problem specific to the UK: panels d and e show the comparative graphs to b and c for the USA, using post-war quarterly data (from Baba, Hendry and Starr, 1992). Again, there is considerable evidence of change, although the last panel f comparing UK and US annual inflation rates suggests the UK may exhibit greater instability. It is hard to imagine any ‘revamping’ of the statistical assumptions such that these outcomes could be construed as drawings from stationary processes.^2 Intermittent episodes of forecast failure (a significant deterioration in forecast performance rel- ative to the anticipated outcome) confirm that economic data are not stationary: even poor models of stationary data would forecast on average as accurately as they fitted, yet that manifestly does not occur empirically. The practical problem facing econometricians is not a plethora of congruent models from which to choose, but to find any relationships that survive long enough to be useful. It seems clear that stationarity assumptions must be jettisoned for most observable economic time series. Four issues immediately arise:
1 2 3 4 5 6 7 8 9 10
2
4
6
8
10
Figure 2: An artificial variable with a stochastic trend
the level of any variable with a stochastic trend will ‘inherit’ that non-stationarity, and transmit it to other variables in turn: nominal wealth and exports spring to mind, and therefore income and expenditure, and so employment, wages etc. Similar consequences follow for every source of stochastic trends, so the linkages in economies suggest that the levels of many variables will be non-stationary, sharing a set of common stochastic trends. A non-stationary process is, by definition, one which violates the stationarity requirement, so its means and variances are non-constant over time. For example, a variable exhibiting a shift in its mean is a non-stationary process, as is a variable with a heteroscedastic variance over time. We will focus here on the non-stationarity caused by stochastic trends, and discuss its implications for empirical modeling. To introduce the basic econometric concepts, we consider a simple regression model for a variable yt containing a fixed (or deterministic) linear trend with slope β generated from an initial value y 0 by:
yt = y 0 + βt + ut for t = 1, ..., T. (1)
To make the example more realistic, the error term ut is allowed to be a first-order autoregressive process:
ut = ρut− 1 + εt. (2)
That is, the current value of the variable ut is affected by its value in the immediately preceding period (ut− 1 ) with a coefficient^ ρ, and by a stochastic ‘shock’^ εt.^ We discuss the meaning of the autoregressive parameter ρ below. The stochastic ‘shock’ εt is distributed as IN[0, σ^2 ε], denoting an
independent (I), normal (N) distribution with a mean of zero (E [εt] = 0) and a variance V [εt] = σ^2 ε: since these are constant parameters, an identical distribution holds at every point in time. A process such as {εt} is often called a normal ‘white-noise’ process. Of the three desirable properties of independence, identical distributions, and normality, the first two are clearly the most important. The process ut in (2) is not independent, so we first consider its properties. In the following, we will use the notation ut for an autocorrelated process, and εt for a white-noise process. The assumption of only first-order autocorrelation in ut, as shown in (2), is for notational simplicity, and all arguments generalize to higher-order autoregressions. Throughout the paper, we will use lower-case letters to indicate logarithmic scale, so xt = log(Xt). Dynamic processes are most easily studied using the lag operator L (see e.g., Hendry, 1995, chapter 4) such that Lxt = xt− 1. Then, (2) can be written as ut = ρLut + εt or:
ut =
εt 1 − ρL
When |ρ| < 1, the term 1/(1 − ρL) in (3) can be expanded as (1 + ρL + ρ^2 L^2 + · · · ). Hence:
ut = εt + ρεt− 1 + ρ^2 εt− 2 + · · ·. (4)
Expression (4) can also be derived after repeated substitution in (2). It appears that ut is the sum of all previous disturbances (shocks) εt−i, but that the effects of previous disturbances decline with time because |ρ| < 1. However, now think of (4) as a process directly determining ut – ignoring our derivation from (2) – and consider what happens when ρ = 1. In that case, ut = εt +εt− 1 +εt− 2 +· · · , so each disturbance persists indefinitely and has a permanent effect on ut. Consequently, we say that ut has the ‘stochastic trend’ Σti=1εi. The difference between a linear stochastic trend and a deterministic trend is that the increments of a stochastic trend are random, whereas those of a deterministic trend are constant over time as Figure 2 illustrated. From (4), we notice that ρ = 1 is equivalent to the summation of the errors. In continuous time, summation corresponds to integration, so such processes are also called integrated, here of first order: we use the shorthand notation ut ∼ I(1) when ρ = 1, and ut ∼ I(0) when |ρ| < 1. From (4), when |ρ| < 1, we can derive the properties of ut as:
E [ut] = 0 and V [ut] =
σ^2 ε 1 − ρ^2
Hence, the larger the value of ρ, the larger the variance of ut. When ρ = 1, the variance of ut becomes indeterminate and ut becomes a random walk. Interpreted as a polynomial in L, (3) has a factor of 1 − ρL, which has a root of 1/ρ: when ρ = 1, (2) is called a unit-root process. While there may appear to be many names for the same notion, extensions yield important distinctions: for example, longer lags in (2) preclude ut being a random walk, and processes can be integrated of order 2 (i.e., I(2)), so have several unit roots. Returning to the trend regression example, by substituting (2) into (1) we get:
yt = βt +
εt 1 − ρL
and by multiplying through the factor (1 − ρL):
(1 − ρL)yt = (1 − ρL)βt + (1 − ρL)y 0 + εt. (6)
From (6), it is easy to see why the non-stationary process which results when ρ = 1, is often called a unit-root process, and why an autoregressive error imposes a common-factor dynamics on a static
If changes to long-term interest rates (Rl) were predictable, and Rl > Rs (the short-term rate)
Rl,t = Rl,t− 1 + t (10)
where Et− 1 [t|It− 1 ] = 0 and t is an ID[0, σ^2 ε] process (where D denotes the relevant distribution, which need not be normal). The model in (10) has a unit coefficient on Rl,t− 1 , and as a dynamic relation, is a unit-root process. To discuss the implications for empirical modeling of having unit roots in the data, we first need to discuss the statistical properties of stationary and non-stationary processes.^5
Figure 3: Monthly long-term interest rates in the USA, in levels and differences, 1950–
Equation (10) shows that the whole of Rl,t− 1 and t influence Rl,t, and hence, in the next period, the whole of Rl,t influences Rl,t+1 and so on. Thus, the effect of t persists indefinitely, and past
(^5) Empirically, for the monthly data over 1950(1)–1993(12) on 20-year bond rates in the USA shown in Figure 3, the estimated coefficient of Rl,t− 1 in (10) is 0.994 with an estimated standard error of 0.004.
errors accumulate with no ‘depreciation’, so an equivalent formulation of (10) is:
Rl,t = t + t− 1 + · · · + 1 + 0 + − 1 · · · (11)
or alternatively:
Rl,t = t + t− 1 + · · · + 1 + Rl, 0 (12)
where the initial value Rl, 0 = 0 +− 1 ... contains all information of the past behavior of the long-term interest rate up to time 0. In practical applications, time 0 corresponds to the first observation in the sample. Equation (11) shows that theoretically, the unit-root assumption implies an ever-increasing variance to the time series (around a fixed mean), violating the constant-variance assumption of a stationary process. In empirical studies, the conditional model (12) is more relevant as a description of the sample variation, and shows that {Rl,t|Rl, 0 } has a finite variance, tσ^2 , but this variance is non-constant since it changes with t = 1,... , T. Cumulating random errors will make them smooth, and in fact, induces properties like those of economic variables, as first discussed by Working (1934) (so Rl,t should be smooth, at least in comparison to its first difference, and is, as illustrated in Figure 3, panels a and b). From (12), taking Rl, 0 as a fixed number, one can see that:
E [Rl,t] = Rl, 0 (13)
and that:
V [Rl,t] = σ^2 t. (14)
Further, perhaps not so easily seen, when t > s, the covariance between drawings t − s periods apart is:
C [Rl,t, Rl,t−s] = E [(Rl,t − Rl, 0 ) (Rl,s − Rl, 0 )] = σ^2 s (15)
and so:
corr^2 [Rl,t, Rl,t−s] = 1 −
s t
Consequently, when t is large, all the serial correlations for a random-walk process are close to unity, a feature of Rl,t as illustrated in Figure 3, panel e.^ Finally, even if^ {Rl,t}^ is the sum of a large number of errors it will not be approximately normally distributed. This is because each observation, Rl,t|Rl, 0 , t = 1, ..., T, has a different variance. Figure 3, panel c shows the histogram of approximately 80 quarterly observations for which the observed distribution is bimodal, and so does not look even approximately normal. We will comment on the properties of ∆Rl,t below. To summarize, the variance of a unit-root process increases over time, and successive observations are highly interdependent. The theoretical mean of the conditional process Rl,t|Rl, 0 is constant and equal to Rl, 0. However, the theoretical mean and the empirical sample mean Rl are not even approximately equal when data are non-stationary (surprisingly, the sample mean divided by
is distributed as N[0, 1] in large samples: see Hendry, 1995, chapter 3).
We now turn to the properties of a stationary process. As argued above, most economic time series are non-stationary, and at best become stationary only after differencing. Therefore, we will from the outset discuss stationarity either for a differenced variable {∆yt} or for the IID errors {εt}.
Yule found that rxy was almost normally distributed in case A, but became nearly uniformly distributed (except at the end points) in B. He was startled to discover that rxy had a U-shaped distribution in C, so the correct null hypothesis (of no relation between x and y) was virtually certain to be rejected in favor of a near-perfect positive or negative link. Consequently, it seemed as if inference could go badly wrong once the data were non-stationary. Today, his three types of series are called integrated of orders zero, one, and two respectively (I(0), I(1), and I(2) as above). Differencing an I(1) series delivers an I(0), and so on. Section 5 replicates a simulation experiment that Yule undertook. Yule’s message acted as a significant discouragement to time-series work in economics, but gradu- ally its impact faded. However, Granger and Newbold (1974) highlighted that a good fit with significant serial correlation in the residuals was a symptom associated with nonsense regressions. Hendry (1980) constructed a nonsense regression by using cumulative rainfall to provide a better explanation of price inflation than did the money stock in the UK. A technical analysis of the sources and symptoms of the nonsense-regressions problem was finally presented by Phillips (1986). As economic variables have trended over time since the Industrial Revolution, ensuring non- stationarity resulted in empirical economists usually making careful adjustments for factors such as population growth and changes in the price level. Moreover, they often worked with the logarithms of data (to ensure positive outcomes and models with constant elasticities), and thereby implicitly assumed constant proportional relations between non-stationary variables. For example, if β 6 = 0 and ut is stationary in the regression equation:
yt = β 0 + β 1 xt + ut (18)
then yt and xt must contain the same stochastic trend, since otherwise ut could not be stationary. Assume that yt is aggregate consumption, xt is aggregate income, and the latter is a random walk, i.e., xt = Σεi + x 0.^7 If aggregate income is linearly related to aggregate consumption in a causal way, then yt would ‘inherit’ the non-stationarity from xt, and ut would be stationary unless there were other non-stationary variables than income causing consumption. Assume now that, as before, yt is non-stationary, but it is not caused by xt and instead is determined by another non-stationary variable, say, zt = Σνi + z 0 , unrelated to xt. In this case, β 1 = 0 in (18) is the correct hypothesis, and hence what one would like to accept in statistical tests. Yule’s problem in case B can be seen clearly: if β 1 were zero in (18), then yt = β 0 + ut; i.e., ut contains Σνi so is non-stationary and, therefore, inconsistent with the stationarity assumption of the regression model. Thus, one cannot conduct standard tests of the hypothesis that β 1 = 0 in such a setting. Indeed, ut being autocorrelated in (18), with ut being non-stationary as the extreme case, is what induces the non-standard distributions of rxy. Nevertheless, Sargan (1964) linked static-equilibrium economic theory to dynamic empirical mod- els by embedding (18) in an autoregressive-distributed lag model:
yt = b 0 + b 1 yt− 1 + b 2 xt + b 3 xt− 1 + εt. (19)
The dynamic model (19) can also be formulated in the so-called equilibrium-correction form by subtracting yt− 1 from both sides and subtracting and adding b 2 xt− 1 to the right-hand side of (19):
∆yt = α 0 + α 1 ∆xt − α 2 (yt− 1 − β 1 xt− 1 − β 0 ) + εt (20)
where α 1 = b 2 , α 2 = (1−b 1 ), β 1 = (b 2 +b 3 )/(1−b 1 ), and α 0 +α 2 β 0 = b 0. Thus, all coefficients in (20) can be derived from (19). Models such as (20) explain growth rates in yt by the growth in xt and the
(^7) Recall lower case letters are in logaritmic form.
past disequilibrium between the levels. Think of a situation where consumption changes (∆yt) as a result of a change in income (∆xt), but also as a result of previous period’s consumption not being in equilibrium (i.e., yt− 1 6 = β 0 + β 1 xt− 1 ). For example, if previous consumption was too high, it has to be corrected downwards, or if it was too low, it has to be corrected upwards. The magnitude of the past disequilibrium is measured by (yt− 1 − β 1 xt− 1 − β 0 ) and the speed of adjustment towards this steady-state by α 2. Notice that when εt, ∆yt and ∆xt are I(0), there are two possibilities:
The latter case can be seen from (19). When α 2 = 0, then ∆yt = α 0 +α 1 ∆xt +εt, and by integrating we get yt = α 1 xt +
εt. Notice also that by subtracting β 1 ∆xt from both sides of (20) and collecting terms, we can derive the properties of the equilibrium error ut = (y − β 1 x − β 0 )t:
(y − β 1 x − β 0 )t = α 0 + (α 1 − β 1 )∆xt + (1 − α 2 )(y − β 1 x − β 0 )t− 1 + εt (21)
or:
ut = ρut− 1 + α 0 + (α 1 − β 1 )∆xt + εt.
where ρ = (1 − α 2 ).^8 Thus, the equilibrium error is an autocorrelated process; the higher the value of ρ (equivalently the smaller the value of α 2 ), the slower is the adjustment back to equilibrium, and the longer it takes for an equilibrium error to disappear. If α 2 = 0, there is no adjustment, and yt does not return to any equilibrium value, but drifts as a non-stationary variable. To summarize: when α 2 6 = 0 (so ρ 6 = 1), the ‘equilibrium error’ ut = (y − β 1 x − β 0 )t is a stationary autoregressive process. I(1) ‘nonsense-regressions’ problems will disappear in (20) because ∆yt and ∆xt are I(0) and, therefore, no longer trending. Standard t-statistics will be ‘sensibly’ distributed (assuming that εt is IID), irrespective of whether the past equilibrium error, ut− 1 , is stationary or not.^9 This is because a stationary variable, ∆yt, cannot be explained by a non-stationary variable, and α̂ 2 ' 0 if ut− 1 ∼ I(1). Conversely, when ut− 1 ∼ I(0), then α̂ 2 measures the speed of adjustment with which ∆yt adjusts (corrects) towards each new equilibrium position. Based on equations like (20), Hendry and Anderson (1977) noted that ‘there are ways to achieve stationarity other than blanket differencing’, and argued that terms like ut− 1 would often be station- ary even when the individual series were not. More formally, Davidson, Hendry, Srba and Yeo (1978) introduced a class of models based on (20) which they called ‘error-correction’ mechanisms (denoted ECMs). To understand the status of equations like (20), Granger (1981) introduced the concept of cointegration where a genuine relation exists, despite the non-stationary nature of the original data, thereby introducing the obverse of nonsense regressions. Further evidence that many economic time series were better construed as non-stationary than stationary was presented by Nelson and Plosser (1982), who tested for the presence of unit roots and could not reject that hypothesis. Closing this circle, Engle and Granger (1987) proved that ECMs and cointegration were actually two names for
(^8) The change in the equilibrium yt = β 1 xt − β 0 is ∆yt = β 1 ∆xt, so these variables must have steady-state growth rates gy and gx related by gy = β 1 gx. But from (20), gy = α 0 + α 1 gx, hence we can derive that α 0 = −(α 1 − β 1 )gx, as occurs in (21). (^9) The resulting distribution is not actually a t-statistic as proposed by Student (1908): Section 6 shows that it depends in part on the Dickey–Fuller distribution. However, t is well behaved, unlike the ‘nonsense regressions’ case.
This, however, is not the case if ut is autocorrelated: in particular, if ut is I(1). In fact, at T = 100, from the Monte Carlo experiment, we would need a critical value of 14. 8 to define a 5% rejection frequency under the null because:
P
tβ 1 =
so serious over-rejection occurs using (29). Instead of the conventional critical value of 2, we should use 15. Why does such a large distortion occur? We will take a closer look at each of the components in (26) and expand their formulae to see what happens when ut is non-stationary. The intuition is that although ̂β 1 is an unbiased estimator of β 1 , so E[̂β 1 ] = 0, it has a very large variance, but the calculated SE[̂β 1 ] dramatically under-estimates the true value. From (28), SE[̂β 1 ] consists of two components, the residual standard error, ̂σu, and the sum of squares,
(xt − x)^2. When β 1 = 0, the estimated residual variance ̂σ^2 u will in general be lower than σ^2 u =
(yt − y)^2 /T. This is because the estimated value ̂β 1 is usually different from zero (sometimes widely so) and, hence, will produce smaller residuals. Thus: ∑ (yt −^ ŷt)^2 ≤^
(yt − y)^2 , (30)
where ŷt = β̂ 0 + ̂β 1 xt. More importantly, the sum of squares
(xt − x)^2 is not an appropriate measure of the variance in xt when xt is non-stationary. This is so because x (instead of xt− 1 ) is a very poor ‘reference line’ when xt is trending, as is evident from the graphs in Figures 1 and 3, and our artificial example 2. When the data are stationary, the deviation from the mean is a good measure of how much xt has changed, whereas when xt is non-stationary, it is the deviation from the previous value that measures the stochastic change in xt. Therefore: ∑ (xt − x)^2
(xt − xt− 1 )^2. (31)
so both (30) and (31) work in the same direction of producing a serious downward bias in the estimated value of SE[̂β 1 ]. It is now easy to understand the outcome of the simulation study: because the correct standard error is extremely large, i.e., σβ 1 SE[β̂ 1 ], the dispersion of ̂β 1 around zero is also large, big positive and negative values both occur, inducing many big ‘t-values’. Figure 4 reports the frequency distributions of the t-tests from a simulation study by PcNaive (see Doornik and Hendry, 1998), using M = 10, 000 drawings for T = 50. The shaded boxes are for ±2, which is the approximate 95% conventional confidence interval. The first panel (a) shows the distribution of the t-test on the coefficient of xt in a regression of yt on xt when both variables are white noise and unrelated. This is numerically very close to the correct distribution of a t-variable. The second panel (denoted b, in the top row) shows the equivalent distribution for the nonsense regression based on (22)–(26). The third panel (c, left in the lower row) is for the distribution of the t-test on the coefficient of xt in a regression of yt on xt, yt− 1 and xt− 1 when the data are generated as unrelated stationary first-order autoregressive processes. The final panel (d) shows the t-test on the equilibrium-correction coefficient α 2 in (20) for data generated by a cointegrated process (so α 2 6 = 0 and β is known). The first and third panels are close to the actual distribution of Student’s t; the former is as expected from statistical theory, whereas the latter shows that outcome is approximately correct in dynamic models once the dynamics have been included in the equation specification. The second panel shows an outcome that is wildly different from t, with a distributional spread so wide that
−20 −10 0 10 20 30
.
.
.
−4 −2 0 2 4
.
.
.
.
−4 −2 0 2 4
.
.
.
.
0 2 4 6 8
.
.
.
.
.
Figure 4: Frequency distributions of nonsense-regression t-tests
most of the probability lies outside the usual region of ±2. While the last distribution is not centered on zero – because the true relation is indeed non-null – it is included to show that the range of the distribution is roughly correct. Thus, panels (c) and (d) show that, by themselves, neither dynamics nor unit-root non-stationarity induce serious distortions: the nonsense-regressions problem is due to incorrect model specification. Indeed, when yt− 1 and xt− 1 are added as regressors in case (b), the correct distribution results for tβ 1 =0, delivering a panel very similar to (c), so the excess rejection is due to the wrong standard error being used in the denominator (which as shown above, is badly downwards biased by the untreated residual autocorrelation). Almost all available software packages contain a regression routine that calculates coefficient estimates, t-values, and R^2 based on OLS. Since the computer will calculate the coefficients inde- pendently of whether the variables are stationary or not (and without issuing a warning when they are not), it is important to be aware of the following implications for regressions with trending variables:
(i) Although E[̂β 1 ] = 0, nevertheless tβ 1 =0 diverges to infinity as T increases, so that conventionally- calculated critical values are incorrect (see Hendry, 1995, chapter 3).
(ii) R^2 cannot be interpreted as a measure of goodness-of-fit.
The first point means that one will too frequently reject the null hypothesis (β 1 = 0) when it is true. Even in the best case, when β 1 6 = 0, i.e., when yt and xt are causally related, standard t-tests will be biased with too frequent rejections of a null hypothesis such as β 1 = 1, when it is true.
−.25 0 .25 .5 .75 1 1.
1
2
3
−.25 0 .25 .5 .75 1 1.
5
−.25 0 .25 .5 .75 1 1.
10
20
30
40
50
−.25 0 .25 .5 .75 1 1.
50
Figure 5: Frequency distributions of unit-root estimators
However, always including a constant and a trend in the estimated model ensures that the test will have the correct rejection frequency under the null for most economic time series. The required critical values have been tabulated using Monte Carlo simulations by Dickey and Fuller (1979, 1981), and most time-series econometric software (e.g., PcGive) automatically provides appropriate critical values for unit-root tests in almost all relevant cases, provided the correct model is used (see discussion in the next section).
7 Testing for cointegration
When data are non-stationary purely due to unit roots, they can be brought back to stationarity by linear transformations, for example, by differencing, as in xt − xt− 1. If xt ∼ I(1), then by definition ∆xt ∼ I(0). An alternative is to try a linear transformation like yt − β 1 xt − β 0 , which induces cointegration when yt − β 1 xt − β 0 ∼ I(0). But unlike differencing, there is no guarantee that yt − β 1 xt − β 0 is I(0) for any value of β, as the discussion in Section 4 demonstrated. There are many possible tests for cointegration: the most general of them is the multivariate test based on the vector autoregressive representation (VAR) discussed in Johansen (1988). These procedures will be described in Part II. Here we only consider tests based on the static and the dynamic regression model, assuming that xt can be treated as weakly exogenous for the parameters of the conditional model (see e.g., Engle, Hendry and Richard, 1983).^11 As discussed in Section 5, the condition that there exists a genuine causal link between I(1) series yt and xt is that the residual
(^11) Regression methods can be applied to model I(1) variables which are in fact linked (i.e., cointegrated). Most tests still have conventional distributions, apart from that corresponding to a test for a unit root.
−.5 −.25 0 .25 .5 .75 1
.
1
2
−.5 −.25 0 .25 .5 .75 1
1
2
3
4
−.5 −.25 0 .25 .5 .75 1
5
−.5 −.25 0 .25 .5 .75 1
5
10
15
Figure 6: Frequency distributions of stationary autoregression estimators
ut ∼ I(0), otherwise a ‘nonsense regression’ has been estimated. Therefore, the Engle–Granger test procedure is based on testing that the residuals ut from the static regression model (18) are stationary, i.e., that ρ < 1 in (2). As discussed in the previous section, the test of the null of a unit coefficient, using the DF test, implies using a non-standard distribution. Let ût = yt − ̂β 1 xt − ̂β 0 where ̂β is the OLS estimate of the long-run parameter vector β, then the null hypothesis of the DF test is H 0 : ρ = 1, or equivalently, H 0 : 1 − ρ = 0 in:
ût = ρ ût− 1 + εt (33)
or:
∆̂ut = (1 − ρ)ût− 1 + εt.
The test is based on the assumption that εt in (33) is white noise, and if the AR(1) model in (33) does not deliver white-noise errors, then it has to be augmented by lagged differences of residuals:
∆̂ut = (1 − ρ)̂ut− 1 + ψ 1 ∆ût− 1 + · · · + ψm∆ût−m + εt. (34)
We call the test of H 0 : 1−ρ = 0 in (34) the augmented Dickey–Fuller test (ADF ). A drawback of the DF -type test procedure (see Campos, Ericsson and Hendry, 1996, for a discussion of this drawback) is that the autoregressive model (33) for ̂ut is the equivalent of imposing a common dynamic factor on the static regression model:
(1 − ρL)yt = β 0 (1 − ρ) + β 1 (1 − ρL)xt + εt. (35)
data in levels are graphed in Figure 7 on a log scale, and in (log) differences in Figure 8. The price levels exhibit ‘wandering’ behavior, though not very strongly, whereas the differenced series seem to fluctuate randomly around a fixed mean of zero, in a typically stationary manner. The bimodal frequency distribution of the price levels is also typical of non-stationary data, whereas the frequency distribution of the differences is much closer to normality, perhaps with a couple of outliers. We also notice the large autocorrelations of the price levels at long lags, suggesting non-stationarity, and the lack of such autocorrelations for the differenced prices, suggesting stationarity (the latter are shown with lines at ± 2 SE to clarify the insignificance of the autocorrelations at longer lags).
1990 1995
−.
0
.
.
1990 1995
−.
0
.
−.1 0 .1.
10
20
−.1 −.05 0 .05 .1.
10
20
0 5 10
0
1
0 5 10
0
1
Figure 8: Gasoline prices at two locations in differences, their empirical density distribution and autocorrelogram
We first report the estimates from the static regression model: yt = β 0 + β 1 xt + ut,
and then consider the linear dynamic model:
yt = a 0 + a 1 xt + a 2 yt− 1 + a 3 xt− 1 + t. (38)
Consistent with the empirical data, we will assume that E[∆yt] = E[∆xt] = 0, i.e., there are no linear trends in the data. Without changing the basic properties of the model, we can then rewrite (38) in the equilibrium-correction form:
∆yt = a 1 ∆xt + (a 2 − 1) (y − β 0 − β 1 x)t− 1 + t (39)
where a 2 6 = 1, and:
β 0 = a^0 1 − a 2
and β 1 = a^1 +^ a^3 1 − a 2
In formulation (39), the model embodies the lagged equilibrium error (y − β 0 − β 1 x)t− 1 , which captures departures from the long-run equilibrium as given by the static model. As demonstrated in (21), the equilibrium error will be a stationary process if (a 2 − 1) 6 = 0 with a zero mean:
E [yt − β 0 − β 1 xt] = 0, (41)
whereas if (a 2 − 1) = 0, there is no adjustment back to equilibrium and the equilibrium error is a non-stationary process. The link of cointegration to the existence of a long-run solution is manifest here, since yt − β 0 − β 1 xt = ut ∼ I(0) implies a well-behaved equilibrium, whereas when ut ∼ I(1), no equilibrium exists. In (39), (∆yt, ∆xt) are I(0) when their corresponding levels are I(1), so with t ∼ I(0), the equation is ‘balanced’ if and only if (y − β 0 − β 1 x)t is I(0) as well. This type of ‘balancing’ occurs naturally in regression analysis when the model formulation permits it: we will demonstrate empirically in (43) that one does not need to actually write the model as in (39) to obtain the benefits. What matters is whether the residuals are uncorrelated or not. The estimates of the static regression model over 1987(24)–1998(29) are:
pa,t = 0. 018 (2.2)
(67.4)
pb,t + ut
R^2 = 0. 89 , σ̂ u = 0. 050 , DW = 0. 18
where DW is the Durbin–Watson test statistic for first-order autocorrelation, and the ‘t-statistics’ based on (26) are given in parentheses. Although the DW test statistic is small and suggests non- stationarity, the DF test of ut in (42) supports stationarity (DF = − 8. 21 ∗∗). Furthermore, the following mis-specification tests were calculated:
AR(1–7), F(7, 569) = 524 .6 [0.00]∗∗ ARCH (7), F(7, 562) = 213 .2 [0.00]∗∗ Normality , χ^2 (2) = 22 .9 [0.00]∗∗
The AR(1–m) is a test of residual autocorrelation of order m distributed as F(m, T ), i.e. a test of H 0 : ut = εt against H 1 : ut = ρ 1 ut− 1 + · · · + ρmut−m + εt. The test of autocorrelated errors of order 1-7 is very large and the null of no autocorrelation is clearly rejected. The ARCH (m) (see Engle,