Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
You have accessResearch articles

Heavy-tailed distributions, correlations, kurtosis and Taylor’s Law of fluctuation scaling

Joel E. Cohen

Joel E. Cohen

Laboratory of Populations, The Rockefeller University and Columbia University, New York, NY, USA

Earth Institute, Department of Statistics, Columbia University, New York, NY, USA

Department of Statistics, University of Chicago, Chicago, IL, USA

[email protected]

Google Scholar

Find this author on PubMed

,
Richard A. Davis

Richard A. Davis

Department of Statistics, Columbia University, New York, NY, USA

Google Scholar

Find this author on PubMed

and
Gennady Samorodnitsky

Gennady Samorodnitsky

School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA

Google Scholar

Find this author on PubMed

Published:https://doi.org/10.1098/rspa.2020.0610

    Abstract

    Pillai & Meng (Pillai & Meng 2016 Ann. Stat. 44, 2089–2097; p. 2091) speculated that ‘the dependence among [random variables, rvs] can be overwhelmed by the heaviness of their marginal tails ·· ·’. We give examples of statistical models that support this speculation. While under natural conditions the sample correlation of regularly varying (RV) rvs converges to a generally random limit, this limit is zero when the rvs are the reciprocals of powers greater than one of arbitrarily (but imperfectly) positively or negatively correlated normals. Surprisingly, the sample correlation of these RV rvs multiplied by the sample size has a limiting distribution on the negative half-line. We show that the asymptotic scaling of Taylor’s Law (a power-law variance function) for RV rvs is, up to a constant, the same for independent and identically distributed observations as for reciprocals of powers greater than one of arbitrarily (but imperfectly) positively correlated normals, whether those powers are the same or different. The correlations and heterogeneity do not affect the asymptotic scaling. We analyse the sample kurtosis of heavy-tailed data similarly. We show that the least-squares estimator of the slope in a linear model with heavy-tailed predictor and noise unexpectedly converges much faster than when they have finite variances.

    1. Introduction

    (a) Motivations

    This work has a practical motivation and a theoretical motivation. Both are driven by the need to cope with exceptionally large events such as outbreaks of human, animal and plant diseases; earth events like cyclones, tsunamis, earthquakes, volcanoes, fires, droughts and floods; and extreme fluctuations in finance, insurance, trade, production and employment, among others. Both motivations are enabled by the on-going development since the work of Paul Lévy in 1924 of a systematic theory of random events with infinite mean [13]

    For example, one practical motivation is the need to understand the effect of heavy-tailed data on an empirical regularity called Taylor’s Law (TL) or Taylor’s power law in the biological sciences, and fluctuation scaling in the physical sciences. TL asserts that the sample variance is approximately proportional to some power of the sample mean in a set of samples. Explicitly, for some constants a > 0 and b, both of which are independent of the sample i,

    sample variance of sample ia×(sample mean of sample i)b,i=1,2,
    or equivalently
    log sample variance of sample ilog(a)+b×log(sample mean of sample i),i=1,2,
    or equivalently
    sample variance of sample i(sample mean of sample i)ba,i=1,2,.
    These specifications of TL intentionally leave vague the error model hinted at by the approximation ≈. TL is very widely confirmed in many sciences, including ecology, infectious disease epidemiology, human demography, financial statistics, earth sciences and other physical sciences, as reviewed by Eisler et al. [4] and Taylor [5].

    In statistics, a variance function is a function that specifies how the variance is related to the mean in a set of samples. To statisticians, TL is simply a power-law variance function. An early, perhaps the first, occurrence of a power-law variance function in ecology appeared in Bliss [6]. From 1941 to 2017, users of TL assumed, usually without saying so, that observations were drawn from distributions with finite mean and finite variance. At the same time, heavy-tailed data were increasingly recognized in insurance, income distributions, earthquake magnitudes, financial fluctuations, meteorology and other sciences. Heavy-tailed data distributions have some or all infinite moments, such as infinite variance or infinite mean. Heavy-tailed distributions have much greater probabilities of extremely large observations than light-tailed distributions, such as the normal or Gaussian distribution. But samples from a heavy-tailed distribution always have a finite sample mean and finite sample variance. It is important to know under what conditions samples from a heavy-tailed distribution obey TL and, when TL holds, how the exponent b of TL relates to the parameter(s) of the underlying heavy-tailed distribution. Brown et al. [7] showed that certain random samples (sets of independently and identically distributed [iid] observations) from heavy-tailed distributions with infinite mean obey TL. These results are greatly extended here. In particular, we ask whether and when TL holds if observations are not iid, either because the observations are dependent or because the observations come from different distributions.

    In addition to its empirical motivation, this work has a theoretical motivation. Drton & Xiao [8] conjectured, and proved when m = 2, a surprising and beautiful proposition subsequently proved for any integer m > 1 by Pillai & Meng [9]. For integer m > 1, let X = (X1, …, Xm) and Y = (Y1, …, Ym) be independent copies of a multivariate normal vector N(0, Σ), where Σ = {σij} ≥ 0 is any m × m covariance matrix with σjj > 0, 1 ≤ j ≤ m. Let (w1, …, wm) be weights, wj ≥ 0, j = 1, …, m, and Σj=1mwj=1. Then Σj=1mwjXj/Yj has the standard Cauchy distribution on the real line with probability density function (pdf) 1/[π(1 + z2)]. The surprise is that the covariance matrix Σ has no effect on the distribution of Σj=1mwjXj/Yj.

    Pillai & Meng [9, p. 2091] remarked: ‘A theoretical speculation from this unexpected result is that for a set of random variables ·· ·, the dependence among them can be overwhelmed by the heaviness of their marginal tails ·· ·. We invite the reader to ponder with us whether this is a pathological phenomenon or something profound’. The present work supports the speculation of Pillai & Meng [9] by giving multiple examples in which heavy tails outweigh correlations and heterogeneity of distributions. Here is a preview of coming attractions.

    (b) Organization of this paper

    Section 2 reviews the definitions and some properties of regularly varying (RV), including heavy-tailed, random variables (rvs), those in which the survival function or upper tail is asymptotically a power function with negative exponent −α, where α > 0. An important classical limit theorem stated here is that sums of appropriately centred and normalized RV rvs converge to stable rvs with index α if α < 2. We illustrate the uses of this theorem by deriving the limiting stable laws that describe the asymptotic behaviour of the sums of squared rvs and the sums of squared deviations from the sample mean.

    Section 3 analyses the asymptotic behaviour of Taylor’s Law when the observations are iid heavy-tailed RV rvs; or when the observations are RV rvs with distinct tail indices (heterogeneity); or when the observations are the reciprocals of powers greater than one of arbitrarily (but imperfectly) positively or negatively dependent normals. These results provide far-reaching generalizations of Taylor’s Law. The ratio of the sample variance to a power of the sample mean, which is the focus of Taylor’s Law, is formally similar to the sample kurtosis, which is the ratio of the fourth central moment to a power (the square) of the sample variance. Thus, we use the methods developed to analyse Taylor’s Law to derive the asymptotic behaviour of the sample kurtosis when the data are iid RV rvs. In principle, these results could be extended to heterogeneous and dependent data.

    Section 4 shows that under natural conditions the sample correlation of a pair of RV rvs converges to a generally random limit, which we express in terms of stable laws. This limit is zero in case the rvs are the reciprocals of powers greater than one of two arbitrarily (but imperfectly) positively or negatively dependent normal rvs. Surprisingly, in this case the sample correlation multiplied by the sample size has a limiting distribution that is concentrated on the negative half-line, regardless of whether the two normals are positively or negatively dependent. These findings support the speculation of Pillai & Meng [9].

    Section 5 compares the behaviour of the least-squares estimator of η in a linear model Y = ηX + Z under two different assumptions about X and Z: first, that X and Z are heavy-tailed RV rvs with asymptotically equivalent tails; and second, that X is heavy-tailed but the noise variable Z is light-tailed, possibly Gaussian. In both cases the estimator converges much faster than in the case when X and Z have finite variances. When Z is light-tailed, the sample correlation between X and Y converges in probability to 1.

    Section 6 investigates by numerical simulation how rapidly some of the correlated, heterogeneous, RV rvs described in the preceding sections converge in distribution to their limit rvs expressed as functions of stable laws. The examples are drawn from Taylor’s Law, bivariate correlation, and the linear model.

    For the sake of readability, §§25 give only a few simple proofs. Appendix A gives more detailed mathematical background on heavy-tailed distributions and point process methods. It then proves the major claims made regarding the limit theory for Taylor’s Law and the limit theory for the sample correlations of heavy-tailed data including the special example of the reciprocals of powers greater than one of correlated normals.

    2. Background on heavy tails and regular variation

    For heavy-tailed data when the population variance is infinite, the limit behaviour of the sample mean, sample variance, extreme order statistics and related statistics, is widely studied [1,2,1014]. When the variance is finite, the classical central limit theorem implies that the sample mean, centred by its mean and normalized by its standard deviation, converges to a normal distribution. When the variance is infinite, the suitably centred and normalized partial sums also converge in distribution when the data are regularly varying (RV), with Pareto upper tails. So regular variation is often the starting point for modelling various statistics of univariate heavy-tailed data.

    To be specific, assume the random variable (rv) X is positive and has distribution function F(x): = P(X ≤ x), x > 0 and survival function F¯(x):=1F(x)=P(X>x). We say that X (or F) is RV with index α > 0, and we shall henceforth write F ∈ RV(α), if

    limtF¯(tx)F¯(t)=xα for all x>0. 2.1
    If F satisfies (2.1), then F has a Pareto upper tail:
    F¯(x)=xαL(x),limtL(tx)L(t)=1, for all x>0. 2.2
    We say L(x) is a slowly varying function (at infinity). Examples of slowly varying functions are L(x)=logx,loglogx and their reciprocals. Assumption (2.1) implies that X has finite moments only up to order α: EXp < ∞ for p < α and EXp = ∞ for p > α. From (2.2), EXα=0P(Xα>x)dx=0x1L(x1/α)dx, so that the finiteness of EXα depends on the integrability of x−1L(x1/α) for x large. For example, EXα= if L(x) is constant, whereas EXα< if L(x) ∼ (1/log x)1+ϵ for some ϵ > 0. Since we are interested here only in heavy-tailed distributions with infinite variance, we shall assume that α < 2 throughout.

    Let X1, …, Xn be iid with distribution F. The sum t=1nXt, appropriately centred and normalized, converges in distribution as the sample size n → ∞. To this end, define the sequence of normalizing constants {an: n = 1, 2, …} as the sequence of 1 − 1/n quantiles of F, i.e.

    an=inf{x:F(x)11n}. 2.3
    If F is continuous, then F¯(an)=1/n. By replacing t with an in (2.1) it can be shown that
    nP(X>anx)=nF¯(anx)xαfor allx>0asn 2.4
    and an=n1/αL~(n) for some slowly varying function L~. If F is Pareto, i.e. if F¯(x)=xα for x ≥ 1, then an = n1/α.

    A classical limit theorem [15] describes sums of heavy-tailed rvs. Let Sα be a stable rv with index α ∈ (0, 2) and shape parameter β = 1. When α ∈ (0, 1), Sα takes only non-negative real values. When α ∈ [1, 2), Sα takes negative real values, but the left tail is lighter than that of a normal distribution. When α = 2, Sα is normally distributed.

    Define

    bn=0 for α<1andbn=E[X1{Xan}]forα[1,2). 2.5
    Then
    an1t=1n(Xtbn)dSα, 2.6
    where d denotes ‘converges in distribution to’. For α ∈ (1, 2), bn can be replaced by the (finite) mean EX.

    Example 2.1 (sums of squares).

    Because X2 ∈ RV(α/2), applying (2.6) to the sums of squares gives, for any α ∈ (0, 2),

    an2t=1nXt2dSα/2 asn. 2.7
    The sum of squared deviations from the sample mean X¯n:=n1t=1nXt has the same limit:
    an2t=1n(XtX¯n)2dSα/2 asn. 2.8
    The convergences in (2.6) and (2.8) are joint, i.e.
    (an1t=1n(Xtbn),an2t=1n(XtX¯n)2)d(Sα,Sα/2) asn. 2.9

    The remarkable feature of this example is that the limit rvs on the right sides of (2.7) and (2.8) are identical. To give insight into why subtracting the sample mean in (2.8) makes no difference to the limit rv, we offer a brief proof. For the remainder of this article, for the sake of readability, we shall defer many detailed proofs to appendix A. For example, the nature of the dependence between Sα and Sα/2 and the proof of (2.9) are given in theorem A.3.

    Proof of (2.8).

    The sum of squared deviations from the sample mean on the left of (2.8) decomposes into

    an2t=1n(XtX¯n)2=an2t=1n(Xtbn)2n1(an1t=1n(Xtbn))2. 2.10
    By virtue of (2.6), the second term on the right converges to zero in probability. Since an2=n+2/αL~2(n), it follows that an2>n1+ϵ for some small ϵ > 0 and all n large. Moreover, since for α > 1, bn → EX and for α = 1, bn is slowly varying [15], nbn/an20 and bn/an → 0. Using these facts and expanding the first term on the right of (2.10) gives
    an2t=1nXt22an1bnan1t=1n(Xtbn)nan2bn2=an2t=1nXt2+an1bnOp(1)nan2bn2=an2t=1nXt2+op(1),
    from which (2.8) is immediate. ▪

    3. Heavy tails, Taylor’s Law and kurtosis

    (a) Taylor’s Law with iid heavy-tailed data

    If the RV heavy-tailed iid rvs X1, …, Xn have tail index α < 1, then the centring constants are bn = 0 and the population mean and population variance are infinite. In one of its many forms, TL considers the ratio of the sample variance divided by a power b, 0 < b < ∞, of the sample mean,

    Wn=n1t=1n(XtX¯n)2X¯nb. 3.1
    The order of the numerator is n1an2 while the order of the denominator is nbanb. For Wn to converge in distribution, the order of the numerator and denominator must match, which requires that
    b=(2α)(1α). 3.2
    For this b,
    1L~2b(n)WndSα/2Sαb, 3.3
    where L~(n) is as defined for an following (2.4). The numerator Sα/2 and the denominator Sαb are dependent, and the dependence is described explicitly in theorem A.3. Brown et al. [7] obtained expression (3.2) for b and some of the moments of the limiting distribution.

    (b) Taylor’s Law with heterogeneous heavy-tailed data

    In the previous subsection, the heavy-tailed observations {Xt} are assumed to be iid. Here we analyse independent data (or rvs) that are heterogeneous, i.e. not identically distributed.

    Suppose that the observations are independent but come from two rvs U and V, where U ∈ RV(α) with α ∈ (0, 1) and V is dominated by U in the sense that

    P(V>x)P(U>x)0asx. 3.4
    Further assume that the data X1, …, Xn consist of a random number νn < n of observations distributed as U and n − νn of observations distributed as V, where νn/nPp(0,1] as n → ∞. Here P means ‘converges in probability to’. This formulation can be interpreted quite generally: V could represent a union of several pooled rvs, each with tails lighter than the tail of U. Letting an be the 1 − 1/n quantile of FU, it is easy to show that
    an1t=1nXt=an1t=1νnUt+an1t=1nνnVt=an1t=1[np]Ut+an1t=1[n(1p)]Vt+op(1).
    Since P(V > x) = o(P(U > x)) for x large, the argument in the proof of theorem 4.1 in appendix A(c) can be used to show that
    an1t=1[n(1p)]VtP0.

    If f(x), g(x) are real-valued functions of real x and g(x) > 0 for all sufficiently large x, then f(x) ∼ g(x) means lim x→∞ f(x)/g(x) = 1. Then, on the other hand,

    an1t=1[np]Ut=anpananp1t=1[np]Utp1/αanp1t=1[np]Utdp1/αSα.
    A similar result holds for the sample variance so that the convergence in distribution for TL in (3.3) also holds for such non-identically distributed data. The dependence between numerator Sα/2 and the denominator Sαb is described explicitly in theorem A.3.

    (c) Taylor’s Law with correlated heavy-tailed data

    Here we suppose the observations (or rvs) are pairwise correlated and identically distributed. We shall show that the scaling exponent b in TL remains the same as (3.2), but the limit changes. Specifically, let N0, N1, … be iid standard normal rvs and let 0 ≤ ρ < 1. Let Zi=ρN0+1ρNi,i=1,2,. Then for i = 1, …, each Zi is normal, EZi = 0, Var(Zi) = 1, and for i ≠ j ≥ 1, Cov(Zi, Zj) = ρ. Let 0 < α < 1 and define Xi:=1/(Zi2)1/(2α),i=1,2,. Define Wn in terms of Xi as in (3.1) and b as in (3.2). Then

    WndSα/2Sαb[(1ρ)exp{ρ1ρN2}]1/(2(1α)), 3.5
    where Sα/2,Sα are non-negative stable rvs given in (A 10) and (A 11) with tail indices α/2, α ∈ (0, 1), respectively, and shape parameter β = 1, and N is a standard normal rv independent of Sα/2 and Sα. As previously, the numerator Sα/2 and the denominator Sαb are dependent because Γj in (A 10) is the same as Γj in (A 11), for each j = 1, ….

    To prove (3.5), it is enough to prove that for every fixed real z, if Zi:=ρz+1ρNi, i = 1, 2, …, then

    WndSα/2Sαb[(1ρ)exp{ρ1ρz2}]1/(2(1α)). 3.6
    To prove (3.6), we observe that X1, X2, … are iid (because they are conditional on fixed z) and
    P(X1>t)2π(1ρ)1/2exp{ρ2(1ρ)z2}tα as t.
    In particular, the tail of X is multiplied by the factor of
    h(ρ)=(1ρ)1/2exp{ρ2(1ρ)z2}
    in comparison with the case ρ = 0 in §3a. That means that, in the limit, every term Γj1/α in both Sα/2 and Sα ends up being multiplied by h(ρ)1/α. Therefore, the numerator in the limit in (3.3) gets multiplied by h(ρ) 2/α, while the denominator gets multiplied by h(ρ) (2−α)/(α(1−α)). Doing the arithmetic proves (3.6).

    The heavy-tailed version of TL also holds widely for stationary mixing sequences. Here we assume that {Xt} is a strictly stationary time series with X1 ∈ RV(α), α ∈ (0, 1). If the time series satisfies the mixing and dependence conditions D and D′ in [16], then the limit theorem that holds for TL for the iid case also holds exactly for the stationary case. This follows directly from Davis [16, theorem 4 and its corollary]. The mixing condition D governs how fast certain events become independent as their time separation increases. Condition D′ prohibits clustering of pairs of nearby observations when both are large. While condition D is rather mild, condition D′ is more restrictive.

    Alternatively, in mixing stationary time series that are RV, i.e. where all the finite-dimensional distributions are multivariate RV, the point process convergence in theorem A.1 can be extended, but now the limit point process includes clusters of points that are linked to the Poisson points Γi1/α [10,17]. From these point process convergence results, the limiting form of Taylor’s Law (3.3) is a ratio of the same two stable rvs with tail indices α/2 and α in the numerator and denominator as in (3.3), but the numerator and denominator may need to be multiplied by different constant scale factors.

    The results in §§3b and c may be combined. Explicitly, as in §3b, suppose that U ∈ RV(α) with α ∈ (0, 1) and U dominates V in the sense of (3.4). Assume that the data X1, …, Xn consist of a random number νn < n of observations distributed as U and n − νn observations distributed as V, where νn/nPp(0,1] as n → ∞. In addition, as in §3c, suppose that the νn < n observations Xi distributed as U satisfy Xi:=1/(Zi2)1/(2α)RV(α) and, when Xi, Xj are both distributed as U, then {Zi} are correlated standard normals with Cov(Zi, Zj) = ρ if i ≠ j ≥ 1. The n − νn of observations distributed as V may be arbitrarily dependent or independent. Then TL holds as described in (3.5). We simulate this model in §6a.

    (d) Sample kurtosis

    The kurtosis is often used to measure the heaviness of the tail of a distribution function. For a random variable X with finite fourth moment and finite variance σ2, the kurtosis is defined by κ = E(X − EX)4/σ4. A normal random variable has kurtosis 3. A rv with κ > 3 is said to be leptokurtic, indicating tails fatter than those of a normal rv. Since we are primarily concerned with heavy-tailed data in which the population variance is infinite, we focus our attention on the kurtosis κ as opposed to the excess kurtosis κ − 3.

    If a rv has infinite second moment, then the population kurtosis does not exist. Nevertheless, given a sample X1, …, Xn, one can still compute the sample mean X¯n and the sample kurtosis

    κ^n:=n1t=1n(XtX¯n)4(n1t=1n(XtX¯n)2)2=nt=1n(XtX¯n)4(t=1n(XtX¯n)2)2. 3.7

    Assume that X1, …, Xn are iid as X ∈ RV(α) with α ∈ (0, 2). From the argument of example 2.1 (see also theorem A.3), it follows that

    (an2t=1n(XtX¯n)2,an4t=1n(XtX¯n)4)d(Sα/2,Sα/4) 3.8
    where Sγ=i=1Γi1/γ. By the continuous mapping theorem, we obtain
    n1κ^ndSα/4Sα/22. 3.9
    By Cauchy’s inequality, the limit random variable on the right of (3.9) is bounded above by 1. As the tails get heavier, i.e. as α ↓ 0, the limit random variable becomes more concentrated at 1 because, for γ < 1,
    Sγ=Γ11/γ+Γ11/γi=2(ΓiΓ1)1/γ.
    Since each term in the sum is bounded by 1 and Γi ∼ i a.s. as i → ∞, it follows that SγΓ11/γ as γ ↓ 0. Thus, the limit random variable on the right of (3.9) converges to 1 as α ↓ 0.

    If α ∈ (2, 4), then X has finite variance σ2 but infinite fourth moment. In this case, the same argument shows that

    nan4κ^ndSα/4[σ2]2. 3.10
    The rate of convergence, however, is roughly n1−4/αL(n), where L is a slowly varying function, which is rather slow especially as α ↑ 4.

    4. Heavy tails and sample correlations

    The correlation of a pair of rvs is defined as their covariance divided by the product of their standard deviations. It cannot be defined for pairs of rvs with infinite variance, such as all those in RV(α) with α ∈ (0, 2). Nevertheless, the sample correlation (4.4) is a well defined rv. For heavy-tailed time series, the sample autocorrelation can have unexpected and desirable properties [11,12]. Here we give conditions under which the sample correlation of dependent heavy-tailed rvs converges to zero (in probability or almost surely, depending on detailed assumptions) as the sample size gets large.

    (a) Bivariate regular variation and asymptotic independence

    We define regular variation for a pair (X, Y) of heavy-tailed rvs. Assume that X and Y are identically distributed with RV distribution F ∈ RV(α) as in (2.1). Our results require only that their distributions have asymptotically equivalent tails, i.e. lim s→∞P(X > s)/P(Y > s) = C ∈ (0, ∞). We say that X and Y are asymptotically independent if

    P(X>t|Y>t):=P(X>t,Y>t)P(Y>t)0ast. 4.1
    Of course, if X and Y are independent then they are also asymptotically independent. When t is replaced by an defined in (2.3), asymptotic independence is equivalent to
    nP(X>an,Y>an)0asn. 4.2

    If X and Y are not asymptotically independent, we quantify their dependence in the tails of the distribution by assuming that there exists a measure ν( · ) on ([0,)2,B([0,)2)) such that, for Ax,y = [0, x] × [0, y] and Ax,yc the set complement of Ax,y,

    nP(X>anxorY>any)=nP(an1(X,Y)Ax,yc)ν(Ax,yc)asn 4.3
    for all x, y ≥ 0 with max{x, y} > 0.

    If x = 0, y > 0, then ν(A0,yc)=yα. Similarly, ν(Ax,0c)=xα. It follows that X and Y are asymptotically independent if and only if ν([x, ∞) × [y, ∞)) = 0 for all x > 0, y > 0.

    (b) Sample correlation of heavy-tailed data

    Define the sample correlation between the pairs (Xi, Yi), t = 1, …, n, in the conventional way by

    ρn:=t=1n(XtYtX¯nY¯n)t=1n(XtX¯n)2t=1n(YtY¯n)2. 4.4

    Theorem 4.1.

    Assume the distribution of (X, Y) satisfies (4.3) and α ∈ (0, 2). Then

    ρndU:=Sα/2,0Sα/2,1Sα/2,2 as n, 4.5
    where Sα/2,0, Sα/2,1, and Sα/2,2 are stable rvs with joint distribution specified in appendix A(c), (A 18)–(A 20). Moreover,
    ρnP0asn 4.6
    if and only if X and Y are asymptotically independent.

    Appendix A(c) proves this result. Although the stable rvs in the numerator and denominator of (4.5) have heavy tails, the ratio on the right of (4.5) does not have heavy tails because of self-normalization.

    When ρnP0, i.e. when X and Y are asymptotically independent, it is often possible to find normalizing constants cn → ∞ such that cnρn converges in distribution to a non-degenerate rv. For example, when X and Y are independent, cn = n1/αL(n) for some slowly varying function L [11,12]. Theorem 4.5 gives another example in which X and Y are the reciprocals of powers of correlated bivariate normal rvs with mean 0 and variance 1. Such dependent X and Y are asymptotically independent.

    We now extend theorem 4.1 to the case when X and Y are RV rvs with different tail indices. Appendix A(c) gives the proof of theorem 4.2.

    Theorem 4.2.

    If X ∈ RV(α) and Y ∈ RV(β) with 0 < α < β < 2, then

    ρndU:=Sαβ/(α+β),0Sα/2,1Sβ/2,2 as n, 4.7
    where Sαβ/(α+β),0, Sα/2,1 and Sβ/2,2 are defined in (A 21), (A 19) and (A 22), respectively. If X and Yβ/α are asymptotically independent, then ρnP0 as n → ∞ since then U = 0 a.s.

    We give a quick and easy way to construct dependent, but asymptotically independent, rvs.

    Theorem 4.3.

    Let U and V be two positive rvs with positive and continuous marginal densities in a neighbourhood of zero. Suppose that in a neighbourhood of the origin, U and V have a joint density function fU,V (u, v) satisfying fU,V (u, v) ≤ a(u2 + v2)θ/2 for some a > 0 and 0 ≤ θ < 1. Then, for any c > 1 (or α = 1/c ∈ (0, 1)), X: = 1/Uc ∈ RV(1/c), Y: = 1/Vc ∈ RV(1/c), X and Y are asymptotically independent, and, for a random sample (X1, Y1), …, (Xn, Yn) iid as (X, Y), we have ρnP0 as in (4.6).

    Proof.

    We have

    P(X>x)=P(1/Uc>x)=P(U<x1/c)=0x1/cfU(u)dux1/cfU(0)asx,
    where fU(u) is the marginal pdf of U. It follows that X ∈ RV(1/c). A similar argument holds for Y. Next,
    P(X>x,Y>x)=P(1Uc>x,1Vc>x)=0x1/c0x1/cfU,V(u,v)dudva0x1/c0x1/c(u2+v2)θ/2dudv2πax(2θ)/c.
    Dividing the right side by x−1/cfV(0) and letting x → ∞, the limit becomes zero, proving (4.1). Then theorem 4.1 entails (4.6). ▪

    Corollary 4.4.

    Suppose (U, V) has a bivariate normal distribution with means 0, variances 1 and correlation ρ ∈ ( − 1, 1). Set X: = 1/|U|c, Y: = 1/|V|c for c > 1. Then X ∈ RV(1/c), Y ∈ RV(1/c), and X and Y are asymptotically independent with ρnP0 as n → ∞. Further, ρna.s.0 as n → ∞.

    Theorem 4.5.

    Under the assumptions of corollary 4.4,

    P(XY>x)2cπ(1ρ2)1/2x1/clogxasx.
    Moreover,
    nρndR1R2asn 4.8
    where R1 and R2 are iid rvs with R1=Sα/Sα/2 and Sα and Sα/2 are given in (A 10) and (A 11) with α = 1/c < 1.

    It is curious that, regardless of ρ ∈ ( − 1, 1), n converges in distribution to a rv that is negative almost surely. In this situation (α < 1), the mean is infinite and the tail of the product XY is similar to the tails of X and Y apart from a slowly varying function. The limit behaviour of the numerator of (4.4) is governed by the product of the partial sums of the Xis and Yis, which in view of theorem 4.5 is of smaller order. In other words, ρn behaves essentially like the product of two t-statistics, one for the Xis and one for the Yis.

    Since the mean here is infinite, it is reasonable also to define a version ρ~n of the sample correlation without the correction for the mean:

    ρ~n:=i=1nXiYii=1nXi2i=1nYi2.

    Corollary 4.6.

    The assumptions of corollary 4.4 imply

    (nlogn)cρ~nd(1ρ2)c/2S1S2S3asn, 4.9
    where S1, S2, S3 are independent stable rvs with representations given in (A 10) and indices α equal to 1/c, 1/(2c) and 1/(2c), respectively.

    Here the stable rvs in the numerator and denominator of (4.9) have heavy tails and are independent, so the ratio on the right of (4.9) does have heavy tails. We comment after (5.5) below on the confidence intervals implied by the similar situation there (a ratio of independent stable rvs).

    Whereas the limit rv in (4.8) is independent of the population correlation ρ, the limit rv in (4.9) is scaled by a function of the population correlation ρ. The rate of convergence (n/log n)c is rather fast, essentially of order n2 for c = 2. This is much faster than the standard n1/2 rate when the variance is finite. Corollary 4.4, theorem 4.5 and corollary 4.6 are proved in appendix A(d).

    5. Heavy tails in a linear model

    Suppose X and Z are two heavy-tailed rvs and

    Y=ηX+Z,η>0. 5.1
    Given a sample (X1, Y1), …, (Xn, Yn) of n independent observations, what is a useful way to estimate η? Surprisingly, even though the data have infinite variance, the least-squares estimator for η performs remarkably well. Specifically, the least-squares estimator is weakly consistent, as we now show.

    Assume that X and Z are independent positive rvs with distribution functions FX and FZ, respectively, with asymptotically equivalent tails, i.e.

    limsF¯Z(s)F¯X(s)=C(0,). 5.2
    If X ∈ RV(α), then it is elementary that
    P(Y>x)P(X>x)ηα+Casx.
    Thus Y also has a heavy-tailed distribution with the same tail index α. However, if C = ∞, then Y inherits its heavy tail from the noise Z. Here we will assume C < ∞.

    Since the vector (X, Z) is bivariate regularly varying and since (X, Y) is a linear transformation of (X, Z), it follows directly from Basrak et al. [18, p. 113, Proposition A.1] that (X, Y) is bivariate RV.

    The angular measure of (X, Y) in (A 5) has mass at arctanη and π/2, namely,

    G()=(η2+1)α/2(η2+1)α/2+Cεarctanη()+C(η2+1)α/2+Cεπ/2(). 5.3
    It follows directly from theorem 4.1 that if (X1, Y1), …, (Xn, Yn) are iid observations from the linear model (5.1) with 0 < C < ∞, then ρndU where U is given in (4.5). On the other hand, if C = 0, which would allow the noise variable Z to be light-tailed, including normal, then the form of the limit variable U in (4.5) (see also (A 14)–(A 20)) implies that ρnP1 as n → ∞. To see this, the distribution G in (5.3) for the angular part of the limit measure corresponds to point mass at arctanη. It follows now that the rvs Θ i appearing in (A 18)–(A 20) are all constant and equal to arctanη with probability one. As long as η > 0, then arctanη(0,π/2) and hence Sα/2,02/(Sα/2,1Sα/2,2)=1.

    The least-squares estimator of η minimizes t=1n(YtηXt)2. It is given by

    η^=t=1nXtYtt=1nXt2.
    Plugging Yt = ηXt + Zt into the summands in the numerator gives
    η^η=t=1nXtZtt=1nXt2. 5.4
    Applying theorem 4.5 as in the proof of (4.5), it is easy to see that, if an is the 1 − 1/n quantile of FX and if α ∈ (0, 2), then an2t=1nXtZtP0, an2t=1nXt2dSα/2,1 as n → ∞, and hence η^ is weakly consistent for η (i.e., as n → ∞, η^ converges in probability to η for every η > 0).

    If α ∈ (0, 1) and FX is asymptotic to a Pareto distribution F¯x(x)Kxα,K(0,) as x → ∞, then as in corollary 4.6,

    (nlogn)1/α(η^η)dC1/αS1S2asn, 5.5
    where S1 and S2 are independent stable rvs given by (A 10) with indices α and α/2, respectively. For example, if α = 1/2, then the rate of convergence of the least-squares estimate is extremely fast, approximately of order n2, which is substantially faster than the convergence rate of order n when the variances of X and Z are finite. In particular, an approximate 95% confidence interval for η, assuming C = 1 and α = 1/2, is (η^c0.975(logn/n)2,η^c0.025(logn/n)2), where cγ is the γ-quantile of the distribution for S1/S2. The two minus signs in the lower and upper bounds of the confidence interval are the surprising consequence of the fact that η^ is always bigger than η, as can be seen from (5.4), even for finite n and as n → ∞. This inequality is an artifact of X and Z being positive. The width of this confidence interval is (c0.975 − c0.025)(log n/n)2. On the other hand, if the variances of X and Z are finite, the width of the 95% confidence interval for η based on the least squares estimate is 3.92(σZ/σX)n−1/2. While the difference in quantiles c0.975 − c0.025 coming from the ratio of two independent stable rvs can be large, this potentially large width is more than overcome by the fast rate of the scaling (log n/n)2 for large n.

    6. Simulations of Taylor’s Law, correlations and a linear model

    The purpose of this section is to investigate by numerical simulation how rapidly some of the statistics based on dependent heterogeneous, RV rvs described in §§35 converge in distribution to their limit rvs expressed as functions of stable laws.

    (a) Taylor’s Law with heterogeneous, dependent, heavy tails

    We simulate the heterogeneous, dependent, RV rvs described in §§3b,c with three different values of the pairwise correlations ρ = 0, 0.5, 0.9999 of the standard normals Zt and with observations

    Xt:=1(Zt2)1/(2αt)RV(αt),t=1,2,,n.
    In this numerical example (figure 1), for every tenth value of t, ~αt = 0.1, and the remaining nine of every ten values have αt = 0.9. Despite having nine of every ten exponents equal to 0.9, the numerals p = 1, …, 7 plotted at the location of (sample mean, sample variance) = (X¯n,σ^n2) for the three samples of each size n = 10p fall along the solid line with slope b = 2.1111 = (2 − 0.1)/(1 − 0.1) on log-log coordinates, which is the slope in (3.2) when α = 0.1. The solid lines in figure 1 are not fitted to the markers but are calculated a priori as the bth power of the abscissa. The values of αt > 0.1 have no effect on the slope because the mean and variance are determined by the largest observations, which arise from the RV rvs with the smallest tail exponent, here 0.1.
    Figure 1.

    Figure 1. Scatterplots on log–log axes of (sample mean X¯n, sample variance σ^n2) in three samples of Xt:=1/(2Zt2)1/[2αt]RV(αt),t=1,2,,n of each sample size n = 10p, p = 1, 2, …, 7 with pairwise correlation ρ = 0, 0.5, 0.9999 (panels (a), (b) and (c)). For example, in the left panel, the three appearances of the numeral ‘7’ represent (X¯n, σ^n2) in three samples of size n = 107. The solid lines (X¯s)b have the same slope b = 2.1111 = (2 − 0.1)/(1 − 0.1) on log–log axes, regardless of the value of the pairwise correlation ρ = 0, 0.5, 0.9999 (panels (a), (b) and (c)). (Online version in colour.)

    (b) Bivariate correlation

    In §6b, we simulate (X, Y) described in corollary 4.4 and theorem 4.5. In this case, although X and Y are generated by arbitrarily correlated (with |ρ| < 1) standard normal variates, X and Y are asymptotically independent stable laws. The purpose of these simulations is to shed light on this question: Does the sample correlation ρn (4.4) of n iid copies of (X, Y) approach its almost sure limit, 0, or its rescaled limiting distribution (4.8) rapidly enough as n → ∞ that, for sample sizes plausibly encountered in empirical applications, the limiting value or rescaled limiting distribution are useful approximations? We ask a similar question below in §6c, where we simulate (X, Y) for the linear model in (5.1).

    Here we generate n iid pairs (U, V) normally distributed with mean 0, variance 1, and correlation ρ ∈ ( − 1, + 1), for sample sizes n = 10p, p = 1, 2, 3, 4, and ρ = +0.9. From each (U, V) we compute (X, Y): = (1/|U|2, 1/|V|2). The n × 2 matrix in which each row is one (X, Y) pair of 1/2-stable rvs constitutes one simulation. For each sample size n, we generate s = 104 simulations. For each simulation, we record the sample correlation ρn between the n realizations of X and the n realizations of Y and calculate n (the left side of (4.8)).

    Figure 2 shows that the frequency histogram of the sample correlation ρn changes from a bimodal distribution with one mode at negative values and another near ρn = 1 when n = 10 (top left) to a unimodal distribution to the left of 0 for large sample sizes n = 104 (bottom left).

    Figure 2.

    Figure 2. For sample sizes n = 10p, p = 1, 2, 3, 4, we generate n iid pairs (U, V) normally distributed with mean 0, variance 1 and correlation ρ = +0.9 and then (X, Y): = (1/|U|2, 1/|V|2). The n × 2 matrix in which each row is one (X, Y) pair constitutes one simulation. For each sample size n, we generate s = 104 simulations. For each simulation, we record the sample correlation ρn between the n realizations of X and the n realizations of Y. The left column of this figure shows the frequency histogram of the s = 104 simulations of ρn, for sample sizes from n = 10 (top row) to n = 104 (bottom row). The histogram becomes increasingly concentrated on ( − 1, 0). The right column of this figure shows, on the ordinate with cross symbols, the order statistics of −R1R2, the limit rv on the right side of (4.8), and, on the abscissa and in the diagonal solid line of slope 1, the corresponding order statistics of n (see text for details of computation). For samples of size n = 10, the sample correlation ρn was negative in 3868 of 10 000 samples. For samples of size n = 104, the sample correlation ρn was negative in 9510 of 10 000 samples. The plots on the right include only and all the negative values of ρn and the corresponding order statistics of −R1R2. (4.8) predicts, as n → ∞, that the fraction of simulations in which ρn < 0 should approach 1, and that the cross symbols markers should approach the diagonal line. The right panels support both predictions. (Online version in colour.)

    The right side of (4.8) is −R1R2 where R1 and R2 are iid rvs with R1=Sα/Sα/2 and Sα and Sα/2 are given in (A 10) and (A 11) with α = 1/c = 1/2. To simulate each value of Sα and its corresponding Sα/2, we generate 105 iid unit exponential variates Ei, compute the 105 partial sums Γt:=i=1tEi,t=1,,105, raise each Γt to the power −1/α (for Sα) or −2/α (for Sα/2), and sum the 105 values of Γt1/α or Γt2/α. From each such pair (Sα,Sα/2) using the same values of Γt, we compute one value of R1=Sα/Sα/2. An independent repetition of this procedure generates one iid value of R2 and thence one value of −R1R2.

    For each sample size n, we sort the s simulated values of n (the left side of (4.8)) in increasing order, and we sort the s simulated values of −R1R2 (the right side of (4.8)) in increasing order. If the distribution n approaches the distribution of −R1R2 as n → ∞ as theorem 4.5 asserts, then the empirical quantile–quantile plot of −R1R2 on the vertical axis and n on the horizontal axis should fall along a diagonal straight line of slope 1 through the origin.

    However, the absolute error n − ( − R1R2) of the approximation in (4.8) is unbounded as n → ∞ because there is a non-zero (though asymptotically vanishing) probability that ρn will be near 1, which case n will be near n while −R1R2 is negative with probability 1, so the error could be close to n. Consequently, including positive values of ρn in the quantile–quantile plots in the right column of figure 2 would obscure the comparison of n with −R1R2 on ( − ∞, 0] where asymptotically an increasing proportion of the values of ρn will fall. For each sample size n, we report the number of simulations with ρn < 0 in the lower right corner of each panel in the right column of figure 2 and we restrict the quantile–quantile plot to these values and to the corresponding order statistics of −R1R2. As (4.8) predicts, as n → ∞, the fraction of simulations in which ρn < 0 approaches 1 and the order statistics of n approach the corresponding order statistics of −R1R2.

    (c) Linear model

    We simulate the linear model (5.1) with slope η = 1 under two different assumptions: (i) that the independent variable X and the noise (or random perturbation) variable Z are iid 1/2-stable, so that in (5.2) we have C = 1; and (ii) that X is 1/2-stable and Z is standard normal, which is light-tailed, and X and Z are independent, so that in (5.2) C = 0.

    Under (i), assuming X and Z are iid α-stable with α = 1/2, ~η = C = 1, the angular measure (5.3) simplifies to

    G()=21/421/4+1εarctan1()+121/4+1επ/2()0.5432ε0.7854()+0.4568ε1.5708().
    The limiting distribution of ρn, given by the right side of (4.5), depends on the three stable rvs S1/4,0, S1/4,1 and S1/4,2 with joint distribution specified in appendix A(c), (A 18)–(A 20). To simulate their joint distribution, for each realization ω, each Γt is a cumulative sum of t iid exponential rv with parameter 1, as described in greater detail in §6b. From (5.3), in the limit of large n, each Θt=arctan(1)0.7854 with probability 0.5432 and Θ t = π/2 ≈ 1.5708 with probability 0.4568, as calculated numerically just above. We sum the summands on the right sides of (A 18)–(A 20) until the sums quasi-converge.

    Under (ii), with α = 1/2, η = 1, C = 0, the angular measure (5.3) simplifies to

    G()=εarctan1()ε0.7854()
    and the limiting distribution of ρn is a point mass at 1.

    As above, n = 10p, p = 1, 2, 3, 4, denotes the sample size or number of realizations of (X, Y). For each sample size n, we simulate s = 1000 samples each with n copies of (X, Y) and plot the frequency histogram of the s values of the sample correlation coefficient ρn for these s samples. We compare this frequency histogram with the limiting distribution calculated in §5.

    To simulate s samples of size n of the 1/2-stable rv X, we let each element of the n × s matrix X be iid distributed as the reciprocal of the square of an iid standard normal. Each column of X represents one sample of size n of X.

    Under assumption (i) that X and Z are iid 1/2-stable, we also simulate s samples of size n of the 1/2-stable rv Z by exactly the same procedure, independently of X. Each column of the n × s matrix Z represents one sample of size n of Z. Then Y = ηX + Z gives an n × s matrix containing s samples of size n of Y according to (5.1). We assume η = 1 here, so Y = X + Z. The top row of figure 3 shows one sample of size n = 104 under assumption (i). The modal values of arctan(Y/X) from s iid simulations of samples of size n = 104 (second row of figure 3) approximate arctan(1) on the left and π/2 on the right (lower left panel), with increasing concentration at these modal values when the radius in polar coordinates R: = (X2 + Y2)1/2 falls in the upper decile of all 107 values of R (lower right panel). For samples of increasing size, the sample correlation coefficient ρn is increasingly concentrated on the interval [0, 1], with modal frequencies at the extremes (figures 4 and 6).

    Figure 3.

    Figure 3. Left to right, top to bottom: (a) Scatterplot on linear scales of (X, Y), Y = X + Z with X, Z iid 1/2-stable, in a single sample of size n = 104. (b) Scatterplot on logarithmic scales of the same sample shown on the left. (c) Frequency histogram of n × s = 104 × 103 = 107 values of arctan(Y/X) from the pooled s simulations of samples of size n. The modal values approximate arctan(1) on the left and π/2 on the right. (d) Frequency histogram of arctan(Y/X) from the 106 values of (X, Y) for which the radius in polar coordinates R: = (X2 + Y2)1/2 falls in the upper decile of all 107 values of the radius. The values of arctan(Y/X) are more concentrated at arctan(1) and π/2. (Online version in colour.)

    Figure 4.

    Figure 4. Frequency histograms of the sample correlation coefficient ρn of samples of (X, Y), ~Y = ηX + Z when Z is 1/2-stable (heavy-tailed) as in figure 3. We assume η = 1. Top to bottom: Sample sizes are (a) n = 10, (b) n = 102, (c) n = 103, and (d) n = 104. Modal sample correlation coefficients are near 0 and near 1.

    Under assumption (ii) that X is 1/2-stable, Z is standard normal, and X and Z are independent, we generate another independent n × s matrix X as above and an n × s matrix N with iid standard normal elements N. Then Y = ηX + N = X + N. The top row of figure 5 shows one simulated sample of size n = 104 under assumption (ii). The top right panel shows on log–log coordinates only the pairs (X, Y) of this sample where X > 0, Y > 0. Unlike the previous case of heavy-tailed noise, here the modal values of arctan(Y/X) from the pooled s simulations of samples of size n = 104 (lower row of figure 5) centre on and are near arctan(1)=π/40.7854, with increasing concentration at this modal value when the radius in polar coordinates R: = (X2 + Y2)1/2 falls in the upper decile of all 107 values of R (lower right panel). Under assumption (ii), for samples of increasing size, the sample correlation coefficient ρn is concentrated around 1, with a range that decreases very rapidly toward 0 (figure 6): these results are consistent with the theory and figure 5c,d where the histogram of arctan(Y/X) is heavily concentrated at π/4.

    Figure 5.

    Figure 5. Left to right, top to bottom: (a) Scatterplot on linear scales of (X, Y) where Y = X + U with X iid 1/2-stable, U iid standard normal. We have s = 103 samples each of size n = 104, giving a total of n × s = 104 × 103 = 107 pairs (X, Y). (b) Scatterplot on logarithmic scales of only the 8 887 329 pairs (X, Y) where X > 0, Y > 0. The same simulations appear in (A) and (B) but only the positive pairs are selected for (B). (c) Frequency histogram of 107 values of arctan(Y/X) from s iid simulations of samples of size n. The modal values centre on arctan(1)=π/40.7854. (d) Frequency histogram of arctan(Y/X) from the 106 values of (X, Y) for which the radius in polar coordinates R: = (X2 + Y2)1/2 falls in the upper decile of all 107 values. The values are more concentrated at arctan(1). (Online version in colour.)

    Figure 6.

    Figure 6. Column 1: sample size n. Column 2: median of the s = 1000 values of the sample correlation coefficient ρn of (X, Y), Y = ηX + Z with X, Z iid 1/2-stable, η = 1. The noise term is heavy–tailed. Column 3: range (highest minus lowest) of the s = 1000 values of the sample correlation coefficient in column 2. Column 4: median of the s = 1000 values of the sample correlation coefficient of (X, Y), Y = ηX + U with X iid 1/2-stable, U iid standard normal, η = 1. The noise term is light–tailed. Column 5: range (highest minus lowest) of the s = 1000 values of the sample correlation coefficient of the sample correlation coefficient in column 4. With heavy-tailed noise, the sample correlation coefficient is increasingly covers the full interval [0, 1] as sample size n increases. With light-tailed noise, the sample correlation coefficient is increasingly tightly concentrated near 1 as sample size n increases.

    Data accessibility

    This article does not contain any additional data.

    Authors' contributions

    All three authors contributed ideas, mathematical analysis, writing and editing. Cohen did simulations. All authors approved the final manuscript and agree to be held accountable for all aspects of the work.

    Competing interests

    We declare we have no competing interests.

    Funding

    J.E.C.’s research was partially supported by Columbia University’s Earth Institute. R.A.D.’s research was partially supported by NSF grant no. DMS 2015379 to Columbia University. G.S.’s research was partially supported by ARO grant no. W911NF-18-10318 to Cornell University.

    Acknowledgements

    We thank Hongyuan Cao for drawing attention to Pillai & Meng [9] and for correcting two typographical errors in a previous draft. We thank Abootaleb Shirvani, Svetlozar Rachev and an anonymous referee for thoughtful reviews.

    Appendix A

    (a) Heavy-tailed distributions and point process methods: background

    Point process methods provide indispensable insight into limit theory for statistics arising from heavy-tailed data. The methods have been championed by Resnick [1,2,19], and others. We will discuss the relevant theory in the setting of §4 and leave the general formulation to the references in Resnick [2].

    Here the Euclidean space E is defined as E = (0, ∞] or (0,]2(0,0).1 A point measure ξ on E is a non-negative integer-valued measure on the Borel σ-field, B(E), having the form

    ξ()=t=1εxt(),
    where εx( · ) is the delta measure with unit mass at x, and xi ∈ E. That is, for AB(E),
    εx(A)={1,xA,0,xA.
    We say that a point measure is Radon on E if it is finite on all compact sets.

    We define the vague topology on the class Mp(E) of Radon point measures in terms of the space CK(E) of continuous functions on E with compact support. If ξn, ξ ∈ Mp(E), then ξnvξ if and only if for every f ∈ CK(E),

    ξn(f):=Efdξnξ(f):=Efdξ.

    A point process is a random element of Mp(E) defined on some probability space (Ω,F,P). A Poisson point process ξ is defined in terms of a Radon intensity measure μ and is often called a Poisson random measure, denoted by PRM(μ) [1]. Such a point process satisfies the following assumptions:

    (i)

    For relatively compact AB(E), ξ(A) has a Poisson distribution with mean μ(A).

    (ii)

    For relatively compact and disjoint A,BB(E), ξ(A) and ξ(B) are independent.

    The Laplace functional Lξ(f) of a point process ξ()=t=1εXt() is

    Lξ(f):=E(exp{ξ(f)})=E(exp{t=1f(Xt)}) A 1
    for non-negative f ∈ CK(E). When ξ is a PRM(μ), the Laplace functional becomes [1, Sect. 5.3.2]
    Lξ(f)=exp{E(1ef)dμ}. A 2
    A sequence of point processes ξn converges in distribution to the point process ξ if and only if their respective Laplace functionals also converge, i.e.
    ξndξif and only ifLξn(f)Lξ(f)for all non-negativefCK(E). A 3
    A powerful result about the convergence of point processes of heavy-tailed one-dimensional data is:

    Theorem A.1.

    Suppose {Xt} is an iid sequence of observations with distribution F ∈ RV(α) as in (2.1), and define an by (2.3). Let {Et} be an iid sequence of unit exponential rvs, and define Γt:=i=1tEi,t=1,2,. Then the sequence of point processes Nn( · ) defined on the left in (A 4) converges in distribution to the point process N( · ) defined on the right in (A 4) as n → ∞:

    Nn():=t=1nεan1Xt()dN():=t=1εΓt1/α(). A 4
    The limit point process N is a PRM(αxα−1 dx).

    This result follows directly from Resnick [1, theorem 5.3].

    We recall from elementary stochastic processes that the Γt’s are points of a homogeneous Poisson process on (0, ∞) with rate one. By a change of variables, the points Γt1/α are then points of a Poisson process with intensity measure αxα−1 dx.

    To consider sample correlations between pairs of rvs, we extend regular variation to bivariate random vectors. The nonnegative bivariate random vector (X, Y) is said to be RV if the radial part R = (X2 + Y2)1/2 is RV(α) and the angular part Θ=arctan(Y/X) becomes independent of R as R → ∞. More precisely, if a~n is the 1 − 1/n quantile of R and G is a distribution function on [0, π/2], then for all r > 0,

    nP(R>a~nr)rαandP(Θ|R>a~n)dG() as n. A 5
    The limiting relations (A 5) are equivalent to
    nP((a~n1R,Θ)(,))vν~(dr,dθ):=αrα1dr×G(dθ) as n A 6
    in polar coordinates or, in the original Euclidean coordinates,
    nP(a~n1(X,Y)(,))vν(dx,dy) as n, A 7
    where v represents vague convergence of measures on (0, ∞]2 and ν(dx, dy) is the measure in rectangular coordinates corresponding to the polar coordinate version ν~(dr,dθ).

    The point process convergence that drives much of the limit theory for bivariate heavy-tailed data is:

    Theorem A.2.

    Suppose {(Xt, Yt), t = 1, 2, …} is an iid sequence of observations with distribution that satisfies (A 6) or (A 7). Then the sequence of point processes Nn( · ) converges in distribution as n → ∞ to a PRM(ν) N( · ), where

    Nn():=t=1nεa~n1(Xt,Yt)()dN():=t=1εΓt1/α(cosΘt,sinΘt)() as n. A 8
    Here {Θ t} is an iid sequence with distribution G(dθ) in (A 6) that is also independent of the sequence {Γt}.

    The proof of this result is in Resnick [1, theorem 5.3].

    The limit point process N( · ) in theorem A.2 has an instructive representation. The Γj1/α and Θ j correspond to the limit of the radial and angular parts, respectively, of the pairs a~n1(Xj,Yj). To verify that N is a PRM(ν), we use Laplace functionals. Specifically, by conditioning on the Poisson points Γj and by using the independence of the Θ i, we have

    LN(f)=E(exp{j=1f(Γj1/α(cosΘj,sinΘj))})=E(j=1[0,π/2](exp{f(Γj1/α(cosθ,sinθ))}G(dθ)))=E(exp{j=1h(Γj1/α)}),[h(r):=log[0,π/2](exp{f(r(cosθ,sinθ))}G(dθ))]=exp{(0,)(1exp{h(r)})αrα1dr}=exp{(0,)(1exp{f(x,y)})ν(dx,dy)}
    as desired.

    (b) Limit theory for Taylor’s Law

    The following theorem describes the joint convergence of the partial sums and partial sums of squares in example 2.1 and the limiting behaviour of Taylor’s Law.

    Theorem A.3.

    Suppose {Xt} is an iid sequence of observations with F ∈ RV(α) as in (2.1). Define an by (2.3) and bn by (2.5). Then

    (an1t=1n(Xtbn),an2t=1n(XtX¯n)2)d(Sα,Sα/2) as n. A 9
    The limit rvs Sα and Sα/2 are stable and defined through the points of the point process N in theorem A.2. Namely,
    Sα={j=1(Γj1/αE[Γj1/α1{Γj1/α1}]),ifα[1,2),j=1Γj1/α,ifα(0,1), A 10
    and
    Sα/2=j=1Γj2/α. A 11
    For α ∈ (0, 1), Taylor’s Law (3.3) follows: for b = (2 − α)/(1 − α),
    nan2nbanbWn=1L~2b(n)Wndi=1Γi2/α(i=1Γi1/α)basn. A 12

    Proof.

    This is a special case of a standard result in, for example, LePage et al. [20, theorems 1 and 1’], where the derivation of (2.10) from (2.8) is required. ▪

    The representations in (A 10) and (A 11) are often referred to as the LWZ representation of a stable distribution. Davis [16] gives the companion result when {Xt} is a stationary time series satisfying certain dependence conditions.

    (c) Limit theory for the sample correlation

    Proof of theorem 4.1.

    The condition (4.3) implies the vague convergence in (A 7). Indeed, (4.3) implies that (A 7) holds for sets of the form Ax,yc=([0,x]×[0,y])c, which can then be extended to all two-dimensional Borel sets that are bounded away from (0, 0). Moreover, the constants an and a~n are related via

    nP(R>a~n)1andnP(R>an)ν({(x,y):x2+y2>1})asn,
    which implies
    a~nν1/α({(x,y):x2+y2>1})anasn. A 13

    For 0 < ϵ < 1 fixed, consider the function fϵ(x, y) = xy1{ϵ<xy≤1/ϵ}, which has compact support on E and is continuous except on a set of ν measure 0. By the weak convergence of Nn to N in theorem A.2, it follows that

    Nn(fϵ)=a~n2t=1nXtYt1{a~n2ϵ<XtYta~n2/ϵ}dN(fϵ)=t=1Γt2/αcosΘtsinΘt1{ϵ<Γt2/αcosΘtsinΘt1/ϵ}asn. A 14
    Since Γt ∼ t a.s. by the strong law of large numbers, the series t=1Γt2/α is summable a.s. Hence
    N(fϵ)a.s.N(f0)=t=1Γt2/αcosΘtsinΘt as ϵ0. A 15
    Moreover, for any δ > 0,
    P(|Nn(fϵ)Nn(f0)|>δ)P(a~n2t=1nXtYt1{XtYt>a~n2/ϵ}>δ2)+P(a~n2t=1nXtYt1{XtYta~n2ϵ}>δ2)=I+II.
    Term I is bounded by
    P(t=1n{a~n2XtYt>1ϵ})nP(a~n2X1Y1>1ϵ)nν{(x,y):xy1ϵ}0asϵ0. A 16
    Hence limϵ0lim supnI=0. Markov’s inequality bounds term II by
    II2δ1na~n2E[X1Y11{X1Y1a~n2ϵ}].
    Next
    P(XY>an2)P(X2>an2)=nP(XY>an2)nP(X2>an2)ν({(x,y):xy>1})asn,
    where the limit is positive if X and Y are asymptotically dependent, or zero if X and Y are asymptotically independent. Either way, it follows that there exists a finite positive constant C such that, for all z > 0,
    P(XY>z)CP(X2>z).
    Consequently
    E[XY1{XYa~n2ϵ}]=0a~n2ϵP(XY>z)dz0a~n2ϵCP(X2>z)dz=CEX21{X2a~n2ϵ}.
    Now from Karamata’s theorem [1, theorem 2.1], we have E(X21{X2z})2/(2α)zP(X2>z) as z → ∞ and hence for some K ∈ (0, ∞),
    limϵ0lim supnIIlimϵ0lim supn2Cδ1na~n2E[X21{X2a~n2ϵ}]=limϵ0Kϵ1α/2=0. A 17
    Combining (A 14), (A 16) and (A 17), we conclude that
    a~n2t=1nXtYtdSα/2,0:=t=1Γt2/αcosΘtsinΘtasn. A 18
    Essentially, the same point process argument, although somewhat easier, can be used to show
    a~n2t=1nXt2dSα/2,1:=t=1Γt2/αcos2Θtasn A 19
    and
    a~n2t=1nYt2dSα/2,2:=t=1Γt2/αsin2Θtasn, A 20
    where the convergences in (A 18)–(A 20) are joint. Applying the continuous mapping theorem gives (4.5).

    Finally, by the form of the limit, one sees that the numerator Sα/2,0 = 0 a.s. if and only if the angular measure concentrates at 0 and π/2, which is equivalent to asymptotic independence. ▪

    Proof of theorem 4.2.

    If X ∈ RV(α) and Y ∈ RV(β) with 0 < α < β < 2, then Yβ/α ∈ RV(α). So replacing the condition on (X, Y) in theorem 4.1 with the assumption that (X, Yβ/α) is bivariate RV, theorem A.2 implies that the same point process convergence in (A 8) holds with Yt replaced by Ytβ/α. Using the same proof as for theorem 4.1 with the function fϵ(x,y)=xyα/β1{ϵ<xyα/β1/ϵ} implies the analogue of (A 18), namely,

    a~n1α/βt=1nXtYtdSαβ/(α+β),0:=t=1Γt1/α1/βcosΘtsinα/βΘtasn. A 21
    From the same point process result, we obtain
    a~n2α/βt=1(YtY¯n)2dSβ/2,2:=t=1Γt2/βsin2α/βΘtasn. A 22
    The remainder of the proof is identical to that for theorem 4.1. ▪

    (d) Limit theory for correlation of heavy-tailed data from correlated normal random variables

    In this section, we prove theorem 4.5, corollaries 4.6 and 4.4.

    Proof of theorem 4.5 and corollary 4.6.

    Recall that X: = 1/|U|c, Y: = 1/|V|c for some c > 1 where (U, V) is bivariate normal with means 0, variances 1, and correlation ρ ∈ ( − 1, 1). Let ϕ(x): = (2π)−1/2exp ( − x2/2), x ∈ ( − ∞, + ∞), be the standard normal pdf. It follows directly from proposition 4.3 that

    P(X>x)x1/c2ϕ(0)=x1/c2π.
    Now setting an = (2/π)c/2nc, it follows that nP(X > an) → 1 as n → ∞.

    Establishing

    Pr(XY>x)2cπ(1ρ2)1/2x1/clogxasx A 23
    is equivalent to proving
    Pr(|UV|<ϵ)2π(1ρ2)1/2ϵlog(1ϵ)asϵ0. A 24
    The probability on the left side of (A 24) can be written
    Pr(|UV|<ϵ)=(12π)1/2ex2/2[Φ(ϵ/|x|ρx(1ρ2)1/2)Φ(ϵ/|x|ρx(1ρ2)1/2)]dx, A 25
    where Φ is the standard normal cdf. For an upper bound, notice that the integral in the right side of (A 25) does not exceed
    2ϵ+|x|>ϵex2/2[Φ(ϵ/|x|ρx(1ρ2)1/2)Φ(ϵ/|x|ρx(1ρ2)1/2)]dx2ϵ+(12π)1/2|x|>ϵex2/22ϵ|x|(1ρ2)1/2dx
    since
    Φ(b)Φ(a)(12π)1/2(ba)
    for any a < b. We conclude that
    P(|UV|<ϵ)(2π)1/2ϵ+2π(1ρ2)1/2ϵϵ1xex2/2dx.
    Since by L’Hôpital’s rule
    ϵ1xex2/2dxlog(1ϵ)asϵ0,
    we conclude that
    lim supϵ0P(|UV|<ϵ)ϵlog(1/ϵ)2π(1ρ2)1/2. A 26

    It remains to obtain a matching lower bound of the right side of (A 25). Now we use the bound, for any a < b,

    Φ(b)Φ(a)(12π)1/2(ba)exp{2max(a2,b2)2}.
    Choose θ = θ(ϵ) ↓ 0 as ϵ ↓ 0 such that
    log(1θ)=o(log(1ϵ))asϵ0. A 27
    Let M be a large number. We have by (A 25),
    P(|UV|<ϵ)12πϵM|x|θex2/22ϵ|x|(1ρ2)1/2exp{12(ϵ|x|+|ρx|)2}dx=2π(1ρ2)1/2ϵϵMθ1xex2/2exp{ϵ22x2ϵ|ρ|ρ2x22}dx2π(1ρ2)1/2ϵexp{12M2ϵθ2}ϵMθ1xdx.
    By (A 27) we conclude that
    lim supϵ0P(|UV|<ϵ)ϵlog(1/ϵ)2π(1ρ2)1/2exp{1/2M2}.
    Letting M → ∞ provides a lower bound matching (A 26) and, hence, proves (A 23).

    The normalizing constant associated with the distribution of XY is given by a~n=K(nlogn)c, where K = (2/π)c(1 − ρ2)c/2 so that nP(XY>a~n)1 as n → ∞. Note that a~n/an. Also, for M > 0 fixed, large n, and all x, y > 0,

    nP(XY>a~nx,X>any)nP(X>anM)+nP(XY>a~nx,X>any,XanM)nP(X>anM)+nP(Y>a~nanxM,X>any). A 28
    As n → ∞, the first term converges to M−1/c, while the second term is
    nP(|U|(y1/can1/c),|V|const(a~nan)1/c)Cn(an1/c)(a~nan)1/cφρ(0,0),
    where C is a generic constant and φρ is the joint density of (U, V). Since nan1/c(2/π)1/2 and a~n/an as n → ∞, it follows that this term goes to 0 as n → ∞. Consequently, the lim supn followed by the limit as M → ∞ of the left side of (A 28) is 0. In other words, X and XY are asymptotically independent. From this property [12], we have the point process convergence
    Nn:=t=1nε(an1Xt,an1Yt,a~n1XtYt)dN:=i=1[ε(Γi,1c,0,0)+ε(0,Γi,2c,0)+ε(0,0,Γi,3c)] as n, A 29
    where {Γi,1},{Γi,2}, and {Γi,3} are iid copies of the points {Γi} defined in theorem A.1.

    From this point process, by the same argument as used in the proof of theorem 4.1 (see appendix A(c)), we obtain

    (an2t=1nXt2,an2t=1nYt2,a~n1t=1nXtYt)d(S1/(2c),1,S1/(2c),2,S1/c) as n, A 30
    where the limit rvs are independent and given by
    (S1/(2c),1,S1/(2c),2,S1/c):=(i=1Γi,11/(2c),i=1Γi,21/(2c),i=1Γi,31/c). A 31
    Since an2/a~n=(1ρ2)c/2(n/logn)c, we have by the continuous mapping theorem
    (nlogn)cρ~nd(1ρ2)c/2S1/cS1/(2c),1S1/(2c),2asn, A 32
    as desired.

    To prove corollary 4.6 and (4.9), we note that by theorem 4.3, X and Y are asymptotically independent. By the earlier argument for (2.10), it follows that, as n → ∞,

    (an1t=1nXt,an2t=1n(XtX¯n)2,an1t=1nYt,an2t=1n(YtY¯n)2)d(S1/c,1,S1/(2c),1,S1/c,2,S1/(2c),2),
    where S1/c,i, S1/(2c),i are given by (A 10) and (A 11) with α = 1/c and are independent for i = 1, 2. Then by the continuous mapping theorem
    i=1nXii=1nYit=1n(XtX¯n)2t=1n(YtY¯n)2dR1R2, A 33
    where Ri=S1/c,i/S1/(2c),i, i = 1, 2. Also, since nan2a~n0 as n → ∞, it follows that nan2i=1nXiYiP0 as n → ∞. ▪

    Proof of corollary 4.4.

    Let 0 < α < γ < 1, and let S ∈ RV(γ). Then for some c > 0, P(S > x) ∼ cxγ as x → ∞ so the assumption F¯XRV(α) implies that P(X > x)/P(S > x) → ∞ as x → ∞. Therefore, a constant C exists such that P(X > x) > P(S > x) for all real x ≥ C. Therefore, P(X + C > x) > P(S > x) for all x ≥ C > 0, while for all x < C, by definition of C we have P(X + C > x) = 1 ≥ P(S > x). Consequently, a constant C exists such that P(X + C > x) ≥ P(S > x) for all xR. That is, by definition, X + C is stochastically larger than S, or X + C ≥ st S. When 0 < γ < α < 1, the argument works with S and X exchanged, hence there exists a constant C > 0 such that S + C ≥ st X.

    For the next results, we require a well-known inequality (A 34): if γ ∈ (0, 1), then for some c > 0, a γ-stable rv S satisfies

    P(Sϵ)exp(cϵγ/(γ1))asϵ0. A 34
    We prove (A 34) here for ease of reference. For any θ > 0, if S is γ-stable, then its Laplace transform is Eexp(θS)=exp(θγ) [7]. For any ϵ>0,θ>0, using Markov’s inequality for the following inequality, we have
    P(Sϵ)=P(eθSeθϵ)e+θϵEexp(θS)=exp(θγ+θϵ). A 35
    Now pick any a ∈ (0, 1) and set θ = 1/(γ−1) to get θγ=aγϵγ/(γ1) and θϵ = 1/(γ−1)+1 = γ/(γ−1). Therefore θγ+θϵ=aγϵγ/(γ1)+aϵγ/(γ1)=aϵγ/(γ1)(1aγ1). Since γ ∈ (0, 1) and a ∈ (0, 1), we have a1−γ < 1 or 1 − aγ−1 < 0 or c: = −a(1 − aγ−1) > 0 in (A 34), which is now proved.

    Let Xi, i = 1, 2, … be iid copies of X ∈ RV(α). We consider two cases: first that 0 < τ < 1/α, then that 1/α < τ < ∞.

    Suppose first that 0 < τ < 1/α. We claim that

    nτi=1nXia.s. asn. A 36
    We may and will assume that τ ≥ 1. Choose α < γ < 1/τ, and choose C > 0 such that X + C ≥ st S, where S is γ-stable. Let S1, S2, … be iid copies of S. Then for any t > 0
    P(nτi=1nXit)P(i=1nSiCntnτ)P(i=1nSi(t+C)nτ)=P(n1/γS1(t+C)nτ)=P(S1(t+C)n(1/γτ))exp{cn(1γτ)/(1γ)}
    for some c = c(t) > 0. Therefore, for any t > 0,
    n=1P(nτi=1nXit)<,
    so by the first Borel–Cantelli lemma [21, p. 201], (A 36) holds.

    Next, suppose that τ > 1/α. We claim that

    nτi=1nXi0a.s. asn. A 37
    Choose 1/τ < γ < α, and choose C > 0 such that S + C ≥ st X, where S is a γ-stable rv. Let S1, S2, … be iid copies of S. Then for any t > 0, for all n large enough,
    P(nτi=1nXi>t)P(i=1nSi+Cn>tnτ)P(i=1nSi>tnτ2)=P(n1/γS1>tnτ2)=P(S1>tnτ1/γ2)cn(γτ1)
    for some c > 0. Therefore, for any t > 0,
    m=1P((2m)τi=12mXi>t)<,
    so by the first Borel–Cantelli lemma [21, p. 201],
    (2m)τi=12mXi0a.s. asm.
    Therefore, with
    mn=log2nasn
    we have
    nτi=1nXi(2mn)τi=12mn+1Xi=2τ(2mn+1)τi=12mn+1Xi0a.s. as n,
    proving (A 37).

    To prove that ρn → 0 a.s. as n → ∞, we pick any δ ∈ (c, 2c) and multiply numerator Nn and denominator Dn of ρn by nδ to get

    ρn=nδNnnδDn.
    It follows from (A 37) that nδNn → 0 a.s. as n → ∞, and from (A 36) that nδDn → ∞ a.s. as n → ∞. Therefore, ρn → 0 a.s. as n → ∞. ▪

    Shapiro [22] proved a special case of our results, namely, powers of reciprocals of rvs with a continuous density that is positive at 0 have Pareto-like tails.

    Footnotes

    1 Here E corresponds to a one-point compactification of [0, ∞) or [0, ∞)2 with the origin removed. Relatively compact sets are sets bounded away from the origin, i.e. AB(E) is relatively compact if, for some small ϵ > 0, A ⊂ {x :ϵ ≤ x} or in the two-dimensional case A{(x,y):ϵ||(x,y)||}.

    Published by the Royal Society. All rights reserved.

    References