# Heavy-tailed distributions, correlations, kurtosis and Taylor’s Law of fluctuation scaling

## Abstract

Pillai & Meng (Pillai & Meng 2016 *Ann. Stat.* **44**, 2089–2097; p. 2091) speculated that ‘the dependence among [random variables, rvs] can be overwhelmed by the heaviness of their marginal tails ·· ·’. We give examples of statistical models that support this speculation. While under natural conditions the sample correlation of regularly varying (RV) rvs converges to a generally random limit, this limit is zero when the rvs are the reciprocals of powers greater than one of arbitrarily (but imperfectly) positively or negatively correlated normals. Surprisingly, the sample correlation of these RV rvs multiplied by the sample size has a limiting distribution on the negative half-line. We show that the asymptotic scaling of Taylor’s Law (a power-law variance function) for RV rvs is, up to a constant, the same for independent and identically distributed observations as for reciprocals of powers greater than one of arbitrarily (but imperfectly) positively correlated normals, whether those powers are the same or different. The correlations and heterogeneity do not affect the asymptotic scaling. We analyse the sample kurtosis of heavy-tailed data similarly. We show that the least-squares estimator of the slope in a linear model with heavy-tailed predictor and noise unexpectedly converges much faster than when they have finite variances.

### 1. Introduction

#### (a) Motivations

This work has a practical motivation and a theoretical motivation. Both are driven by the need to cope with exceptionally large events such as outbreaks of human, animal and plant diseases; earth events like cyclones, tsunamis, earthquakes, volcanoes, fires, droughts and floods; and extreme fluctuations in finance, insurance, trade, production and employment, among others. Both motivations are enabled by the on-going development since the work of Paul Lévy in 1924 of a systematic theory of random events with infinite mean [1–3]

For example, one practical motivation is the need to understand the effect of heavy-tailed data on an empirical regularity called Taylor’s Law (TL) or Taylor’s power law in the biological sciences, and fluctuation scaling in the physical sciences. TL asserts that the sample variance is approximately proportional to some power of the sample mean in a set of samples. Explicitly, for some constants *a* > 0 and *b*, both of which are independent of the sample *i*,

*et al.*[4] and Taylor [5].

In statistics, a variance function is a function that specifies how the variance is related to the mean in a set of samples. To statisticians, TL is simply a power-law variance function. An early, perhaps the first, occurrence of a power-law variance function in ecology appeared in Bliss [6]. From 1941 to 2017, users of TL assumed, usually without saying so, that observations were drawn from distributions with finite mean and finite variance. At the same time, heavy-tailed data were increasingly recognized in insurance, income distributions, earthquake magnitudes, financial fluctuations, meteorology and other sciences. Heavy-tailed data distributions have some or all infinite moments, such as infinite variance or infinite mean. Heavy-tailed distributions have much greater probabilities of extremely large observations than light-tailed distributions, such as the normal or Gaussian distribution. But samples from a heavy-tailed distribution always have a finite sample mean and finite sample variance. It is important to know under what conditions samples from a heavy-tailed distribution obey TL and, when TL holds, how the exponent *b* of TL relates to the parameter(s) of the underlying heavy-tailed distribution. Brown *et al.* [7] showed that certain random samples (sets of independently and identically distributed [iid] observations) from heavy-tailed distributions with infinite mean obey TL. These results are greatly extended here. In particular, we ask whether and when TL holds if observations are not iid, either because the observations are dependent or because the observations come from different distributions.

In addition to its empirical motivation, this work has a theoretical motivation. Drton & Xiao [8] conjectured, and proved when *m* = 2, a surprising and beautiful proposition subsequently proved for any integer *m* > 1 by Pillai & Meng [9]. For integer *m* > 1, let *X* = (*X*_{1}, …, *X*_{m}) and *Y* = (*Y*_{1}, …, *Y*_{m}) be independent copies of a multivariate normal vector *N*(0, Σ), where Σ = {*σ*_{ij}} ≥ 0 is any *m* × *m* covariance matrix with *σ*_{jj} > 0, 1 ≤ *j* ≤ *m*. Let (*w*_{1}, …, *w*_{m}) be weights, *w*_{j} ≥ 0, *j* = 1, …, *m*, and ${\mathrm{\Sigma}}_{j=1}^{m}{w}_{j}=1$. Then ${\mathrm{\Sigma}}_{j=1}^{m}{w}_{j}{X}_{j}/{Y}_{j}$ has the standard Cauchy distribution on the real line with probability density function (pdf) 1/[*π*(1 + *z*^{2})]. The surprise is that the covariance matrix Σ has no effect on the distribution of ${\mathrm{\Sigma}}_{j=1}^{m}{w}_{j}{X}_{j}/{Y}_{j}$.

Pillai & Meng [9, p. 2091] remarked: ‘A theoretical speculation from this unexpected result is that for a set of random variables ·· ·, the dependence among them can be overwhelmed by the heaviness of their marginal tails ·· ·. We invite the reader to ponder with us whether this is a pathological phenomenon or something profound’. The present work supports the speculation of Pillai & Meng [9] by giving multiple examples in which heavy tails outweigh correlations and heterogeneity of distributions. Here is a preview of coming attractions.

#### (b) Organization of this paper

Section 2 reviews the definitions and some properties of regularly varying (RV), including heavy-tailed, random variables (rvs), those in which the survival function or upper tail is asymptotically a power function with negative exponent −*α*, where *α* > 0. An important classical limit theorem stated here is that sums of appropriately centred and normalized RV rvs converge to stable rvs with index *α* if *α* < 2. We illustrate the uses of this theorem by deriving the limiting stable laws that describe the asymptotic behaviour of the sums of squared rvs and the sums of squared deviations from the sample mean.

Section 3 analyses the asymptotic behaviour of Taylor’s Law when the observations are iid heavy-tailed RV rvs; or when the observations are RV rvs with distinct tail indices (heterogeneity); or when the observations are the reciprocals of powers greater than one of arbitrarily (but imperfectly) positively or negatively dependent normals. These results provide far-reaching generalizations of Taylor’s Law. The ratio of the sample variance to a power of the sample mean, which is the focus of Taylor’s Law, is formally similar to the sample kurtosis, which is the ratio of the fourth central moment to a power (the square) of the sample variance. Thus, we use the methods developed to analyse Taylor’s Law to derive the asymptotic behaviour of the sample kurtosis when the data are iid RV rvs. In principle, these results could be extended to heterogeneous and dependent data.

Section 4 shows that under natural conditions the sample correlation of a pair of RV rvs converges to a generally random limit, which we express in terms of stable laws. This limit is zero in case the rvs are the reciprocals of powers greater than one of two arbitrarily (but imperfectly) positively or negatively dependent normal rvs. Surprisingly, in this case the sample correlation multiplied by the sample size has a limiting distribution that is concentrated on the negative half-line, regardless of whether the two normals are positively or negatively dependent. These findings support the speculation of Pillai & Meng [9].

Section 5 compares the behaviour of the least-squares estimator of *η* in a linear model *Y* = *ηX* + *Z* under two different assumptions about *X* and *Z*: first, that *X* and *Z* are heavy-tailed RV rvs with asymptotically equivalent tails; and second, that *X* is heavy-tailed but the noise variable *Z* is light-tailed, possibly Gaussian. In both cases the estimator converges much faster than in the case when *X* and *Z* have finite variances. When *Z* is light-tailed, the sample correlation between *X* and *Y* converges in probability to 1.

Section 6 investigates by numerical simulation how rapidly some of the correlated, heterogeneous, RV rvs described in the preceding sections converge in distribution to their limit rvs expressed as functions of stable laws. The examples are drawn from Taylor’s Law, bivariate correlation, and the linear model.

For the sake of readability, §§2–5 give only a few simple proofs. Appendix A gives more detailed mathematical background on heavy-tailed distributions and point process methods. It then proves the major claims made regarding the limit theory for Taylor’s Law and the limit theory for the sample correlations of heavy-tailed data including the special example of the reciprocals of powers greater than one of correlated normals.

### 2. Background on heavy tails and regular variation

For heavy-tailed data when the population variance is infinite, the limit behaviour of the sample mean, sample variance, extreme order statistics and related statistics, is widely studied [1,2,10–14]. When the variance is finite, the classical central limit theorem implies that the sample mean, centred by its mean and normalized by its standard deviation, converges to a normal distribution. When the variance is infinite, the suitably centred and normalized partial sums also converge in distribution when the data are regularly varying (RV), with Pareto upper tails. So regular variation is often the starting point for modelling various statistics of univariate heavy-tailed data.

To be specific, assume the random variable (rv) *X* is positive and has distribution function *F*(*x*): = *P*(*X* ≤ *x*), *x* > 0 and survival function $\overline{F}(x):=1-F(x)=P(X>x)$. We say that *X* (or *F*) is RV with index *α* > 0, and we shall henceforth write *F* ∈ *RV*(*α*), if

*F*satisfies (2.1), then

*F*has a Pareto upper tail:

*L*(

*x*) is a slowly varying function (at infinity). Examples of slowly varying functions are $L(x)=\mathrm{log}x,\hspace{0.17em}\mathrm{log}\mathrm{log}x$ and their reciprocals. Assumption (2.1) implies that

*X*has finite moments only up to order

*α*:

*EX*

^{p}< ∞ for

*p*<

*α*and

*EX*

^{p}= ∞ for

*p*>

*α*. From (2.2), $E{X}^{\alpha}={\int}_{0}^{\mathrm{\infty}}P({X}^{\alpha}>x)\hspace{0.17em}\text{d}x={\int}_{0}^{\mathrm{\infty}}{x}^{-1}L({x}^{1/\alpha})\hspace{0.17em}\text{d}x$, so that the finiteness of $E{X}^{\alpha}$ depends on the integrability of

*x*

^{−1}

*L*(

*x*

^{1/α}) for

*x*large. For example, $E{X}^{\alpha}=\mathrm{\infty}$ if

*L*(

*x*) is constant, whereas $E{X}^{\alpha}<\mathrm{\infty}$ if

*L*(

*x*) ∼ (1/log

*x*)

^{1+ϵ}for some

*ϵ*> 0. Since we are interested here only in heavy-tailed distributions with infinite variance, we shall assume that

*α*< 2 throughout.

Let *X*_{1}, …, *X*_{n} be iid with distribution *F*. The sum $\sum _{t=1}^{n}{X}_{t}$, appropriately centred and normalized, converges in distribution as the sample size *n* → ∞. To this end, define the sequence of normalizing constants {*a*_{n}: *n* = 1, 2, …} as the sequence of 1 − 1/*n* quantiles of *F*, i.e.

*F*is continuous, then $\overline{F}({a}_{n})=1/n$. By replacing

*t*with

*a*

_{n}in (2.1) it can be shown that

*F*is Pareto, i.e. if $\overline{F}(x)={x}^{-\alpha}$ for

*x*≥ 1, then

*a*

_{n}=

*n*

^{1/α}.

A classical limit theorem [15] describes sums of heavy-tailed rvs. Let ${S}_{\alpha}$ be a stable rv with index *α* ∈ (0, 2) and shape parameter *β* = 1. When *α* ∈ (0, 1), ${S}_{\alpha}$ takes only non-negative real values. When *α* ∈ [1, 2), ${S}_{\alpha}$ takes negative real values, but the left tail is lighter than that of a normal distribution. When *α* = 2, ${S}_{\alpha}$ is normally distributed.

Define

*α*∈ (1, 2),

*b*

_{n}can be replaced by the (finite) mean

*EX*.

### Example 2.1 (sums of squares).

Because *X*^{2} ∈ *RV*(*α*/2), applying (2.6) to the sums of squares gives, for any *α* ∈ (0, 2),

The remarkable feature of this example is that the limit rvs on the right sides of (2.7) and (2.8) are identical. To give insight into why subtracting the sample mean in (2.8) makes no difference to the limit rv, we offer a brief proof. For the remainder of this article, for the sake of readability, we shall defer many detailed proofs to appendix A. For example, the nature of the dependence between ${S}_{\alpha}$ and *S*_{α/2} and the proof of (2.9) are given in theorem A.3.

### Proof of (2.8).

The sum of squared deviations from the sample mean on the left of (2.8) decomposes into

*ϵ*> 0 and all

*n*large. Moreover, since for

*α*> 1,

*b*

_{n}→

*EX*and for

*α*= 1,

*b*

_{n}is slowly varying [15], $n{b}_{n}/{a}_{n}^{2}\to 0$ and

*b*

_{n}/

*a*

_{n}→ 0. Using these facts and expanding the first term on the right of (2.10) gives

### 3. Heavy tails, Taylor’s Law and kurtosis

#### (a) Taylor’s Law with iid heavy-tailed data

If the RV heavy-tailed iid rvs *X*_{1}, …, *X*_{n} have tail index *α* < 1, then the centring constants are *b*_{n} = 0 and the population mean and population variance are infinite. In one of its many forms, TL considers the ratio of the sample variance divided by a power *b*, 0 < *b* < ∞, of the sample mean,

*W*

_{n}to converge in distribution, the order of the numerator and denominator must match, which requires that

*b*,

*a*

_{n}following (2.4). The numerator

*S*

_{α/2}and the denominator ${S}_{\alpha}^{b}$ are dependent, and the dependence is described explicitly in theorem A.3. Brown

*et al.*[7] obtained expression (3.2) for

*b*and some of the moments of the limiting distribution.

#### (b) Taylor’s Law with heterogeneous heavy-tailed data

In the previous subsection, the heavy-tailed observations {*X*_{t}} are assumed to be iid. Here we analyse independent data (or rvs) that are heterogeneous, i.e. not identically distributed.

Suppose that the observations are independent but come from two rvs *U* and *V*, where *U* ∈ *RV*(*α*) with *α* ∈ (0, 1) and *V* is dominated by *U* in the sense that

*X*

_{1}, …,

*X*

_{n}consist of a random number

*ν*

_{n}<

*n*of observations distributed as

*U*and

*n*−

*ν*

_{n}of observations distributed as

*V*, where ${\nu}_{n}/n\stackrel{P}{\to}p\in (0,1]$ as

*n*→ ∞. Here $\stackrel{P}{\to}$ means ‘converges in probability to’. This formulation can be interpreted quite generally:

*V*could represent a union of several pooled rvs, each with tails lighter than the tail of

*U*. Letting

*a*

_{n}be the 1 − 1/

*n*quantile of

*F*

_{U}, it is easy to show that

*P*(

*V*>

*x*) =

*o*(

*P*(

*U*>

*x*)) for

*x*large, the argument in the proof of theorem 4.1 in appendix A(c) can be used to show that

If *f*(*x*), *g*(*x*) are real-valued functions of real *x* and *g*(*x*) > 0 for all sufficiently large *x*, then *f*(*x*) ∼ *g*(*x*) means lim _{x→∞} *f*(*x*)/*g*(*x*) = 1. Then, on the other hand,

*S*

_{α/2}and the denominator ${S}_{\alpha}^{b}$ is described explicitly in theorem A.3.

#### (c) Taylor’s Law with correlated heavy-tailed data

Here we suppose the observations (or rvs) are pairwise correlated and identically distributed. We shall show that the scaling exponent *b* in TL remains the same as (3.2), but the limit changes. Specifically, let *N*_{0}, *N*_{1}, … be iid standard normal rvs and let 0 ≤ *ρ* < 1. Let ${Z}_{i}=\sqrt{\rho}{N}_{0}+\sqrt{1-\rho}{N}_{i},\hspace{0.17em}i=1,2,\dots $. Then for *i* = 1, …, each *Z*_{i} is normal, *EZ*_{i} = 0, Var(*Z*_{i}) = 1, and for *i* ≠ *j* ≥ 1, Cov(*Z*_{i}, *Z*_{j}) = *ρ*. Let 0 < *α* < 1 and define ${X}_{i}:=1/{({Z}_{i}^{2})}^{1/(2\alpha )},\hspace{0.17em}i=1,2,\dots $. Define *W*_{n} in terms of *X*_{i} as in (3.1) and *b* as in (3.2). Then

*α*/2,

*α*∈ (0, 1), respectively, and shape parameter

*β*= 1, and

*N*is a standard normal rv independent of

*S*

_{α/2}and ${S}_{\alpha}$. As previously, the numerator

*S*

_{α/2}and the denominator ${S}_{\alpha}^{b}$ are dependent because

*Γ*

_{j}in (A 10) is the same as

*Γ*

_{j}in (A 11), for each

*j*= 1, ….

To prove (3.5), it is enough to prove that for every fixed real *z*, if ${Z}_{i}:=\sqrt{\rho}z+\sqrt{1-\rho}{N}_{i},$ *i* = 1, 2, …, then

*X*

_{1},

*X*

_{2}, … are iid (because they are conditional on fixed

*z*) and

*X*is multiplied by the factor of

*ρ*= 0 in §3a. That means that, in the limit, every term ${\mathit{\Gamma}}_{j}^{-1/\alpha}$ in both

*S*

_{α/2}and ${S}_{\alpha}$ ends up being multiplied by

*h*(

*ρ*)

^{1/α}. Therefore, the numerator in the limit in (3.3) gets multiplied by

*h*(

*ρ*)

^{2/α}, while the denominator gets multiplied by

*h*(

*ρ*)

^{(2−α)/(α(1−α))}. Doing the arithmetic proves (3.6).

The heavy-tailed version of TL also holds widely for stationary mixing sequences. Here we assume that {*X*_{t}} is a strictly stationary time series with *X*_{1} ∈ *RV*(*α*), *α* ∈ (0, 1). If the time series satisfies the mixing and dependence conditions *D* and *D*′ in [16], then the limit theorem that holds for TL for the iid case also holds exactly for the stationary case. This follows directly from Davis [16, theorem 4 and its corollary]. The mixing condition *D* governs how fast certain events become independent as their time separation increases. Condition *D*′ prohibits clustering of pairs of nearby observations when both are large. While condition *D* is rather mild, condition *D*′ is more restrictive.

Alternatively, in mixing stationary time series that are RV, i.e. where all the finite-dimensional distributions are multivariate RV, the point process convergence in theorem A.1 can be extended, but now the limit point process includes clusters of points that are linked to the Poisson points ${\mathit{\Gamma}}_{i}^{-1/\alpha}$ [10,17]. From these point process convergence results, the limiting form of Taylor’s Law (3.3) is a ratio of the same two stable rvs with tail indices *α*/2 and *α* in the numerator and denominator as in (3.3), but the numerator and denominator may need to be multiplied by different constant scale factors.

The results in §§3b and c may be combined. Explicitly, as in §3b, suppose that *U* ∈ *RV*(*α*) with *α* ∈ (0, 1) and *U* dominates *V* in the sense of (3.4). Assume that the data *X*_{1}, …, *X*_{n} consist of a random number *ν*_{n} < *n* of observations distributed as *U* and *n* − *ν*_{n} observations distributed as *V*, where ${\nu}_{n}/n\stackrel{P}{\to}p\in (0,1]$ as *n* → ∞. In addition, as in §3c, suppose that the *ν*_{n} < *n* observations *X*_{i} distributed as *U* satisfy ${X}_{i}:=1/{({Z}_{i}^{2})}^{1/(2\alpha )}\in RV(\alpha )$ and, when *X*_{i}, *X*_{j} are both distributed as *U*, then {*Z*_{i}} are correlated standard normals with *Cov*(*Z*_{i}, *Z*_{j}) = *ρ* if *i* ≠ *j* ≥ 1. The *n* − *ν*_{n} of observations distributed as *V* may be arbitrarily dependent or independent. Then TL holds as described in (3.5). We simulate this model in §6a.

#### (d) Sample kurtosis

The kurtosis is often used to measure the heaviness of the tail of a distribution function. For a random variable *X* with finite fourth moment and finite variance *σ*^{2}, the kurtosis is defined by *κ* = *E*(*X* − *EX*)^{4}/*σ*^{4}. A normal random variable has kurtosis 3. A rv with *κ* > 3 is said to be leptokurtic, indicating tails fatter than those of a normal rv. Since we are primarily concerned with heavy-tailed data in which the population variance is infinite, we focus our attention on the kurtosis *κ* as opposed to the excess kurtosis *κ* − 3.

If a rv has infinite second moment, then the population kurtosis does not exist. Nevertheless, given a sample *X*_{1}, …, *X*_{n}, one can still compute the sample mean ${\overline{X}}_{n}$ and the sample kurtosis

Assume that *X*_{1}, …, *X*_{n} are iid as *X* ∈ *RV*(*α*) with *α* ∈ (0, 2). From the argument of example 2.1 (see also theorem A.3), it follows that

*α*↓ 0, the limit random variable becomes more concentrated at 1 because, for

*γ*< 1,

*Γ*

_{i}∼

*i*a.s. as

*i*→ ∞, it follows that ${S}_{\gamma}\sim {\mathit{\Gamma}}_{1}^{-1/\gamma}$ as

*γ*↓ 0. Thus, the limit random variable on the right of (3.9) converges to 1 as

*α*↓ 0.

If *α* ∈ (2, 4), then *X* has finite variance *σ*^{2} but infinite fourth moment. In this case, the same argument shows that

*n*

^{1−4/α}

*L*(

*n*), where

*L*is a slowly varying function, which is rather slow especially as

*α*↑ 4.

### 4. Heavy tails and sample correlations

The correlation of a pair of rvs is defined as their covariance divided by the product of their standard deviations. It cannot be defined for pairs of rvs with infinite variance, such as all those in *RV*(*α*) with *α* ∈ (0, 2). Nevertheless, the *sample* correlation (4.4) is a well defined rv. For heavy-tailed time series, the sample autocorrelation can have unexpected and desirable properties [11,12]. Here we give conditions under which the sample correlation of dependent heavy-tailed rvs converges to zero (in probability or almost surely, depending on detailed assumptions) as the sample size gets large.

#### (a) Bivariate regular variation and asymptotic independence

We define regular variation for a pair (*X*, *Y*) of heavy-tailed rvs. Assume that *X* and *Y* are identically distributed with RV distribution *F* ∈ *RV*(*α*) as in (2.1). Our results require only that their distributions have asymptotically equivalent tails, i.e. lim _{s→∞}*P*(*X* > *s*)/*P*(*Y* > *s*) = *C* ∈ (0, ∞). We say that *X* and *Y* are asymptotically independent if

*X*and

*Y*are independent then they are also asymptotically independent. When

*t*is replaced by

*a*

_{n}defined in (2.3), asymptotic independence is equivalent to

If *X* and *Y* are not asymptotically independent, we quantify their dependence in the tails of the distribution by assuming that there exists a measure *ν*( · ) on ${([0,\mathrm{\infty})}^{2},\mathcal{B}{([0,\mathrm{\infty})}^{2}))$ such that, for *A*_{x,y} = [0, *x*] × [0, *y*] and ${A}_{x,y}^{c}$ the set complement of *A*_{x,y},

*x*,

*y*≥ 0 with max{

*x*,

*y*} > 0.

If *x* = 0, *y* > 0, then $\nu ({A}_{0,y}^{c})={y}^{-\alpha}$. Similarly, $\nu ({A}_{x,0}^{c})={x}^{-\alpha}$. It follows that *X* and *Y* are asymptotically independent if and only if *ν*([*x*, ∞) × [*y*, ∞)) = 0 for all *x* > 0, *y* > 0.

#### (b) Sample correlation of heavy-tailed data

Define the sample correlation between the pairs (*X*_{i}, *Y*_{i}), *t* = 1, …, *n*, in the conventional way by

#### Theorem 4.1.

*Assume the distribution of* (*X*, *Y*) *satisfies* (4.3) *and α* ∈ (0, 2). *Then*

*where S*

_{α/2,0},

*S*

_{α/2,1},

*and S*

_{α/2,2}

*are stable rvs with joint distribution specified in appendix A*(c), (A 18)–(A 20).

*Moreover*,

*if and only if X and Y are asymptotically independent*.

Appendix A(c) proves this result. Although the stable rvs in the numerator and denominator of (4.5) have heavy tails, the ratio on the right of (4.5) does not have heavy tails because of self-normalization.

When ${\rho}_{n}\stackrel{P}{\to}0$, i.e. when *X* and *Y* are asymptotically independent, it is often possible to find normalizing constants *c*_{n} → ∞ such that *c*_{n}*ρ*_{n} converges in distribution to a non-degenerate rv. For example, when *X* and *Y* are independent, *c*_{n} = *n*^{1/α}*L*(*n*) for some slowly varying function *L* [11,12]. Theorem 4.5 gives another example in which *X* and *Y* are the reciprocals of powers of correlated bivariate normal rvs with mean 0 and variance 1. Such dependent *X* and *Y* are asymptotically independent.

We now extend theorem 4.1 to the case when *X* and *Y* are RV rvs with different tail indices. Appendix A(c) gives the proof of theorem 4.2.

#### Theorem 4.2.

*If X* ∈ *RV*(*α*) *and Y* ∈ *RV*(*β*) *with* 0 < *α* < *β* < 2, *then*

*where S*

_{αβ/(α+β),0},

*S*

_{α/2,1}

*and S*

_{β/2,2}

*are defined in*(A 21), (A 19)

*and*(A 22),

*respectively. If X and Y*

^{β/α}

*are asymptotically independent, then*${\rho}_{n}\stackrel{P}{\to}0$

*as n*→ ∞

*since then U*= 0

*a.s*.

We give a quick and easy way to construct dependent, but asymptotically independent, rvs.

#### Theorem 4.3.

*Let U and V be two positive rvs with positive and continuous marginal densities in a neighbourhood of zero. Suppose that in a neighbourhood of the origin, U and V have a joint density function f*_{U,V} (*u*, *v*) *satisfying f*_{U,V} (*u*, *v*) ≤ *a*(*u*^{2} + *v*^{2})^{−θ/2} *for some a* > 0 *and* 0 ≤ *θ* < 1. *Then, for any c* > 1 (*or α* = 1/*c* ∈ (0, 1)), *X*: = 1/*U*^{c} ∈ *RV*(1/*c*), *Y*: = 1/*V*^{c} ∈ *RV*(1/*c*), *X and Y are asymptotically independent, and, for a random sample* (*X*_{1}, *Y*_{1}), …, (*X*_{n}, *Y*_{n}) *iid as* (*X*, *Y*), *we have* ${\rho}_{n}\stackrel{P}{\to}0$ *as in* (4.6).

#### Proof.

We have

*f*

_{U}(

*u*) is the marginal pdf of

*U*. It follows that

*X*∈

*RV*(1/

*c*). A similar argument holds for

*Y*. Next,

*x*

^{−1/c}

*f*

_{V}(0) and letting

*x*→ ∞, the limit becomes zero, proving (4.1). Then theorem 4.1 entails (4.6). ▪

#### Corollary 4.4.

*Suppose* (*U*, *V*) *has a bivariate normal distribution with means* 0, *variances* 1 *and correlation ρ* ∈ ( − 1, 1). *Set X*: = 1/|*U*|^{c}, *Y*: = 1/|*V*|^{c} *for c* > 1. *Then X* ∈ *RV*(1/*c*), *Y* ∈ *RV*(1/*c*), *and X and Y are asymptotically independent with* ${\rho}_{n}\stackrel{P}{\to}0$ *as n* → ∞. *Further*, ${\rho}_{n}\stackrel{a.s.}{\to}0$ *as n* → ∞.

#### Theorem 4.5.

*Under the assumptions of corollary* 4.4,

*Moreover*,

*where R*

_{1}

*and R*

_{2}

*are iid rvs with*${R}_{1}={S}_{\alpha}/\sqrt{{S}_{\alpha /2}}$

*and*${S}_{\alpha}$

*and S*

_{α/2}

*are given in*(A 10)

*and*(A 11)

*with α*= 1/

*c*< 1.

It is curious that, regardless of *ρ* ∈ ( − 1, 1), *nρ*_{n} converges in distribution to a rv that is negative almost surely. In this situation (*α* < 1), the mean is infinite and the tail of the product *XY* is similar to the tails of *X* and *Y* apart from a slowly varying function. The limit behaviour of the numerator of (4.4) is governed by the product of the partial sums of the *X*_{i}s and *Y*_{i}s, which in view of theorem 4.5 is of smaller order. In other words, *ρ*_{n} behaves essentially like the product of two *t*-statistics, one for the *X*_{i}s and one for the *Y*_{i}s.

Since the mean here is infinite, it is reasonable also to define a version ${\stackrel{~}{\rho}}_{n}$ of the sample correlation without the correction for the mean:

#### Corollary 4.6.

*The assumptions of corollary* 4.4 *imply*

*where S*

_{1},

*S*

_{2},

*S*

_{3}

*are independent stable rvs with representations given in*(A 10)

*and indices α equal to*1/

*c*, 1/(2

*c*)

*and*1/(2

*c*),

*respectively*.

Here the stable rvs in the numerator and denominator of (4.9) have heavy tails and are independent, so the ratio on the right of (4.9) does have heavy tails. We comment after (5.5) below on the confidence intervals implied by the similar situation there (a ratio of independent stable rvs).

Whereas the limit rv in (4.8) is independent of the population correlation *ρ*, the limit rv in (4.9) is scaled by a function of the population correlation *ρ*. The rate of convergence (*n*/log *n*)^{c} is rather fast, essentially of order *n*^{2} for *c* = 2. This is much faster than the standard *n*^{1/2} rate when the variance is finite. Corollary 4.4, theorem 4.5 and corollary 4.6 are proved in appendix A(d).

### 5. Heavy tails in a linear model

Suppose *X* and *Z* are two heavy-tailed rvs and

*X*

_{1},

*Y*

_{1}), …, (

*X*

_{n},

*Y*

_{n}) of

*n*independent observations, what is a useful way to estimate

*η*? Surprisingly, even though the data have infinite variance, the least-squares estimator for

*η*performs remarkably well. Specifically, the least-squares estimator is weakly consistent, as we now show.

Assume that *X* and *Z* are independent positive rvs with distribution functions *F*_{X} and *F*_{Z}, respectively, with asymptotically equivalent tails, i.e.

*X*∈

*RV*(

*α*), then it is elementary that

*Y*also has a heavy-tailed distribution with the same tail index

*α*. However, if

*C*= ∞, then

*Y*inherits its heavy tail from the noise

*Z*. Here we will assume

*C*< ∞.

Since the vector (*X*, *Z*) is bivariate regularly varying and since (*X*, *Y*) is a linear transformation of (*X*, *Z*), it follows directly from Basrak *et al.* [18, p. 113, Proposition A.1] that (*X*, *Y*) is bivariate RV.

The angular measure of (*X*, *Y*) in (A 5) has mass at $\mathrm{arctan}\eta $ and *π*/2, namely,

*X*

_{1},

*Y*

_{1}), …, (

*X*

_{n},

*Y*

_{n}) are iid observations from the linear model (5.1) with 0 <

*C*< ∞, then ${\rho}_{n}\stackrel{d}{\to}U$ where

*U*is given in (4.5). On the other hand, if

*C*= 0, which would allow the noise variable

*Z*to be light-tailed, including normal, then the form of the limit variable

*U*in (4.5) (see also (A 14)–(A 20)) implies that ${\rho}_{n}\stackrel{P}{\to}1$ as

*n*→ ∞. To see this, the distribution

*G*in (5.3) for the angular part of the limit measure corresponds to point mass at $\mathrm{arctan}\eta $. It follows now that the rvs

*Θ*

_{i}appearing in (A 18)–(A 20) are all constant and equal to $\mathrm{arctan}\eta $ with probability one. As long as

*η*> 0, then $\mathrm{arctan}\eta \in (0,\pi /2)$ and hence ${S}_{\alpha /2,0}^{2}/({S}_{\alpha /2,1}{S}_{\alpha /2,2})=1$.

The least-squares estimator of *η* minimizes $\sum _{t=1}^{n}{({Y}_{t}-\eta {X}_{t})}^{2}$. It is given by

*Y*

_{t}=

*ηX*

_{t}+

*Z*

_{t}into the summands in the numerator gives

*a*

_{n}is the 1 − 1/

*n*quantile of

*F*

_{X}and if

*α*∈ (0, 2), then ${a}_{n}^{-2}\sum _{t=1}^{n}{X}_{t}{Z}_{t}\stackrel{P}{\to}0$, ${a}_{n}^{-2}\sum _{t=1}^{n}{X}_{t}^{2}\stackrel{d}{\to}{S}_{\alpha /2,1}$ as

*n*→ ∞, and hence $\hat{\eta}$ is weakly consistent for

*η*(i.e., as

*n*→ ∞, $\hat{\eta}$ converges in probability to

*η*for every

*η*> 0).

If *α* ∈ (0, 1) and *F*_{X} is asymptotic to a Pareto distribution ${\overline{F}}_{x}(x)\sim K{x}^{-\alpha},K\in (0,\mathrm{\infty})$ as *x* → ∞, then as in corollary 4.6,

*S*

_{1}and

*S*

_{2}are independent stable rvs given by (A 10) with indices

*α*and

*α*/2, respectively. For example, if

*α*= 1/2, then the rate of convergence of the least-squares estimate is extremely fast, approximately of order

*n*

^{2}, which is substantially faster than the convergence rate of order $\sqrt{n}$ when the variances of

*X*and

*Z*are finite. In particular, an approximate 95% confidence interval for

*η*, assuming

*C*= 1 and

*α*= 1/2, is $(\hat{\eta}-{c}_{0.975}{(\mathrm{log}n/n)}^{2},\hat{\eta}-{c}_{0.025}{(\mathrm{log}n/n)}^{2})$, where ${c}_{\gamma}$ is the

*γ*-quantile of the distribution for

*S*

_{1}/

*S*

_{2}. The two minus signs in the lower and upper bounds of the confidence interval are the surprising consequence of the fact that $\hat{\eta}$ is always bigger than

*η*, as can be seen from (5.4), even for finite

*n*and as

*n*→ ∞. This inequality is an artifact of

*X*and

*Z*being positive. The width of this confidence interval is (

*c*

_{0.975}−

*c*

_{0.025})(log

*n*/

*n*)

^{2}. On the other hand, if the variances of

*X*and

*Z*are finite, the width of the 95% confidence interval for

*η*based on the least squares estimate is 3.92(

*σ*

_{Z}/

*σ*

_{X})

*n*

^{−1/2}. While the difference in quantiles

*c*

_{0.975}−

*c*

_{0.025}coming from the ratio of two independent stable rvs can be large, this potentially large width is more than overcome by the fast rate of the scaling (log

*n*/

*n*)

^{2}for large

*n*.

### 6. Simulations of Taylor’s Law, correlations and a linear model

The purpose of this section is to investigate by numerical simulation how rapidly some of the statistics based on dependent heterogeneous, RV rvs described in §§3–5 converge in distribution to their limit rvs expressed as functions of stable laws.

#### (a) Taylor’s Law with heterogeneous, dependent, heavy tails

We simulate the heterogeneous, dependent, RV rvs described in §§3b,c with three different values of the pairwise correlations *ρ* = 0, 0.5, 0.9999 of the standard normals *Z*_{t} and with observations

*t*, ~

*α*

_{t}= 0.1, and the remaining nine of every ten values have

*α*

_{t}= 0.9. Despite having nine of every ten exponents equal to 0.9, the numerals

*p*= 1, …, 7 plotted at the location of (sample mean, sample variance) = $({\overline{X}}_{n},{\hat{\sigma}}_{n}^{2})$ for the three samples of each size

*n*= 10

^{p}fall along the solid line with slope

*b*= 2.1111 = (2 − 0.1)/(1 − 0.1) on log-log coordinates, which is the slope in (3.2) when

*α*= 0.1. The solid lines in figure 1 are not fitted to the markers but are calculated

*a priori*as the

*b*th power of the abscissa. The values of

*α*

_{t}> 0.1 have no effect on the slope because the mean and variance are determined by the largest observations, which arise from the RV rvs with the smallest tail exponent, here 0.1.

#### (b) Bivariate correlation

In §6b, we simulate (*X*, *Y*) described in corollary 4.4 and theorem 4.5. In this case, although *X* and *Y* are generated by arbitrarily correlated (with |*ρ*| < 1) standard normal variates, *X* and *Y* are asymptotically independent stable laws. The purpose of these simulations is to shed light on this question: Does the sample correlation *ρ*_{n} (4.4) of *n* iid copies of (*X*, *Y*) approach its almost sure limit, 0, or its rescaled limiting distribution (4.8) rapidly enough as *n* → ∞ that, for sample sizes plausibly encountered in empirical applications, the limiting value or rescaled limiting distribution are useful approximations? We ask a similar question below in §6c, where we simulate (*X*, *Y*) for the linear model in (5.1).

Here we generate *n* iid pairs (*U*, *V*) normally distributed with mean 0, variance 1, and correlation *ρ* ∈ ( − 1, + 1), for sample sizes *n* = 10^{p}, *p* = 1, 2, 3, 4, and *ρ* = +0.9. From each (*U*, *V*) we compute (*X*, *Y*): = (1/|*U*|^{2}, 1/|*V*|^{2}). The *n* × 2 matrix in which each row is one (*X*, *Y*) pair of 1/2-stable rvs constitutes one simulation. For each sample size *n*, we generate *s* = 10^{4} simulations. For each simulation, we record the sample correlation *ρ*_{n} between the *n* realizations of *X* and the *n* realizations of *Y* and calculate *nρ*_{n} (the left side of (4.8)).

Figure 2 shows that the frequency histogram of the sample correlation *ρ*_{n} changes from a bimodal distribution with one mode at negative values and another near *ρ*_{n} = 1 when *n* = 10 (top left) to a unimodal distribution to the left of 0 for large sample sizes *n* = 10^{4} (bottom left).

The right side of (4.8) is −*R*_{1}*R*_{2} where *R*_{1} and *R*_{2} are iid rvs with ${R}_{1}={S}_{\alpha}/\sqrt{{S}_{\alpha /2}}$ and ${S}_{\alpha}$ and *S*_{α/2} are given in (A 10) and (A 11) with *α* = 1/*c* = 1/2. To simulate each value of ${S}_{\alpha}$ and its corresponding *S*_{α/2}, we generate 10^{5} iid unit exponential variates *E*_{i}, compute the 10^{5} partial sums ${\mathit{\Gamma}}_{t}:=\sum _{i=1}^{t}{E}_{i},t=1,\dots ,{10}^{5}$, raise each *Γ*_{t} to the power −1/*α* (for ${S}_{\alpha}$) or −2/*α* (for *S*_{α/2}), and sum the 10^{5} values of ${\mathit{\Gamma}}_{t}^{-1/\alpha}$ or ${\mathit{\Gamma}}_{t}^{-2/\alpha}$. From each such pair $({S}_{\alpha},{S}_{\alpha /2})$ using the same values of *Γ*_{t}, we compute one value of ${R}_{1}={S}_{\alpha}/\sqrt{{S}_{\alpha /2}}$. An independent repetition of this procedure generates one iid value of *R*_{2} and thence one value of −*R*_{1}*R*_{2}.

For each sample size *n*, we sort the *s* simulated values of *nρ*_{n} (the left side of (4.8)) in increasing order, and we sort the *s* simulated values of −*R*_{1}*R*_{2} (the right side of (4.8)) in increasing order. If the distribution *nρ*_{n} approaches the distribution of −*R*_{1}*R*_{2} as *n* → ∞ as theorem 4.5 asserts, then the empirical quantile–quantile plot of −*R*_{1}*R*_{2} on the vertical axis and *nρ*_{n} on the horizontal axis should fall along a diagonal straight line of slope 1 through the origin.

However, the absolute error *nρ*_{n} − ( − *R*_{1}*R*_{2}) of the approximation in (4.8) is unbounded as *n* → ∞ because there is a non-zero (though asymptotically vanishing) probability that *ρ*_{n} will be near 1, which case *nρ*_{n} will be near *n* while −*R*_{1}*R*_{2} is negative with probability 1, so the error could be close to *n*. Consequently, including positive values of *ρ*_{n} in the quantile–quantile plots in the right column of figure 2 would obscure the comparison of *nρ*_{n} with −*R*_{1}*R*_{2} on ( − ∞, 0] where asymptotically an increasing proportion of the values of *ρ*_{n} will fall. For each sample size *n*, we report the number of simulations with *ρ*_{n} < 0 in the lower right corner of each panel in the right column of figure 2 and we restrict the quantile–quantile plot to these values and to the corresponding order statistics of −*R*_{1}*R*_{2}. As (4.8) predicts, as *n* → ∞, the fraction of simulations in which *ρ*_{n} < 0 approaches 1 and the order statistics of *nρ*_{n} approach the corresponding order statistics of −*R*_{1}*R*_{2}.

#### (c) Linear model

We simulate the linear model (5.1) with slope *η* = 1 under two different assumptions: (i) that the independent variable *X* and the noise (or random perturbation) variable *Z* are iid 1/2-stable, so that in (5.2) we have *C* = 1; and (ii) that *X* is 1/2-stable and *Z* is standard normal, which is light-tailed, and *X* and *Z* are independent, so that in (5.2) *C* = 0.

Under (i), assuming *X* and *Z* are iid *α*-stable with *α* = 1/2, ~*η* = *C* = 1, the angular measure (5.3) simplifies to

*ρ*

_{n}, given by the right side of (4.5), depends on the three stable rvs

*S*

_{1/4,0},

*S*

_{1/4,1}and

*S*

_{1/4,2}with joint distribution specified in appendix A(c), (A 18)–(A 20). To simulate their joint distribution, for each realization

*ω*, each

*Γ*

_{t}is a cumulative sum of

*t*iid exponential rv with parameter 1, as described in greater detail in §6b. From (5.3), in the limit of large

*n*, each ${\mathit{\Theta}}_{t}=\mathrm{arctan}(1)\approx 0.7854$ with probability 0.5432 and

*Θ*

_{t}=

*π*/2 ≈ 1.5708 with probability 0.4568, as calculated numerically just above. We sum the summands on the right sides of (A 18)–(A 20) until the sums quasi-converge.

Under (ii), with *α* = 1/2, *η* = 1, *C* = 0, the angular measure (5.3) simplifies to

*ρ*

_{n}is a point mass at 1.

As above, *n* = 10^{p}, *p* = 1, 2, 3, 4, denotes the sample size or number of realizations of (*X*, *Y*). For each sample size *n*, we simulate *s* = 1000 samples each with *n* copies of (*X*, *Y*) and plot the frequency histogram of the *s* values of the sample correlation coefficient *ρ*_{n} for these *s* samples. We compare this frequency histogram with the limiting distribution calculated in §5.

To simulate *s* samples of size *n* of the 1/2-stable rv *X*, we let each element of the *n* × *s* matrix **X** be iid distributed as the reciprocal of the square of an iid standard normal. Each column of **X** represents one sample of size *n* of *X*.

Under assumption (i) that *X* and *Z* are iid 1/2-stable, we also simulate *s* samples of size *n* of the 1/2-stable rv *Z* by exactly the same procedure, independently of **X**. Each column of the *n* × *s* matrix **Z** represents one sample of size *n* of *Z*. Then **Y** = *η***X** + **Z** gives an *n* × *s* matrix containing *s* samples of size *n* of *Y* according to (5.1). We assume *η* = 1 here, so **Y** = **X** + **Z**. The top row of figure 3 shows one sample of size *n* = 10^{4} under assumption (i). The modal values of $\mathrm{arctan}(Y/X)$ from *s* iid simulations of samples of size *n* = 10^{4} (second row of figure 3) approximate $\mathrm{arctan}(1)$ on the left and *π*/2 on the right (lower left panel), with increasing concentration at these modal values when the radius in polar coordinates *R*: = (*X*^{2} + *Y*^{2})^{1/2} falls in the upper decile of all 10^{7} values of *R* (lower right panel). For samples of increasing size, the sample correlation coefficient *ρ*_{n} is increasingly concentrated on the interval [0, 1], with modal frequencies at the extremes (figures 4 and 6).

Under assumption (ii) that *X* is 1/2-stable, *Z* is standard normal, and *X* and *Z* are independent, we generate another independent *n* × *s* matrix **X** as above and an *n* × *s* matrix **N** with iid standard normal elements *N*. Then **Y** = *η***X** + **N** = **X** + **N**. The top row of figure 5 shows one simulated sample of size *n* = 10^{4} under assumption (ii). The top right panel shows on log–log coordinates only the pairs (*X*, *Y*) of this sample where *X* > 0, *Y* > 0. Unlike the previous case of heavy-tailed noise, here the modal values of $\mathrm{arctan}(Y/X)$ from the pooled *s* simulations of samples of size *n* = 10^{4} (lower row of figure 5) centre on and are near $\mathrm{arctan}(1)=\pi /4\approx 0.7854$, with increasing concentration at this modal value when the radius in polar coordinates *R*: = (*X*^{2} + *Y*^{2})^{1/2} falls in the upper decile of all 10^{7} values of *R* (lower right panel). Under assumption (ii), for samples of increasing size, the sample correlation coefficient *ρ*_{n} is concentrated around 1, with a range that decreases very rapidly toward 0 (figure 6): these results are consistent with the theory and figure 5*c*,*d* where the histogram of $\mathrm{arctan}(Y/X)$ is heavily concentrated at *π*/4.

### Data accessibility

This article does not contain any additional data.

### Authors' contributions

All three authors contributed ideas, mathematical analysis, writing and editing. Cohen did simulations. All authors approved the final manuscript and agree to be held accountable for all aspects of the work.

### Competing interests

We declare we have no competing interests.

### Funding

J.E.C.’s research was partially supported by Columbia University’s Earth Institute. R.A.D.’s research was partially supported by NSF grant no. DMS 2015379 to Columbia University. G.S.’s research was partially supported by ARO grant no. W911NF-18-10318 to Cornell University.

## Acknowledgements

We thank Hongyuan Cao for drawing attention to Pillai & Meng [9] and for correcting two typographical errors in a previous draft. We thank Abootaleb Shirvani, Svetlozar Rachev and an anonymous referee for thoughtful reviews.

## Appendix A

**(a) Heavy-tailed distributions and point process methods: background**

Point process methods provide indispensable insight into limit theory for statistics arising from heavy-tailed data. The methods have been championed by Resnick [1,2,19], and others. We will discuss the relevant theory in the setting of §4 and leave the general formulation to the references in Resnick [2].

Here the Euclidean space *E* is defined as *E* = (0, ∞] or ${(0,\mathrm{\infty}]}^{2}\setminus (0,0)$.^{1} A point measure *ξ* on *E* is a non-negative integer-valued measure on the Borel *σ*-field, $\mathcal{B}(E)$, having the form

_{x}( · ) is the delta measure with unit mass at

*x*, and

*x*

_{i}∈

*E*. That is, for $A\in \mathcal{B}(E)$,

*E*if it is finite on all compact sets.

We define the vague topology on the class *M*_{p}(*E*) of Radon point measures in terms of the space *C*_{K}(*E*) of continuous functions on *E* with compact support. If *ξ*_{n}, *ξ* ∈ *M*_{p}(*E*), then ${\xi}_{n}\stackrel{v}{\to}\xi $ if and only if for every *f* ∈ *C*_{K}(*E*),

A point process is a random element of *M*_{p}(*E*) defined on some probability space $(\mathit{\Omega},\mathcal{F},P)$. A Poisson point process *ξ* is defined in terms of a Radon intensity measure *μ* and is often called a Poisson random measure, denoted by PRM(*μ*) [1]. Such a point process satisfies the following assumptions:

(i) | For relatively compact $A\in \mathcal{B}(E)$, | ||||

(ii) | For relatively compact and disjoint $A,B\in \mathcal{B}(E)$, |

The Laplace functional ${L}_{\xi}(f)$ of a point process $\xi (\cdot )=\sum _{t=1}^{\mathrm{\infty}}{\epsilon}_{{X}_{t}}(\cdot )$ is

*f*∈

*C*

_{K}(

*E*). When

*ξ*is a PRM(

*μ*), the Laplace functional becomes [1, Sect. 5.3.2]

*ξ*

_{n}converges in distribution to the point process

*ξ*if and only if their respective Laplace functionals also converge, i.e.

## Theorem A.1.

*Suppose* {*X*_{t}} *is an iid sequence of observations with distribution F* ∈ *RV*(*α*) *as in* (2.1), *and define a*_{n} *by* (2.3). *Let* {*E*_{t}} *be an iid sequence of unit exponential rvs, and define* ${\mathit{\Gamma}}_{t}:=\sum _{i=1}^{t}{E}_{i},t=1,2,\dots $. *Then the sequence of point processes N*_{n}( · ) *defined on the left in* (A 4) *converges in distribution to the point process N*( · ) *defined on the right in* (A 4) *as n* → ∞:

*The limit point process N is a*PRM(

*αx*

^{−α−1}d

*x*).

This result follows directly from Resnick [1, theorem 5.3].

We recall from elementary stochastic processes that the *Γ*_{t}’s are points of a homogeneous Poisson process on (0, ∞) with rate one. By a change of variables, the points ${\mathit{\Gamma}}_{t}^{-1/\alpha}$ are then points of a Poisson process with intensity measure *αx*^{−α−1} d*x*.

To consider sample correlations between pairs of rvs, we extend regular variation to bivariate random vectors. The nonnegative bivariate random vector (*X*, *Y*) is said to be RV if the radial part *R* = (*X*^{2} + *Y*^{2})^{1/2} is *RV*(*α*) and the angular part $\mathit{\Theta}=\mathrm{arctan}(Y/X)$ becomes independent of *R* as *R* → ∞. More precisely, if ${\stackrel{~}{a}}_{n}$ is the 1 − 1/*n* quantile of *R* and *G* is a distribution function on [0, *π*/2], then for all *r* > 0,

^{2}and

*ν*(d

*x*, d

*y*) is the measure in rectangular coordinates corresponding to the polar coordinate version $\stackrel{~}{\nu}(\text{d}r,\text{d}\theta )$.

The point process convergence that drives much of the limit theory for bivariate heavy-tailed data is:

## Theorem A.2.

*Suppose* {(*X*_{t}, *Y*_{t}), *t* = 1, 2, …} *is an iid sequence of observations with distribution that satisfies* (A 6) or (A 7). *Then the sequence of point processes N*_{n}( · ) *converges in distribution as n* → ∞ *to a* PRM(*ν*) *N*( · ), *where*

*Here*{

*Θ*

_{t}}

*is an iid sequence with distribution G*(d

*θ*)

*in*(A 6)

*that is also independent of the sequence*{

*Γ*

_{t}}.

The proof of this result is in Resnick [1, theorem 5.3].

The limit point process *N*( · ) in theorem A.2 has an instructive representation. The ${\mathit{\Gamma}}_{j}^{-1/\alpha}$ and *Θ* _{j} correspond to the limit of the radial and angular parts, respectively, of the pairs ${\stackrel{~}{a}}_{n}^{-1}({X}_{j},{Y}_{j})$. To verify that *N* is a PRM(*ν*), we use Laplace functionals. Specifically, by conditioning on the Poisson points *Γ*_{j} and by using the independence of the *Θ* _{i}, we have

**(b) Limit theory for Taylor’s Law**

The following theorem describes the joint convergence of the partial sums and partial sums of squares in example 2.1 and the limiting behaviour of Taylor’s Law.

## Theorem A.3.

*Suppose* {*X*_{t}} *is an iid sequence of observations with F* ∈ *RV*(*α*) *as in* (2.1). *Define a*_{n} *by* (2.3) *and b*_{n} *by* (2.5). *Then*

*The limit rvs*${S}_{\alpha}$

*and S*

_{α/2}

*are stable and defined through the points of the point process N in theorem*A.2.

*Namely*,

*and*

*For α*∈ (0, 1),

*Taylor’s Law*(3.3)

*follows: for b*= (2 −

*α*)/(1 −

*α*),

## Proof.

This is a special case of a standard result in, for example, LePage *et al.* [20, theorems 1 and 1’], where the derivation of (2.10) from (2.8) is required. ▪

The representations in (A 10) and (A 11) are often referred to as the LWZ representation of a stable distribution. Davis [16] gives the companion result when {*X*_{t}} is a stationary time series satisfying certain dependence conditions.

**(c) Limit theory for the sample correlation**

## Proof of theorem 4.1.

The condition (4.3) implies the vague convergence in (A 7). Indeed, (4.3) implies that (A 7) holds for sets of the form ${A}_{x,y}^{c}={([0,x]\times [0,y])}^{c}$, which can then be extended to all two-dimensional Borel sets that are bounded away from (0, 0). Moreover, the constants *a*_{n} and ${\stackrel{~}{a}}_{n}$ are related via

For 0 < *ϵ* < 1 fixed, consider the function *f*_{ϵ}(*x*, *y*) = *xy***1**_{{ϵ<xy≤1/ϵ}}, which has compact support on *E* and is continuous except on a set of *ν* measure 0. By the weak convergence of *N*_{n} to *N* in theorem A.2, it follows that

*Γ*

_{t}∼

*t*a.s. by the strong law of large numbers, the series $\sum _{t=1}^{\mathrm{\infty}}{\mathit{\Gamma}}_{t}^{-2/\alpha}$ is summable a.s. Hence

*δ*> 0,

*I*is bounded by

*II*by

*X*and

*Y*are asymptotically dependent, or zero if

*X*and

*Y*are asymptotically independent. Either way, it follows that there exists a finite positive constant

*C*such that, for all

*z*> 0,

*z*→ ∞ and hence for some

*K*∈ (0, ∞),

Finally, by the form of the limit, one sees that the numerator *S*_{α/2,0} = 0 a.s. if and only if the angular measure concentrates at 0 and *π*/2, which is equivalent to asymptotic independence. ▪

## Proof of theorem 4.2.

If *X* ∈ *RV*(*α*) and *Y* ∈ *RV*(*β*) with 0 < *α* < *β* < 2, then *Y*^{β/α} ∈ *RV*(*α*). So replacing the condition on (*X*, *Y*) in theorem 4.1 with the assumption that (*X*, *Y*^{β/α}) is bivariate RV, theorem A.2 implies that the same point process convergence in (A 8) holds with *Y*_{t} replaced by ${Y}_{t}^{\beta /\alpha}$. Using the same proof as for theorem 4.1 with the function ${f}_{\u03f5}(x,y)=x{y}^{\alpha /\beta}{\mathbf{1}}_{\{\u03f5<x{y}^{\alpha /\beta}\le 1/\u03f5\}}$ implies the analogue of (A 18), namely,

**(d) Limit theory for correlation of heavy-tailed data from correlated normal random variables**

In this section, we prove theorem 4.5, corollaries 4.6 and 4.4.

## Proof of theorem 4.5 and corollary 4.6.

Recall that *X*: = 1/|*U*|^{c}, *Y*: = 1/|*V*|^{c} for some *c* > 1 where (*U*, *V*) is bivariate normal with means 0, variances 1, and correlation *ρ* ∈ ( − 1, 1). Let *ϕ*(*x*): = (2*π*)^{−1/2}exp ( − *x*^{2}/2), *x* ∈ ( − ∞, + ∞), be the standard normal pdf. It follows directly from proposition 4.3 that

*a*

_{n}= (2/

*π*)

^{c/2}

*n*

^{c}, it follows that

*nP*(

*X*>

*a*

_{n}) → 1 as

*n*→ ∞.

Establishing

*Φ*is the standard normal cdf. For an upper bound, notice that the integral in the right side of (A 25) does not exceed

*a*<

*b*. We conclude that

It remains to obtain a matching lower bound of the right side of (A 25). Now we use the bound, for any *a* < *b*,

*θ*=

*θ*(

*ϵ*) ↓ 0 as

*ϵ*↓ 0 such that

*M*be a large number. We have by (A 25),

*M*→ ∞ provides a lower bound matching (A 26) and, hence, proves (A 23).

The normalizing constant associated with the distribution of *XY* is given by ${\stackrel{~}{a}}_{n}=K{(n\mathrm{log}n)}^{c}$, where *K* = (2/*π*)^{c}(1 − *ρ*^{2})^{−c/2} so that $nP(XY>{\stackrel{~}{a}}_{n})\to 1$ as *n* → ∞. Note that ${\stackrel{~}{a}}_{n}/{a}_{n}\to \mathrm{\infty}$. Also, for *M* > 0 fixed, large *n*, and all *x*, *y* > 0,

*n*→ ∞, the first term converges to

*M*

^{−1/c}, while the second term is

*C*is a generic constant and ${\phi}_{\rho}$ is the joint density of (

*U*,

*V*). Since $n{a}_{n}^{-1/c}\to {(2/\pi )}^{-1/2}$ and ${\stackrel{~}{a}}_{n}/{a}_{n}\to \mathrm{\infty}$ as

*n*→ ∞, it follows that this term goes to 0 as

*n*→ ∞. Consequently, the $\underset{n\to \mathrm{\infty}}{lim\u2006sup}$ followed by the limit as

*M*→ ∞ of the left side of (A 28) is 0. In other words,

*X*and

*XY*are asymptotically independent. From this property [12], we have the point process convergence

*Γ*

_{i,1}},{

*Γ*

_{i,2}}, and {

*Γ*

_{i,3}} are iid copies of the points {

*Γ*

_{i}} defined in theorem A.1.

From this point process, by the same argument as used in the proof of theorem 4.1 (see appendix A(c)), we obtain

To prove corollary 4.6 and (4.9), we note that by theorem 4.3, *X* and *Y* are asymptotically independent. By the earlier argument for (2.10), it follows that, as *n* → ∞,

*S*

_{1/c,i},

*S*

_{1/(2c),i}are given by (A 10) and (A 11) with

*α*= 1/

*c*and are independent for

*i*= 1, 2. Then by the continuous mapping theorem

*i*= 1, 2. Also, since $n{a}_{n}^{-2}{\stackrel{~}{a}}_{n}\to 0$ as

*n*→ ∞, it follows that $n{a}_{n}^{-2}\sum _{i=1}^{n}{X}_{i}{Y}_{i}\stackrel{P}{\to}0$ as

*n*→ ∞. ▪

## Proof of corollary 4.4.

Let 0 < *α* < *γ* < 1, and let *S* ∈ *RV*(*γ*). Then for some *c* > 0, *P*(*S* > *x*) ∼ *cx*^{−γ} as *x* → ∞ so the assumption ${\overline{F}}_{X}\in RV(\alpha )$ implies that *P*(*X* > *x*)/*P*(*S* > *x*) → ∞ as *x* → ∞. Therefore, a constant *C* exists such that *P*(*X* > *x*) > *P*(*S* > *x*) for all real *x* ≥ *C*. Therefore, *P*(*X* + *C* > *x*) > *P*(*S* > *x*) for all *x* ≥ *C* > 0, while for all *x* < *C*, by definition of *C* we have *P*(*X* + *C* > *x*) = 1 ≥ *P*(*S* > *x*). Consequently, a constant *C* exists such that *P*(*X* + *C* > *x*) ≥ *P*(*S* > *x*) for all $x\in \mathbb{R}$. That is, by definition, *X* + *C* is stochastically larger than *S*, or *X* + *C* ≥ _{st} *S*. When 0 < *γ* < *α* < 1, the argument works with *S* and *X* exchanged, hence there exists a constant *C* > 0 such that *S* + *C* ≥ _{st} *X*.

For the next results, we require a well-known inequality (A 34): if *γ* ∈ (0, 1), then for some *c* > 0, a *γ*-stable rv *S* satisfies

*θ*> 0, if

*S*is

*γ*-stable, then its Laplace transform is $E\mathrm{exp}(-\theta S)=\mathrm{exp}(-{\theta}^{\gamma})$ [7]. For any $\u03f5>0,\hspace{0.17em}\theta >0,$ using Markov’s inequality for the following inequality, we have

*a*∈ (0, 1) and set

*θ*=

*aϵ*

^{1/(γ−1)}to get $-{\theta}^{\gamma}=-{a}^{\gamma}{\u03f5}^{\gamma /(\gamma -1)}$ and

*θϵ*=

*aϵ*

^{1/(γ−1)+1}=

*aϵ*

^{γ/(γ−1)}. Therefore $-{\theta}^{\gamma}+\theta \u03f5=-{a}^{\gamma}{\u03f5}^{\gamma /(\gamma -1)}+a{\u03f5}^{\gamma /(\gamma -1)}=a{\u03f5}^{\gamma /(\gamma -1)}(1-{a}^{\gamma -1})$. Since

*γ*∈ (0, 1) and

*a*∈ (0, 1), we have

*a*

^{1−γ}< 1 or 1 −

*a*

^{γ−1}< 0 or

*c*: = −

*a*(1 −

*a*

^{γ−1}) > 0 in (A 34), which is now proved.

Let *X*_{i}, *i* = 1, 2, … be iid copies of *X* ∈ *RV*(*α*). We consider two cases: first that 0 < *τ* < 1/*α*, then that 1/*α* < *τ* < ∞.

Suppose first that 0 < *τ* < 1/*α*. We claim that

*τ*≥ 1. Choose

*α*<

*γ*< 1/

*τ*, and choose

*C*> 0 such that

*X*+

*C*≥

_{st}

*S*, where

*S*is

*γ*-stable. Let

*S*

_{1},

*S*

_{2}, … be iid copies of

*S*. Then for any

*t*> 0

*c*=

*c*(

*t*) > 0. Therefore, for any

*t*> 0,

Next, suppose that *τ* > 1/*α*. We claim that

*τ*<

*γ*<

*α*, and choose

*C*> 0 such that

*S*+

*C*≥

_{st}

*X*, where

*S*is a

*γ*-stable rv. Let

*S*

_{1},

*S*

_{2}, … be iid copies of

*S*. Then for any

*t*> 0, for all

*n*large enough,

*c*> 0. Therefore, for any

*t*> 0,

To prove that *ρ*_{n} → 0 a.s. as *n* → ∞, we pick any *δ* ∈ (*c*, 2*c*) and multiply numerator *N*_{n} and denominator *D*_{n} of *ρ*_{n} by *n*^{−δ} to get

*n*

^{−δ}

*N*

_{n}→ 0 a.s. as

*n*→ ∞, and from (A 36) that

*n*

^{−δ}

*D*

_{n}→ ∞ a.s. as

*n*→ ∞. Therefore,

*ρ*

_{n}→ 0 a.s. as

*n*→ ∞. ▪

Shapiro [22] proved a special case of our results, namely, powers of reciprocals of rvs with a continuous density that is positive at 0 have Pareto-like tails.

## Footnotes

1 Here *E* corresponds to a one-point compactification of [0, ∞) or [0, ∞)^{2} with the origin removed. Relatively compact sets are sets bounded away from the origin, i.e. $A\in \mathcal{B}(E)$ is relatively compact if, for some small *ϵ* > 0, *A* ⊂ {*x* :*ϵ* ≤ *x*} or in the two-dimensional case $A\subset \{(x,y)\hspace{0.17em}:\u03f5\le ||(x,y)||\}$.

### References

- 1.
Resnick SI . 2007**Heavy-tail phenomena: probabilistic and statistical modeling**. Berlin, Germany: Springer Science & Business Media. Google Scholar - 2.
Resnick SI . 2008**Extreme values, regular variation and point processes**. Berlin, Germany: Springer. Google Scholar - 3.
Zolotarev VM . 1986**One-dimensional stable distributions**.Translations of Mathematical Monographs, vol. 65 . Providence, RI: American Mathematical Society. Crossref, Google Scholar - 4.
Eisler Z, Bartos I, Kertész J . 2008 Fluctuation scaling in complex systems: Taylor’s law and beyond.**Adv. Phys.**, 89–142. (doi:10.1080/00018730801893043) Crossref, ISI, Google Scholar**57** - 5.
Taylor RA . 2019**Taylor’s power law: order and pattern in nature**. New York, NY: Academic Press. Google Scholar - 6.
Bliss CI . 1941 Statistical problems in estimating populations of Japanese beetle larvae.**J. Econ. Entomol.**, 221–232. (doi:10.1093/jee/34.2.221) Crossref, Google Scholar**34** - 7.
Brown M, Cohen JE, de la Peña VH . 2017 Taylor’s law, via ratios, for some distributions with infinite mean.**J. Appl. Probab.**, 657–669. (doi:10.1017/jpr.2017.25) Crossref, ISI, Google Scholar**54** - 8.
Drton M, Xiao H . 2016 Wald tests of singular hypotheses.**Bernoulli**, 38–59. (doi:10.3150/14-BEJ620) Crossref, ISI, Google Scholar**22** - 9.
Pillai NS, Meng X-L . 2016 An unexpected encounter with Cauchy and Lévy.**Ann. Stat.**, 2089–2097. (doi:10.1214/15-AOS1407) Crossref, ISI, Google Scholar**44** - 10.
Davis RA, Hsing T . 1995 Point process and partial sum convergence for weakly dependent random variables with infinite variance.**Ann. Probab.**, 879–917. (doi:10.1214/aop/1176988294) Crossref, ISI, Google Scholar**23** - 11.
Davis RA, Resnick S . 1985 More limit theory for the sample correlation function of moving averages.**Stoch. Process. Appl.**, 257–279. (doi:10.1016/0304-4149(85)90214-5) Crossref, ISI, Google Scholar**20** - 12.
Davis RA, Resnick S . 1986 Limit theory for the sample correlation function of moving averages.**Ann. Stat.**, 533–558. (doi:10.1214/aos/1176349937) Crossref, ISI, Google Scholar**14** - 13.
Samorodnitsky G . 2016**Stochastic processes and long range dependence**.Springer Series in Operations Research and Financial Engineering, vol. 26 . Berlin, Germany: Springer. Crossref, Google Scholar - 14.
Samorodnitsky G, Taqqu M . 1994**Stable non-Gaussian stable processes: stochastic models with infinite variance**. London, UK: Chapman and Hall. Google Scholar - 15.
Feller W . 1971**An introduction to probability theory and its applications**,**vol. 2**. New York, NY: John Wiley and Sons. Google Scholar - 16.
Davis RA . 1983 Stable limits for partial sums of dependent random variables.**Ann. Probab.**, 262–269. (doi:10.1214/aop/1176993595) Crossref, ISI, Google Scholar**11** - 17.
Davis RA, Mikosch T . 1998 Limit theory for the sample ACF of stationary process with heavy tails with applications to ARCH.**Ann. Stat.**, 2049–2080. (doi:10.1214/aos/1024691368) Crossref, Google Scholar**26** - 18.
Basrak B, Davis RA, Mikosch T . 2002 Regular variation of GARCH processes.**Stoch. Process. Appl.**, 95–115. (doi:10.1016/S0304-4149(01)00156-9) Crossref, ISI, Google Scholar**99** - 19.
Resnick SI . 1986 Point processes, regular variation and weak convergence.**Adv. Appl. Probab.**, 66–138. (doi:10.1017/S0001867800015597) Crossref, ISI, Google Scholar**18** - 20.
LePage R, Woodroofe M, Zinn J . 1981 Convergence to a stable distribution via order statistics.**Ann. Probab.**, 624–632. (doi:10.1214/aop/1176994367) Crossref, ISI, Google Scholar**9** - 21.
Feller W . 1968**An introduction to probability theory and its applications**,**vol. 1**. New York, NY: John Wiley and Sons. Google Scholar - 22.
Shapiro JM . 1977 On domains of normal attraction to stable distributions.**Houston J. Math.**, 539–542. Google Scholar**3**