Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
You have accessResearch articles

Dependency in multisensory integration: a copula-based analysis

Published:https://doi.org/10.1098/rsta.2018.0364

    Abstract

    The notion of copula has attracted attention from the field of contextuality and probability. A copula is a function that joins a multivariate distribution to its one-dimensional marginal distributions. Thereby, it allows characterizing the multivariate dependency separately from the specific choice of margins. Here, we demonstrate the use of copulas by investigating the structure of dependency between processing stages in a stochastic model of multisensory integration, which describes the effect of stimulation by several sensory modalities on human reaction times. We derive explicit terms for the covariance and Kendall's tau between the processing stages and point out the specific role played by two stochastic order relations, the usual stochastic order and the likelihood ratio order, in determining the sign of dependency.

    This article is part of the theme issue ‘Contextuality and probability in quantum mechanics and beyond’.

    1. Introduction: copulas

    The theory of copulas has stirred a lot of interest in recent years in several areas of statistics, including actuarial science and finance (e.g. [1]). More recently, it has also captured the attention of researchers in contextuality and probability. The main reasons for this development are the following properties: copulas allow one (i) to study the structure of stochastic dependency in a ‘scale-free’ manner, i.e. independent of the specific marginal distributions and (ii) to construct families of multivariate distributions with specified properties. Briefly, a copula is a function that joins a multivariate distribution to its one-dimensional marginal distribution functions. A formal definition for any finite dimension n is the following.

    Definition 1.1.

    A function C:[0,1]n[0,1] is called n-dimensional copula if there is a probability space (Ω,F,P) supporting a vector of standard uniform random variables (U1, …, Un) such that

    C(u1,,un)=P(U1u1,,Unun),u1,,un[0,1].

    The following theorem by Sklar [2] laid the foundation of many subsequent studies (for a proof, e.g. [3]).

    Theorem 1.2 (Sklar's theorem, 1959).

    LetF(x1, …, xn) be ann-variate distribution function with marginsF1(x1), …, Fn(xn); then there exists ann-copulaC:[0, 1]n→[0, 1] that satisfies

    F(x1,,xn)=C(F1(x1),,Fn(xn)),(x1,,xn)n.
    If all univariate marginsF1, …, Fnare continuous, then the copula is unique. IfF−11, …, F−1nare the quantile functions of the margins, then for any (u1, …, un)∈[0, 1]n
    C(u1,,un)=F(F11(u1),,Fn1(un)).

    Here, we demonstrate the use of copulas by investigating the structure of dependency between processing stages in a stochastic model of reaction time (RT) for multisensory integration. Next, we briefly present the time-window-of-integration (TWIN) model developed to describe the effect of multisensory stimulation on RT [4]. Then, we consider the model from the perspective of copula theory, limited to the case of n = 2, and derive explicit terms for the covariance and Kendall's tau between the processing stages without making assumptions about parametric distribution families. Moreover, we point out the specific role played by two stochastic order relations, the so-called usual stochastic order and the likelihood ratio order, in determining the sign of dependency.

    2. Time-window-of-integration model

    When stimulus information, perceived via several sensory modalities, indicates the occurrence of some event, an observer is typically faster responding to the stimulus complex compared to receiving only unimodal information. For example, when a flash occurring randomly to the left or right of a fixation point is presented together with an auditory stimulus (e.g. click) presented in a close temporal proximity, the time to start moving the eyes towards the stimulus location (i.e. saccadic RT) is typically reduced by 40–100 ms, depending on the specific experimental set-up. This is considered as an instantiation of multisensory integration, a topic that has attracted a lot of attention from researchers in psychology and neuroscience [5]. The occurrence of multisensory integration (MI) critically depends on the temporal arrangement of the stimulus components as well. Specifically, the speed-up of reaction time to a visual–auditory stimulus occurs only if stimuli from both modalities are registered by the sensory system within a certain time interval (aka ‘time window of integration’) and typically becomes greatest when a visual stimulus precedes an auditory by an interval that equals the difference in RT between response to the visual alone and the auditory alone (e.g. [6,7]). In the time-window-of-integration model of MI [4], RT is assumed to be some combination of random variables representing various sub-processing times defined with respect to a common probability space. Specifically, we let V and A denote processing times for visual and auditory stimuli, respectively, in a first peripheral processing stage followed by a second, more central stage. Let (W1, W2) be the random vector with W1 presenting some function of A and V, e.g. min(A,V) in the ‘redundant-signals task’ [6], and W2 denoting the random duration of the second stage that includes central processing like stimulus identification and response preparation. The purpose of this note is to investigate the effect of a time window on the stochastic dependency between W1 and W2 at a rather general level. Observable RT in the auditory–visual condition is taken as

    T=W1+W2.2.1
    The time-window assumption holds that the occurrence of multisensory integration depends on V and A terminating sufficiently close in time. Let I denote the event that this happens, i.e.
    I(ω)={|VA|<ω},2.2
    with ω denoting the ‘width’ of the time window, assumed to be a positive constant (we write I(ω) = I if no confusion arises). W1 and W2 are defined on the same probability space but they are not necessarily statistically independent. With π=P[I], expected RT in the auditory–visual condition can then be decomposed as follows:
    E[T]=E[W1+W2]=πE[W1+W2|I]+(1π)E[W1+W2|C]=π(E[W1|I]+E[W2|I])+(1π)(E[W1|C]+E[W2|C])=E[W1|C]+E[W2|C]π(E[W1|C]E[W1|I])π(E[W2|C]E[W2|I]),
    where E[Wi|I] and E[Wi|C] (i = 1, 2) denote expected processing times in the i-th stage conditioned on I and its complement, C, respectively. Setting
    Δi=E[Wi|C]E[Wi|I],i=1,2
    for the magnitude of the integration effect in the i-th stage results in
    E[T]=E[W1|C]+E[W2|C]π×(Δ1+Δ2).2.3
    The term π × (Δ1 + Δ2) is interpreted as a measure of the expected combined MI effect in both stages, with positive values corresponding to facilitation, negative ones to inhibition. An important assumption often made in the TWIN model is that π and (Δ1 + Δ2) can be estimated independently by varying certain conditions of the experimental context (for details, e.g. [8,9]). Up to now, both mathematical analysis and empirical testing of the model have mostly been limited to the means (i.e. expectation) of the random processing times. For further insight, it is worthwhile to extend the analysis to higher moments, like variance of T and covariance between W1 and W2:
    V[T]=V[W1+W2]=V[W1]+V[W2]+2C[W1,W2],2.4
    where C stands for the covariance. The TWIN model assumes that W1 and W2 are conditionally independent, with conditioning on either I or its complement, C. Thus, in general, C[W1,W2] may be different from zero. Obviously, explicit expressions for C[W1,W2] are only obtainable if specific assumptions about the distribution of W1 and W2 are added like, e.g., ex-Gaussian [9]. The goal pursued here is to characterize the dependency between W1 and W2 without assuming that the random variables follow some specified parametric family of distributions. First, linear dependence will be studied, followed by characterizing stochastic dependency at a more general level with the help of concepts from copula theory.

    3. Linear dependency

    (a) Conditional independence and mixture distribution

    We introduce the bivariate distribution function for the non-negative random vector (W1, W2) of processing times:

    H(w1,w2)=P[W1w1,W2w2],3.1
    with marginal distributions H(w1, ∞) = H1(w1) and H(∞, w2) = H2(w2). Conditioning on the events I and C, this distribution is expressed as a bivariate mixture,
    H(w1,w2)=πHI(w1,w2)+(1π)HC(w1,w2),3.2
    where HI and HC denote the conditional distributions of W1 and W2 with respect to I and C, respectively. By conditional independence, HI and HC can be written as products of their marginal distributions,
    HI(w1,w2)=FI(w1)GI(w2)andHC(w1,w2)=FC(w1)GC(w2),
    where F and G refer to the first and second stage (conditional) distributions, respectively. Inserting these into equation (3.2) yields
    H(w1,w2)=πFI(w1)GI(w2)+(1π)FC(w1)GC(w2).3.3
    The marginal distributions for H are
    H1(w1)=H(w1,)=πFI(w1)+(1π)FC(w1)andH2(w2)=H(,w2)=πGI(w2)+(1π)GC(w2).}3.4
    From now on, all distribution functions are assumed to be absolutely continuous with strictly positive densities and to have finite first and second moments.

    (b) Covariance of W1 and W2

    For the reader's convenience, we list three useful properties about conditional second moments. Let X, Y, Z be random variables defined on some probability space. Conditional covariance, assuming all moments exist, is defined as

    C[X,Y|Z]:=E[(XE[X|Z])×(YE[Y|Z])|Z]=E[XY|Z]E[X|Z]E[Y|Z].3.5
    Setting X = Y yields the conditional variance
    V[X|Z]=E[X2|Z](E[X|Z])2.3.6
    It can be shown that [10]
    (a)

    EZ[E[X|Z]]=E[X] (‘law of total expectation’);

    (b)

    C[X,Y]=C[EZ[X|Z],EZ[Y|Z]]+EZ[C[X,Y|Z]] (‘law of total covariance’);

    (c)

    V[X]=EZ[V[X|Z]]+VZ[E[X|Z]] (‘conditional variance’ formula).

    The subscripts of E and V are dropped if no confusion arises. We define Z as an indicator function for the ‘integration’ event:

    Z={1if I occurs0if C occurs.3.7
    Covariance between W1 and W2 is then evaluated using the above ‘law of total covariance’ (b):
    C[W1,W2]=C[E[W1|Z],E[W2|Z]]+EZ[C[W1,W2|Z]]=C[E[W1|Z],E[W2|Z]](by cond. independence)=EZ[E[W1|Z]E[W2|Z]]EZ[E[W1|Z]]EZ[E[W2|Z]]3.8
    Hence, after some algebra, covariance can be written as
    C[W1,W2]=π(1π)(e10e11)(e20e21)=π(1π)Δ1Δ2,3.9
    where, for simpler notation, we have introduced the following abbreviations:
    e11=E[W1|I];e10=E[W1|C];e21=E[W2|I];e20=E[W2|C].
    Given π is different from zero or one, covariance between W1 and W2 is positive if and only if for both variables the mean is (strictly) larger, or smaller, under the event of ‘no integration’ than under ‘integration’. If the means are equal for W1, or W2, or both, covariance becomes zero. Finally, covariance is negative if the ordering of means with respect to ‘integration/no integration’ differs between the two variables.

    Note that an alternative way to obtain the covariance is to appeal to Hoeffding's lemma (see below), by inserting the distributions and rearranging them.

    (c) From local to global dependency

    Although we have just characterized the sign of the covariance between W1 and W2 in terms of conditional expectations, one may also ask about properties of the conditional distributions FI, GI, FC, GC that (i) are defined locally and (ii) are sufficient to determine the direction of dependence.

    The key tool is to refer to the well-known Hoeffding's lemma [11] formulated here for (W1, W2).

    Lemma 3.1 (Hoeffding).

    C[W1,W2]=0[H(w1,w2)H1(w1)H2(w2)]dw1dw2.

    It turns out that the usual stochastic order relation suffices to determine the direction of dependence between W1 and W2.

    Definition 3.2.

    Let X and Y be two random variables such that

    P(X>x)P(Y>x) for all x(,).
    Then X is said to be smaller than Y in the usual stochastic order, denoted as
    XstY.

    Theorem 3.3.

    LetFC(w1) > FI(w1) andGC(w2) > GI(w2) for allw1, w2∈[0, ∞); then, C[W1,W2]is positive. If both order relations are reversed, covariance is positive again. Reversing only one of the signs implies the covariance to be negative.

    Proof.

    In lemma 3.1 replace H, H1 and H2 according to equations (3.3) and (3.4); after algebraic rearrangement, the integrand becomes

    FC(w1)GC(w2)FI(w1)GC(w2)FC(w1)GI(w2)+FI(w1)GI(w2);
    thus, the covariance is positive if
    [FC(w1)FI(w1)][GC(w2)GI(w2)]>0;
    the theorem follows by reversing these steps; the case of negative covariance is analogous. ▪

    To summarize, assuming that the distribution of Wi under integration (respectively, no integration) strictly dominates the distribution of Wi under no integration (resp. integration) relative to the usual stochastic order relation ≤st, for i = 1, 2, implies positive (resp. negative) covariance between W1 and W2.

    (d) Variance

    For completeness, the corresponding equations for the variances of W1 and W2 are presented next. They are derived using the expression for a conditional variance. For i = 1, 2,

    V[Wi]=EZ[V[Wi|Z]]+VZ[E[Wi|Z]]=πV[Wi|I]+(1π)V[Wi|C]+π(1π)Δi2.3.10
    The last identity reveals that the variance of both the first and second processing stage is composed of a weighted average of the conditional variances (weighted by the probability of I and C occurring) plus an additional, non-negative term π(1 − π2i. The latter represents the part of variability that is due to the mixture generated by the occurrence of I or C. Obviously, this effect will be maximal for π = 0.5.

    4. Nonlinear dependency

    By Sklar's theorem [2], there exists a copula C such that

    H(w1,w2)=C(H1(w1),H2(w2)),4.1
    and, assuming the marginals are continuous, C is unique. It is also known that a binary mixture of copulas is again a copula [3]. However, although H is a mixture of product copulas, it need not be a product copula itself, as this example demonstrates.

    Example 4.1 (D. Pfeifer 2013, personal communication).

    H(x,y)=xy2(x+y)=12xy2+12x2y=12F1(x)G1(y)+12F2(x)G2(y),
    with F1(x) = x, F2(x) = x2, G1(y) = y2, G2(y) = y for 0≤x, y≤1. Thus, H is a mixture of product distribution functions. For the marginals,
    H1(x)=H(x,1)=12x(x+1)andH2(y)=H(1,y)=12y(y+1),
    for 0≤x, y≤1. However, for copula C one has
    C(u,v)=H(H11(u),H21(v))=H(12+121+8u,12+121+8v)=116(4+1+8u+1+8v)(1+1+8u)(1+1+8v)
    for 0≤u, v≤1, which is not a mixture of product copulas.

    Inserting the expression for the marginals as mixtures, equation (4.1) can be written as

    H(w1,w2)=C(H1(w1),H2(w2))=C(πFI(w1)+(1π)FC(w1),πGI(w2)+(1π)GC(w2)).4.2
    With Hi(wi) = ui, i = 1, 2, the copula is
    C(u1,u2)=H(H11(u1),H21(u2)).4.3
    In contrast to example 4.1, here the inverse functions can, in general, not be obtained explicitly. Rather than introducing specific distributional families for FI, GI, FC, GC, we will study a general measure of dependency that extends our results for linear dependency.

    (a) Kendall's tau for TWIN model

    Let (Wi1, Wi2), i = 1, 2, be two independent and identically distributed vectors with joint distribution function H and copula C. At the population level, Kendall's tau is defined as the probability of ‘concordance’ minus the probability of ‘discordance’ (e.g. [1,3]), i.e.

    τ(W1,W2)=P[(W11W21)(W12W22)>0]P[(W11W21)(W12W22)<0)].4.4
    Since P[(W11W21)(W12W22)<0)]=1P[(W11W21)(W12W22)>0)],
    τ(W1,W2)=2P[(W11W21)(W12W22)>0)]1=4P[W11>W21,W12>W22]1=4[0,1]2C(u1,u2)dC(u1,u2)14.5
    =4[0,1]2C(u1,u2)2C(u1,u2)u1u2du1du21.4.6

    Theorem 4.2.

    For the random processing times inTWIN, Kendall's tau equals

    τ(W1,W2)=2π(1π)(2V1),4.7
    with
    V=FC(w1)dFI(w1)GC(w2)dGI(w2)+FI(w1)dFC(w1)GI(w2)dGC(w2),
    where integration is over the positive reals, ℜ+.

    Proof.

    Given that

    P[(W11W21)(W12W22)>0]=P[W11>W21,W12>W22]+P[W11<W21,W12<W22],
    these probabilities can be evaluated by integrating over the distribution of one of the vectors, (W11, W12) or (W21, W22). For example,
    P[W11>W21,W12>W22]=P[W21<W11,W22<W12]=+2P[W21w1,W22w2]dH(w1,w2)=+2P[W21w1,W22w2]dC(H1(w1),H2(w2))=+2H(w1,w2)2H(w1,w2)w1w2dw1dw2.4.8
    Let fI(w1), gI(w2), fC(w1), gC(w2) denote the densities corresponding to FI(w1), GI(w2), FC(w1), GC(w2), respectively. Then,
    2H(w1,w2)w1w2=h(w1,w2)=πfI(w1)gI(w2)+(1π)fC(w1)gC(w2).4.9
    Inserting explicit expressions for H(w1, w2) and its second partial derivative into equation (4.8) results in
    P[W11>W21,W12>W22]=+2[πFI(w1)GI(w2)+(1π)FC(w1)GC(w2)]×[πfI(w1)gI(w2)+(1π)fC(w1)gC(w2)]dw1dw2.4.10
    Dropping the arguments w1 and w2 for simpler notation, expanding terms under the integral and rearranging them yields for the integrand over w1 and w2:
    FC×fC×GC×gC+π(2FC×fC×GC×gC+FC×fI×GC×gI+FI×fC×GI×gC)+π2(FC×fC×GC×gCFC×fI×GC×gIFI×fC×GI×gC+FI×fI×GI×gI).
    Observing that, e.g., the integral over FC × fC × GC × gC amounts to
    0(0FC(w1)fC(w1)dw1)GC(w2)gC(w2)dw2=12×12=14,
    the integral of equation (4.10) results in
    P[W11>W21,W12>W22]=14π2+πV+π2(14V+14),4.11
    where
    V+2[FC×fI×GC×gI+FI×fC×GI×gC]dw1dw2.4.12
    Because of symmetry, P[W11>W21,W12>W22]=P[W11<W21,W12<W22], so that
    τ(W1,W2)=4×(14π2+πV+π2(14V+14))1=2π(1π)(2V1).
     ▪

    Corollary 4.3.

    Assumeπis different from zero or one; then

    (i)

    IfFC = FIorGC = GI, thenτ(W1, W2) = 0;

    (ii)

    τ(W1, W2)≥ − (1/2).

    Proof.

    For (i): Assume FC = FI; then,

    V=0(0FC(w1)fC(w1)dw1)[GC(w2)gI(w2)+GI(w2)gC(w2)]dw2=12.
    The case GC = GI follows by symmetry. For (ii): as an integral over a linear combination of non-negative functions, V is non-negative. Thus, the smallest value for τ is obtained by setting V = 0 and π = (1/2). ▪

    The corollary shows that Kendall's τ will be non-zero only if both pairs of marginal distributions, (FI, FC) or (GI, GC), contain non-identical distributions; thus, for τ to be non-zero, there must be an effect of integration on the distribution in both first and second stage processing. In contrast, for covariance C[W1,W2] (equation (3.9)) to be non-zero there must be an effect of integration on the expected values in both pairs of marginal distributions, i.e. in both the first and second stage. In other words if e.g. (FI, FC) have equal means but different variances, processing times W1 and W2 will be linearly independent but may exhibit nonlinear dependency (nonzero τ).

    (b) From local to global dependency

    As shown in the last section, Kendall's tau for the TWIN model may be positive or negative. In analogy to the linear dependence, the question arises whether there exists an order relation on the random variables that determines the sign of dependency. The answer is in the affirmative: it is again a well-known stochastic order relation (e.g. [12], p. 42)

    Definition 4.4.

    Let X and Y be continuous random variables with densities f and g such that

    g(t)f(t)increases in t over the union of support of X and Y,
    (here, a/0 is taken to be equal to ∞ whenever a > 0) or, equivalently,
    f(x)g(y)f(y)g(x)for all xy.
    Then X is said to be smaller than Y in the likelihood ratio order, denoted as
    XlrY.

    The key here is the following representation of Kendall's tau by Nelsen [13]:

    Lemma 4.5 ([13]).

    LetXandYbe continuous random variables with joint density functiong(x, y); with −∞ < x1 < x2 < ∞ and −∞ < y1 < y2 < ∞

    τ(X,Y)=2y2x2[g(x2,y2)g(x1,y1)g(x1,y2)g(x2,y1)]dx1dy1dx2dy2.4.13

    Note that function g is called positively likelihood ratio-dependent (PLR), or totally positive of order 2 (TP2) if the above integrand is non-negative. With these preparations, we state the following theorem for TWIN model random variables. Note that we write e.g.

    [W1|I]lr[W1|C]and[W2|I]lr[W2|C],
    for the likelihood ratio order defined by the densities fC(w1), fI(w1) and gC(w2), gI(w2), respectively.

    Theorem 4.6.

    Let [W1|I]≤lr[W1|C] and [W2|I]≤lr[W2|C]; thenτ(W1, W2)≥0. If both order relations are reversed, thenτ(W1, W2)≥0 again. If only one of the order relations is reversed, thenτ(W1, W2)≤0.

    Proof.

    Consider the integrand of equation (4.13) with g(x, y) replaced by the joint density h(w1, w2) of (W1, W2). For w1w1′ and w2w2′, the integrand becomes

    h(w1,w2)h(w1,w2)h(w1,w2)h(w1,w2).
    Inserting according to equation (4.9),
    h(w1,w2)=πfI(w1)gI(w2)+(1π)fC(w1)gC(w2),
    yields for the integrand, after routine algebraic manipulation,
    π×(1π)×[fC(w1)fI(w1)fC(w1)fI(w1)]×[gC(w2)gI(w2)gC(w2)gI(w2)].
    The terms in brackets will be non-negative if both
    fC(w1)fI(w1)fC(w1)fI(w1) and gC(w2)gI(w2)gC(w2)gI(w2),
    that is, [W1|I]≤lr[W1|C] and [W2|I]≤lr[W2|C]. Thus, the integrand will be non-negative and τ(W1, W2)≥0. The other claims of the theorem follow trivially. ▪

    5. Conclusion

    According to the TWIN model, RT is an additive combination of two random variables, the first and second stage processing times W1 and W2 with the joint distribution being a binary mixture with indicator function Z for the event of integration (see equation (3.7)). We derived explicit terms for two measures of dependency between W1 and W2, covariance and Kendall's tau. Specifically, we showed that Kendall's tau may not be smaller than −1/2 and will equal zero whenever there is no effect of integration in either the first or second stage. Moreover, comparing processing times under integration with those without integration with respect to two stochastic order relations—the usual stochastic order and the likelihood ratio order—determines the sign of dependency.

    It is noteworthy that our statistical results do not at all depend on the exact definition of event I. It has to be a binary event but can be arbitrary otherwise. Obviously, P(I) = π is an increasing function of window width parameter ω. Because of the factor π(1 − π) occurring in both terms for the dependency between W1 and W2, this implies that its strength will be maximal for a value of ω yielding π = 0.5.

    As mentioned in the introductory section, numerical values for the covariance or Kendall's tau cannot be obtained without explicit assumptions about the distribution of W1 and W2. The simple reason is that these random variables are not observable, only there sum is, according to the model. One may thus wonder whether any of the results derived here will allow empirical testing of some aspects of the model.

    We will not cover this issue in full depth here because it would require presenting more of the current state of research in multisensory integration. However, our results are likely to be relevant in the context of some controversy about the role of variability, specifically comparing the variance of RT to unimodal versus multisensory stimuli. First, from equation (2.4), it is clear that, in TWIN, the covariance between W1 and W2 directly modulates the (observable) variance of their sum. Moreover, this covariance is a function of the probability of integration π, which in turn depends on the size of the time window of integration ω. There are a number of well-tested experimental manipulations to influence the time window (e.g. [8,9]), so this will allow deriving certain hypotheses about RT variance in the context of additional assumptions. Second, it is known that the usual stochastic order is closed under convolution for (conditionally) independent random variables; then, according to theorem 3.3, the covariance will be positive/negative under a strict stochastic ordering of the response times under integration versus no integration. An analogous observation holds for the likelihood ratio order and Kendall's tau with the additional requirement of logconcave densities (e.g. [12]). Thus, for certain broad families of distributions, further predictions about RT variability can be derived. This general route of investigation seems promising but needs further scrutiny.

    Data accessibility

    This article has no additional data.

    Authors' contributions

    H.C. and A.D. conceived of the study, and H.C. drafted the manuscript. Both authors read and approved the manuscript.

    Competing interests

    We declare we have no competing interests.

    Funding

    A.D. was supported by grant no. DI 506/12-1 (German Science Foundation/DFG). H.C. was supported by a grant from German Science Foundation/DFG (SFB/TRR31) and Cluster of Excellence (DFG) ‘Hearing4all’ of Oldenburg University.

    Footnotes

    One contribution of 16 to a theme issue ‘Contextuality and probability in quantum mechanics and beyond’.

    Published by the Royal Society. All rights reserved.

    References