Journal of The Royal Society Interface
You have accessResearch articles

A continuous statistical-geometric framework for normative and impaired gaits



    A quantitative analysis of human gait patterns in space–time provides an opportunity to observe variability within and across individuals of varying motor capabilities. Impaired gait significantly affects independence and quality of life, and thus a large part of clinical research is dedicated to improving gait through rehabilitative therapies. Evaluation of these paradigms relies on understanding the characteristic differences in the kinematics and underlying biomechanics of impaired and unimpaired locomotion, which has motivated quantitative measurement and analysis of the gait cycle. Previous analysis has largely been limited to a statistical comparison of manually selected pointwise metrics identified through expert knowledge. Here, we use a recent statistical-geometric framework, elastic functional data analysis (FDA), to decompose kinematic data into continuous ‘amplitude’ (spatial) and ‘phase’ (temporal) components, which can then be integrated with established dimensionality reduction techniques. We demonstrate the utility of elastic FDA through two unsupervised applications to post-stroke gait datasets. First, we distinguish between unimpaired, paretic and non-paretic gait presentations. Then, we use FDA to reveal robust, interpretable groups of differential response to exosuit assistance. The proposed methods aim to benefit clinical practice for post-stroke gait rehabilitation, and more broadly, to automate the quantitative analysis of motion.

    1. Introduction

    Human gait biomechanics vary widely based on individual physiological parameters such as age, height, weight and muscle strength [1,2]. Gait can also be affected by neuromotor disorders such as stroke [3] or Parkinson’s disease [4], which can further reduce patient quality of life and community participation [5]. Characterizing gait can provide valuable information for the diagnosis, understanding and treatment of impairment [6], to the extent that gait speed has been termed the ‘sixth vital sign’ [7]. Thus, the development of methods that enable a quantitative analysis of gait, at the population and individual levels, along with associated clinical and biomechanical interpretations, has become an important focus of research in gait diagnostics and rehabilitation.

    Common methods for analysing gait presentations from different groups can be broadly categorized into either statistical comparisons of point metrics or dimensionality reduction methods on engineered feature sets. In both cases, the chosen measures encapsulate signal amplitude (the magnitudes of local extrema), signal phase (relative timing within the gait cycle) or a combination of both [8]. These methods, while effective, rely heavily on expert domain knowledge to both identify metrics of interest and to define significant changes within them [9], which can be labour and time intensive. Moreover, as these methods are confined to a predefined, finite-dimensional space, they are limited in their ability to distinguish between gait characteristics that span across domains, such as multi-joint or spatiotemporal variance. As a result, insight into the biomechanical locomotor strategies used by individuals is limited to the space of chosen metrics. These challenges are further compounded when analysing the effects of an intervention, such as robotic assistance, due to the redundancy in the musculoskeletal system and high heterogeneity of clinical gait and neurophysiology.

    In clinical settings, functional measures focus on high-level outcomes such as walking speed, symmetry and capacity, through standardized tests such as the 6-min and 10-m walk tests [10]. These measures require minimal equipment and are thus highly accessible to clinicians. However, there is growing appreciation for studying the geometry of gait through kinematics (e.g. joint angles) and dynamics (e.g. joint torques) and understanding the biomechanical mechanisms underlying a specific gait presentation. For example, people post-stroke often increase the high-level outcome of walking speed through compensatory mechanisms, such as hip circumduction or hip hiking, rather than through increased ankle torque production [11,12]. Such biomechanical measures are typically obtained by using motion capture and force ergometry, in conjunction with inverse dynamics techniques. Both clinical and biomechanical measures have been used to successfully differentiate between gait from clinical and unimpaired populations [8], stratify gait from within a single population [13] and evaluate the efficacy of assistive devices [14].

    With access to these kinematic and dynamic measurements, one is naturally led to ask if we can move beyond point metrics to a holistic analysis of gait, using the full time series data. For instance, some studies have directly applied methods such as principal component analysis (PCA) [15] or statistical parametric mapping [16] to gait time series and demonstrated continuous biomechanical differences between groups of participants. However, continuous analysis holds an inherent challenge due to variation in relative timings within the gait cycle. Even when data are time normalized to account for variable stride duration, misalignment in measurements across strides can introduce noise into summary calculations, such as the mean and variance curves. One solution to mitigate this issue is to align events within the gait cycle with algorithms such as dynamic time warping (DTW) before further analysis. Such algorithms stretch and contract the time axis of a signal to optimally match a reference signal [17,18]. Multiple studies have shown that DTW-based analysis can improve stride segmentation [19,20], enable recognition of unimpaired individuals by gait [21] and yield clinically meaningful metrics [22]. However, DTW is designed for pairwise time series alignment, rather than for collections of strides. Moreover, while DTW addresses the challenges of temporal variability within data, it ultimately causes temporal information to be removed from analysis. Finally, this algorithm lacks a rigorous theoretical framework in which desirable properties of alignment, such as symmetry, are met. To mitigate these issues, we look into elastic functional data analysis (FDA), a recent statistical framework under which to perform alignment, termed elastic alignment, and calculate representative templates of temporally variable data [2325]. Promising recent work has shown that elastic FDA can enhance statistical analysis of impaired gait, but has yet to be extended for automating extraction of biomechanically interpretable features from gait [26]. In this work, we build on existing literature by integrating ideas from continuous gait analysis, time series alignment and spatiotemporal functional analysis.

    Specifically, we demonstrate how elastic alignment can be used alongside common dimensionality reduction techniques to enhance clinical gait analysis at both the population and individual levels, while removing the need for manual feature selection. We conduct this proof of concept using gait kinematics as it both serves as a representation of postural geometry during gait and is an accessible measure that can be estimated through low-cost wearable sensors [27]. First, we show that the decomposition of a signal into its amplitude and phase through elastic FDA, together with PCA, provides interpretable characterizations of unimpaired, post-stroke paretic (more impaired) and post-stroke non-paretic (less impaired) gait presentations. We then show that the integration of elastic FDA with unsupervised clustering methods [28] reveals two types of user responses to gait assistance with a soft exosuit and improves robustness to intra-subject variability relative to conventional methods. We discuss the mathematical background in §2, describe our data collection and pre-processing methods before presenting application-specific details in §3 and present results in §§4 and 5. Finally, we conclude with a discussion on the utility of this approach and future directions in §6.

    2. Mathematical background

    Elastic FDA presents a theoretical foundation for time series analysis through decomposition into an amplitude component, which describes the magnitudes assumed by a function (e.g. joint angles), and a phase component, which captures its relative temporal structure [29]. More formally, consider a set of N time series, or functions, {f1, f2, …, fN}, sampled at T + 1 equally spaced times t = {0, 1/T, 2/T, …, (T − 1)/T, 1}. The distortion of a function along the horizontal axis can be formulated by introducing a warping functionγ(t), which represents a transformation between two time axes: t~=γ(t). Then, time warping can be represented as the composition of an original function fi with γ, to obtain the warped function gi(t) = fi(γ(t)). Notably, a valid warping function γ(t) must satisfy three requirements: (1) it transforms the finite time interval t ∈ [0, 1], such that γ(0) = 0 and γ(1) = 1; (2) warping must be a reversible operation, i.e. γ should be invertible and therefore strictly monotonic; and (3) the function and its inverse γ−1 should be smooth to have theoretical guarantees and reduce artifacts.

    The problem of pairwise alignment of two functions f1 and f2 can be formulated as finding the warping function γ* which minimizes some cost, E[f1, f2(γ)], that encodes the desired properties of alignment. Ideally, we would like warping to exhibit inverse symmetry, with optimal alignment of f2 to f1 given by (γ*)−1. Moreover, γ* should optimally align f1γ~ to f2γ~ for any other warping function γ~ to exhibit invariance to simultaneous warping. While a common choice for the cost function E is the Euclidean (L2) norm, this cost satisfies neither inverse symmetry nor invariance to simultaneous warping [23]. Instead, we consider the elastic FDA framework proposed by Srivastava et al. [24]. In this framework, we first define the square-root slope function operator:

    Conceptually, this function is a normalized representation of the slope f′(t). Then, the corresponding cost function satisfies the desired properties described earlier (see [23] for details):

    We can now calculate a mean time series, the Karcher mean(f¯K) [23], that is informed by both amplitude and phase distributions. The iterative calculation of the Karcher mean also yields the corresponding warping functions {γi} for each original function in {fi}, such that fi(γi(t)) are aligned with f¯K. These warping functions form the phase component of the dataset, while the aligned functions form the amplitudes.

    In figure 1, we provide intuition for elastic FDA alongside an overview of the kinematic data used in this work. Specifically, figure 1b (top row) demonstrates amplitude–phase decomposition over a set of Gaussian curves that exhibit variation in the amplitudes along the vertical axis, and in the phases along the horizontal axis. We align a set of functions {fi(t)} (middle column) to obtain the aligned functions {fi(γi(t))} (left column) and the corresponding warping functions {γi(t)} (right column). A point on the warping function below the identity line (γ(t) = t) indicates earlier timing compared to the mean, while a point above the identity line indicates later timing. For example, we see that the blue original function attains the highest and earliest peak; consequently, it has the largest amplitude and the lowest warping function. Conversely, the green and purple functions have similar peak magnitudes but are shifted in time; we see that their amplitudes align closely, while their phases are offset throughout the time interval.

    Figure 1.

    Figure 1. Elastic functional analysis of joint angles. (a) Joint angle conventions for the hip (θh), knee (θk) and ankle (θa). (b) Demonstration of elastic alignment as described in §2. The top row demonstrates the process with Gaussian functions of varying means and standard deviations, and the bottom three rows show example data from the three joints of a representative unimpaired individual walking overground. The middle column (shaded) consists of the original curves, while the left and right columns consist of the aligned functions (amplitudes) and warping functions (phases), respectively. The composition of an original function fi(t) with the associated warping function γi(t) yields the aligned function gi(t) = fi(γi(t)). We observe improved alignment for this participant’s data in certain regions of the gait cycle such as maximum plantarflexion and knee extension.

    In the context of the current study, the set of time series {fi(t)} represents joint angles across strides, from 0% to 100% of the gait cycle. Figure 1b (rows 2–4) shows the results of the amplitude–phase analysis for the hip, knee and ankle angles during overground walking for a representative unimpaired individual. We observe improved alignment in amplitude, particularly at maximum ankle plantarflexion (∼70% GC) and knee extension (∼50% GC). These regions of large improvements in alignment correspond to regions of high variance in {γi} and indicate sections of the gait cycle that experience high temporal variability across strides, which we can now quantify.

    3. Data collection

    3.1. Participants

    Overground gait data from unimpaired young adults (N = 10, 5 female; 26 ± 4 yrs (mean ± s.d.); 68 ± 17 kg) and from chronic post-stroke patients (>6 months post-stroke; N = 21, 9 female; 78 ± 19 yrs; 54 ± 11 kg) were used for this work. All unimpaired participants reported no previous history of musculoskeletal injury or disease, and all participants provided written informed consent prior to participation. The study procedures were approved by the Harvard Longwood Medical Area and Boston University Institutional Review Boards.

    3.2. Experimental protocol

    Participants walked around an oval overground track at their self-selected walking speeds for 6 min. Lower limb kinematics and ground reaction forces were measured using optical motion capture (Qualisys Inc., Sweden) and instrumented force plates (Bertec, Columbus, OH), respectively.

    Post-stroke participants completed an additional 6-min walk while wearing a mobile, unilateral soft ankle exosuit [30] (figure 4a). The soft exosuit applied assistive stance-phase plantarflexion and swing-phase dorsiflexion torques at the paretic ankle (see [31] for hardware and controller details).

    3.3. Software

    Inverse dynamics were calculated with biomechanics software, Visual 3D (C-Motion, Germantown, MD). The data were then segmented into strides using heel strikes detected by the force plates and interpolated to 101 points. All ensuing analyses were performed in MATLAB (The MathWorks Inc.). All alignment and Karcher mean calculations were performed using the ‘fdasrvf’ MATLAB package developed by J. Derek Tucker [32].

    3.4. Data pre-processing

    We focused on sagittal-plane joint kinematics for this study, using standard joint angle conventions (figure 1a). The range of motion (ROM) of each joint during gait varies widely, with the ankle, knee and hip spanning approximately 30°, 60° and 40°, respectively, in unimpaired individuals [33]. The ROM further varies across people post-stroke depending upon the severity of impairment [34]. Thus, to weight changes at each joint equally, we divided each joint angle curve by a measure of its ROM. For comparison between unimpaired and post-stroke populations (Application I), the average unimpaired ROM was used to normalize the data. For evaluation of individual response to exosuit assistance (Application II), data were normalized by the baseline (no exosuit) ROM for each participant. In the latter application, the normalization aimed to also mitigate any biases from correlations between participant impairment level and their capacity to change gait kinematics. Further detail on application-specific normalization is provided in §§4.1.1 and 5.1.1.

    4. Application I: differentiating unimpaired and post-stroke gait

    First, we aimed to use this continuous and unsupervised framework to understand the spatial and temporal differences that characterize post-stroke gait kinematics relative to unimpaired gait, using a combination of elastic alignment and PCA.

    4.1. Methods

    4.1.1. Pre-processing

    We analysed data from the right side of the 10 unimpaired participants and the paretic and non-paretic sides of the 21 post-stroke individuals as they walked without any device, for a total of 52 sets of gait measurements, {fij}. For each joint j of each set i, we used elastic alignment to calculate the Karcher mean, fij¯K, across strides. After normalization by the mean unimpaired joint-level ROM, we concatenated the hip, knee and ankle Karcher means to obtain a single representative gait vector per set. Finally, we aligned all such gait vectors, {fi}, yielding an overall fi¯K for the dataset, a set of amplitude functions {fiγi} and a set of warping functions {γi}.

    4.1.2. Principal component analysis

    In this work, we leveraged PCA, an established method for dimensionality reduction [35], to understand structure within the resultant amplitudes and phases. Specifically, we construct a data matrix X in which each column contains a separate observation, subtract the mean X¯ and calculate the singular value decomposition of the result using the form XX¯=USVT. The columns Up, termed the principal components (PCs), form the basis vectors for the lower-dimensional space, and each data sample i is associated with coefficients Vi, representing coordinates within this space. The diagonal matrix S contains singular values, which represent the relative significance of the components. We separately applied PCA to the aligned amplitudes ({fiγi}) and phases ({γi}) to differentiate among unimpaired, paretic and non-paretic gait.

    To understand how each component manifests across the gait cycle, we scaled each PC to span the range of coefficients observed in the unimpaired, paretic and non-paretic data, while keeping the other components fixed. Formally, we visualized:

    where VC,p is a vector containing the coefficients for strides in category C, corresponding to PC p, and gC,p,β is the resulting curve for scaling factor β. We considered C ∈ {‘unimpaired’, ‘paretic’, ‘non-paretic’}, p ∈ {1, 2, 3}, and chose β{1,12,0,12,1} to act over one standard deviation of VC,p.

    To analyse the phase, we repeated this approach and generated gC,p,β for each PC in the space of warping functions {γi}. We then composed the unimpaired Karcher mean with gC,p,β to understand how variation in the warping functions translated to variation in the gait cycle. Finally, for further visualization of the amplitude and phase PCs, animations of gait defined by gC,p,0 for each category and PC are provided in the electronic supplementary material.

    4.1.3. Statistical evaluation

    The separability of the three categories (unimpaired, paretic and non-paretic) in the PC coefficient space quantifies how much the PCs correspond to the distinguishing characteristics of each category. For each PC, we performed a Kruskal–Wallis test at a significance level of α = 0.05 to test for differences in the coefficient distributions across categories. If a significant main effect of category was observed, we conducted posthoc pairwise Wilcoxon rank-sum tests with Tukey–Kramer corrections.

    4.2. Results

    4.2.1. Amplitude

    In figure 2a, we applied PCA to the set of amplitudes and found that in the three-dimensional PC coefficient space, the three categories formed largely separable clusters. The first, second and third PCs accounted for 52%, 20% and 10% of the variation in the data, respectively. Specifically, we found that coefficients of PC1 for paretic strides were significantly different from those for unimpaired (p = 0.002) and non-paretic (p < 0.001) strides. Similarly, in figure 2b, we see that PC2 separated unimpaired from paretic (p = 0.002) and non-paretic (p < 0.001) strides, and that PC3 separated unimpaired and paretic strides (p = 0.001).

    Figure 2.

    Figure 2. Amplitude-based PCA. (a) Coefficients of the first three PCs obtained by performing PCA on the aligned amplitude curves across the pooled unimpaired (Inline Graphic), non-paretic (Inline Graphic) and paretic (Inline Graphic) data, as described in §4.1.2. The shaded regions represent 2-standard-deviation covariance ellipsoids for each category of gait. (b) Coefficient distributions for the first three PCs across the three categories. Coefficients are significantly different between paretic and unimpaired, between paretic and non-paretic data in PC1 (p ≤ 0.002), between unimpaired and impaired in PC2 (p ≤ 0.002) and between unimpaired and paretic in PC3 (p = 0.001). (c) Effect of the first three PCs on joint angle time series data for each category, as described in equation (4.1).

    In figure 2c, we isolated the effects of each PC as described in equation (4.1). PC1 underscored the reduced paretic hip and knee flexion across the gait cycle and reduced ankle excursion in both dorsiflexion and plantarflexion, indicating lower ROM in the paretic limb than in the non-paretic or unimpaired limbs. PC2 showed that impaired gait is associated with lower absolute ankle angle across the gait cycle, along with reduced hip extension. Finally, PC3 captured variation in peak ankle plantarflexion and peak hip extension, with the coefficients for non-paretic gait falling between those for unimpaired and paretic gait. For an intuitive picture of how these differences manifest within the gait cycle, we also generated gait animations for these PCA components (see electronic supplementary material).

    4.2.2. Phase

    In figure 3a, applying PCA to the phase data also led to separability across categories in the PC coefficient space. The first, second and third PCs accounted for 44%, 27% and 8% of the variation in the data, respectively. The coefficients for PC1 were significantly different across all three categories—unimpaired and paretic (p = 0.025), unimpaired and non-paretic (p = 0.029) and paretic and non-paretic (p < 0.001)—with the unimpaired samples occupying a narrow intermediate range, and the paretic and non-paretic samples covering either side. PC2 differentiated unimpaired from impaired data (p < 0.001). Finally, PC3 did not provide additional separation between categories.

    Figure 3.

    Figure 3. Phase-based PCA. (a) Coefficients of the first three PCs obtained by performing PCA on the warping curves γi across the pooled unimpaired (Inline Graphic), non-paretic (Inline Graphic) and paretic (Inline Graphic) data, as described in §4.1.2. The shaded regions represent 2-standard-deviation covariance ellipsoids for each category. (b) Coefficient distributions for the first three PCs across the three categories. Coefficient distributions were different across all three categories in PC1 (p ≤ 0.029) and between the unimpaired and impaired categories in PC2 (p ≤ 0.001). (c) Effect of the first three phase PCs on joint kinematics for each category, as described in equation (4.1).

    By examining the gait cycle visualizations, we observed that PC1 extracted differences in the stance-to-swing ratio, with paretic strides spending less time in stance relative to unimpaired data, and the non-paretic strides compensating with increased time in stance. PC2 identified differences in the early–mid stance phase while PC3 represented minor variation in the relative timing of late swing phase. While this third component did not differentiate between categories, we found that paretic strides exhibited higher variation in the corresponding coefficients. Once again, we generated gait animations to visualize these differences within the gait cycle (see electronic supplementary material).

    5. Application II: evaluating individual response to exosuit assistance

    Functional alignment can also be used to reduce the effects of individual variance and noise during group-level analysis. In this application, we aimed to mitigate the effects of intra-subject variability through alignment and enable unsupervised stratification of user response to exosuit assistance with clustering. We posited that this approach would (i) extract distinct categories of user response to exosuit assistance from geometric representations of their gait and (ii) result in improved robustness compared to analysis without alignment. A summary of the methods is shown in figure 4a.

    Figure 4.

    Figure 4. Effects of exosuit assistance on post-stroke gait biomechanics using elastic functional data analysis (FDA). (a) Collection and processing methods for evaluating individual response to exosuit assistance (ankle data for one subject shown). People post-stroke walked along an overground path at their comfortable walking speeds with (EXO+) and without (EXO) a unilateral soft assistive ankle exosuit, while optical motion capture, force plate and exosuit sensor data were collected. For each subject, kinematics for each joint were aligned using FDA to generate a representative curve of EXO+ and EXO gait, then divided by the EXO range of motion and finally subtracted from each other to obtain a set of difference curves {δs} = EXO+ − EXO for clustering (see §5.1.2). Two clusters resulted from this process. (b) Metrics used in evaluating dynamic gait quality (see §5.1.3). People post-stroke typically present with low ankle torque (τa), low walking speed (v) and high circumduction (Δxf) [34]. (c) Average joint angle time series for the quality-based group (QBG) and speed-based group (SBG) during EXO+ and EXO. The magnitude of change between EXO+ and EXO differs between the two clusters, with the largest distinction at the ankle joint. (d) Changes between EXO+ and EXO in walking speed, peak paretic net and biological ankle torque, and peak hip circumduction in QBG and SBG. The clusters show significant (p < 0.05) differences in their response to exosuit assistance. Although both groups increased net ankle torque, the two groups did not significantly differ in the magnitude of increase (p = 0.083). SBG also increased in biological ankle torque with an associated increase in walking speed. Conversely, QBG reduced biological ankle torque without any change in walking speed, but reduced hip circumduction (see §5.2.1).

    5.1. Methods

    5.1.1. Pre-processing

    We analysed data from the paretic limb as post-stroke participants walked with (EXO+) and without (EXO) the soft ankle exosuit, for a total of 42 sets of gait measurements, {fij}. As in Application I, for each joint j of each set i, we used elastic alignment to calculate the Karcher mean, fijK¯, across strides. We then normalized each element of fijK¯ by the ROM observed in joint j during EXO for the corresponding subject. The normalized Karcher means were concatenated across joints to generate a single representative gait vector per set fiK¯. Finally, we generated a difference curve between EXO+ and EXO gait, δs=fs,EXO+¯fs,EXO¯, to serve as the input vector for each subject s.

    5.1.2. Clustering

    Clustering is an unsupervised learning paradigm that separates a set of observations, in our case {δs}, into distinct subgroups. Here, we applied agglomerative hierarchical clustering as it is deterministic for a given set of data and thus is more robust for small datasets. With this method, each observation begins in its own subgroup, after which the two closest subgroups are merged, and the process continues until all observations are placed in the same subgroup.

    We used Euclidean distance for comparing observations, and Ward’s linkage method [36] for computing inter-cluster distances dab:

    where μa and μb are the means of clusters a and b, which contain na and nb elements, respectively. dab quantifies the additional variance resulting from merging clusters a and b. Thus, this distance minimizes the squared total within-cluster variance at each merging step.

    To define the optimal number of subgroups, we maximized the silhouette coefficient (SC) [37], a measure of the mean inter-cluster (μ¯) to intra-cluster (x¯) distance ratio:

    We evaluated SC for two to six clusters to determine the final number of subgroups.

    We then leveraged the ability of clustering to detect outliers in our data to reduce the sensitivity of the algorithm [38]. We used an automated iterative process to identify outliers as observations that were clustered alone, i.e. a difference curve δs was an outlier if the distance between δs and every member of {δr} for rs was greater than the distance between all pairs of elements in {δr}. We removed observations until the smallest generated cluster contained at least two samples. Finally, we applied a hierarchical clustering algorithm to the resulting cleaned dataset of normalized difference curves to investigate categories of response to exosuit assistance.

    5.1.3. Evaluation

    Biomechanics We hypothesized that stroke survivors would use the combination of plantarflexion and dorsiflexion exosuit assistance to increase gait speed with a reliance on hip circumduction, a common compensatory mechanism, maintain walking speed with reduced circumduction, or both increase gait speed and reduce circumduction [39,40]. We further hypothesized that clustering on {δs} would elucidate these categories of gait response. Thus, we evaluated changes in gait kinematics, speed, joint torques and hip circumduction. The net change in joint kinematics due to exosuit assistance was defined as the mean absolute value of δj,s for each joint j and subject s. Net ankle torque (τa,net) was computed through inverse dynamics, and biological ankle torque (τa,bio) was calculated as the difference between τa,net and exosuit-applied torque [41]. Maximum hip circumduction (Δxf) was measured from lateral movement of the heel marker. A decrease in Δxf indicates an improvement in the quality of gait. Given the small sample size, we used the non-parametric Kruskal–Wallis test [42] to check for significant differences in user response to assistance across the resultant clusters, with a significance level of α = 0.05.

    Robustness To evaluate whether using alignment influenced sensitivity to intra-subject stride-by-stride variability, we used the evaluation score introduced by Gloumakov et al. [43]. Briefly, the evaluation score assumes that observations from one individual are likely to be more similar to each other than to observations from any other individual. The score is the percentage of pairs of intra-subject strides that are grouped together when clustering on all strides from all subjects. For example, the evaluation score with one cluster is 100%, as all strides are placed in the same group. As the number of clusters increases, the evaluation score decreases monotonically. This score can then be used to compare the performance of different clustering algorithms up to k clusters, where k is the number of individuals in the dataset.

    We reran the clustering process using individual strides from each subject and condition, with and without alignment. Since the number of strides varied by condition and subject, we first constructed the set of all possible difference curves for each subject using every pairwise combination of EXO+ and EXO strides. To avoid biasing towards individuals with more strides, we randomly selected n curves for each subject, where n = 15 was the size of the smallest set of generated difference curves. We repeated this process 25 times to get a distribution of evaluation scores for 1 to 21 clusters. We further used this metric to compare the performance of hierarchical versus two common non-hierarchical clustering algorithms, k-means and k-medoids, with and without alignment.

    5.2. Results

    5.2.1. Biomechanics

    As hypothesized, clustering the aligned difference curves as described in §5.1.2 resulted in separable clusters based on exosuit-driven changes in sagittal-plane gait kinematics (figure 4a). Two subjects were removed as outliers and two clusters, with 8 and 11 subjects, respectively, were identified from the cleaned dataset. The two groups adopted different functional strategies in response to the exosuit assistance, one that increased walking speed, hereafter referred to as the speed-based group (SBG), and one that reduced hip circumduction, hereafter referred to as the quality-based group (QBG). Individuals in SBG showed significantly higher increases in comfortable walking speed (p = 0.026) and peak τa,bio (p = 0.013) than in QBG (figure 4d). Specifically, with exosuit assistance, individuals in SBG increased walking speed by 0.08 ± 0.04 m s−1 (mean ± standard error; 13.5±6%), which is between the thresholds for small and substantial meaningful changes in the literature [44]. Meanwhile, those in QBG showed negligible changes of −0.01 ± 0.02 m s−1 (0.8±2%). Although both SBG and QBG increased peak τa,net by 0.18 ± 0.03 N m kg−1 (30.1±5.9%) and 0.11 ± 0.02 N m kg−1 (11.4±3.1%), respectively, the magnitude of change was not significantly different between groups (p = 0.083). However, peak τa,bio increased by 0.04 ± 0.03 N m kg−1 (9.5±4.1%) in SBG, but decreased by 0.08 ± 0.03 N m kg−1 (5.2±2.7%) in QBG. Conversely, subjects in QBG demonstrated larger improvements in hip circumduction than those in SBG (p = 0.039). Specifically, while maximum hip circumduction worsened by 6.4 ± 4 mm (20.9±12.8%) in SBG, it improved by 9.4 ± 6.3 mm (16.6±12.6%) in QBG, both of which are of the same order of magnitude of change observed in past exosuit studies [39]. These results suggest that (i) this approach may differentiate between subcategories of response to external assistance and (ii) individuals may use assistance to improve high-level gait function or low-level gait quality.

    Subjects in SBG also showed significantly more kinematic deviation from EXO during EXO+ than those in QBG at the ankle (p < 0.001) and knee (p = 0.002) joints, but not at the hip (p = 0.283) (figure 4c). The average absolute change of the normalized joint angle (degrees/subject-specific ROM) was 0.19 ± 0.02, 0.10 ± 0.01 and 0.10 ± 0.02 at the ankle, knee and hip, respectively, for those in SBG, and 0.10 ± 0.01, 0.05 ± 0.01 and 0.07 ± 0.01 for those in QBG. Rescaling by the original normalization factors, these changes approximately corresponded to 5.0° ± 1.9°, 5.0° ± 2.3° and 3.4° ± 1.7° for SBG, and 2.5° ± 1.1°, 2.2° ± 1.1° and 2.7° ± 1.9° for QBG. Finally, we applied the FDA framework to align kinematics during EXO+ and EXO for each subject and investigated the resultant warping functions for the two groups (electronic supplementary material, figure S1). We observed negligible differences between the temporal responses of the two groups. However, both showed the largest changes in phase at the ankle joint, followed by the knee joint and finally the hip joint.

    5.2.2. Robustness

    The mean evaluation score was higher when using aligned gait data compared to the original data across the range of clusters tested (electronic supplementary material, figure S2). Moreover, the magnitude of improvement increased as the number of clusters grew, and at 21 clusters, we obtained a mean evaluation score of 78% with FDA versus 65% without. The difference in evaluation score was also present when using non-hierarchical clustering algorithms, with FDA leading to improvements at 21 clusters of 12% in both k-means and k-medoids.

    Hierarchical clustering contributed towards increased robustness compared to using k-means and k-medoids, across all numbers of clusters tested (electronic supplementary material, figure S2). At two clusters, both k-means and k-medoids achieved evaluation scores of 89% with FDA, while hierarchical clustering achieved 92%. With 21 clusters and alignment, the scores of k-means and k-medoids were 68% and 70%, respectively, versus 78% with our approach.

    6. Discussion

    In this work, we have described a framework for gait analysis integrating elastic alignment with conventional dimensionality reduction techniques, and presented results from two complementary applications for post-stroke gait. In our first application, we used elastic alignment followed by PCA to understand the primary modes of gait variation in both the spatial (amplitude) and temporal (phase) dimensions, demonstrating the advantages of continuous, unsupervised analysis. In our second application, we combined alignment with hierarchical clustering to reveal stratification of user response to gait assistance with a soft exosuit and demonstrated robustness to individual variation. Within both applications, this approach yielded biomechanical interpretations that align with and build upon prior work.

    Unlike most conventional approaches, this framework automatically leverages information from the full gait cycle while remaining biomechanically interpretable. Specifically, we found that the principal modes of variation in our dataset of unimpaired, paretic and non-paretic gait corresponded to statistically significant differences across the three categories, in both spatial and temporal domains. These results are consistent with recent work that employed a similar approach to identify statistically significant amplitude and phase differences between the centre-of-pressure trajectories of unimpaired individuals and patellofemoral pain syndrome patients [26]. Notably, the gait characteristics corresponding to regions of high spatiotemporal variance in our data were similar to manually selected features and point metrics from prior clinical studies. Analysis of amplitude PC1 was consistent with prior work showing that peak hip flexion, knee flexion at toe-off and peak knee flexion during swing are reduced in the paretic limb due to muscular weakness and low propulsion in late stance [45,46]. Similarly, PC2 of the amplitudes identified knee hyperextension, a common characteristic of post-stroke gait, as a differentiating factor [46]. Phase PCA also extracted a key temporal difference between paretic and non-paretic gait, i.e. the paretic limb spends less time in stance relative to unimpaired gait, while the non-paretic limb compensates with higher stance time. This observation has been attributed clinically to muscle weakness and reduced stability in this population [46]. From phase PC2, we found a slower rise in ankle angle in people post-stroke during early stance (∼10–20%GC), which aligns with observations of reduced tibial progression attributable to reduced walking speed and knee extensor strength [47]. For comparison, we also applied PCA without alignment by using standard arithmetic means instead of Karcher means for each participant (electronic supplementary material, figure S4). Although we still observed statistical separation, the components were difficult to interpret due to the interactions of spatial and temporal characteristics. While PCA with alignment could also be extended to examine amplitude–phase interactions through analysis of correlation in principal coefficients, we chose to focus on interpretation of individual components and leave more complex analysis of pairwise interactions for future work. These results suggest the potential for this framework to automate identification of key spatiotemporal point metrics, thereby bridging the gap with conventional techniques.

    We also demonstrated the utility of FDA with continuous time series gait data at the individual level. We introduced the concept of difference curves to cluster on changes in joint kinematic trajectories between EXO+ and EXO gait, thereby capturing characteristics of user strategy in response to external assistance. While hierarchical clustering has been used to successfully subgroup post-stroke joint kinetics [13] and upper limb motions in unimpaired individuals [48], here we extended these approaches to evaluate user response to exosuit assistance and used a continuous representation of change for this analysis. Similar to our first application, our geometric approach to representing gait enabled us to account for variations in posture of the user throughout the gait cycle without the need for feature engineering. Interestingly, despite only using sagittal-plane geometry as the input to the algorithm, we observed significant differences in both dynamic and out-of-plane measures, which then allowed for functional subgrouping of individuals into quality-based and speed-based groups. For a comparison, we considered whether clustering on changes in walking speed alone would yield informative clusters, as increased speed is both a desired outcome and a factor that may affect gait kinematics [49]. However, we found that while this approach resulted in groups that differed in their walking speed response, no statistically significant differences were found in other biomechanical metrics (electronic supplementary material, figure S3), suggesting that using the full time series data could offer a more holistic understanding of user response. Unlike the group-level analysis of the first application, comparing phase information across subgroups did not provide much additional insight. Given that the magnitude of change in phase can be two orders smaller than the time period of a stride (∼1–2%GC), it is possible that this dataset was too small to identify changes at that resolution.

    An additional strength of our approach is its capacity to reduce stride-to-stride variability in data during pre-processing. The observed improvements in robustness in our application are consistent with those observed by Gloumakov et al., who used a similar alignment step before clustering upper limb motions [43]. This outcome is particularly beneficial for clinical applications in which the signal-to-noise ratio is low, hindering the ability to find structure across data samples. Thus, we expect that this approach will provide most utility in unsupervised investigations of high-dimensional data, such as in the case of gait, where domain knowledge may be less accessible. Furthermore, as the field largely recognizes that each individual is unique and that a single model cannot capture population-level behaviour, there is a need for methods that identify individual locomotor strategies. The approach presented here may begin to enable a theoretical method for simultaneously mitigating the effects of variation in the data while also capturing salient features of the data across strides.

    This work is a proof of concept that demonstrates the feasibility of the proposed continuous and unsupervised framework for gait analysis. Still, there are a few considerations that future work must account for to enable broader translation. First, the small sample size of the dataset used in this work limits the generalizability of the proposed method, and additional investigation is needed to understand and quantify the tradeoffs between dataset size and validity of the method. With more clinical trials with assistive and rehabilitative devices underway, the field can soon begin to evaluate whether similar techniques can be applied to these larger datasets to obtain separable clusters of response. Moreover, all strides corresponding to a single subject were collected in one experimental visit, and thus we expect future work will evaluate the feasibility of these methods for longitudinal applications, such as categorizing recovery trajectories during rehabilitation. Longitudinal studies would further provide data to address the inability of the current work to separate the effects of age and neuromotor impairment. Finally, we note that while we opted to focus on sagittal-plane gait kinematics given their rising accessibility in real-world environments, these methods are applicable to complex functions and thus can be explored with neurophysiological signals such as electromyography. Hence, we expect that future work will focus on expanding the applications of elastic alignment combined with dimensionality reduction to incorporate larger datasets, longer timescales and more complex physiological input.

    In summary, this article presents the use of elastic alignment alongside unsupervised dimensionality reduction methods to characterize post-stroke gait and understand the response of stroke survivors to soft exosuit assistance. Our work aims to enable improved understanding of impaired gait, inform the evaluation of assistive devices and ultimately advance physical rehabilitation for individuals with motor impairment.


    All participants provided written informed consent prior to participation. The study procedures were approved by the Harvard Longwood Medical Area and Boston University Institutional Review Boards.

    Data accessibility

    All data needed to support the conclusions of this paper are included in the main text or electronic supplementary material. Source data are available in the electronic supplementary material [50].

    Authors' contributions

    K.S.: conceptualization, data curation, formal analysis, investigation, methodology, validation, visualization, writing—original draft, writing—review and editing; I.T.: conceptualization, data curation, formal analysis, investigation, software, validation, visualization, writing—original draft, writing—review and editing; L.B.: investigation, methodology; D.A.R.: investigation, methodology; L.N.A.: funding acquisition, resources, supervision, writing—review and editing; C.J.W.: funding acquisition, resources, supervision, writing—review and editing; L.M.: conceptualization, funding acquisition, project administration, resources, supervision, writing—original draft, writing—review and editing.

    All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Conflict of interest declaration

    Patents describing the exosuit components documented in this article have been filed with the US Patent Office of which C.J.W. is an inventor. Harvard University has entered into a licensing and collaboration agreement with ReWalk Robotics. C.J.W. is a paid consultant for ReWalk Robotics.


    This work was supported by the National Institute of Health under award nos. BRG-R01HD088619 and R21 AR076686-01A1, the National Science Foundation under award no. CMMI-1925085, the Harvard University John A. Paulson School of Engineering and Applied Sciences and the Wyss Institute for Biologically Inspired Engineering. I.T. is supported by the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard (award no. 1764269) and the Harvard Quantitative Biology Initiative. K.S. and L.B. are partially supported by the National Science Foundation Graduate Research Fellowship (grant no. DGE1650114). D.A.R. and L.N.A. were partially supported by the American Heart Association (grant nos. 18TPA34170171 and 18IPA34170487). L.M. was partially supported by the Simons Foundation and the Henri Seydoux Fund.


    The authors thank Dr Jaehyun Bae, Dr Lizeth Sloot, Dr Franchino Porciuncula, Teresa Baker, Regina Sloutsky, Dabin Choe, Christopher Siviy, Nicolas Menard, Michael Rouleau and Lexine Schumm for their assistance in data collection and processing.


    These authors contributed equally to this study.

    Electronic supplementary material is available online at

    Published by the Royal Society. All rights reserved.